Skip to content

Bayesian modelling approach for detecting RNA flexibility changes in high-throughput structure probing data under different conditions, based on an extension of the BUM-HMM method.

License

Notifications You must be signed in to change notification settings

marangiop/diff_BUM_HMM

Repository files navigation

DOI

diffBUM-HMM

Bayesian modelling approach for detecting RNA flexibility changes in high-throughput structure probing data

The code implementing diffBUM-HMM is based on the original BUM-HMM Bioconductor package hosted in this repository.

Background

RNA structure is known to be a key regulator of many important mechanisms, such as RNA stability, transcription, and mRNA translation. RNA structural regulatory elements are interrogated with chemical and enzymatic structure probing. In these experiments, a chemical adduct reacts with the RNA molecule in a structure-dependent way, cleaving or otherwise modifying its flexible parts. These modified positions can then be detected, providing valuable structural information that can be used for structure prediction. Specifically, chemical adducts halt the reverse transcriptase (RT) reaction, causing it to drop off at the modified positions and truncating the cDNA transcript (RT-stop methods). By changing the conditions for the reaction, one can alternatively force RT to misincorporate non-complementary nucleotides or introduce deletions into the cDNA transcript instead (RT-mutate methods). These drop-off or mutated positions can then be mapped back to the reference sequence. Regardless of the approach, one challenge lies in the stochasticity of this process as the RT can also drop off or introduce mutations randomly. To address this, a complementary control experiment, where no probing reagent is used, is routinely performed to monitor random RT drop-offs or mutations.

Beta-uniform mixture hidden Markov model (BUM-HMM) is a statistical framework for modelling reactivity scores from an RNA structure probing experiment such as SHAPE or ChemModSeq (Selega2017). In short, BUM-HMM outputs posterior probabilities of modification for all nucleotides under a single condition, by comparing treated samples against control ones.

Once we have concluded whether a certain nucleotide is modified in a given condition X, where X can be a chemical reagent, temperature, or of a certain genotype, how can we compare that to the degree of modification of the same nucleotide under a different condition Y?

Differential BUM-HMM (diffBUM-HMM) is a natural extension of BUM-HMM, where the number of hidden states is increased from 2 to 4, to allow modelling probabilities of modification between two conditions (see figure below). diffBUM-HMM requires the coverage and drop-off/mutation counts for the differentially probed RNA of interest, to compute drop-off/mutation rates. For each experimental condition (e.g. Condition 1 and 2), the log-ratios for drop-off/mutation rates (LDRs/LMRs) at each nucleotide position are computed for pairs of control samples to give a null distribution, in order to quantify variability in drop-off or mutation rates observed by chance. LDRs/LMRs are also computed similarly for all possible treatment-control comparisons. Coverage-dependent biases are then removed by applying a variance stabilization transformation.

Subsequently, per-nucleotide empirical P values are computed for all possible treatment-control comparisons in each condition, by comparing the corresponding log-ratios to the null distribution. diffBUM-HMM is run on P values associated with the two independent conditions as observations, leaving out any nucleotides with missing data. The resulting output is a posterior probability of modification for each nucleotide, ranging from 0 to 1. diffBUM-HMM reports whether nucleotides were unmodified in both conditions, modified in either of the conditions or modified in both conditions.

In our paper (Marangio2021), we demonstrate that, compared to the existing approaches including dStruct and deltaSHAPE, diffBUM-HMM displays higher sensitivity while calling virtually no false positives. diffBUM-HMM analysis of ex vivo and in vivo Xist lncRNA SHAPE-MaP data detected many more RNA structural differences, involving mostly single-stranded nucleotides located at or near protein-binding sites. Collectively, our analyses demonstrate the value of diffBUM-HMM for quantitatively detecting RNA structural changes and reinforce the notion that RNA structure probing is a very powerful tool for identifying protein-binding sites.

Overview of the diffBUM-HMM model is shown below:

Images/Figure_1.jpg

Citing the diffBUM-HMM paper

Marangio, P., Law, K., Sanguinetti, G., Granneman, S. diffBUM-HMM: a robust statistical modeling approach for detecting RNA flexibility changes in high-throughput structure probing data. Genome Biology 22, 165 (2021). https://doi.org/10.1186/s13059-021-02379-y

Reproducing figures from the paper

Figure Instructions for raw data analysis Jupyter Notebook for figure generation
2-3 Instructions Notebook
4 Instructions Notebook
5A-B Instructions Notebook
5C Instructions Notebook
5D Instructions Notebook
6 Instructions Notebook
S1 Instructions Notebook
S2 N/A Notebook
S3 Instructions N/A
S4 Instructions N/A
S5 Instructions N/A
S6 Instructions Notebook
S7 Instructions Notebook

The table above only includes instructions for figures (or subpanels of figures) from the paper that have been generated programmatically.

Dependencies

The pipeline is built in R. Python and Jupyter (notebook) are needed for performing some of the raw data analysis and figure generation.

  • R 4.0.0 (2020-04-24) (version 3.6.3 2020-02-29 tested working)
  • RStudio 1.2.5001 (version 1.1.442 tested working)
  • Python 3.7.6 (unless specified otherwise)

If .R scripts are run using previous versions of R, Bioconductor v 3.10 needs to be installed.

Requirements

The renv package was used for recording version of the packages used in the R environment. We also provide a requirements.txt file listing Python packages with versions used in the development and benchmarking of the diffBUM-HMM pipeline described in this study, as reference and also to enable quick installation of the packages.

#Restores the state of the R development environment from renv.lock 
renv::restore()
#Restores the state of the Python development environment from python-requirements.txt 
pip install -r python_requirements.txt

A note on OS compatibility

The entire repository has been tested successfully on Mac OS. All .R scripts have been tested successfully for Mac OS and Windows. Jupyter notebooks for Figure 4, 6 and S2 have partial dependencies (the pyCRAC software package and MEME Suite software) that are only supported by Mac OS and Linux, hence cannot be fully run on Windows.

About

Bayesian modelling approach for detecting RNA flexibility changes in high-throughput structure probing data under different conditions, based on an extension of the BUM-HMM method.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •