Skip to content

Implementation of the miRBooking algorithm and metrics in C

License

Notifications You must be signed in to change notification settings

major-lab/mirbooking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

miRBooking

Implementation of the miRBooking algorithm and metrics in C

  • fast and memory efficient
  • usable from Python, JavaScript and Vala via GObject introspection
  • memory-mapped score tables, target and miRNAs FASTA for low memory footprint in parallel execution
  • binary with support for static linking for more portability
  • stdin/stdout for piping from and into other tools

Usage

mirbooking --targets targets.fa
           --mirnas mirnas.fa
           --seed-scores scores-7mer-3mismatch-ending
           [--accessibility-scores accessibility-scores[.gz]]
           [--supplementary-model none]
           [--supplementary-scores scores-3mer]
           [--input stdin]
           [--output stdout]
           [--output-format tsv]
           [--sparse-solver best-available]
           [--max-iterations 100]
           [--5prime-footprint 9]
           [--3prime-footprint 7]
           [--cutoff 100]
           [--relative-cutoff 0]
           [--blacklist blacklist.tsv]

To obtain detailed usage and options, launch mirbooking --help.

The command line program expects a number of inputs:

  • --targets, a FASTA containing RNA transcripts where the identifier is the accession with support for alternative flavours from NCBI RefSeq and GenBank via --ncbi-targets, and GENCODE via --gencode-targets
  • --mirnas, a FASTA containing mature miRNAs where the identifier is the accession with support for alternative flavour from miRBase via --mirbase-mirnas
  • --seed-scores, a sparse score table of seed free energies which can be generated using generate-score-table program described below
  • --accessibilitiy-scores contains entries with position-wise free energy contribution (or penalty) on the targets
  • --supplementary-scores contains either 4mer or 3mer
  • --input, a quantity file mapping target and miRNA accessions to expressed quantity in picomolars units

Tables for seed and supplementary scores are provided in the data folder. These were computed with RNAcofold binding energy from ViennaRNA package.

Note that Yan et al. (--supplementary-model=yan-et-al-2018) model requires a 3mer table whereas Zamore et al. (--supplementary-model=zamore-et-al-2012) require a 4mer table.

Tables for seed and supplementary bindings are automatically located (new in 2.3).

The --cutoff parameter can exploit a known upper bound on the complex concentration to adjust the granularity of the model. Only interaction that can ideally reach the specified picomolar concentration will be modeled.

The --relative-cutoff parameter is similar, but instead filter based on the ideal substrate bound fraction.

The output is a TSV with the following columns:

Column Description
gene_accession Gene accession with version (new in 2.3)
gene_name Name of the gene or N/A if unknown (new in 2.3)
target_accession Target accession with version
target_name Name of the target or N/A if unknown
target_quantity Total target concentration in picomolars
position Site position on the target
mirna_accession miRNA accession
mirna_name Name of the miRNA or N/A if unknown
mirna_quantity Total miRNA concentration in picomolars
score Michalis-Menten constant of the miRNA::MRE duplex
quantity miRNA::MRE duplex concentration this target position in picomolars

The detailed TSV output which expands the score structure in its constituents can be used with --output-format=tsv-detailed (new in 2.3). In this mode, the score column is replaced by kf, kr, kcleave, krelease, kcat, kother, kd and km.

The GFF3 output can be used with --output-format=gff3. The score will indicate the bound fraction of the position.

Wiggle output can also be produced with --output-format=wig. The score will be the position-wise bound fraction of substrate which properly account for overlapping microRNA.

The --blacklist parameter indicates a file that contains interactions that the model should ignore. This is particularly useful if you know beforehand they will be too weak at equilibrium to be worth modeling. The format is a three column TSV containing only the columns target_accession, position and mirna_accession from the output format.

Installation

You'll need Meson and Ninja as well as GLib development files installed on your system.

mkdir build && cd build
meson --buildtype=release
ninja
ninja install

To generate fast code, configure with meson -Doptimization=3.

You can perform a local installation using meson --prefix=$HOME/.local, but you'll need LD_LIBRARY_PATH set accordingly since the mirbooking program uses a shared library. Otherwise, a static linkage can be done by calling meson --default-library=static.

To generate introspection metadata, use meson -Dwith_introspection=true. To generate Vala bindings, use meson -Dwith_vapi=true.

CBLAS is required and you can alternatively opt for ATLAS with -Dwith_atlas=true or OpenBLAS -Dwith_openblas=true implementations instead of the default netlib CBLAS. If configured with -Dwith_mkl=true, MKL CBLAS will be used instead. The OpenMP flavour of OpenBLAS is used when configured with -Dwith_openmp=true.

FFTW can be optionally used to compute more accurate silencing by specifying meson -Dwith_fftw3=true. If you redistribute miRBooking source code, be careful not to enable this as a default because of the GPL license covering this dependency. If you have access to Intel MKL, you can alternatively use its FFTW3 implementation with -Dwith_mkl_fftw3=true.

OpenMP can be optionally used to parallelize the evaluation of partial derivatives and some supported solvers by specifying -Dwith_openmp=true.

MPI can be optionally used to distribute the computation across multiple machine on supported solvers (i.e. mkl-cluster) by specifying -Dwith_mpi=true.

Solver Build Options
LAPACK No option since this is the fallback solver.
SuperLU -Dwith_superlu=true
SuperLU MT -Dwith_superlu_mt=true
UMFPACK -Dwith_umfpack=true
cuSOLVER -Dwith_cuda=<cuda_toolkit_api_version> -Dwith_cusolver=true
MKL DSS -Dwith_mkl=true -Dmkl_root=<path to mkl> -Dwith_mkl_dss=true
MKL Cluster -Dwith_mpi=true -Dwith_mkl=true -Dmkl_root=<path to mkl> -Dwith_mkl_cluster=true
MKL LAPACK -Dwith_mkl=true -Dmkl_root=<path to mkl> -Dwith_mkl_lapack=true
PARDISO -Dwith_pardiso=true

LAPACK is not a sparse linear solver and thus will not handle typical workload very well, but it will perform orders of magnitude faster on dense jacobians.

cuSOLVER require CUDA toolkit whose API version is to be specified with -Dwith_cuda=<cuda_toolkit_api_version>.

MKL DSS and MKL Cluster can benefit from TBB instead of OpenMP, which can be enabled with -Dwith_mkl_tbb=true.

MKL DSS and MKL Cluster can be used with the 64 bit interface, allowing much larger systems to be solved with -Dwith_mkl_ilp64=true. However, this will break other solvers as it will load a 64 bit BLAS.

PARDISO cannot be used along with MKL DSS because they define common symbols.

By default, the best sparse solver available among the following will be used (new in 2.3):

  1. MKL-DSS
  2. PARDISO
  3. UMFPACK
  4. SuperLU
  5. LAPACK

Numerical integration

In addition to determine the steady state, miRBooking can also perform numerical integration of the microtargetome using the programming API.

Other tools

In addition to the mirbooking binary, this package ship a number of utilities.

Te generate-score-table compute a hybridization energy table for a given seed mask. Either ViennaRNA or mcff is required to compute energies.

generate-score-table [--method=RNAcofold]
                     [--temperature=310.5]
                     [--mask=||||...]
                     [--hard-mask=||||...]
                      --output scores

The seed mask defines folding constraints on the target with | for a canonical match, x for a canonical mismatch and . for no constraint. It also determines the seed length. If a hard mask is provided, unsatisfying interactions are filtered out (new in 2.3).

It's also possible to ajust the folding temperature (new in 2.3).

The number of workers can be tuned by setting OMP_NUM_THREADS environment variable.

The mirbooking-iterative tool is a wrapper script around miRBooking which takes advantage of the --blacklist flag by solving the equilibrium gradually and excluding weak interactions in subsequent models.

It takes the same arguments as mirbooking with the slight distinction that the --cutoff now indicates the target cutoff.

C API

The API is conform to the GLib style and enable a wide range of use. It is fairly easy to use and a typical experimentation session is:

  1. create a broker via mirbooking_broker_new
  2. create some sequence objects with mirbooking_target_new and mirbooking_mirna_new
  3. setup quantities via mirbooking_broker_set_sequence_quantity
  4. call mirbooking_broker_evaluate and mirbooking_broker_step repeatedly to perform a full hybridization or numerical integration
  5. retrieve and inspect the microtargetome with mirbooking_broker_get_target_sites

For a more detailed usage and code example, the main program source in bin/mirbooking.c is very explicit as it perform a full session and fully output the target sites.

Cite this software

Poirier-Morency, G. Modélisation des réseaux de régulation de l’expression des gènes par les microARN. (Université de Montréal, 2021). https://doi.org/1866/25104