Skip to content

NVIDIA/VariantWorks

Repository files navigation

VariantWorks SDK

VariantWorks is a framework to enable the development of Deep Learning based genomic read processing tasks such as variant calling, consensus calling, etc. It provides a library of data encoding and parsing functions commonly applicable to read processing, along with a simple way to plug them into a Deep Learning pipeline.

For the Deep Learning pipeline, VariantWorks leverages the NeMo framework which provdes an easy-to-use, graph based representation of high level computation graphs.

The target audience for VariantWorks is the following -

  1. Variant Caller developers - for existing developers in the variant calling community, VariantWorks intends to provide a convenient way to start designing variant callers built using Deep Learning.
  2. Deep Learning practitioners - for existing deep learning practitioners, VariantWorks can lower the barrier to applying novel Deep Learning techniques to the field of genomic variant calling.

Core Features

  • Encoders - Pre-written, commonly used (and in the future, optimized) encoders for reads.
  • I/O - Readers and writers for common genomics file formats.
  • Reference Models - Collection of neural network architectures well suited for variant calling.

Requirements

  1. Python 3.7+
  2. NVIDIA GPU (Pascal+ architecture)
  3. NVIDIA Apex library (for multi-GPU training in supported pipelines)

Getting Started

  • Install latest development code from source
git clone --recursive https://github.com/clara-parabricks/VariantWorks.git
cd VariantWorks
pip install -r python-style-requirements.txt
pip install -r requirements.txt
pip install -e .
# Install pre-push hooks to run tests
ln -nfs $(readlink -f hooks/pre-push) .git/hooks/pre-push