Skip to content

This a snakemake pipeline to detect Somatic mutations (GATK4 and Mutect2)

License

Notifications You must be signed in to change notification settings

SherineAwad/SomaticMutations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Snakemake License DOI

Snakemake Workflow for somatic mutation calling

The is GATK4/Mutect2 pipeline for Somatic Mutation.

Requirments

  • trim-galore=0.6.6
  • star=2.7.10a
  • picard=2.25.6
  • gatk4=4.2.0.0

You can run the pipeline in -use-conda mode to pull these tools automatically. See use conde section below.

Edit config file

You will need to edit the config file to match your samples and parameters.

The pipeline expects samples with suffix ".r_1.fq.gz" and ".r_2.fq.gz" if the samples are paired-end. Any prefix before this suffix is the sample name and to be written in the "samples.tsv". For single-end reads, the samples suffix is ".fq.gz" and any prefix before this suffix is written in the "samples.tsv". For example, if your sample name is sample1.s_1.r_1.fq.gz, then your sample name in the samples file should be sample1.s_1.

You need to update the config file with whether your samples are paired-end or single reads. If your samples are paired-end, then the PAIRD entry in the config file should be set to TRUE, otherwise, set the PAIRED entry in the config file to FALSE. You can change the samples.tsv name in the config file.

The samples.tsv has the following format:

Tumors Normals
SLX-18967.UDP0126.HT3G5DMXX.s_1 SLX-18967.UDP0129.HT3G5DMXX.s_1
SLX-18967.UDP0146.HT3G5DMXX.s_1 SLX-18967.UDP0149.HT3G5DMXX.s_1

You will need to edit the names and directory of your genome, your genome index, GTF, adapters, read groups in the GENOME, INDEX, GTF, ADAPTERS, and RG entries in the config file respectively. You will also need to have your DBSNP vcf, indels vcf, gold standard vcf, and AF only gnomAD in the DBSNP, INDELS, GOLD_STANDARD, and AFONLYGNOMAD entries respectively in the config file. If you are using human genome, these resources can be pulled from Broad institute Resource Bundle

You need to update your interval list, by editing the intervals.list file to list only the chromosomes of your interest. You can change the name of this file by editing the config file entry INTERVALS.

The pipeline will automatically pull the biallelic gnomAD. You can change its name/location by editing the GNOMAD_BIALLELIC entry in the config file.

Run Snakemake pipeline

Once you edit the config file to match your needs, then:

snakemake -jn 

where n is the number of cores for example for 10 cores use:

snakemake -j10 

For a dry run use:

snakemake -j1 -n 

and to print command in dry run use:

snakemake -j1 -n -p 

Use Conda

For less frooodiness, to pull automatically the same versions of dependencies use:

snakemake -jn --use-conda

This will pull the same versions of tools we used. Conda has to be installed in your system.

For example, for 10 cores:

snakemake -j10 --use-conda

Dry run

for a dry run use:

snakemake -j1 -n

and you can see the command printed on a dry run using:

snakemake -j1 -n -p

Keep going option

You can try the following to keep going if any issues happen, like no variants is found by one tool:

snakemake -j1 --keep-going

Collect some stats

snakemake -j 10 --keep-going --stats run.stats

Citations

If you use this pipeline, please cite us as follows:

Sherine Awad. (2022). SherineAwad/SomaticMutations: v1.0.0 (v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.6202482

About

This a snakemake pipeline to detect Somatic mutations (GATK4 and Mutect2)

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages