Skip to content

🐍 🧬 Turning my metagenomic MAG (metagenome-assembled genome) pipeline into a snakemake pipeline for increased reproducibility and scalability.

License

Notifications You must be signed in to change notification settings

patriciatran/MAG_pipeline

Repository files navigation

About:

I built this snakemake pipeline to showcase how FASTQ files can be taken all the way into a set of good-quality dereplicated MAGs.

MAG pipeline logo

The general steps are:

  • Quality checked the reads using fastQC
  • Assembled the fastq reads using SPADES
  • Bin the MAGs using metawrap
  • Refining MAGs using dasTool
  • Deplicate the MAGs (if relevant) using dRep.
  • Determine MAG quality using checkM
  • Select only MIMAG quality-standard MAGs for further analyses (e.g. >50% complete, <10% contamination).
  • Assign taxonomy of this MAG set using gtdbtk.

DAG of the workflow

Example result folder:

https://github.com/patriciatran/MAG_pipeline/blob/main/example_results_folder.txt

Relevant folders for output:

  • results/{sample}/final_bin_set/.fasta* : all the final bins in FASTA format
  • results/{sample}/taxonomy_final_bin_set.tsv : final GTDBTK taxonomic assignment for the final bin set of MAGs

Status:

April 17, 2023

  • Pipeline works without errors!
  • Next step: improving documentation and distribute as a package.
  • Add ways to report final information : e.g. run time of the pipeline, how many MAGs in the final bin set for each sample.
  • Add ways to report final information: e.g. bar plot of taxonomies across samples

Thanks to:

This pipeline exists because of the folks making these programs available, please cite their work:

Further Reading:

MIMAG Standards: https://www.nature.com/articles/nbt.3893

Data used for pipeline testing:

Tisza MJ et al., "A catalog of tens of thousands of viruses from human metagenomes reveals hidden associations with chronic diseases.", Proc Natl Acad Sci U S A, 2021 Jun 8;118(23)

About

🐍 🧬 Turning my metagenomic MAG (metagenome-assembled genome) pipeline into a snakemake pipeline for increased reproducibility and scalability.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published