Skip to content

This page is contains all the software and commands I've used for my mtDNA analysis.

Notifications You must be signed in to change notification settings

AKMARTIAN/mtDNA-Analysis-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Pipeline for Analysis of Mitochondrial DNA Sequence Data

Overview

This document describes the pipeline used for the bioinformatic analysis of mitochondrial DNA (mtDNA) sequence data. The pipeline involves processing long-read sequencing data obtained from Oxford Nanopore Technologies. The core tools used in this pipeline are minimap2, samtools, and bcftools.

Pipeline Workflow

The pipeline processes raw sequencing data (in FASTQ format) to ultimately produce filtered variant call files (VCF). Below are the steps involved:

Step 1: Sequence Alignment with minimap2

Aligns the raw sequencing reads to a reference mtDNA sequence.

minimap2 -ax map-ont <MTDNA_Reference.fna> <Sample.fastq> >
<Output_file_name.sam>

Step 2: Conversion to BAM Format with samtools

Converts the SAM file generated by minimap2 to a binary format (BAM).

samtools view -Sb <input.sam> > <output.bam>

Step 3: Sorting BAM File

Sorts the BAM file by genomic coordinates.

samtools sort <input.bam> -o <sorted_output.bam>

Step 4: Indexing the Sorted BAM File

Creates an index for the sorted BAM file.

samtools index <sorted_output.bam>

Step 5: Variant Calling with bcftools

Performs variant calling on the aligned and sorted reads.

bcftools mpileup -q20 -Ou -f <MTDNA_Reference.fa>
<sorted_output.bam> | bcftools call -cv --ploidy 1 -f GQ -Ou |
bcftools filter -i 'QUAL>20' -Ov -o <filtered_results.vcf>

(Filtering by Base Quality)

bcftools mpileup -Q20 -q20 -Ou -f <MTDNA_Reference.fa>
<sorted_output.bam> | bcftools call -cv --ploidy 1 -f GQ -Ou |
bcftools filter -i 'QUAL>20' -Ov -o <filtered_results.vcf>

Basecalling on Dorado

dorado basecaller <kitname> <basecalling_model> /path/to/pod5_pass > /path/to/desired_ouput_directory/output.fastq

Input/Output Formats

  • Input: FASTQ file from Oxford Nanopore sequencing.
  • Intermediate: SAM/BAM files for aligned reads.
  • Output: VCF file containing the filtered list of variants.

Dependencies

  • minimap2 for read alignment.
  • samtools for manipulation of SAM/BAM files.
  • bcftools for variant calling and filtering.

Usage Notes

  • Ensure that the reference mtDNA file (MTDNA_Reference.fna/.fa) is accurate and up-to- date.
  • Quality thresholds and other parameters in bcftools commands can be adjusted based on specific project requirements.
  • For large datasets, consider increasing system resources to improve processing times.

Contact

For questions or issues related to this pipeline, please contact [Ahmed Khalid/[email protected]].

About

This page is contains all the software and commands I've used for my mtDNA analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published