GitHub - balwierz/mappabilityPipeline: Optimised Genome Mappability pipeline with bowtie2

Output: a bigWig file with signal value equal to the shortest read length necessary to unique map a read to a given locus.

Dependencies:

bowtie2
bedGraphToBigWig (Jim Kent tools from UCSC)
samtools
perl
perl modules IO::Zlib, File::Temp

System requirements:

lots of RAM: I am executing mouse and human genomes with 200GB allocation.
up to 1TB of temp disk space.

Usage: The use has to download the genomes in fasta format (can be gzipped). Check perl script header for genome file locations and temporary (scratch) directory. Edit. compile the cpp file Execute the perl script with options mentioned in the header.

Uses bowtie2. Iteratively maps a chopped genome to itself to test if the mapping is unique. Filters for mapping quality q>=10 (can be easily changed). Keeps a hash table of genomic coordinates in the memory. For the mouse genome at least 90GB or RAM is necessary.

Note: Bowtie2 sometimes reports a very good alignment (q=42) of a sequence to a wrong locus. Probably somehow does not discover the second (intended) locus and reports only one perfect match somewhere else. This is really wrong, but it is bowtie2 where this should be fixed. The vast majority of positions and mappings are OK.

Piotr Balwierz

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
bam2mappablePositions2.cpp		bam2mappablePositions2.cpp
unifiedPipeline.pl		unifiedPipeline.pl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

bam2mappablePositions2.cpp

bam2mappablePositions2.cpp

unifiedPipeline.pl

unifiedPipeline.pl

Repository files navigation

About

Releases

Packages

Languages

balwierz/mappabilityPipeline

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages