Skip to content

balwierz/mappabilityPipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Output: a bigWig file with signal value equal to the shortest read length necessary to unique map a read to a given locus.

Dependencies:

  • bowtie2
  • bedGraphToBigWig (Jim Kent tools from UCSC)
  • samtools
  • perl
  • perl modules IO::Zlib, File::Temp

System requirements:

  • lots of RAM: I am executing mouse and human genomes with 200GB allocation.
  • up to 1TB of temp disk space.

Usage: The use has to download the genomes in fasta format (can be gzipped). Check perl script header for genome file locations and temporary (scratch) directory. Edit. compile the cpp file Execute the perl script with options mentioned in the header.

Uses bowtie2. Iteratively maps a chopped genome to itself to test if the mapping is unique. Filters for mapping quality q>=10 (can be easily changed). Keeps a hash table of genomic coordinates in the memory. For the mouse genome at least 90GB or RAM is necessary.

Note: Bowtie2 sometimes reports a very good alignment (q=42) of a sequence to a wrong locus. Probably somehow does not discover the second (intended) locus and reports only one perfect match somewhere else. This is really wrong, but it is bowtie2 where this should be fixed. The vast majority of positions and mappings are OK.

Piotr Balwierz

About

Optimised Genome Mappability pipeline with bowtie2

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published