Great tit (Parus major) INDEL analysis

Pipeline from Barton and Zeng (2019).

Henry Juho Barton
Department of Animal and Plant Sciences, The University of Sheffield

Introduction

This repository outlines the pipeline used to generate and analyse an INDEL dataset from 10 high coverage (mean coverage = 44X) great tit (Parus major) genomes (described here: Corcoran et al. 2017). The repository is subdivided by processing steps.

Programs required

Python 2.7.2
GATK version 3.4-46-gbc02625 available from: https://software.broadinstitute.org/gatk/download/archive
VCFtools version 0.1.12b available from: https://sourceforge.net/projects/vcftools/files/
SAMtools version 1.2 available from: https://sourceforge.net/projects/samtools/files/samtools/
BCFtools version 1.3
bedtools version 2.23.0
anavar version 1.2.2
q_sub.py and qsub_gen.py available from https://github.com/henryjuho/python_qsub_wrapper
pysam version 0.11.2.1 available from https://github.com/pysam-developers/pysam

* Note * that most scripts make use of the script 'qsub_gen.py' which is designed to submit jobs in the form of shell scripts to the 'Sun Grid Engine', if shell scripts only are required the '-OM' option in the 'qsub_gen.py' command line within the scripts can be changed from 'q' to 'w'. Alternatively some scripts make use of the python qsub wrapper module qsub.py described here: https://github.com/henryjuho/python_qsub_wrapper.

Pre-prepared files required for analysis

Reference genome: /fastdata/bop15hjb/GT_ref/Parus_major_1.04.rename.fa
Reference genome index file: /fastdata/bop15hjb/GT_ref/Parus_major_1.04.rename.fa.fai
GFF annotation file: /fastdata/bop15hjb/GT_ref/GCF_001522545.1_Parus_major1.0.3_genomic.gff.gz
All sites VCF: /fastdata/bop15hjb/GT_data/BGI/bgi_10birds.raw.snps.indels.all_sites.vcf
Repeat masker bed file: /fastdata/bop15hjb/GT_data/BGI_10_repeats/ParusMajorBuild1_v24032014_reps.bed
BAM files for SAMtools calling: /fastdata/bop15hjb/GT_data/BGI_10_BAM/*.bam

Pipeline

Generating the dataset

The variant calling and filtering pipeline for both SNPs and INDELs is described here: variant_calling/.

Multispecies alignment and INDEL polarisation

The generation of a multiple species alignment between zebra finch, great tit and fly catcher and its use in polarisating variants and identifying ancestral repeats is described here: alignment_and_polarisation/.

Annotating the data

Variant annotation using the NCBI GFF file is described here: annotation/.

Summary statistics and analyses

The calculation of summary statistics and other data summary analyses are documented here: summary_analyses/.

Anavar analyses

Analysis of the INDEL data with the anavar package is described here: anavar_analyses/.

Proximity analyses

Analysis of INDEL data in windows of increasing distance from exons is described here: gene_proximity_analyses/.

Recombination analyses

Pipeline for relating INDEL diversity and Tajima's D with recombination rate is documented here: recombination_analyses/.

Length analyses

Analysis of impact of INDEL length on the SFS is documented here: length_analyses/.

Name		Name	Last commit message	Last commit date
Latest commit History 706 Commits
alignment_and_polarisation		alignment_and_polarisation
anavar_analyses		anavar_analyses
annotation		annotation
gene_proximity_analyses		gene_proximity_analyses
length_analyses		length_analyses
paper_figures		paper_figures
recombination_analyses		recombination_analyses
summary_analyses		summary_analyses
variant_calling		variant_calling
README.md		README.md
hen_utils.py		hen_utils.py

henryjuho/parus_indel

Folders and files

Latest commit

History

Repository files navigation

Great tit (Parus major) INDEL analysis

Pipeline from Barton and Zeng (2019).

Introduction

Programs required

Pre-prepared files required for analysis

Pipeline

Generating the dataset

Multispecies alignment and INDEL polarisation

Annotating the data

Summary statistics and analyses

Anavar analyses

Proximity analyses

Recombination analyses

Length analyses

About

Topics

Resources

Stars

Watchers

Forks

Languages