Skip to content

Identify the most shared allele specific sgRNA pairs for a gene or locus in the 1000 Genomes cohort

Kathleen Keough edited this page Dec 2, 2020 · 2 revisions

1.) Download the relevant VCF/BCF to your computer or server.

Download the 1000 Genomes VCF for the relevant chromosome. E.g., if you're analyzing the gene BEST1, which is located on chromosome 11, download the VCF and index files for chromosome 11. The chromosome VCFs for hg38 are located here.

If you're low on space, once you download the VCF you can extract just your gene's region of interest using bcftools view making sure the output is gzipped. More info on bcftools here. Make sure you make a new index file!

2.) Use get_gens_df.py (in preprocessing) to make a gens df for your gene/locus of interest.

3.) Use annot_variants.py (in preprocessing) to annotate the variants for sgRNA generation based on the type(s) of cas you're using.

4.) Use ExcisionFinder.py in --exhaustive mode to identify which 1000 genomes individual have which targetable variant pairs.

5.) Use gen_arcplot_input.py to generate an arcplot input dataframe.

6.) Open the arcplot input dataframe with Excel, and sort by the n_inds column. This will put the most shared pairs at the top of the file!

Note: each variant pair analyzed here may have more than one allele-specific sgRNA that can be designed to it. sgRNA design currently must be done based on an individual genome, so in order to get the sgRNA associated with a variant pair, identify an individual that has both variants, and design sgRNAs to their genome.