-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with VCF generated after running vg haplotypes #4258
Comments
@glennhickey Do you know what deconstruct might be thinking to come up with the wrong reference traversals? |
This looks like the usual confusion about the node ids being different between the GFA (unclipped) and GBZ (clipped to 1024bp). You can generally get the GFA node ids out of
A few more details about node clipping in general can be found here: https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/doc/pangenome.md#node-chopping |
1. What were you trying to do?
I'm trying to run the haplotype sampling algorithm to create a VCF of the resulting subgraph with vg deconstruct. I used the commands shown below based on (CHM13-based) MC graph "hgsvc3_and_hprc-dec16-mc-chm13" available here: https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=pangenomes/scratch/2023_12_16_minigraph_cactus_hgsvc3_and_hprc/, and Illumina short reads for sample HG00733 from the 1000G.
2. What did you want to happen?
I want to create a VCF from the subsampled graph to call bubbles relative to the reference genome (CHM13).
3. What actually happened?
The commands run successfully, but I noticed some issue with the resulting VCF file. According to the AT fields in the VCF, some reference alleles traverse nodes previously not traversed by the reference path in the GFA. In other words, nodes appear in these AT traversals that are not annotated as reference nodes in the GFA (missing SN:Z and SO:i tags) and additionally are not contained in the W line of the respective reference path. One such example is the node with ID "8521880". I attached a file containing an example of such a line in the VCF (for node: 8521880). I'm confused how this can happen. Did I do anything wrong?
4. If you got a line like
Stack trace path: /somewhere/on/your/computer/stacktrace.txt
, please copy-paste the contents of that file here:5. What data and command can the vg dev team use to make the problem happen?
see above (1.)
6. What does running
vg version
say?vcf-line.txt
The text was updated successfully, but these errors were encountered: