Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genotyping SVs in a minigraph-cactus graph yields many similar alleles in output vcf #4281

Open
henrivkgt opened this issue May 6, 2024 · 1 comment

Comments

@henrivkgt
Copy link

Hello,

I have been trying to genotype structural variants from a graph made with minigraph-cactus, by mapping short reads with vg giraffe, then using vg pack and vg call to get a vcf. This runs without error, but the output is sometimes hard to interpret for longer variants. This is because of small nested variants within larger structural variants getting their own allele in the vcf, leading to variants with ~10 alleles (depending on the number of input genomes), most of which are more than 95% similar to each other. Ideally, to prevent this, I would like to remove small nested bubbles from a graph before calling only the large ones. Vg simplify sounds like it does what I want, but it gives me a segmentation fault. Do you know of a strategy to deal with this?

Thanks for any help with this,

Henri

@glennhickey
Copy link
Contributor

Yeah this is a known issue that we're actively working on. I think your best bet until we get it sorted out is to merge the SVs together in the VCF output using something like truvari.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants