Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCF file empty when calling SV on ONT data #4279

Open
SwenDiepstraten opened this issue Apr 29, 2024 · 9 comments
Open

VCF file empty when calling SV on ONT data #4279

SwenDiepstraten opened this issue Apr 29, 2024 · 9 comments

Comments

@SwenDiepstraten
Copy link

1. What were you trying to do?
calling structural variation after mapping long-read data on whole genome pangenome graph

2. What did you want to happen?
create a VCF file which contained the variants based on the reads

3. What actually happened?
The VCF file stays empty

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

Place stacktrace here.

5. What data and command can the vg dev team use to make the problem happen?
reads were aligned with GraphAligner:
GraphAligner -t 32 -g whole_genome.gfa -f reads/LW1.fastq.gz -a mapped_LW.gam -x vg

And I wanted to call variants following the tutorial on https://github.com/vgteam/vg/wiki/SV-Genotyping-and-variant-calling

vg convert -g whole_genome.gfa -p -t 32 > output_WG.pg
vg augment output_WG.pg mapped_LW.gam -t 32 -m 4 -q 5 -Q 5 -A mapped_LW_aug.gam > mapped_LW_aug.pg
vg snarls -t 32 mapped_LW_aug.pg > mapped_LW_aug.snarls
vg pack -t 32 -x mapped_LW_aug.pg -g mapped_LW_aug.gam -o mapped_LW_aug.pack
vg call mapped_LW_aug.pg -t 32 -r mapped_LW_aug.snarls -k mapped_LW_aug.pack > mapped_LW_snarls.vcf

6. What does running vg version say?

vg version v1.55.0 "Bernolda"
Compiled with g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 on Linux
Linked against libstd++ 20230528
Built by jeizenga@mustard

Hopefully you can assist me!

@glennhickey
Copy link
Contributor

I'd suggest not using vg augment -- just call the SVs directly on your graph.

Others have had issues where GraphAligner does not write mapping qualities (you can check your GAM with vg view -a). In this case, any MAPQ filter (vg pack -Q 5 or in your case vg augment -Q 5) will filter all the reads leading to no calls. Try

vg convert -g whole_genome.gfa -p -t 32 > output_WG.pg
vg snarls -t 32 output_WG.pg > mapped_LW_aug.snarls
vg pack -t 32 -x output_WG.pg -g mapped_LW.gam -o mapped_LW.pack
vg call output_WG.pg -t 32 -r mapped_LW.snarls -k mapped_LW.pack > mapped_LW_snarls.vcf

@SwenDiepstraten
Copy link
Author

after running this code for over 4 hours it is still an empty file. the first couple steps took only a fraction, but the vcf file stays empty

@SwenDiepstraten
Copy link
Author

I was also wondering how the vcf file is generated. Does the entire vcf have to be loaded into memory and is then pasted into the output file, or is it procedurally generated in the output?

@glennhickey
Copy link
Contributor

It outputs the VCF all at once at the end. vg call can be very slow on some complex graphs. You can often manage this by using -C to limit the size of alt alleles to search for in the graph.

@SwenDiepstraten
Copy link
Author

Do you know what would be a good cutoff value be when looking in a mammalian genome?

@SwenDiepstraten
Copy link
Author

And what timeframe should I keep in mind for producing the VCF file, my gam file is 27.7 GB, the reads are 38.3 GB, my VG graph is 6.06 GB and contains 52.4 million nodes, 71.9 million edges and a total length of 2.7 billion.

Thank!

@glennhickey
Copy link
Contributor

There's a --progress option that may help you judge where it is. Otherwise, the running time is extremely dependent on the graph. If you have many reference paths, using -p/-S to select a reference can help. If you have many haplotypes in general in your graph, you can convert it to gbz with vg gbwt and run call with -z to only explore these haplotypes (speeding up the search)

@SwenDiepstraten
Copy link
Author

Unfortunately the --progress option is not available for me in vg call, but I will try to run with gbz and see how that goes

@SwenDiepstraten
Copy link
Author

When trying to run with a gbz file, I encounter the following error, what could cause this? I am running the following code now:

vg gbwt -G /minigraph_cactus/output_WG/output_WG.gfa -o output_WG.gbwt -d temp -p
vg gbwt -G /minigraph_cactus/output_WG/output_WG.gfa --graph-name output_WG.gbz --gbz-format -p

vg snarls -t 32 output_WG.gbz > mapped_LW.snarls
vg pack -t 32 -x output_WG.gbz -g graphaligner/mapped_LW.gam -o mapped_LW.pack

vg call output_WG.gbz -z -t 32 -r mapped_LW.snarls -k mapped_LW.pack -C 100000 > mapped_C100000_LW_snarls_git.vcf

vg: /private/groups/patenlab/jeizenga/GitHub/vg/include/sdsl/int_vector.hpp:1391: sdsl::int_vector< >::reference sdsl::int_vector< >::operator[](const size_type&) [with unsigned char t_width = 0; sdsl::int_vector< >::reference = sdsl::int_vector_reference<sdsl::int_vector<0> >; sdsl::int_vector< >::size_type = long unsigned int]: Assertion `idx < this->size()' failed.
��������������������
Crash report for vg v1.55.0 "Bernolda"
Stack trace (most recent call last) in thread 872463:
#14 Object "", at 0xffffffffffffffff, in
#13 Object "/lustre/nobackup/TOPIGS/shared/swen/cactus-bin-v2.8.0/bin/vg", at 0x2160633, in __clone
#12 Object "/lustre/nobackup/TOPIGS/shared/swen/cactus-bin-v2.8.0/bin/vg", at 0x20b9d4a, in start_thread
#11 Object "/lustre/nobackup/TOPIGS/shared/swen/cactus-bin-v2.8.0/bin/vg", at 0x205c5dd, in gomp_thread_start
#10 Object "/lustre/nobackup/TOPIGS/shared/swen/cactus-bin-v2.8.0/bin/vg", at 0x205ef27, in gomp_team_barrier_wait_end
#9 Object "/lustre/nobackup/TOPIGS/shared/swen/cactus-bin-v2.8.0/bin/vg", at 0x205682a, in gomp_barrier_handle_tasks
#8 Object "/lustre/nobackup/TOPIGS/shared/swen/cactus-bin-v2.8.0/bin/vg", at 0xdbc505, in void vg::io::for_each_parallel_implvg::Alignment(std::istream&, std::function<void (vg::Alignment&, vg::Alignment&)> const&, std::function<void (vg::Alignment&)> const&, std::function<bool ()> const&, unsigned long) [clone ._omp_fn.1]
#7 Object "/lustre/nobackup/TOPIGS/shared/swen/cactus-bin-v2.8.0/bin/vg", at 0x1272b8a, in vg::Packer::add(vg::Alignment const&, int, int, int)
#6 Object "/lustre/nobackup/TOPIGS/shared/swen/cactus-bin-v2.8.0/bin/vg", at 0x126a338, in vg::Packer::increment_coverage(unsigned long)
#5 Object "/lustre/nobackup/TOPIGS/shared/swen/cactus-bin-v2.8.0/bin/vg", at 0x1264a71, in sdsl::int_vector<(unsigned char)0>::operator[](unsigned long const&) [clone .isra.0]
#4 Object "/lustre/nobackup/TOPIGS/shared/swen/cactus-bin-v2.8.0/bin/vg", at 0x2088545, in __assert_fail
#3 Object "/lustre/nobackup/TOPIGS/shared/swen/cactus-bin-v2.8.0/bin/vg", at 0x5e6053, in __assert_fail_base.cold
#2 Object "/lustre/nobackup/TOPIGS/shared/swen/cactus-bin-v2.8.0/bin/vg", at 0x5e612b, in abort
#1 Object "/lustre/nobackup/TOPIGS/shared/swen/cactus-bin-v2.8.0/bin/vg", at 0x208eb55, in raise
#0 Object "/lustre/nobackup/TOPIGS/shared/swen/cactus-bin-v2.8.0/bin/vg", at 0x20bb56c, in __pthread_kill
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Please include this entire error log in your bug report!
��������������������

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants