Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging multiple graph files #4289

Open
Lucio-Yang opened this issue May 7, 2024 · 4 comments
Open

Merging multiple graph files #4289

Lucio-Yang opened this issue May 7, 2024 · 4 comments

Comments

@Lucio-Yang
Copy link

Hi!

I have constructed the multiple graphs and i want to combine that to a single graph file. But I found that both vg combine and vg ids can combine multiple vg files, and the output size is different. Which one should I use and what is the difference ?

Thank you very much!

@glennhickey
Copy link
Contributor

Only vg combine can combine multiple graphs into a single graph file -- so use it.

@Lucio-Yang
Copy link
Author

Thanks!
I used vg combine to merge the vg files of multiple chromosomes, and then I wanted to get the corresponding vcf file, but the following error occurred. Why does the path in the merged file disappear?

Error [vg deconstruct]: No specified reference path or prefix found in graph

My code:
vg combine chr1.vg chr2.vg chr3.vg chr4.vg chr5.vg chr6.vg chr7.vg chr8.vg chr9.vg chr10.vg chr11.vg chr12.vg chr13.vg chr14.vg > merged.vg
vg view --threads 128 merged.vg > merged.gfa
vg deconstruct -P TW_t2 -H "#" -e -a -t 128 merged.gfa > merged.vcf

vg version v1.40.0 "Suardi"
Compiled with g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 on Linux
Linked against libstd++ 20210601
Built by stephen@lubuntu

The format of chromosome name is TW_t2#1#chr1, TW_t2#1#chr2 ... TW_t2#1#chr14

@ashleethomson
Copy link

I am also trying to combine graphs, but I have full graphs (containing all chromosomes) that I have augmented to contain variants specific to different individuals. Can vg combine be used to merge these graphs to make a single graph that is relative to the reference path that was used?

@adamnovak
Copy link
Member

@Lucio-Yang You can try vg paths --list -x merged.vg and vg paths --list -x merged.gfa to see what paths are in the graphs. Sometimes converting paths to/from GFA can hit bugs in how we represent path names, especially on such an old build of vg.

I would recommend upgrading to a more recent release of vg, and also maybe adding an RS tag to your GFA to indicate which sample is the reference you want to use.

I know @glennhickey is revising deconstruct; I'm not sure whether it will help with your particular problem.

@ashleethomson unfortunately vg combine can't weld multiple graphs together along a shared set of linear reference paths. I don't believe we have a tool in vg that can do that, but that sort of graph welding might be exposed in https://github.com/ComparativeGenomicsToolkit/cactus ? Especially if you take all your graphs back to MAF or PAF? It's definitely possible using the https://github.com/ComparativeGenomicsToolkit/pinchesAndCacti library and some walking of paths, but I don't know if there's a tool that can do it yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants