Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative end of path positions in vg giraffe mapping #4249

Open
iamdeaardappel opened this issue Mar 18, 2024 · 1 comment
Open

Negative end of path positions in vg giraffe mapping #4249

iamdeaardappel opened this issue Mar 18, 2024 · 1 comment

Comments

@iamdeaardappel
Copy link

1. What were you trying to do?

I am working on a GAF parser, and currently trying to calculate alignment lengths. Alignment block length (column 11) typically mentions 151, even when the actual alignment is shorter. To obtain the length, I subtract the end position on the path (GFA column 9) from the start position (column 8), which gives me the correct length in 99.9% of the mappings. However, in some cases, the end position is lower than the start position. Sometimes, the end position is negative. I have two examples where I suspect the end position is incorrect.

2. What did you want to happen?

I expected end positions to be higher, at least not be negative.

3. What actually happened?

Negative end position example.
Alignment starts at 0 and ends at -273. I can’t seem to find any information about the end being negative, also the path is only 79 bases long.

ST-E00288:114:H3NJ5ALXX:3:1102:13250:34817:1	151	0	151	+	>49055347>49055348>49055350>49055351>49055353>49055354>49055356>49055357>49055359>49055360>49055361>49055364>49055365>49055366>49055380>49055381>49055382>49055384>49055385>49055387>49055388>49055390>49055391>49055393>49055395>49055396>49055398>49055400>49055401>49056185>49056186>49056196>49056197>49056198>49056201>49056202>49056205>49056206>49056208>49056209>49056211	79	0	-273	79	151	60	AS:i:84	bq:Z:AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJFJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ	cs:Z::79+ATTTGCATTGCGAGATTTATTAGTAAGTGAAATTTTCAATAGAATTTTTCTTATATTTTAGGCTCAATTCTG	dv:f:0.4768	fn:Z:ST-E00288:114:H3NJ5ALXX:3:1102:13250:34817:2	pd:b:1

Example where coordinates don't seem to match.
Unsure if this second example is a bug, but I’m unable to explain to obtained coordinates.
Alignment starts at 1274 and ends 955., I thought maybe the read is mapped in reverse. But looking at the cigar, there is no large insertion that explains why the alignment length is 319 bases. A length close to 151 is to be expected.

ST-E00288:114:H3NJ5ALXX:3:1101:27011:5774:1     151     0       151     +       >105708692      1414    1274    955     139     151     60      AS:i:140        bq:Z:AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ<FJ<JJJF<AAF--A7<--7F<<------7 cs:Z::128*TC:11+CCCCCCGCCGC     dv:f:0.0795     fn:Z:ST-E00288:114:H3NJ5ALXX:3:1101:27011:5774:2        pd:b:1

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

5. What data and command can the vg dev team use to make the problem happen?

source venv-cactus-v2.6.8/bin/activate
cactus-pangenome ./js genomes.txt --outDir potato-pg --outName potato-pg --reference DM8 --vcf --giraffe --gfa --gbz --maxCores 48
./vg giraffe --named-coordinates -o gaf -t 12 -Z cactus/potato-pg/potato-pg.d2.gbz -m cactus/potato-pg/potato-pg.d2.min -d cactus/potato-pg/potato-pg.d2.dist -f XXX_L003_R1_001.fastq.gz  -f XXX_L003_R2_001.fastq.gz > run.gaf

6. What does running vg version say?

vg version v1.51.0 "Quellenhof"
Compiled with g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 on Linux
Linked against libstd++ 20230528
Built by [email protected]

Thanks for reading!

@ASLeonard
Copy link

I saw some similar issues that were fixed in #3996, but came across further errors in #4008 which remain unresolved. In general the --named-coordinates flag seems to be troublesome, but it is definitely nicer to use for a lot of applications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants