Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base quality and consensus generation #58

Open
nriddiford opened this issue Apr 6, 2022 · 7 comments
Open

Base quality and consensus generation #58

nriddiford opened this issue Apr 6, 2022 · 7 comments

Comments

@nriddiford
Copy link

nriddiford commented Apr 6, 2022

First off - thanks for all the great work on tracy. It's quite amazing to me how few tools there are for performing trace file assembly - so thanks for filling this void with a very nice tool!

I have been using tracy quite a bit recently to assemble trace files, and perform variant calling relative to a reference sequence. Generally, this seems to work very well using tracy, but I have a question (related to a previous issue ) on the interplay between base-call confidence (on the chromatogram), and consensus formation.

I'm seeing incorrect consensus calls being made for a particular base where one of the trace files contains a low-confidence call and the other a high confidence call. From what I understand (based on your previous explanation) tracy does not use the base quality from the chromatogram, and I guess just choses on base over the other when there's a disagreement?

Here's what I'm seeing:

Screenshot 2022-04-06 at 16 19 32

This shows 2 trace files in Geneious. When I assemble these using tracy assemble --format fastq --inccons trace1.ab1 trace2.ab1 the resulting consensus contains insertions at both positions highlighted in red. This is strange to me - the base quality in trace 2 is clearly higher than in trace 1. Or is it the case that with insertions in one trace file, there is no base to compare to in the second trace file, so the insertion is included in the consensus, irrespective of quality?

Is this expected behaviour?

Thanks for any help!

@blex-max
Copy link

blex-max commented Apr 7, 2022

Just to add, I've also been wondering about this!

@blex-max
Copy link

@tobiasrausch forgive me for pinging you, but are you intending to respond to this?

@tobiasrausch
Copy link
Member

For tracy assemble it's a simple majority vote. If you have, for instance, 3 traces and 2 support a gap - and 1 a nucleotide then the gap - is chosen. Ties are arbitrarily broken and tracy assemble does not take into account qualities at the moment. For the pairwise case, tracy consensus does use the qualities but gaps don't have any to begin with. Therefore, for tracy consensus it depends on whether you use -i or not.

@nriddiford
Copy link
Author

@tobiasrausch Thanks for the clarification. How about in cases where you have 2 traces (like the image above). Is it just a 50:50 change to incorporate a low quality insertion?

@tobiasrausch
Copy link
Member

Indeed, it's a 50:50 chance in theory but in order to make the algorithm deterministic the code currently favours nucleotides over gaps.

@blex-max
Copy link

Is this likely to change? As @nriddiford says, it seems a shame to have a 50% chance to incorporate a low quality base when the information is available to make the better call.

@tobiasrausch
Copy link
Member

I think tracy consensus in tracy v0.7.5 now properly handles the low-quality vs. high-quality base problem but low-quality insertions vs. gaps is still something I need to work on. Do you have some example traces that you can share with me where you think the insertion is incorrect? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants