Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choosing longest or shortest alignment #213

Open
brendanf opened this issue Apr 5, 2023 · 1 comment
Open

Choosing longest or shortest alignment #213

brendanf opened this issue Apr 5, 2023 · 1 comment

Comments

@brendanf
Copy link

brendanf commented Apr 5, 2023

I am testing edlib for the purpose of calculating pairwise identites between sequences, using the BLAST/USEARCH definition of sequence identity; i.e. $\text{id} = \frac{\text{edit distance}}{\text{alignment length}}$. One difficulty I am finding is that there can be multiple alignments which have the same edit distance, but different lengths, even in the fully global case, because it may be possible to trade, e.g., 2 mismatches for 1 insertion and 1 deletion. This means that the pairwise identity is not uniquely defined. Is there any way for edlib to give me a maximally long and/or maximally short CIGAR, or even just calculate what the maximum and minimum lengths are?

If this is possible but would require some coding, I'd be willing to give it a shot if you give me some pointers.

@Martinsos
Copy link
Owner

@brendanf I am not 100% sure as it has been some time since I have implemented the core part of Edlib, but off the bat I would guess that is probably not easy to do. Edlib is fast because it is optimized to use edit distance as scoring mechanism, and adding any additional criteria, like for example to try and find minimal size of cigar, will almost certainly be hard to fit into that.
One thing you can do is tell it to return multiple possible positions at which the alignment ends -> is that helpful? You can't make sure those are the min and max though if I am correct.

I can't give much better answer right now without spending significant amount of time on figure this all out. If you are intersted, there is a link to an article for Edlib that I wrote, that explains inner workings in detail, so that could shine some light on your question also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants