-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Search] Query the text size #2934
Labels
feature/proposal
a new feature or an idea of
Comments
Some ideas, without looking much at the code:
Question: Do we also need the sizes of individual texts in the collection?
(cc @SGSSGene) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
For calculating e-values or other purposes it is often necessary to query the text size of a (bi) FM index.
I have experimented with the
size()
function ofseqan3::bi_fm_index<dna4, text_layout::collection>
in order to calculate it myself. According to the documentation, the value ofsize()
includes sentinels. Assume that I have stored somewhere a list of sequence names, so I know the value nseq = number of indexed sequences. Then for nseq > 1, I can compute the text size nchar =index.size()
- nseq. For nseq == 1 we have a special case with nchar =index.size()
- 2 (because a single sequence has 2 sentinels).I suggest to provide a function
get_text_size()
for the index that performs these calculations. An issue is that we have to keep track of the number of sequences stored in the index (which I could solve with the length of the names list).The text was updated successfully, but these errors were encountered: