Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between get_batch_scores and get_scores method #10

Open
soumya-ranjan-sahoo opened this issue Aug 24, 2020 · 1 comment
Open

Comments

@soumya-ranjan-sahoo
Copy link

soumya-ranjan-sahoo commented Aug 24, 2020

Hi Team,

I would need your help here!
To give you a brief overview, I have about 500k documents in my corpus and I have only a set of 7k queries-document pairs, and I want to calculate the BM25 scores for each of these individual pairs. To start with -

  1. I have indexed all the 500k documents
  2. I understand I can use get_scores method to get the bm25 scores for all the 500k documents, which is a 500k vector, and then I can index the vector for each of my query-document indexes, i. For example - For a given query with index i, the score for query-document pair with index i, will be bm25score[i].
    But this method takes ages to calculate the scores, and hence I was looking for a way around.
    Can the method get_batch_scores, be of any help here. My guess is it would only index the subset of the documents provided to the method and not all 500k documents.

My objective is to index 500k documents, and then given query-document pair, I have to calculate the bm25 scores.

Thanks in advance!

@soumya-ranjan-sahoo
Copy link
Author

Can someone kindly help me answer this? I want to know how get_batch_scores is different from get_scores?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant