Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDF in BM25Okapi is not version from Atire #35

Open
msalwen opened this issue Oct 30, 2023 · 0 comments
Open

IDF in BM25Okapi is not version from Atire #35

msalwen opened this issue Oct 30, 2023 · 0 comments

Comments

@msalwen
Copy link

msalwen commented Oct 30, 2023

In Trotman, Jia & Crane the idf measure is given as log(N/df_t) (see top of page 5), where N is the corpus size and df_t is the number of docs containing term t. This is always non-negative. In the implementation of BM25Okapi, you have used an earlier version of the idf computation (Robertson-Spark Jones, see eq (4) bottom of page 2 in Trotman, Puurula & Burgess), which can become negative and which you handle by setting negative values to eps.

The balance of the score calculation in BM25Okapi follows Atire exactly, and the implementations for BM25L and BM25+ align perfectly with the descriptions in Trotman, Puurula & Burgess. I wonder why the idf implementation for BM25Okapi deviates from the Atire specification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant