IDF in BM25Okapi is not version from Atire #35

msalwen · 2023-10-30T20:26:41Z

In Trotman, Jia & Crane the idf measure is given as log(N/df_t) (see top of page 5), where N is the corpus size and df_t is the number of docs containing term t. This is always non-negative. In the implementation of BM25Okapi, you have used an earlier version of the idf computation (Robertson-Spark Jones, see eq (4) bottom of page 2 in Trotman, Puurula & Burgess), which can become negative and which you handle by setting negative values to eps.

The balance of the score calculation in BM25Okapi follows Atire exactly, and the implementations for BM25L and BM25+ align perfectly with the descriptions in Trotman, Puurula & Burgess. I wonder why the idf implementation for BM25Okapi deviates from the Atire specification.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IDF in BM25Okapi is not version from Atire #35

IDF in BM25Okapi is not version from Atire #35

msalwen commented Oct 30, 2023

IDF in BM25Okapi is not version from Atire #35

IDF in BM25Okapi is not version from Atire #35

Comments

msalwen commented Oct 30, 2023