Support for vectorized/batch inference? #18

Smu-Tan · 2022-03-01T20:53:11Z

Hi, Im just wondering is there any method that can speed up the retrieval process? for example, vectorized or batch inference? (it means do the retrieval for a batch/a list of query at the same time).

Since Im trying to use the bm25 to retrieve the top n docs for large data(retrieve over 10k query from 50k docs), and if I do this by calling bm25.get_top_n() in a for loop, the inference time will be unacceptable long.

dorianbrown · 2022-03-02T09:43:55Z

Have you checked out the get_batch_scores method yet? It sounds like this might be what you're looking for.

Smu-Tan · 2022-03-02T12:00:36Z

Have you checked out the get_batch_scores method yet? It sounds like this might be what you're looking for.

I think get_batch_scores is to compute the bm25 scores between one query and a subset of the corpus? what I need is to compute the bm25 scores between a list of queries and the corpus. And because the query list is very huge(10k queries), then computing them is very slow.

puzzlecollector · 2022-07-04T02:05:18Z

Is this problem resolved? I am having the same sort of issue. I have 50k queries and it takes a long time (for me 150k seconds approx or almost 42 hrs) to compute.

wise-east · 2022-08-25T23:39:58Z

@Smu-Tan @puzzlecollector were you able to find an alternative to this implementation to speed up the process?

Smu-Tan · 2022-09-02T11:38:01Z

@Smu-Tan @puzzlecollector were you able to find an alternative to this implementation to speed up the process?

checkout Pyserini.

AmenRa · 2022-11-17T17:05:13Z

Hi @Smu-Tan, @puzzlecollector, and @wise-east,

I have just released a new Python-based search engine called retriv.
It only takes ~40ms to query 8M documents on my machine, and it can perform multiple searches in parallel.
If you try it, please, let me know if it works for your use case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for vectorized/batch inference? #18

Support for vectorized/batch inference? #18

Smu-Tan commented Mar 1, 2022

dorianbrown commented Mar 2, 2022

Smu-Tan commented Mar 2, 2022

puzzlecollector commented Jul 4, 2022

wise-east commented Aug 25, 2022

Smu-Tan commented Sep 2, 2022

AmenRa commented Nov 17, 2022

Support for vectorized/batch inference? #18

Support for vectorized/batch inference? #18

Comments

Smu-Tan commented Mar 1, 2022

dorianbrown commented Mar 2, 2022

Smu-Tan commented Mar 2, 2022

puzzlecollector commented Jul 4, 2022

wise-east commented Aug 25, 2022

Smu-Tan commented Sep 2, 2022

AmenRa commented Nov 17, 2022