New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong facet counts and returned hits when using hybrid search #4494
Comments
Hello @bb 👋 The behavior is expected here. As all your documents have an embedding, they are all a candidate for the query. Another way of seeing semantic search is that it is sorting the documents that have an embedding according to their similarity with the embedding of the query. We have plans to add a score threshold such that documents under the threshold don't appear in the results or facet counts, would that help your use case?
Yes, selecting a facet adds a filter, which removes documents that don't match the filter from the results.
Now I'm not sure I understand the behavior here. Could you indicate the search request that resulted in the facet behavior being the same as with When doing a hybrid search, we are also considering the candidates as the union of the candidates of the semantic search and the candidates of the keyword search. Which should result in all the documents matching the filter, or all the documents of the index when there is no filter. That said, in the hybrid search, we don't run the semantic search if we get a sufficient number of results from the keyword search that are relevant enough (ranking score is high enough). With a low semantic ratio, this can easily happen, so that might have been the case with your query, explaining the shift in behavior you noticed. I hope that clarifies the behavior you've been seeing! |
Hello, I released a prototype to add the Unfortunately, its effect on the facet counts is not as good as planned. See the linked thread for details. We're looking for feedback on this prototype and, in particular, whether it serves the issue here. |
Describe the bug
I tried vector search for the first time, using huggingface, BAAI/bge-base-en-v1.5) with v1.7.0 (from official Docker image).
When doing a hybrid query with any semantic ratio greater
0
, e.g.0.1
, I receive all facets with their total absolute counts and also all documents instead of a filtered list as long as I don't filter.When also filtering, the facets seem to work (but I'm not 100% sure yet) but results returned still not what I expected.
To Reproduce
Test Data
I have a data set with 8217 documents.
Facet counts without filtering nor search (excerpt)
Czechia 33
Denmark 15
Estonia 18
Finland 47
France 488
Georgia 1
Germany 592
Greece 202
Hungary 340
Iceland 4
Ireland 24
searching for
paris
, no filteringsemanticRatio=0
959 total results
Facet counts, without filtering (excerpt)
Czechia 2
Denmark 3
Estonia 3
Finland 4
France 153
Germany 21
Greece 20
Hungary 15
Iceland 1
Ireland 1
semanticRatio>0, e.g. 0.1, 0.2, 0.5, 0.9 etc. (same results as if nothing is searched/filtered)
8217 total results
Facet counts, without filtering (excerpt)
Czechia 33
Denmark 15
Estonia 18
Finland 47
France 488
Georgia 1
Germany 592
Greece 202
Hungary 340
Iceland 4
Ireland 24
Now when I start selecting a facet, I actually see a smaller number of result hits -- this number is the same as the facet value clicked.
E.g. when clicking (filtering) "Germany 592", I get 592 results. All results are from Germany which is good. Overall, this is better, but not the expected behavior.
Interestingly, now the facet counts are changed:
Czechia 2
Denmark 3
Estonia 3
Finland 4
France 153
Germany 21
Greece 20
Hungary 15
Iceland 1
Ireland 1
This is basically the same as with semanticRatio=0.
It's definitely not what I expected but it's better than no filtering of facet values.
It's inconsistent to have Germany 21 here but Germany 592 in the list. Those should be same.
Not sure if it's relevant here, this is using multisearch endpoint.
Expected behavior
With hybrid search, facets should work basically the same as without hybrid search. Of course I expect a few more or a few less results depending on the query and a bit differing facet counts, but not all results / total facet counts.
Meilisearch version:
[e.g. v1.7.0]
Additional context
Additional information that may be relevant to the issue.
Docker for Desktop, macOS
The text was updated successfully, but these errors were encountered: