Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Set ef_search per query #2176

Open
atroyn opened this issue May 10, 2024 · 3 comments
Open

[Feature Request]: Set ef_search per query #2176

atroyn opened this issue May 10, 2024 · 3 comments
Labels
enhancement New feature or request needs-cip

Comments

@atroyn
Copy link
Contributor

atroyn commented May 10, 2024

Describe the problem

Currently we only set the ef_search parameter when we init the index, usually on creation of a Collection. However, this parameter is actually per-query at the level of HNSW.

Generally we only need to set this parameter once since in principle there is a 'best' setting for the average case on any collection, but there are cases where we may want to let the user set it dynamically, for example to conduct a parameter sweep against a particular dataset.

Additionally, EF should be at least as large as the number of elements we want to return, which means it may have to be dynamic anyway.

Describe the proposed solution

Allow EF search to be set on a per-query basis.

Alternatives considered

No response

Importance

would make my life easier

Additional Information

This is somewhat of a pain in the ass. Besides piping in this parameter on query (which now creates the need for a query_parameters argument...) the way the HNSW implementation handles this is, also, bad, since it's set as a property of the index: https://github.com/chroma-core/hnswlib/blob/d52dd85106c5a1955a7dbd745a76ba2a8259451a/hnswlib/hnswalg.h#L33

even though it's only a property of each query.

To do this 'right' (for example, to handle concurrency correctly), it would take a refactor of our HNSW impl too.

@atroyn atroyn added enhancement New feature or request needs-cip labels May 10, 2024
@atroyn atroyn changed the title [Feature Request]: Set search_ef per query [Feature Request]: Set ef_search per query May 10, 2024
@atroyn
Copy link
Contributor Author

atroyn commented May 10, 2024

I took a look at how voyager does this and I think it's buggy, because they just call set_ef on the index. What happens if you have concurrent queries asking for different EF's into the same index?

@tazarov
Copy link
Contributor

tazarov commented May 11, 2024

Would it make sense to add this to the searchKnn:

https://github.com/chroma-core/hnswlib/blob/d52dd85106c5a1955a7dbd745a76ba2a8259451a/hnswlib/hnswalg.h#L1595-L1601

As far as I understand the code, the algo picks max(k,ef), so maybe we can trick it by passing per query ef that doesn't change the default behavior of the index if such is not provided (making it possible to parallelize queries).

@atroyn
Copy link
Contributor Author

atroyn commented May 11, 2024

No, we should instead change the interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs-cip
Projects
None yet
Development

No branches or pull requests

2 participants