-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Setting search defaults as an index option #235
Comments
support select * from emb_table where xx with (ivfflat.probes=10) |
@GPF199541 I'm not following, today you can do: SET ivfflat.probes TO 10;
SELECT * FROM emb_table WHERE ... What this proposes is setting a default for using the index, so if you plan to use SELECT * FROM emb_table ... ORDER BY ... and pgvector would use 10 probes for the search if you created the index with |
I believe something like But having a default value for each index as @jkatz proposed would be the most convenient, as otherwise different values of ivfflat.probes for each separate index will need to be stored somewhere else (probably, in a separate table or some memory cache). I also want to emphasize that having connection-specific
can become
|
Can the syntax like As for probes/ef_search as an index option I think it's a good idea. |
@jkatz is this something you still want to pursue? This would be great to simplify my use case at @discourse where we automated the index creation, but setting this parameter is still brittle. |
@xfalcox Yes, I'd be interested in pursuing it -- the RFC is out there to collect use cases before adding to the code 😄 I don't think a PR would be too difficult. |
Fixes a regression from 140359c which caused we to set this globally based on post count, rendering the cost of an index scan on the topics table too high and making the planner, correctly, not use the index anymore. Hopefully pgvector/pgvector#235 lands soon.
Fixes a regression from 140359c which caused we to set this globally based on post count, rendering the cost of an index scan on the topics table too high and making the planner, correctly, not use the index anymore. Hopefully pgvector/pgvector#235 lands soon.
Just to upvote on the use case n°1 mentioned (#235 (comment)) where multiple tables are there in the same database that uses different default probes parameters instead of specifiying the probes in query time to avoid the associated overhead
|
We keep getting reports of problems caused by the lack of this feature. On @discourse we automatically create vectors for new topics and posts, and since this is based on dynamic user generated content, we have a background task that updates the index and the For the parameter, we set it following the instructions from @ankane here, however we received bug reports from people running with a user without We also have multiple vector tables, and each one needs a different parameter so, once again, I'd love to see this parameter default feature become a reality. |
Use cases
There are cases where a user wants to set a default search value (e.g.
ivfflat.probes
,hnsw.ef_search
) value on a per-index basis, for example:Proposal
The proposal is to add new index options on the index AMs:
ivfflat
:default_probes
(default:NULL
)hnsw
:default_ef_search
(default:NULL
)PostgreSQL tracks when a GUC is changed. For this case, we can use the
PGC_S_SESSION
indicator to determine that GUC is changed in the database sessions and therefore we do not need to consult the index option.The proposal would then use this heuristic:
Alternatives
With additional work, we could set "smarter" automatic defaults for
ivfflat.probes
/hnsw.ef_search
, but folks may still want to set their own index-specific defaults based on the above use cases.Seeking feedback on this proposal.
The text was updated successfully, but these errors were encountered: