-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 8-bit scalar quantization support for IVF
index.
#231
base: master
Are you sure you want to change the base?
Conversation
following benefits on top of the existing `ivfflat` index: * Up to 2X index query time improvement. * ~25% faster index build time. * 4X savings on index storage. * Vectors with 8,000 dimensions can be supported with quantization.
|
||
L2 distance | ||
|
||
```sql | ||
CREATE INDEX ON items USING ivf (embedding vector_l2_ops) WITH (lists = 100, quantizer = 'SQ8'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this implemented as a new index access method instead of as an option on the existing ivfflat code? Having it as an existing option would make it simpler for users to manage their indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this with @ankane. Our understanding is that flat
means no encoding - here are some examples from Milvus and Faiss and it would be good to be consistent.
Agree that we should make it simpler for users to use. We think ivf
with an quantizer
option would provide more flexibility - we could also support ivf WITH (quantizer = 'flat')
if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OR as another examples
CREATE INDEX ON items USING ivf (lists=100); -- defaults to `quantizer=flat` or `quantizer=NULL` or however one wants to represent "flat"
CREATE INDEX ON items USING ivf (lists=100, quantizer='SQ8');
That said, given ivfflat
index AM is out there, we do need to be careful about introducing new access methods. Effectively we need to treat ivfflat
as if it's not going away. Maybe ivf
becomes the preferred choice when creating an IVF index and the default is to leverage the ivfflat
infrastructure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe ivf becomes the preferred choice when creating an IVF index and the default is to leverage the ivfflat infrastructure.
We discussed the same proposal with @ankane as well. I can update the PR to support quantizer='flat'
, aliasing to ivfflat
. +1 on using ivf
as the preferred choice.
Thanks for the proposal! I have a few general comments:
Do you have any tests that you ran against a known data set (e.g. ANN Benchmarks) that show how this implementation compares against
|
I have the same question as @jkatz - why a new index type, why not the "ops" part, so |
As for recall - at least for some ANN Benchmarks datasets - there should not be any difference, as for example SIFT 128 dataset has only integer values in range 0 ... 218 in its vectors, though still stored as FP32 |
Thanks for your comments! And yes, we ran the ANN benchmark with various datasets, an example gain from Deep1B on
|
Hi @bohanliu5, thanks for the PR! I really appreciate all the work. It looks like this introduces a lot of complexity to the code. I think there's some that can be removed (using the existing index type, no cross page storage at first), but I'm also concerned that there's enough that can't (which may not justify the benefits right now). I'd like to see how the performance and complexity compare to product quantization, which I'm planning to focus on after HNSW, so I think it makes sense to wait on this. |
Does scalar quantization only make sense with IVF index, or can it be used with HNSW too? |
@hlinnaka SQ is a general technique to reduce the number of bytes required to store a vector, so the short answer is yes. However, compared to product quantization (PQ), SQ can only reduce the storage so much, as you ultimately can only reduce so many bits before you lose too much information. I agree with Andrew's analysis above (in many ways), but in particular, around focus on PQ first, which should have a more dramatic effect on reducing memory consumption, though we'd have to test for the impact on recall. |
It brings the following benefits on top of the existing
ivfflat
index: