Add 8-bit scalar quantization support for `IVF` index. #231

bohanliu5 · 2023-08-17T03:25:16Z

It brings the following benefits on top of the existing ivfflat index:

Up to 2X index query time improvement.
~25% faster index build time.
4X savings on index storage.
Vectors with 8,000 dimensions can be supported with quantization.

following benefits on top of the existing `ivfflat` index: * Up to 2X index query time improvement. * ~25% faster index build time. * 4X savings on index storage. * Vectors with 8,000 dimensions can be supported with quantization.

jkatz · 2023-08-17T13:51:19Z

README.md


 L2 distance

 ```sql
+CREATE INDEX ON items USING ivf (embedding vector_l2_ops) WITH (lists = 100, quantizer = 'SQ8');


Why is this implemented as a new index access method instead of as an option on the existing ivfflat code? Having it as an existing option would make it simpler for users to manage their indexes.

We discussed this with @ankane. Our understanding is that flat means no encoding - here are some examples from Milvus and Faiss and it would be good to be consistent.

Agree that we should make it simpler for users to use. We think ivf with an quantizer option would provide more flexibility - we could also support ivf WITH (quantizer = 'flat') if needed.

OR as another examples

CREATE INDEX ON items USING ivf (lists=100); -- defaults to `quantizer=flat` or `quantizer=NULL` or however one wants to represent "flat" CREATE INDEX ON items USING ivf (lists=100, quantizer='SQ8');

That said, given ivfflat index AM is out there, we do need to be careful about introducing new access methods. Effectively we need to treat ivfflat as if it's not going away. Maybe ivf becomes the preferred choice when creating an IVF index and the default is to leverage the ivfflat infrastructure.

Maybe ivf becomes the preferred choice when creating an IVF index and the default is to leverage the ivfflat infrastructure.

We discussed the same proposal with @ankane as well. I can update the PR to support quantizer='flat', aliasing to ivfflat. +1 on using ivf as the preferred choice.

jkatz · 2023-08-17T13:55:30Z

Thanks for the proposal! I have a few general comments:

Why does this proposal implement the quantizer as a separate index access method? It'd be more convenient if quantization were an option for the existing ivfflat index.
Generally, I'm not in favor of making specific performance claims (e.g. 2x faster) in documentation, as this can vary based upon workload. With ANN searches, it's also important to discuss the performance / recall tradeoff.

Do you have any tests that you ran against a known data set (e.g. ANN Benchmarks) that show how this implementation compares against ivfflat or others, specifically in:

Performance/recall
Index build time
Index size

postsql · 2023-08-17T15:30:11Z

I have the same question as @jkatz - why a new index type, why not the "ops" part, so ... vector_l2_ops_uint8) ... which looks like the correct integration point to me

postsql · 2023-08-17T15:38:32Z

Thanks for the proposal! I have a few general comments:

Why does this proposal implement the quantizer as a separate index access method? It'd be more convenient if quantization were an option for the existing ivfflat index.

Generally, I'm not in favor of making specific performance claims (e.g. 2x faster) in documentation, as this can vary based upon workload. With ANN searches, it's also important to discuss the performance / recall tradeoff.

Do you have any tests that you ran against a known data set (e.g. ANN Benchmarks) that show how this implementation compares against ivfflat or others, specifically in:

Performance/recall

Index build time

Index size

As for recall - at least for some ANN Benchmarks datasets - there should not be any difference, as for example SIFT 128 dataset has only integer values in range 0 ... 218 in its vectors, though still stored as FP32

bohanliu5 · 2023-08-17T17:37:24Z

Thanks for your comments!
Re 1. Please see my comments above for.
Re 2., point taken. let me update the doc and be more specific about the performance/recall trade-off.

And yes, we ran the ANN benchmark with various datasets, an example gain from Deep1B on Intel(R) Xeon(R) CPU @ 2.00GHz:

Index build time (single threaded build prior to 0.5.0): 90s (ivfflat) down to 68s (with quantization).
Index size: 3903MB (ivfflat) down to 1115MB.
Recall loss varies with datasets but overall we saw 1-1.5% recall loss comparing to ivfflat for the same probe when vectors are not normalized (i.e., l2 distance) and smaller < 1% for normalized cases.

ankane · 2023-08-18T00:30:12Z

Hi @bohanliu5, thanks for the PR! I really appreciate all the work.

It looks like this introduces a lot of complexity to the code. I think there's some that can be removed (using the existing index type, no cross page storage at first), but I'm also concerned that there's enough that can't (which may not justify the benefits right now).

I'd like to see how the performance and complexity compare to product quantization, which I'm planning to focus on after HNSW, so I think it makes sense to wait on this.

hlinnaka · 2023-09-08T08:30:45Z

Does scalar quantization only make sense with IVF index, or can it be used with HNSW too?

jkatz · 2023-09-08T19:24:32Z

@hlinnaka SQ is a general technique to reduce the number of bytes required to store a vector, so the short answer is yes.

However, compared to product quantization (PQ), SQ can only reduce the storage so much, as you ultimately can only reduce so many bits before you lose too much information. I agree with Andrew's analysis above (in many ways), but in particular, around focus on PQ first, which should have a more dramatic effect on reducing memory consumption, though we'd have to test for the impact on recall.

jkatz reviewed Aug 17, 2023

View reviewed changes

lrotim mentioned this pull request Nov 14, 2023

Support for lower precision floating point vectors #356

Closed

AcKing-Sam mentioned this pull request Feb 29, 2024

Contribution Ideas #359

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 8-bit scalar quantization support for `IVF` index. #231

Add 8-bit scalar quantization support for `IVF` index. #231

bohanliu5 commented Aug 17, 2023

jkatz Aug 17, 2023

bohanliu5 Aug 17, 2023

jkatz Aug 17, 2023

bohanliu5 Aug 17, 2023

jkatz commented Aug 17, 2023

postsql commented Aug 17, 2023 •

edited

postsql commented Aug 17, 2023

bohanliu5 commented Aug 17, 2023

ankane commented Aug 18, 2023

hlinnaka commented Sep 8, 2023

jkatz commented Sep 8, 2023

Add 8-bit scalar quantization support for IVF index. #231

Are you sure you want to change the base?

Add 8-bit scalar quantization support for IVF index. #231

Conversation

bohanliu5 commented Aug 17, 2023

jkatz Aug 17, 2023

Choose a reason for hiding this comment

bohanliu5 Aug 17, 2023

Choose a reason for hiding this comment

jkatz Aug 17, 2023

Choose a reason for hiding this comment

bohanliu5 Aug 17, 2023

Choose a reason for hiding this comment

jkatz commented Aug 17, 2023

postsql commented Aug 17, 2023 • edited

postsql commented Aug 17, 2023

bohanliu5 commented Aug 17, 2023

ankane commented Aug 18, 2023

hlinnaka commented Sep 8, 2023

jkatz commented Sep 8, 2023

Add 8-bit scalar quantization support for `IVF` index. #231

Add 8-bit scalar quantization support for `IVF` index. #231

postsql commented Aug 17, 2023 •

edited