Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 8-bit scalar quantization support for IVF index. #231

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bohanliu5
Copy link

It brings the following benefits on top of the existing ivfflat index:

  • Up to 2X index query time improvement.
  • ~25% faster index build time.
  • 4X savings on index storage.
  • Vectors with 8,000 dimensions can be supported with quantization.

following benefits on top of the existing `ivfflat` index:

* Up to 2X index query time improvement.
* ~25% faster index build time.
* 4X savings on index storage.
* Vectors with 8,000 dimensions can be supported with quantization.

L2 distance

```sql
CREATE INDEX ON items USING ivf (embedding vector_l2_ops) WITH (lists = 100, quantizer = 'SQ8');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this implemented as a new index access method instead of as an option on the existing ivfflat code? Having it as an existing option would make it simpler for users to manage their indexes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this with @ankane. Our understanding is that flat means no encoding - here are some examples from Milvus and Faiss and it would be good to be consistent.

Agree that we should make it simpler for users to use. We think ivf with an quantizer option would provide more flexibility - we could also support ivf WITH (quantizer = 'flat') if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OR as another examples

CREATE INDEX ON items USING ivf (lists=100); -- defaults to `quantizer=flat` or `quantizer=NULL` or however one wants to represent "flat"
CREATE INDEX ON items USING ivf (lists=100, quantizer='SQ8'); 

That said, given ivfflat index AM is out there, we do need to be careful about introducing new access methods. Effectively we need to treat ivfflat as if it's not going away. Maybe ivf becomes the preferred choice when creating an IVF index and the default is to leverage the ivfflat infrastructure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe ivf becomes the preferred choice when creating an IVF index and the default is to leverage the ivfflat infrastructure.

We discussed the same proposal with @ankane as well. I can update the PR to support quantizer='flat', aliasing to ivfflat. +1 on using ivf as the preferred choice.

@jkatz
Copy link
Contributor

jkatz commented Aug 17, 2023

Thanks for the proposal! I have a few general comments:

  1. Why does this proposal implement the quantizer as a separate index access method? It'd be more convenient if quantization were an option for the existing ivfflat index.
  2. Generally, I'm not in favor of making specific performance claims (e.g. 2x faster) in documentation, as this can vary based upon workload. With ANN searches, it's also important to discuss the performance / recall tradeoff.

Do you have any tests that you ran against a known data set (e.g. ANN Benchmarks) that show how this implementation compares against ivfflat or others, specifically in:

  1. Performance/recall
  2. Index build time
  3. Index size

@postsql
Copy link

postsql commented Aug 17, 2023

I have the same question as @jkatz - why a new index type, why not the "ops" part, so ... vector_l2_ops_uint8) ... which looks like the correct integration point to me

@postsql
Copy link

postsql commented Aug 17, 2023

Thanks for the proposal! I have a few general comments:

  1. Why does this proposal implement the quantizer as a separate index access method? It'd be more convenient if quantization were an option for the existing ivfflat index.
  2. Generally, I'm not in favor of making specific performance claims (e.g. 2x faster) in documentation, as this can vary based upon workload. With ANN searches, it's also important to discuss the performance / recall tradeoff.

Do you have any tests that you ran against a known data set (e.g. ANN Benchmarks) that show how this implementation compares against ivfflat or others, specifically in:

  1. Performance/recall
  2. Index build time
  3. Index size

As for recall - at least for some ANN Benchmarks datasets - there should not be any difference, as for example SIFT 128 dataset has only integer values in range 0 ... 218 in its vectors, though still stored as FP32

@bohanliu5
Copy link
Author

Thanks for your comments!
Re 1. Please see my comments above for.
Re 2., point taken. let me update the doc and be more specific about the performance/recall trade-off.

And yes, we ran the ANN benchmark with various datasets, an example gain from Deep1B on Intel(R) Xeon(R) CPU @ 2.00GHz:

  • Index build time (single threaded build prior to 0.5.0): 90s (ivfflat) down to 68s (with quantization).
  • Index size: 3903MB (ivfflat) down to 1115MB.
  • Recall loss varies with datasets but overall we saw 1-1.5% recall loss comparing to ivfflat for the same probe when vectors are not normalized (i.e., l2 distance) and smaller < 1% for normalized cases.

@ankane
Copy link
Member

ankane commented Aug 18, 2023

Hi @bohanliu5, thanks for the PR! I really appreciate all the work.

It looks like this introduces a lot of complexity to the code. I think there's some that can be removed (using the existing index type, no cross page storage at first), but I'm also concerned that there's enough that can't (which may not justify the benefits right now).

I'd like to see how the performance and complexity compare to product quantization, which I'm planning to focus on after HNSW, so I think it makes sense to wait on this.

@hlinnaka
Copy link
Contributor

hlinnaka commented Sep 8, 2023

Does scalar quantization only make sense with IVF index, or can it be used with HNSW too?

@jkatz
Copy link
Contributor

jkatz commented Sep 8, 2023

@hlinnaka SQ is a general technique to reduce the number of bytes required to store a vector, so the short answer is yes.

However, compared to product quantization (PQ), SQ can only reduce the storage so much, as you ultimately can only reduce so many bits before you lose too much information. I agree with Andrew's analysis above (in many ways), but in particular, around focus on PQ first, which should have a more dramatic effect on reducing memory consumption, though we'd have to test for the impact on recall.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants