-
Notifications
You must be signed in to change notification settings - Fork 438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are either int8 or fp8 vectors planned? #521
Comments
Some data:
|
@xfalcox Do you happen to have recall values? The sizes generally look good, and from eyeballing the data you showed the results look good, but it'd help to have some more numerical values 😄 |
While recall numbers are the best indeed, our feature is more "fuzzy" than that. Given a topic, I only need to find and show 5 topics who are related to the current one the user just finished reading. So while you do want to show very good matches, most of the times there are way more than 5 "good enough" fits, and even using the bit vectors, in most cases, the results were acceptable. And this is with "just" 1024 dimensions, I'd expect this to be even better with one of the slow 4096 dimensions models. I know I could generate a "correct" set using fp32 embeddings for each topic, but that would be a more for my academic curiosity than something that I'd use to justify a change in the product. Which brings me to my hopefulness for int8/fp8 being a good enough fit if even bit produces somewhat acceptable results. |
Hi @xfalcox, thanks for sharing! The data is really interesting. I've done some initial work for int8 vectors in the intvec branch, but want to focus on improving filtering before possibly adding more types. |
I don't know if you guys came across this blog post which is really interesting, but they also use int8 and bit quantization: |
@jpbalarini I have read that before. It'll be great to see the results with pgvector now that v0.7.0 supports bit (binary) quantization (as well as quantization to fp16)! |
Recently I've run a benchmark and recall test on the new halfvec and bit types, and they both yielded impressive results.
All my tests were run against public data on https://meta.discourse.org/, and testing which topics would be selected as the "Related Topics" we show at the end of a topic. Embeddings used
bge-large-en-v1.5
computed via huggingface/text-embeddings-inference.Our embeddings are all computed via bfloat16 already, so halfvec will reduce our storage costs by half while losing us nothing. It's literally free storage reductions.
On the bit front, a naive test simply replacing vectors for bits and using only bits with full scans without indexes, while had visible recall changes, the results were good enough that could be used as-is without most used batting an eye.
Since bit overperforms in our use case, that made me interested on checking how something in between would perform, namely either int8/fp8.
Either way, thank you a lot for the new types and continued work on pgvector.
The text was updated successfully, but these errors were encountered: