-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Issues index dtype=i8 with Inner Product Metrics #405
Comments
Hi @keshusharmamrt! Can you please try the Cosine distance and share the dtype and shape of |
hi @ashvardanian , Thanks for prompt reply. I tried cosine distance too but it also seems to not work perfectly as I was getting keys via |
I think the issue might be in the distribution that that produces those embeddings. USearch has no learned quantization, all it does is linear scaling from [-1.0, 1.0] to integers in [-100, 100]. If you know that your values won't properly scale into that range, you may want to somehow normalize them or just directly convert to int8 and later pass to USearch. That should work 🤗 PS: I'd appreciate a PR for the main README, if you think it makes sense to list it there 🤗 |
Hi, Thanks for your suggestion but It doesn't seems to be working too. Sharing the Example Code snippet which demonstrates the issue(using random uniform embeddings to demonstrate this)
I get following observations by running above code:-
|
Is this expected when quantizing? If one is quantizing the embeddings, shouldn't one prefer the Hamming distance?:
Reference: Li, S., Guo, H., Tang, X., Tang, R., Hou, L., Li, R., & Zhang, R. (2024). Embedding Compression in Recommender Systems: A Survey. ACM Computing Surveys, 56(5), 1-21. Expanding on @keshusharmamrt example code snippet: from sentence_transformers.quantization import quantize_embeddings
embeddings_f32 = embeddings.astype(np.float32)
embeddings_i8 = quantize_embeddings(embeddings_f32, precision="binary")
searcher1 = Index(
ndim=embeddings_f32.shape[1],
metric="hamming",
dtype="i8",
connectivity=32,
expansion_add=128,
expansion_search=64,
)
print("dim: ", embeddings_f32.shape[1])
searcher1.add(range(len(embeddings_f32)), embeddings_f32)
searcher2 = Index(
ndim=embeddings_i8.shape[1],
metric="hamming",
dtype="i8",
connectivity=32,
expansion_add=128,
expansion_search=64,
)
print("dim: ", embeddings_i8.shape[1])
searcher2.add(range(len(embeddings_i8)), embeddings_i8)
query = embeddings_f32[:5]
query_i8 = quantize_embeddings(query, precision="binary")
matches1 = searcher1.search(query, 4)
print("int8 quantization:")
print("Keys = \n", matches1.keys)
print("Distances = \n", matches1.distances)
matches2 = searcher2.search(query_i8, 4)
print("binary quantization:")
print("Keys = \n", matches2.keys)
print("Distances = \n", matches2.distances) Results: dim: 1024
dim: 128
int8 quantization:
Keys =
[[ 0 16000 12400 400]
[ 1 11801 14801 4801]
[ 2 2402 7002 19002]
[ 3 1403 1003 8603]
[ 4 17604 6404 14004]]
Distances =
[[ 0.0000 309.0000 329.0000 343.0000]
[ 0.0000 346.0000 377.0000 382.0000]
[ 0.0000 328.0000 332.0000 341.0000]
[ 0.0000 362.0000 366.0000 371.0000]
[ 0.0000 334.0000 340.0000 354.0000]]
binary quantization:
Keys =
[[ 0 18000 1800 16600]
[ 1 17601 10001 18201]
[ 2 3002 8002 12402]
[ 3 7603 13003 16403]
[ 4 8804 12604 2604]]
Distances =
[[ 0.0000 9.0000 10.0000 12.0000]
[ 0.0000 12.0000 12.0000 12.0000]
[ 0.0000 13.0000 14.0000 14.0000]
[ 0.0000 5.0000 7.0000 8.0000]
[ 0.0000 7.0000 8.0000 8.0000]] Note that each query element is included in the corresponding matches and that |
@keshusharmamrt, your points are extremely close to each other to differentiate them with a simple scale-up. You'd need a custom quantization scheme. I'd recommend to stick to import numpy as np
from usearch.index import Index, MetricKind
np.set_printoptions(formatter={"float": "{: 0.4f}".format})
np.random.seed(42)
num_centroids = 200
num_examples = 100
num_embeddings = num_examples * num_centroids
dim = 1024
centroids = np.random.uniform(low=-1, high=1, size=(num_centroids, dim))
offsets = np.random.uniform(low=-0.05, high=0.05, size=(num_embeddings, dim))
centroids = np.tile(centroids, (num_examples, 1))
embeddings = centroids + offsets
embeddings = embeddings / np.linalg.norm(embeddings, axis=1)[:, None]
for dtype in ["f32", "f16", "i8"]:
print("Running test for dtype = ", dtype)
for metric in [MetricKind.IP, MetricKind.Cosine, MetricKind.L2sq]:
searcher = Index(
ndim=dim,
metric=metric,
dtype=dtype,
connectivity=32,
expansion_add=128,
expansion_search=64,
)
print("Running test ", dtype, metric, searcher.hardware_acceleration)
searcher.add(range(len(embeddings)), embeddings, log=True)
matches = searcher.search(embeddings[:5], 4)
print("Keys = \n", matches.keys)
print("Distances = \n", matches.distances)
exact_distances = 1 - (embeddings[:5] @ embeddings.T)
exact_keys = exact_distances.argsort(axis=1)
exact_distances.sort(axis=1)
if metric == MetricKind.L2sq:
exact_distances = 2 * exact_distances
print("Exact Keys = \n", exact_keys[:, :4])
print("Exact Distances = \n", exact_distances[:, :4]) |
@adolfogc, sadly the functionality of
I also believe, that you should may have wanted to use a different import numpy as np
from usearch.index import Index, MetricKind
np.set_printoptions(formatter={"float": "{: 0.4f}".format})
np.random.seed(42)
num_examples = 100
num_embeddings = num_examples * num_centroids
dim = 1024
embeddings = np.random.uniform(low=-1, high=1, size=(num_embeddings, dim))
from sentence_transformers.quantization import quantize_embeddings
embeddings_f32 = embeddings.astype(np.float32)
embeddings_b1 = quantize_embeddings(embeddings_f32, precision="binary").astype(np.uint8)
searcher_f32 = Index(
ndim=dim,
metric="cos",
dtype="f32",
)
print("dim: ", embeddings_f32.shape[1])
searcher_f32.add(range(len(embeddings_f32)), embeddings_f32)
searcher_b1 = Index(
ndim=dim,
metric="hamming",
dtype="b1",
)
print("dim: ", embeddings_b1.shape[1])
searcher_b1.add(range(len(embeddings_b1)), embeddings_b1)
queries_f32 = embeddings_f32[:5]
queries_b1 = embeddings_b1[:5]
matches_f32 = searcher_f32.search(queries_f32, 4)
print("Original:")
print("Keys = \n", matches_f32.keys)
print("Distances = \n", matches_f32.distances)
matches_b1 = searcher_b1.search(queries_b1, 4)
print("Binary:")
print("Keys = \n", matches_b1.keys)
print("Distances = \n", matches_b1.distances) I think I should update the main README snippet. Will mark you as a co-author 🤗 |
@ashvardanian thank you for pointing that out! In that case, I think the snippet could use the quantize_embeddings(embeddings_f32, precision="ubinary") Defined like so: I thought that since the Also, perhaps this (using What do you think? BTW, superb library, thank you for making it available! |
Hey @ashvardanian , @keshusharmamrt has prepared very simple example which shows that for the same data when using different metrics and int8 quantization we are getting unexpected results. It looks that there is a bug in the int8 quantization (maybe only for python client). You can see it that for
Our results for 1M of embeddings of size 1024
For faiss we use the index setup from usearch benchmark repository As you can see the results for usearch int8 are far from faiss and qdrant equivalents. |
As suggested by @adolfogc (in unum-cloud/usearch#405) the USearch implementation should use `b1` instead of `i8` for binary representations.
@adolfogc, sure, that makes a lot of sense! I've proposed the patch. |
@kmkolasinski and @keshusharmamrt thank you for benchmarks and comparisons, those are very helpful! Can I also ask you to report the |
On older versions I've made this mistake. // Unfortunately we can't use the `_mm512_dpbusd_epi32` intrinsics here either,
// as it's assymetric with respect to the sign of the input arguments:
// Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
// So we have to use the `_mm512_dpwssd_epi32` intrinsics instead, upcasting
// to 16-bit beforehand.
ab_i32s_vec = _mm512_dpwssd_epi32(ab_i32s_vec, a_vec, b_vec);
a2_i32s_vec = _mm512_dpwssd_epi32(a2_i32s_vec, a_vec, a_vec);
b2_i32s_vec = _mm512_dpwssd_epi32(b2_i32s_vec, b_vec, b_vec); |
As suggested by @adolfogc (in unum-cloud/usearch#405) the USearch implementation should use `b1` instead of `i8` for binary representations.
hi, @ashvardanian ,Thanks for your support. |
That means, this kernel is used. I've just tried implementing a few tests for it on random data, but can't yet see a noticeable difference in distance calculations between that and serial code. Any chance you have an example of two arrays for which you get a big error, calling: from simsimd import cos
cos(a, b) |
Hey, we will check it later. I don't know how it is later used in usearch, but we see the problem with |
On integer representations like |
hi, I also Tried to use
Clearly with |
Hi, @ashvardanian if int8 requires some special treatment from the user side, could you provide us example on how do it or maybe there are some docs for it ? |
using other words, I wonder what we can do on our side to get similar results as other frameworks for int8 ? because there are two options, either we are doing something wrong or there is indeed some problem with int8 quantization. |
hi @ashvardanian, do you have any update regarding this? |
A bit overwhelmed by other USearch-related projects, and haven’t had the chance to investigate deeper, @keshusharmamrt 😔
It’s true, most other libraries come with a lot of options for different quantization schemes. We don’t currently aim to enforce any of them or implement them in the C++ layer. But maybe we can add a couple of simple quantization schemes as utilities in the Python layer for now, Any chance you can contribute, @kmkolasinski? |
Hi, I can't promise it. The results are weird and at this moment it looks like there is some bug. |
We can check for bugs in the SimSIMD kernels, but they may already be sufficiently covered with tests. |
ok, last question: let's say I would like to use int8 quantization and dot product metric. Are there some requirements for vectors to make it work properly ? do you have some example code which we could use and benchmark on our data for such scenario ? |
Sure, the most obvious benchmark/example would be to generate random integers (instead of quantizing random floats), constructing the index, and measuring self-recall. Thats should work fine. |
Hi @ashvardanian @kimihailv we tried last thing. We took the benchmarking script from https://github.com/unum-cloud/usearch-benchmarks/tree/main and we tried to compare the metrics on a smaller dataset which is
groundtruth = load_matrix(groundtruth_path)[:, :1] later you use these recall_at_one = recall(nn_ids, groundtruth, ats=(1,)) but this should be replaced with recall_at_one = recall(nn_ids, groundtruth[:, 0], ats=(1,)) or recall function should be fixed, as it adds new axis to the tensor causing a weird behaviour. Small example which demonstrates the issue
from tqdm import tqdm
import numpy as np
from usearch.index import Index
from usearch.io import load_matrix
from utils.metrics import recall
train_embeddings = load_matrix("data/base.10M.fbin")[:500_000]
query_embeddings = load_matrix("data/query.10K.fbin")
def best_match(query):
return (train_embeddings @ query[:, None]).ravel().argmax()
groundtruths = []
for query in tqdm(query_embeddings, desc="Computing ground truths"):
best_index = best_match(query)
groundtruths.append(best_index)
groundtruths = np.array(groundtruths)
print("Benchmark f16 index")
i_half = Index(ndim=96, metric="cos", dtype="f16")
i_half.add(None, train_embeddings, log=True)
result = i_half.search(query_embeddings, 1, log=True)
recall_at_one = recall(result.keys, groundtruths, ats=(1,))
print(f" >> recall@1 = {recall_at_one[0]:.5f}")
print("Benchmark i8 index")
i_quarter = Index(ndim=96, metric="cos", dtype="i8")
i_quarter.add(None, train_embeddings, log=True)
result = i_quarter.search(query_embeddings, 1, log=True)
recall_at_one = recall(result.keys, groundtruths, ats=(1,))
print(f" >> recall@1 = {recall_at_one[0]:.5f}") After running this code you will see
I think that this is a clear evidence that there is a bug in the int8 index, and we cannot simply solve it by adding some customer wrapper for index class. Hope this helps somehow. |
@ashvardanian final remarks and conclusions after spending more time on this issue:
From this you can see that IP is actually computing usearch/include/usearch/index_plugins.hpp Line 977 in 5ea48c8
2.12.0 .
|
Add: Swift bindings & `int8` dot-products With these changes we can make USearch behavior more predictable for `int8` inputs. unum-cloud/usearch#405 unum-cloud/usearch#419
Describe the bug
Hi, I am using uSearch with Integer 8 index dtype.
I am having 928K embedding which I try to add to both i8 and float16 index with Inner Product Metrics and then search 10 starting embedding to get top 10 matches.
I see with float16 I get same vector getting recognised in top 10 but with I8 I didn't get them.
Also when I change metrics from MetricKind.Inner Product to MetricKind.L2sq those vectors starts coming but using this the distances are getting scaled differently(distances for IP are of order 0.61365813 and for l2sq 4517 maybe these are also because of i8) which I am not able to understand.
Also adding embedding to MetricKind.InnerProduct with I8 dtype is too slow.
Steps to reproduce
I use following code to create and populate index with I8 dtype:-
I get following results:
when I only change
metric=MetricKind.L2sq
I start getting the expected embeddings (same index as first key).Note:- With other dtype like f16,f32 MetricKind.InnerProduct works completely fine.
Expected behavior
Ideally I should get something like this with I8+ MetricKind.InnerProduct
The following is result I get with MetricKind.L2sq:
USearch version
2.11.7
Operating System
Ubuntu 22.04
Hardware architecture
x86
Which interface are you using?
Python bindings
Contact Details
[email protected]
Are you open to being tagged as a contributor?
.git
history as a contributorIs there an existing issue for this?
Code of Conduct
The text was updated successfully, but these errors were encountered: