-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Incorrect cosine similarity returned in the search results #33072
Comments
/assign @liliu-z |
don't quite understand it. |
I think searching an exact vector for himself should return 0 as distance, but it returns -1 if cosine. |
exact vector should be 1. 0 means not related at all |
it's opposite direction, then the distance is -1. |
Yes, that's exactly the problem. The values I get back from the search query are incorrect. If you look closer to the example I provided above, I've added three vectors to the database (
Those are incorrect values, neither for cosine similarity nor cosine distance. From the behaviour I'm seeing, those values are representing cosine similarity multiplied by -1. But... why? Here is a snippet of code using scipy & scikit-learn computing those metrics on the same vectors: >>> from scipy.spatial import distance
>>> distance.cosine([-1, -1, -1, -1], [1, 1, 1, 1])
2.0
>>> distance.cosine([-1, -1, -1, -1], [1, 1, -1, -1])
1.0
>>> distance.cosine([-1, -1, -1, -1], [-1, -1, -1, -1])
0.0
>>> from sklearn.metrics.pairwise import cosine_similarity
>>> cosine_similarity([[-1, -1, -1, -1]], [[1, 1, 1, 1]])
array([[-1.]])
>>> cosine_similarity([[-1, -1, -1, -1]], [[1, 1, -1, -1]])
array([[0.]])
>>> cosine_similarity([[-1, -1, -1, -1]], [[-1, -1, -1, -1]])
array([[1.]]) |
/assign @liliu-z |
Tried in Knowhere & Milvus side, cannot reproduce this. Will try milvus-lite |
What versions of Milvus packages did you use for your reproduction? I'll try to bump my environment to the latest version and see if it's still there. |
I've just reinstalled my environment from scratch with However, I've just upgraded both of these packages to the latest available versions: $ pip list | grep milvus
milvus-lite 2.4.4
pymilvus 2.4.3 And it looks that it has fixed it! 🙌 [
[
{
"id": 449917763324477442,
"distance": 1.0,
"entity": {
"embedding": [
-1.0,
-1.0,
-1.0,
-1.0
]
}
},
{
"id": 449917763324477441,
"distance": 0.0,
"entity": {
"embedding": [
1.0,
1.0,
-1.0,
-1.0
]
}
},
{
"id": 449917763324477440,
"distance": -1.0,
"entity": {
"embedding": [
1.0,
1.0,
1.0,
1.0
]
}
}
]
] Has anything changed there? Or is it just the combination of packages I've used before? |
This bug get fixed from milvus-lite 2.4.4 |
That's great! Thank you for your help! 🙌 |
Is there an existing issue for this?
Environment
Current Behavior
It looks that the
distance
value returned from the search on a collection of vectors with a cosine metric similarity is wrong. It's the opposite of what it should be, currently giving us:-1
for proportional vectors,0
for orthogonal vectors,1
for opposite vectors.Expected Behavior
I would expect it to return the value of the cosine similarity, according to the definition, which is:
1
for proportional vectors,0
for orthogonal vectors,-1
for opposite vectors.Steps To Reproduce
Here is a full reproduction script:
My environment:
Milvus Log
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: