Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default normalization in distances is counterintuitive (or wrong) #670

Open
mcschmitz opened this issue Oct 16, 2023 · 1 comment
Open
Labels
bug Something isn't working documentation Improvements or additions to documentation
Milestone

Comments

@mcschmitz
Copy link

Not a real bug, and maybe it's just personal preference, but I feel like the normalization in several distances is counterintuitive.
For example, the documentation for CosineSimilarity, says

This class is equivalent to DotProductSimilarity(normalize_embeddings=True).

Which of course is correct, however, the default DotProductSimilarity itself normalizes the input vectors.

Also, the documentation for the LpDistance says

With default parameters, this is the Euclidean distance.

This is not true as the Euclidean distances performs on unnormalized vectors.

So maybe a bug after all(?)

@mcschmitz mcschmitz changed the title Normalization in distances is counterintuitive Default normalization in distances is counterintuitive (or wrong) Oct 16, 2023
@KevinMusgrave
Copy link
Owner

Which of course is correct, however, the default DotProductSimilarity itself normalizes the input vectors.

Hmm, yeah it would make more sense for DotProductSimilarity to not normalize the vectors. That might be a surprising change for anyone using it currently, so I will leave that for v3.0.

Also, the documentation for the LpDistance says...

Thanks for pointing that out, I will update the docs.

@KevinMusgrave KevinMusgrave added the documentation Improvements or additions to documentation label Oct 17, 2023
@KevinMusgrave KevinMusgrave added this to the v3.0 milestone Oct 17, 2023
@KevinMusgrave KevinMusgrave added the bug Something isn't working label Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants