Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Auto-set distance metric based on embedding function #2128

Open
atroyn opened this issue May 3, 2024 · 2 comments
Open
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@atroyn
Copy link
Contributor

atroyn commented May 3, 2024

Describe the problem

It's up to the user to select the distance metric they use with their collection. However, embedding models are trained with a pre-determined similarity metric. By knowing the embedding function, we should automatically know the required distance metric. Chroma defaults to the L2 metric which isn't right for many embedding models.

Describe the proposed solution

When the user sets an embedding function on a collection, we should have a property on that EF which the collection readts to automatically select the right distance metric if available, unless it's overriden by collection params.

This is slightly hairy when loading a collection. We should error if you try to load with the wrong distance metric.

Alternatives considered

Leaving it as-is.

Importance

would make my life easier

Additional Information

No response

@atroyn atroyn added enhancement New feature or request good first issue Good for newcomers labels May 3, 2024
@BalasubramanyamEvani
Copy link

I can work on this. It would be great if you could share any pointers regarding which files I should focus on to implement this.

@atroyn
Copy link
Contributor Author

atroyn commented May 13, 2024

@BalasubramanyamEvani thanks for proposing, we might want #2034 to land first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants