-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea: Cohere Wikipedia Dataset #393
Comments
Good idea! Do you want to add it to https://github.com/erikbern/ann-benchmarks/blob/main/ann_benchmarks/datasets.py (for English)? I'm about to run a new round of benchmarks so we could include that as one dataset. |
I'm pretty new to this, so would probably take some time before getting it to work 😬 I may give it a try next week, if nobody does it. |
Ok no rush, I can also take a look at it. But you're very welcome to look at it too, if I don't have time to! |
I believe the recently released Cohere's Wikipedia Embedding Archives could be a good addition to the benchmarks dataset.
It's note worth the multi language nature of the dataset.
The text was updated successfully, but these errors were encountered: