Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for other models in AutoEval #44

Open
4 tasks
NivekT opened this issue Aug 1, 2023 · 12 comments
Open
4 tasks

Add support for other models in AutoEval #44

NivekT opened this issue Aug 1, 2023 · 12 comments
Assignees
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@NivekT
Copy link
Collaborator

NivekT commented Aug 1, 2023

馃殌 The feature

This is a good task for a new contributor

We have a few utility functions to perform AutoEval:

https://github.com/hegelai/prompttools/blob/main/prompttools/utils/autoeval.py
https://github.com/hegelai/prompttools/blob/main/prompttools/utils/autoeval_scoring.py
https://github.com/hegelai/prompttools/blob/main/prompttools/utils/expected.py

Currently, they tend to only support one model each. Someone can re-factor the code for each of them to support multiple models. I would recommend making sure they all support for the best known models such as GPT-4 and Claude 2.

We can even consider LlaMA but that is less urgent.

Tasks

  • Update this file such that autoeval_binary_scoring can take in model as an argument. Let's make sure gpt-4 and claude-2 are both accepted and invoke the right completion function.
  • Same as the above but for this file and the function autoeval_scoring. OpenAI needs to added here.
  • Same as the above but for this file and the function compute_similarity_against_model
  • Allow auto evaluation by multiple models (e.g. both gpt-4 and claude-2) at the same time

Motivation, pitch

Allow people to auto-evaluate with different best models would be ideal

Alternatives

No response

Additional context

No response

@NivekT NivekT added help wanted Extra attention is needed good first issue Good for newcomers labels Aug 1, 2023
@rachittshah
Copy link

@NivekT i think if we add this with #31 , it might be faster and we can build on top of Llama-index's evals.

What do you think?

@divij9
Copy link

divij9 commented Aug 1, 2023

So we want to accept an array of models and evaluate against all of them, right?

@NivekT
Copy link
Collaborator Author

NivekT commented Aug 2, 2023

@rachittshah We can consider add LlamaIndex's eval if it integrates well with the pattern we have here. Feel free to propose something and we can have a look.

@divij9 That can be part of it, but for each of the eval function linked above, they currently only support OpenAI or Anthropic.

I will update the main issue to break the request into pieces that are easier for first-time contributors to work on.

@NivekT
Copy link
Collaborator Author

NivekT commented Aug 2, 2023

I have updated the ask to be bite-size. Feel free to comment if anything is unclear!

@Divij97
Copy link

Divij97 commented Aug 2, 2023

I think I understand what our goal is. Can you please assign this to me?

@NivekT
Copy link
Collaborator Author

NivekT commented Aug 2, 2023

@Divij97 Sure! Let us know if you plan to work on all 4 subtasks or a specific one. Feel free pick whichever you think you can contribute to. Thanks!

@ishaan-jaff
Copy link

I'd love to help adding support for new models using https://github.com/BerriAI/litellm. Let me know if I can help out on this too

@steventkrawczyk
Copy link
Contributor

@ishaan-jaff Awesome! Could you create an issue for it and I can assign that to you? I think the best approach would be to create a LitellmExperiment, you can follow some examples we've done for other APIs:

@Divij97
Copy link

Divij97 commented Aug 4, 2023

Hi @steventkrawczyk I have made the required changes but I am not able to push. I signed the cla but the push still returns a 403 for me

@steventkrawczyk
Copy link
Contributor

@Divij97
Copy link

Divij97 commented Aug 4, 2023

Nevermind it was a key chain issue with my stupid mac. Can you please take a look at this PR: #59. It's not complete but I wanted to understand if I am headed in the right direction

@Divij97
Copy link

Divij97 commented Aug 8, 2023

Wanted to update about my progress. I am done with all the changes but wanted some help testing them out for Anthropic. How do I generate an Anthropic key to test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

6 participants