Add support for other models in AutoEval #44

NivekT · 2023-08-01T07:30:30Z

🚀 The feature

This is a good task for a new contributor

We have a few utility functions to perform AutoEval:

https://github.com/hegelai/prompttools/blob/main/prompttools/utils/autoeval.py
https://github.com/hegelai/prompttools/blob/main/prompttools/utils/autoeval_scoring.py
https://github.com/hegelai/prompttools/blob/main/prompttools/utils/expected.py

Currently, they tend to only support one model each. Someone can re-factor the code for each of them to support multiple models. I would recommend making sure they all support for the best known models such as GPT-4 and Claude 2.

We can even consider LlaMA but that is less urgent.

Tasks

Update this file such that autoeval_binary_scoring can take in model as an argument. Let's make sure gpt-4 and claude-2 are both accepted and invoke the right completion function.
Same as the above but for this file and the function autoeval_scoring. OpenAI needs to added here.
Same as the above but for this file and the function compute_similarity_against_model
Allow auto evaluation by multiple models (e.g. both gpt-4 and claude-2) at the same time

Motivation, pitch

Allow people to auto-evaluate with different best models would be ideal

Alternatives

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

rachittshah · 2023-08-01T14:39:06Z

@NivekT i think if we add this with #31 , it might be faster and we can build on top of Llama-index's evals.

What do you think?

divij9 · 2023-08-01T14:53:50Z

So we want to accept an array of models and evaluate against all of them, right?

NivekT · 2023-08-02T01:00:26Z

@rachittshah We can consider add LlamaIndex's eval if it integrates well with the pattern we have here. Feel free to propose something and we can have a look.

@divij9 That can be part of it, but for each of the eval function linked above, they currently only support OpenAI or Anthropic.

I will update the main issue to break the request into pieces that are easier for first-time contributors to work on.

NivekT · 2023-08-02T01:05:27Z

I have updated the ask to be bite-size. Feel free to comment if anything is unclear!

Divij97 · 2023-08-02T19:09:40Z

I think I understand what our goal is. Can you please assign this to me?

NivekT · 2023-08-02T19:24:56Z

@Divij97 Sure! Let us know if you plan to work on all 4 subtasks or a specific one. Feel free pick whichever you think you can contribute to. Thanks!

ishaan-jaff · 2023-08-02T21:56:57Z

I'd love to help adding support for new models using https://github.com/BerriAI/litellm. Let me know if I can help out on this too

steventkrawczyk · 2023-08-02T22:00:15Z

@ishaan-jaff Awesome! Could you create an issue for it and I can assign that to you? I think the best approach would be to create a LitellmExperiment, you can follow some examples we've done for other APIs:

Divij97 · 2023-08-04T19:23:11Z

Hi @steventkrawczyk I have made the required changes but I am not able to push. I signed the cla but the push still returns a 403 for me

steventkrawczyk · 2023-08-04T19:32:30Z

@Divij97 you will need to push to a fork and raise a PR from the fork to the original repo

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork

Divij97 · 2023-08-04T19:44:05Z

Nevermind it was a key chain issue with my stupid mac. Can you please take a look at this PR: #59. It's not complete but I wanted to understand if I am headed in the right direction

Divij97 · 2023-08-08T21:00:42Z

Wanted to update about my progress. I am done with all the changes but wanted some help testing them out for Anthropic. How do I generate an Anthropic key to test?

NivekT added help wanted Extra attention is needed good first issue Good for newcomers labels Aug 1, 2023

NivekT assigned Divij97 Aug 2, 2023

Divij97 mentioned this issue Aug 8, 2023

Add support for other models in AutoEval #59

Open

krrishdholakia mentioned this issue Sep 12, 2023

Integrate with prompt tools BerriAI/litellm#342

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for other models in AutoEval #44

Add support for other models in AutoEval #44

NivekT commented Aug 1, 2023 •

edited

rachittshah commented Aug 1, 2023

divij9 commented Aug 1, 2023

NivekT commented Aug 2, 2023 •

edited

NivekT commented Aug 2, 2023

Divij97 commented Aug 2, 2023

NivekT commented Aug 2, 2023

ishaan-jaff commented Aug 2, 2023

steventkrawczyk commented Aug 2, 2023

Divij97 commented Aug 4, 2023

steventkrawczyk commented Aug 4, 2023

Divij97 commented Aug 4, 2023

Divij97 commented Aug 8, 2023

Add support for other models in AutoEval #44

Add support for other models in AutoEval #44

Comments

NivekT commented Aug 1, 2023 • edited

🚀 The feature

Motivation, pitch

Alternatives

Additional context

rachittshah commented Aug 1, 2023

divij9 commented Aug 1, 2023

NivekT commented Aug 2, 2023 • edited

NivekT commented Aug 2, 2023

Divij97 commented Aug 2, 2023

NivekT commented Aug 2, 2023

ishaan-jaff commented Aug 2, 2023

steventkrawczyk commented Aug 2, 2023

Divij97 commented Aug 4, 2023

steventkrawczyk commented Aug 4, 2023

Divij97 commented Aug 4, 2023

Divij97 commented Aug 8, 2023

NivekT commented Aug 1, 2023 •

edited

NivekT commented Aug 2, 2023 •

edited