Skip to content

Notebooks for evaluating LLM based applications using the Model (LLM) as a judge pattern.

License

Notifications You must be signed in to change notification settings

aws-samples/model-as-a-judge-eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Model As A Judge Eval Patterns

In this repo, we will examine different techniques of using a model (LLM) as a judge when evaluating LLM based solutions.

Content in this repo

We will various types of evals. The datasets provided were all synthetically generated using LLMs and are located in the /data directory. The evals are in jupyter notebooks and are located in the /eval directory.

Types of Evals

  1. 00_basic_chat_evaluation: This eval consumes a chat conversation from a chatbot/agent and evaluates it based on a rubric. The purpose of this notebook is to demonstrate a basic eval pattern. In subsequent evaluations, we'll go through some more advanced evals for comparing chats from different models.

More evaluations coming soon.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

About

Notebooks for evaluating LLM based applications using the Model (LLM) as a judge pattern.

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published