The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.
-
Updated
Jun 6, 2024 - Python
The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.
🪢 Open source LLM engineering platform: Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
Test your prompts, agents, and RAGs. Use LLM evals to improve your app's quality and catch problems. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
TypeScript SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
🐢 Open-Source Evaluation & Testing for LLMs and ML models
Python SDK for running evaluations on LLM generated responses
The official evaluation suite and dynamic data release for MixEval.
A prompt collection for testing and evaluation of LLMs.
The LLM Evaluation Framework
Awesome papers involving LLMs in Social Science.
[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
Summary Evaluation Tool
A list of LLMs Tools & Projects
Open-Source Evaluation for GenAI Application Pipelines
LeanEuclid is a benchmark for autoformalization in the domain of Euclidean geometry, targeting the proof assistant Lean.
Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute, relative and much more. It contains a list of all the available tool, methods, repo, code etc to detect hallucination, LLM evaluation, grading and much more.
Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."