The TypeScript LLM Evaluations Library
EvalKit is an open-source library designed for TypeScript developers to evaluate and improve the performance of large language models (LLMs) with confidence. Ensure your AI models are reliable, accurate, and trustworthy.
Click here to navigate to the Official EvalKit Documentation
In the documentation, you can find information on how to use EvalKit, its architecture, including tutorials and recipes for various use cases and LLM providers.
Feature | Availability | Docs |
---|---|---|
Bias Detection Metric | ✅ | 🔗 |
Coherence Metric | ✅ | 🔗 |
Dynamic Metric (G-Eval) | ✅ | 🔗 |
Faithfulness Metric | ✅ | 🔗 |
Hallucination Metric | ✅ | 🔗 |
Intent Detection Metric | ✅ | 🔗 |
Semantic Similarity Metric | ✅ | 🔗 |
Semantic Similarity Metric | ✅ | 🔗 |
Reporting | 🚧 | 🚧 |
Looking for a metric/feature that's not listed here? Open an issue and let us know!
- Node.js 18+
- OpenAI API Key
EvalKit currently exports a core package that includes all evaluation related functionalities. Install the package by running the following command:
npm install --save-dev @evalkit/core
We welcome contributions from the community! Please feel free to submit pull requests or create issues for bugs or feature suggestions.
This repository's source code is available under the Apache 2.0 License.