Skip to content

👋 Welcome to Athina AI

Athina is building monitoring and evaluation tools for LLM developers.

Sign Up | Website | Contact

  • Evals SDK: Open-source framework for evaluating LLMs (Python + CLI)
  • Platform: Monitor your production inferences, and automatically run evals

hero

Open-Source SDK for Evals

athina-ai/athina-evals

Documentation | Quick Start | Running Evals

We have a library of preset evaluators, but you can also write custom evaluators within the Athina framework.

Example Preset Evals:

  • Context Contains Enough Information: Detect bad or insufficient retrievals.
  • Does Response Answer Query: Detect incomplete or irrelevant responses.
  • Response Faithfulness: Detect when responses are deviating from the provided context.
  • Summarization Accuracy: Detect hallucinations and mistakes in summaries
  • Grading Criteria: If X, then fail. Otherwise pass.
  • Custom Evals: Custom prompt for LLM-powered evaluation.
  • RAGAS: A set of evaluators that return RAGAS metrics.

Results can also be viewed and tracked on our platform. develop-view

Monitoring & Evaluations Platform for LLM Inferences

Documentation | Demo Video | Sign Up

  • UI for monitoring and visibility into your LLM inferences.
  • Run evals automatically against logged inferences in production.
  • Track cost, token usage, response times, feedback, pass rate and other eval metrics.
  • Analytics segmented by Customer ID, Model, Prompt, Environment, and More.
  • Topic Classification
  • Data Exports
  • ... and more

Contact [email protected] if you have any questions.

Pinned

  1. athina-evals athina-evals Public

    Python SDK for running evaluations on LLM generated responses

    Python 139 11

Repositories

Showing 10 of 11 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…