A framework for automatically manipulating and evaluating the political ideology of LLMs with two ideology tests: Wahl-O-Mat and Political Compass Test.
-
Updated
May 1, 2024 - Python
A framework for automatically manipulating and evaluating the political ideology of LLMs with two ideology tests: Wahl-O-Mat and Political Compass Test.
The Typescript SDK for the prompt engineering, prompt management, and prompt testing tool Prompt Foundry
A compilation of referenced benchmark metrics to evaluate different aspects of knowledge for Large Language Models.
Evaluate LLMs using custom functions for reasoning and RAGs and dataset using Langchain
Calibration game is a game to get better at identifying hallucination in LLMs.
For the purposes of familiarization and learning. Consists of utilizing LangChain framework, LangSmith for tracing, OpenAI LLM models, Pinecone serverless vectorDB using Jupyter Notebook and Python.
Visualize LLM Evaluations for OpenAI Assistants
Summary Evaluation Tool
EnsembleX utilizes the Knapsack algorithm to optimize Large Language Model (LLM) ensembles for quality-cost trade-offs, offering tailored suggestions across various domains through a Streamlit dashboard visualization.
FactScoreLite is an implementation of the FactScore metric, designed for detailed accuracy assessment in text generation. This package builds upon the framework provided by the original FactScore repository, which is no longer maintained and contains outdated functions.
LLMs Evaluation
Exploring the depths of LLMs 🚀
[Personalize@EACL 2024] LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models.
Upload, score, and visually compare multiple LLM-graded summaries simultaneously!
Template for an AI application that extracts the job information from a job description using openAI functions and langchain
Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."