Visualize LLM Evaluations for OpenAI Assistants
-
Updated
Mar 27, 2024 - TypeScript
Visualize LLM Evaluations for OpenAI Assistants
A framework for automatically manipulating and evaluating the political ideology of LLMs with two ideology tests: Wahl-O-Mat and Political Compass Test.
[ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.
The Typescript SDK for the prompt engineering, prompt management, and prompt testing tool Prompt Foundry
EnsembleX utilizes the Knapsack algorithm to optimize Large Language Model (LLM) ensembles for quality-cost trade-offs, offering tailored suggestions across various domains through a Streamlit dashboard visualization.
🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
A prompt collection for testing and evaluation of LLMs.
Evaluate LLMs using custom functions for reasoning and RAGs and dataset using Langchain
LLMs Evaluation
Upload, score, and visually compare multiple LLM-graded summaries simultaneously!
This repository contains the lab work for Coursera course on "Generative AI with Large Language Models".
A compilation of referenced benchmark metrics to evaluate different aspects of knowledge for Large Language Models.
Summary Evaluation Tool
Exploring the depths of LLMs 🚀
The implementation for EMNLP 2023 paper ”Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators“
Cookbooks and tutorials on Literal AI
Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."