#

llm-evaluation

Here are 58 public repositories matching this topic...

euskoog / openai-assistants-evals

Visualize LLM Evaluations for OpenAI Assistants

openai tailwindcss llms llm-evaluation openai-assistants

Updated Mar 27, 2024
TypeScript

awesome-software / ray-summit-2023-training

Updated Sep 21, 2023
Jupyter Notebook

j0st / PoliticalLLM

A framework for automatically manipulating and evaluating the political ideology of LLMs with two ideology tests: Wahl-O-Mat and Political Compass Test.

german pct manifesto-project rag wahlomat political-ideology-detection llms llm-evaluation

Updated May 1, 2024
Python

nagababumo / Automated-Testing-for-LLMOps

automation evaluation llm llmops llm-evaluation llm-automation

Updated Jun 4, 2024
Jupyter Notebook

VITA-Group / llm-kick

[ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.

llm-inference llm-evaluation llm-compression llm-pruning

Updated Mar 13, 2024
Python

prompt-foundry / typescript-sdk

The Typescript SDK for the prompt engineering, prompt management, and prompt testing tool Prompt Foundry

typescript open-ai prompt-engineering prompt-testing prompt-manager prompt-management llm-eval llm-test llm-evaluation prompt-evaluation

Updated May 24, 2024
TypeScript

VidhyaVarshanyJS / EnsembleX

EnsembleX utilizes the Knapsack algorithm to optimize Large Language Model (LLM) ensembles for quality-cost trade-offs, offering tailored suggestions across various domains through a Streamlit dashboard visualization.

python benchmark knapsack huggingface streamlit large-language-models llm llm-evaluation open-llm-leaderboard

Updated May 5, 2024
Python

hkust-nlp / dart-math

🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving

nlp deep-learning mathematics llm llm-training llm-inference llm-evaluation

Updated Jun 5, 2024
Python

kwinkunks / promptly

A prompt collection for testing and evaluation of LLMs.

prompts prompt-engineering chatgpt llm-evaluation

Updated Jun 5, 2024
Jupyter Notebook

SharathHebbar / eval_llms

eleutherai llm-evaluation llms-benchmarking

Updated Feb 4, 2024
Jupyter Notebook

wittyicon29 / Custom-Evaluate-LLM

Evaluate LLMs using custom functions for reasoning and RAGs and dataset using Langchain

llms langchain llm-evaluation

Updated Apr 21, 2024
Jupyter Notebook

GURPREETKAURJETHRA / LLMs-Evaluation

LLMs Evaluation

large-language-models llm generative-ai llm-evaluation

Updated May 16, 2024
Jupyter Notebook

nagababumo / Building-and-Evaluating-Advanced-RAG

python rag llamaindex retrieval-augmented-generation llm-evaluation llm-evaluation-framework

Updated Jun 1, 2024
Jupyter Notebook

AdamCoscia / iScore

Upload, score, and visually compare multiple LLM-graded summaries simultaneously!

transformers visual-analytics summary-evaluation learning-sciences responsible-ai ethical-ai llm-evaluation

Updated Mar 8, 2024
JavaScript

rochitasundar / Generative-AI-with-Large-Language-Models

This repository contains the lab work for Coursera course on "Generative AI with Large Language Models".

reinforcement-learning transformer kl-divergence proximal-policy-optimization large-language-models prompt-engineering flan-t5 instruction-finetuning low-rank-adaptation reward-model parameter-efficient-fine-tuning llm-evaluation

Updated Dec 1, 2023
Jupyter Notebook

IteraLabs / knowledge-benchmarks

A compilation of referenced benchmark metrics to evaluate different aspects of knowledge for Large Language Models.

nlp artificial-intelligence benchmarks natural-language-understanding llm llm-evaluation

Updated May 18, 2024

innerNULL / summary-evaluator

Summary Evaluation Tool

nlp deep-learning text-summarization model-evaluation model-evaluation-metrics llm bertscore llm-evaluation

Updated Jun 3, 2024
Python

GiacomoMeloni / ExploringLLMs

Exploring the depths of LLMs 🚀

rag llm prompt-engineering generative-ai retrieval-augmented-generation llm-evaluation

Updated Dec 7, 2023
Jupyter Notebook

ChanLiang / CONNER

The implementation for EMNLP 2023 paper ”Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators“

llama factuality hallucinations large-language-models nlg-evaluation chatgpt llm-evaluation emnlp2023

Updated Jan 22, 2024
Python

Chainlit / literal-cookbook

Cookbooks and tutorials on Literal AI

rag llm prompt-engineering llm-evaluation

Updated Jun 5, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."