GitHub - noah-art3mis/crucible: Develop better LLM apps by testing different models and prompts in bulk.

Crucible

Prompt evaluation package ("evals"). Test multiple models, prompts and variables.

Uses ollama to run LLMs locally.

How to use

Setup: python -m venv venv, source venv/bin/activate, pip install -r requirements.txt
Set the models in eval_models.py, prompts in eval_prompts.py and variables in eval_variables.py. See section on parameters.
(Not implemented) Set grading style in main.py.
- "binary": is either right or wrong
- "qualitative": ask claude
Run python eval.py.
Logs from the run will be in output/<datetime>.yaml.

Parameters

model
- id (str): name as understood by ollama. you might need to download it first
```
      Model("llama3")
```

prompt

id (str): name of the test case
slots (str): name of snippet to be inserted in prompt

content (str): actual prompt

      Prompt(
          id="test_3",
          slots="{variable}",
          content="""Sua tarefa é analisar e responder se o texto a seguir menciona a necessidade de comprar remédios ou itens de saúde. Aqui está o texto:\n\n###\n\n{variable}\n\n###\n\n\nPrimeiro, analise cuidadosamente o texto em um rascunho. Depois, responda: a solicitação citada menciona a necessidade de comprar remédios ou itens de saúde? Responda "<<SIM>>" ou "<<NÃO>>".""",
      )

variable

id (str): name of the test case
content (str): text of snippet to be inserted in prompt
expected (str list): values that would be considered correct

options (str list): all values that the response could take

      Variable(
          id="despesas_essenciais",
          content="Família monoparental composta por Josefa e 5 filhos com idades entre 1 e 17 anos. Contam apenas com a renda de coleta de material reciclável e relatam dificuldade para manter as despesas essenciais. Solicita-se, portanto, o auxílio vulnerabilidade.",
          expected=["<<NAO>>", "<<NÃO>>"],
          options=["<<NAO>>", "<<NÃO>>, <<SIM>>"],
      ),

TODO

add tests
add qualitative eval
add asyncio
add details on which answers tend to be wrong. summary expected

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
outputs		outputs
test		test
utils		utils
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
async.py		async.py
eval_models.py		eval_models.py
eval_prompts.py		eval_prompts.py
eval_variables.py		eval_variables.py
main.py		main.py
requirements.txt		requirements.txt
tasks		tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

outputs

outputs

test

test

utils

utils

.gitignore

.gitignore

README.md

README.md

init.py

init.py

async.py

async.py

eval_models.py

eval_models.py

eval_prompts.py

eval_prompts.py

eval_variables.py

eval_variables.py

main.py

main.py

requirements.txt

requirements.txt

tasks

tasks

Repository files navigation

Crucible

How to use

Parameters

TODO

Resources

About

Releases

Packages

Languages

noah-art3mis/crucible

Folders and files

Latest commit

History

Repository files navigation

Crucible

How to use

Parameters

TODO

Resources

About

Topics

Resources

Stars

Watchers

Forks

Languages