IAAR-Shanghai / xFinder Star 44 Code Issues Pull requests xFinder: Robust and Pinpoint Answer Extraction for Large Language Models benchmark regex reliability evaluation dataset gpt large-language-models llm open-compass lm-evaluation xfinder reliable-evaluation key-answer-extraction Updated May 31, 2024 Python
hitz-zentroa / latxa Star 19 Code Issues Pull requests Latxa: An Open Language Model and Evaluation Suite for Basque evaluation language-model basque huggingface gpt-neox llm lm-evaluation latxa Updated May 15, 2024 Shell