Skip to content

Pull requests: EleutherAI/lm-evaluation-harness

Author
Filter by author
Label
Filter by label
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Milestones
Filter by milestone
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Alghafa benchmark
#1946 opened Jun 11, 2024 by khalil-Hennara Loading…
samples is newline delimited
#1930 opened Jun 5, 2024 by baberabb Loading…
[New Task] Add Paloma benchmark
#1928 opened Jun 5, 2024 by zafstojano Loading…
Multiprompt
#1922 opened Jun 4, 2024 by lintangsutawika Draft
Confusion matrix metric
#1921 opened Jun 4, 2024 by minaremeli Loading…
mlx Model (loglikelihood & generate_until)
#1902 opened May 29, 2024 by chimezie Loading…
add arc_challenge_mt
#1900 opened May 29, 2024 by jonabur Loading…
Add LegalBench tasks
#1878 opened May 23, 2024 by zafstojano Loading…
Test coverage for optimum_lm.py
#1872 opened May 22, 2024 by zafstojano Loading…
Added tests for Anthropic LLMs
#1868 opened May 21, 2024 by zafstojano Loading…
Draft - Support ov models via genai
#1862 opened May 20, 2024 by sstrehlk Loading…
mmlu-pro for the Italian language
#1860 opened May 19, 2024 by giux78 Loading…
[WIP] Fix NeuralMagic tests
#1859 opened May 19, 2024 by haileyschoelkopf Loading…
Fix m_mmlu target
#1853 opened May 18, 2024 by jordane95 Loading…
Implement Exams benchmark
#1852 opened May 17, 2024 by snova-zoltanc Loading…
Fix self.max_tokens in anthropic_llms.py
#1848 opened May 16, 2024 by lozhn Loading…
Adding LLaVa support
#1832 opened May 13, 2024 by ashvinnihalani Loading…
Financial PhraseBank (FPB) Eval Metric
#1815 opened May 9, 2024 by bcicc Loading…
Fix cost_estimate.py
#1810 opened May 8, 2024 by xksteven Loading…
Fix --gen_kwargs and VLLM (temperature not respected) bug Something isn't working.
#1800 opened May 7, 2024 by haileyschoelkopf Loading…
ProTip! no:milestone will show everything without a milestone.