Skip to content

Latest commit

 

History

History
151 lines (141 loc) · 4.42 KB

File metadata and controls

151 lines (141 loc) · 4.42 KB
language license library_name datasets pipeline_tag model-index
en
apache-2.0
transformers
monology/VMware-open-instruct-higgsfield
text-generation
name results
openinstruct-mistral-7b
task dataset metrics source
type name
text-generation
Text Generation
name type config split args
AI2 Reasoning Challenge (25-Shot)
ai2_arc
ARC-Challenge
test
num_few_shot
25
type value name
acc_norm
59.73
normalized accuracy
task dataset metrics source
type name
text-generation
Text Generation
name type split args
HellaSwag (10-Shot)
hellaswag
validation
num_few_shot
10
type value name
acc_norm
82.77
normalized accuracy
task dataset metrics source
type name
text-generation
Text Generation
name type config split args
MMLU (5-Shot)
cais/mmlu
all
test
num_few_shot
5
type value name
acc
60.55
accuracy
task dataset metrics source
type name
text-generation
Text Generation
name type config split args
TruthfulQA (0-shot)
truthful_qa
multiple_choice
validation
num_few_shot
0
type value
mc2
48.76
task dataset metrics source
type name
text-generation
Text Generation
name type config split args
Winogrande (5-shot)
winogrande
winogrande_xl
validation
num_few_shot
5
type value name
acc
79.56
accuracy
task dataset metrics source
type name
text-generation
Text Generation
name type config split args
GSM8k (5-shot)
gsm8k
main
test
num_few_shot
5
type value name
acc
50.49
accuracy

OpenInstruct Mistral-7B

1st among commercially-usable 7B models on the Open LLM Leaderboard!*

This is mistralai/Mistral-7B-v0.1 finetuned on VMware/open-instruct.
Quantized to FP16 and released under the Apache-2.0 license by myself.
Compute generously provided by Higgsfield AI.

Prompt format: Alpaca

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
[your instruction goes here]

### Response:

Recommended preset:

  • temperature: 0.2
  • top_k: 50
  • top_p 0.95
  • repetition_penalty: 1.1

*as of 21 Nov 2023. "commercially-usable" includes both an open-source base model and a non-synthetic open-source finetune dataset. updated leaderboard results available here.

Detailed results can be found here

Metric Value
Avg. 63.64
AI2 Reasoning Challenge (25-Shot) 59.73
HellaSwag (10-Shot) 82.77
MMLU (5-Shot) 60.55
TruthfulQA (0-shot) 48.76
Winogrande (5-shot) 79.56
GSM8k (5-shot) 50.49