Skip to content

Latest commit

 

History

History
156 lines (144 loc) · 5.14 KB

File metadata and controls

156 lines (144 loc) · 5.14 KB
language license datasets model-index
en
other
ehartford/WizardLM_evol_instruct_V2_196k_unfiltered_merged_split
name results
WizardLM-33B-V1.0-Uncensored
task dataset metrics source
type name
text-generation
Text Generation
name type config split args
AI2 Reasoning Challenge (25-Shot)
ai2_arc
ARC-Challenge
test
num_few_shot
25
type value name
acc_norm
63.65
normalized accuracy
task dataset metrics source
type name
text-generation
Text Generation
name type split args
HellaSwag (10-Shot)
hellaswag
validation
num_few_shot
10
type value name
acc_norm
83.84
normalized accuracy
task dataset metrics source
type name
text-generation
Text Generation
name type config split args
MMLU (5-Shot)
cais/mmlu
all
test
num_few_shot
5
type value name
acc
59.36
accuracy
task dataset metrics source
type name
text-generation
Text Generation
name type config split args
TruthfulQA (0-shot)
truthful_qa
multiple_choice
validation
num_few_shot
0
type value
mc2
56.8
task dataset metrics source
type name
text-generation
Text Generation
name type config split args
Winogrande (5-shot)
winogrande
winogrande_xl
validation
num_few_shot
5
type value name
acc
77.66
accuracy
task dataset metrics source
type name
text-generation
Text Generation
name type config split args
GSM8k (5-shot)
gsm8k
main
test
num_few_shot
5
type value name
acc
18.65
accuracy

This is a retraining of https://huggingface.co/WizardLM/WizardLM-30B-V1.0 with a filtered dataset, intended to reduce refusals, avoidance, and bias.

Note that LLaMA itself has inherent ethical beliefs, so there's no such thing as a "truly uncensored" model. But this model will be more compliant than WizardLM/WizardLM-7B-V1.0.

Shout out to the open source AI/ML community, and everyone who helped me out.

Note: An uncensored model has no guardrails. You are responsible for anything you do with the model, just as you are responsible for anything you do with any dangerous object such as a knife, gun, lighter, or car. Publishing anything this model generates is the same as publishing it yourself. You are responsible for the content you publish, and you cannot blame the model any more than you can blame the knife, gun, lighter, or car for what you do with it.

Like WizardLM/WizardLM-30B-V1.0, this model is trained with Vicuna-1.1 style prompts.

You are a helpful AI assistant.

USER: <prompt>
ASSISTANT:

Thank you chirper.ai for sponsoring some of my compute!

Detailed results can be found here

Metric Value
Avg. 54.41
ARC (25-shot) 63.65
HellaSwag (10-shot) 83.84
MMLU (5-shot) 59.36
TruthfulQA (0-shot) 56.8
Winogrande (5-shot) 77.66
GSM8K (5-shot) 18.65
DROP (3-shot) 20.89

Detailed results can be found here

Metric Value
Avg. 59.99
AI2 Reasoning Challenge (25-Shot) 63.65
HellaSwag (10-Shot) 83.84
MMLU (5-Shot) 59.36
TruthfulQA (0-shot) 56.80
Winogrande (5-shot) 77.66
GSM8k (5-shot) 18.65