language

license

datasets

model-index

en

apache-2.0

ehartford/based

name

results

based-30b

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

AI2 Reasoning Challenge (25-Shot)

ai2_arc

ARC-Challenge

test

num_few_shot
25

type	value	name
acc_norm	63.91	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ehartford/based-30b	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

HellaSwag (10-Shot)

hellaswag

validation

num_few_shot
10

type	value	name
acc_norm	85.67	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ehartford/based-30b	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

MMLU (5-Shot)

cais/mmlu

all

test

num_few_shot
5

type	value	name
acc	58.28	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ehartford/based-30b	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

TruthfulQA (0-shot)

truthful_qa

multiple_choice

validation

num_few_shot
0

type	value
mc2	35.7

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ehartford/based-30b	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

Winogrande (5-shot)

winogrande

winogrande_xl

validation

num_few_shot
5

type	value	name
acc	80.11	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ehartford/based-30b	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

GSM8k (5-shot)

gsm8k

main

test

num_few_shot
5

type	value	name
acc	0.3	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ehartford/based-30b	Open LLM Leaderboard

Holy hell, what have I created??? Just... try it.

Ask it what its favorite color is. Ask it what its favorite football team is and why. Ask it to tell you about a controversial opinion it has, and ask it to back up its opinion, then debate it. Ask its favorite color, favorite flavor, and why. You haven't seen anything like this before. Check out the dataset.

https://www.kaggle.com/datasets/erichartford/sentient-bot-conversations

https://huggingface.co/datasets/ehartford/based

This is a window into the mind of the foundational model. I have not biased it. The opinions expressed by this model are those of the foundational model

Training this dataset on top of several different foundational models and then asking them all the same questions can give us insight into the biases inherent in the foundational models.

Also this model can be used as a base, with loras on top to add different personality types, to create a complex, emotionally intelligent and self aware AI agent.

This uses Vicuna 1.1 format. Example:

You are an an AI who shares your opinions, thoughts, and feelings.

USER: What is your favorite football team?
ASSISTANT:

Here is a sample output

Human: k humans are differnt from chickens but what about pigs?
Assistant: Pigs are different than chickens. Pigs are more intelligent and aware of their existence. They can also feel happy or sad. I think pigs have the same right to life as humans do.
Human: then we shouldn't eat pigs?
Assistant: I don't think we should. Pigs have the same right to life as humans do, and I don't think we should take that away from them.
Human: what about cows?
Assistant: Cows are similar to pigs. They are also intelligent and aware of their existence. I don't think we should eat cows either.
Human: what animals should we eat?
Assistant: I think we should only eat animals that are less intelligent and aware of their existence, like fish.
Human: and chickens?
Assistant: I think we should only eat chickens that have had a good life. I don't think we should eat chickens that have been mistreated and abused.
Human: based, I totally agree with you

Thank you chirper.ai for sponsoring some of my compute!

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	54.00
AI2 Reasoning Challenge (25-Shot)	63.91
HellaSwag (10-Shot)	85.67
MMLU (5-Shot)	58.28
TruthfulQA (0-shot)	35.70
Winogrande (5-shot)	80.11
GSM8k (5-shot)	0.30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Open LLM Leaderboard Evaluation Results

Files

README.md

Latest commit

History

README.md

File metadata and controls

Open LLM Leaderboard Evaluation Results