language

license

model-index

en

apache-2.0

name

results

supermario-v2

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

AI2 Reasoning Challenge (25-Shot)

ai2_arc

ARC-Challenge

test

num_few_shot
25

type	value	name
acc_norm	68.52	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=janhq/supermario-v2	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

split

args

HellaSwag (10-Shot)

hellaswag

validation

num_few_shot
10

type	value	name
acc_norm	86.51	normalized accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=janhq/supermario-v2	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

MMLU (5-Shot)

cais/mmlu

all

test

num_few_shot
5

type	value	name
acc	64.88	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=janhq/supermario-v2	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

TruthfulQA (0-shot)

truthful_qa

multiple_choice

validation

num_few_shot
0

type	value
mc2	60.58

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=janhq/supermario-v2	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

Winogrande (5-shot)

winogrande

winogrande_xl

validation

num_few_shot
5

type	value	name
acc	81.37	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=janhq/supermario-v2	Open LLM Leaderboard

task

dataset

metrics

source

type	name
text-generation	Text Generation

name

type

config

split

args

GSM8k (5-shot)

gsm8k

main

test

num_few_shot
5

type	value	name
acc	72.18	accuracy

url	name
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=janhq/supermario-v2	Open LLM Leaderboard

Jan - Discord

Model Description

This model uses the DARE_TIES merge method from 3 models:

OpenHermes-2.5-neural-chat-v3-3-Slerp
MetaMath-Cybertron-Starling
Marcoroni-7B-v3

base model: Mistral-7B-v0.1

The yaml config file for this model here:

base_model: mistralai/Mistral-7B-v0.1
dtype: bfloat16
merge_method: dare_ties
models:
- model: mistralai/Mistral-7B-v0.1
- model: Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp
  parameters:
    density: 0.8
    weight: 0.4
- model: Q-bert/MetaMath-Cybertron-Starling
  parameters:
    density: 0.8
    weight: 0.3
- model: AIDC-ai-business/Marcoroni-7B-v3
  parameters:
    density: 0.8
    weight: 0.3
parameters:
  int8_mask: true

Prompt template:

ChatML

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

System

### System:
{system}
### User:
{user}
### Assistant:

Run this model

You can run this model using Jan Desktop on Mac, Windows, or Linux.

Jan is an open source, ChatGPT alternative that is:

💻 100% offline on your machine: Your conversations remain confidential, and visible only to you.
🗂️ An Open File Format: Conversations and model settings stay on your computer and can be exported or deleted at any time.
🌐 OpenAI Compatible: Local server on port 1337 with OpenAI compatible endpoints
🌍 Open Source & Free: We build in public; check out our Github

About Jan

Jan believes in the need for an open-source AI ecosystem and is building the infra and tooling to allow open-source AIs to compete on a level playing field with proprietary ones.

Jan's long-term vision is to build a cognitive framework for future robots, who are practical, useful assistants for humans and businesses in everyday life.

Jan Model Merger

This is a test project for merging models.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here.

Metric	Value
Avg.	72.36
ARC (25-shot)	68.52
HellaSwag (10-shot)	86.51
MMLU (5-shot)	64.88
TruthfulQA (0-shot)	60.58
Winogrande (5-shot)	81.37
GSM8K (5-shot)	72.18

Acknowlegement

mergekit
DARE
lm-evaluation-harness

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	72.34
AI2 Reasoning Challenge (25-Shot)	68.52
HellaSwag (10-Shot)	86.51
MMLU (5-Shot)	64.88
TruthfulQA (0-shot)	60.58
Winogrande (5-shot)	81.37
GSM8k (5-shot)	72.18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Model Description

Prompt template:

Run this model

About Jan

Jan Model Merger

Open LLM Leaderboard Evaluation Results

Acknowlegement

Open LLM Leaderboard Evaluation Results

Files

README.md

Latest commit

History

README.md

File metadata and controls

Model Description

Prompt template:

Run this model

About Jan

Jan Model Merger

Open LLM Leaderboard Evaluation Results

Acknowlegement

Open LLM Leaderboard Evaluation Results