Skip to content

Latest commit

 

History

History
43 lines (21 loc) · 1.83 KB

README.md

File metadata and controls

43 lines (21 loc) · 1.83 KB

Experiments

To reproduce our experiments, we publicly release our experimental code and data here accorrding to correponding ability.

We also call for support of computing power for conducting more comprehensive experiments.

Code and data

Results

Instruction Tuning Experiments Results

Here is the results of instruction-tuning experiments (all in a single-turn conversation) based on the LLaMA (7B) model under the chat and QA setting. We employ four instruction improvement strategies on the Self-Instruct-52K dataset, i.e., enhancing the complexity (w/ complexity), increasing the diversity (w/ diversity), balancing the difficulty (w/ difficulty), and scaling the instruction number (w/ scaling). ∗Since we select the LLaMA-7B model fine-tuned on Self-Instruct-52K as the baseline, we omit the win rate of the fine-tuned model with Self-Instruct-52K against itself.

table1

Ability Evaluaition Experiments Results

Here is the results of evaluation on the eight abilities of LLMs with specially selected tasks. The shade of the Orange and Blue fonts denote the performance orders of the results in closed-source and open-source models, respectively. This table will be continuously updated by incorporating the results of more models.

table1

table1