Skip to content

Latest commit

 

History

History
23 lines (19 loc) · 2.14 KB

File metadata and controls

23 lines (19 loc) · 2.14 KB
license
llama2

Experiment for DARE(Drop and REscale), most of the delta parameters can be directly set to zeros without affecting the capabilities of SFT LMs and larger models can tolerate a higher proportion of discarded parameters.

Merged with below DARE models.

weight_mask_rate: 0.85 / use_weight_rescale: True / mask_stratery: random / scaling_coefficient: 1.0

Model Average ARC HellaSwag MMLU TruthfulQA Winogrande GSM8K
Intel/neural-chat-7b-v3-1 61.59 66.21 83.64 62.37 59.65 78.14 19.56
migtissera/SynthIA-7B-v1.3 59.34 62.12 83.45 62.65 51.37 78.85 17.59
bhenrym14/mistral-7b-platypus-fp16 58.71 63.05 84.15 64.11 45.07 78.53 17.36
jondurbin/airoboros-m-7b-3.1.2 58.75 61.86 83.51 61.91 53.75 77.58 13.87
teknium/CollectiveCognition-v1.1-Mistral-7B 62.92 62.12 84.17 62.35 57.62 75.37 15.62
uukuguy/speechless-mistral-dolphin-orca-platypus-samantha-7b 62.06 64.33 84.4 63.72 52.52 78.37 21.38
speechless-mistral-7b-dare-0.85 (Merge 6 DARE models) 64.69 63.57 84.82 64.29 50.66 79.24 45.56