-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about performance improvement in Open LLM leaderboard #21
Comments
Hi, thanks for your interest! This question is indeed interesting. We have a couple of speculations that might shed some light:
References
|
Thanks for suggesting your insights and thoughts about my curious question! I also agreed on the second point that the model has the potential to answer but not consistently to do it. |
A potential explanation might be the presence of STEM-related samples within the UltraFeedback Datasets. |
Hi,
First of all, thank you for sharing your wonderful work!
I was searching for efficient ways of mining instructions used in instruction-tuning LLMs.
While reading the manuscript and investigating your provided open-sourced 6k & 10k datasets,
I could not intuitively understand why the SFT (6k) +DPO (10k) training method increases the performance of
the multi-choice question answering tasks such as ARC-challenge and MMLU?
In the dataset, the instances are composed of conversations between humans and GPT which don't have any clue about solving multi-choice QA problems.
Do you have any ideas why it worked?
The text was updated successfully, but these errors were encountered: