-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
实时微调可以通过加入传统RL实现吗 #157
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
根据用户对话中收集的新数据不断update参数,并且要防止遗忘问题。
但数据集每次是一些句子,总体不够多用于finetune。目前思路是设计一个reward model进行preference调参数,即传统强化学习RL?
想讨论具体有例子和实践方法
The text was updated successfully, but these errors were encountered: