New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

实时微调可以通过加入传统RL实现吗 #157

Open

LIzhiqian-cassie opened this issue Sep 8, 2023 · 0 comments

LIzhiqian-cassie commented Sep 8, 2023

根据用户对话中收集的新数据不断update参数，并且要防止遗忘问题。
但数据集每次是一些句子，总体不够多用于finetune。目前思路是设计一个reward model进行preference调参数，即传统强化学习RL？

想讨论具体有例子和实践方法

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment