Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

实时微调可以通过加入传统RL实现吗 #157

Open
LIzhiqian-cassie opened this issue Sep 8, 2023 · 0 comments
Open

实时微调可以通过加入传统RL实现吗 #157

LIzhiqian-cassie opened this issue Sep 8, 2023 · 0 comments

Comments

@LIzhiqian-cassie
Copy link

根据用户对话中收集的新数据不断update参数,并且要防止遗忘问题。
但数据集每次是一些句子,总体不够多用于finetune。目前思路是设计一个reward model进行preference调参数,即传统强化学习RL?

想讨论具体有例子和实践方法

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant