New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why use L2 regularization in reward model training? #3675
Comments
My apologies I don’t know if there is any research that covers it as I am
new to the AI system. However I can explain that I am interested in
attempting to make a conversational AI companion that responds with miare
personality so it’s less boring, I understand if you would like me to cease
some of my testing on your online version.--
The one, the only, Kaegan R. Bruce
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello, respected developers of Open Assistant. @andreaskoepf While studying your reward model training code, I noticed that besides the ranking loss, there is an additional L2 regularization term. What is the purpose of this regularization term? Are there any papers that mention it?
Open-Assistant/model/model_training/utils/losses.py
Line 76 in 7e40ee3
The text was updated successfully, but these errors were encountered: