Why use L2 regularization in reward model training? #3675

hannlp · 2023-08-30T07:59:49Z

Hello, respected developers of Open Assistant. @andreaskoepf While studying your reward model training code, I noticed that besides the ranking loss, there is an additional L2 regularization term. What is the purpose of this regularization term? Are there any papers that mention it?

Open-Assistant/model/model_training/utils/losses.py

Line 76 in 7e40ee3

l2 = 0.5 * (pos_logits**2 + neg_logits**2)

Ravenclaw1 · 2023-08-30T11:14:30Z

My apologies I don’t know if there is any research that covers it as I am new to the AI system. However I can explain that I am interested in attempting to make a conversational AI companion that responds with miare personality so it’s less boring, I understand if you would like me to cease some of my testing on your online version.-- The one, the only, Kaegan R. Bruce

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why use L2 regularization in reward model training? #3675

Why use L2 regularization in reward model training? #3675

hannlp commented Aug 30, 2023 •

edited

Ravenclaw1 commented Aug 30, 2023 via email

Why use L2 regularization in reward model training? #3675

Why use L2 regularization in reward model training? #3675

Comments

hannlp commented Aug 30, 2023 • edited

Ravenclaw1 commented Aug 30, 2023 via email

hannlp commented Aug 30, 2023 •

edited