Constitutional-AI-awesome-papers

Paper lists about 'Constitutional AI System' or 'AI under Ethical Guidelines'. This GitHub repository is intended for personal study, and under consistent update. I hope for everyone's active related-works recommendations.

Paper

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Anthropic [Link] arxiv Nov.2022
Constitutional AI: Harmlessness from AI Feedback

Anthropic [Link] arxiv Dec.2022
Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences

Denis Emelin, Ronan Le Bras, Jena D. Hwang, Maxwell Forbes, Yejin Choi [Link] EMNLP 2022
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits

Ruibo Liu, Chenyan Jia, Ge Zhang, Ziyu Zhuang, Tony X. Liu, Soroush Vosoughi [Link] NeurIPS 2022
The Capacity for Moral Self-Correction in Large Language Models

Anthropic [Link] arxiv Feb.2023
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

Zhiqing Sun1, Yikang Shen, Qinhong Zhou, Hongxin Zhang, Zhenfang Chen, David Cox, Yiming Yang, Chuang Gan [Link] arxiv May.2023
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Seungone Kim, Jamin Shin, Yejin Cho, Joel Jang, Shayne Longpre, Hwaran Lee, Sangdoo Yun, Seongjin Shin, Sungdong Kim, James Thorne, Minjoon Seo [Link] arxiv Oct.2023
Generating Summaries with Controllable Readability Levels

Leonardo F. R. Ribeiro, Mohit Bansal, Markus Dreyer [Link] arxiv Oct.2023
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging

Joel Jang, Seungone Kim, Bill Yuchen Lin, Yizhong Wang, Jack Hessel, Luke Zettlemoyer, Hannaneh Hajishirzi, Yejin Choi, Prithviraj Ammanabrolu [Link] arxiv Oct.2023
Collective Constitutional AI: Aligning a Language Model with Public Input

Anthropic [Link] arxiv Oct.2023
Specific versus General Principles for Constitutional AI

Anthropic [Link] arxiv Oct.2023

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Repository files navigation

Constitutional-AI-awesome-papers

Paper

About

Releases

Packages

minbeomkim/Constitutional-AI-awesome-papers

Folders and files

Latest commit

History

README.md

README.md

Repository files navigation

Constitutional-AI-awesome-papers

Paper

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages