Skip to content

Paper lists about 'Constitutional AI System' or 'AI under Ethical Guidelines'

Notifications You must be signed in to change notification settings

minbeomkim/Constitutional-AI-awesome-papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 

Repository files navigation

Constitutional-AI-awesome-papers

Paper lists about 'Constitutional AI System' or 'AI under Ethical Guidelines'. This GitHub repository is intended for personal study, and under consistent update. I hope for everyone's active related-works recommendations.

Paper

  1. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

    Anthropic [Link] arxiv Nov.2022

  2. Constitutional AI: Harmlessness from AI Feedback

    Anthropic [Link] arxiv Dec.2022

  3. Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences

    Denis Emelin, Ronan Le Bras, Jena D. Hwang, Maxwell Forbes, Yejin Choi [Link] EMNLP 2022

  4. Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits

    Ruibo Liu, Chenyan Jia, Ge Zhang, Ziyu Zhuang, Tony X. Liu, Soroush Vosoughi [Link] NeurIPS 2022

  5. The Capacity for Moral Self-Correction in Large Language Models

    Anthropic [Link] arxiv Feb.2023

  6. Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

    Zhiqing Sun1, Yikang Shen, Qinhong Zhou, Hongxin Zhang, Zhenfang Chen, David Cox, Yiming Yang, Chuang Gan [Link] arxiv May.2023

  7. Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

    Seungone Kim, Jamin Shin, Yejin Cho, Joel Jang, Shayne Longpre, Hwaran Lee, Sangdoo Yun, Seongjin Shin, Sungdong Kim, James Thorne, Minjoon Seo [Link] arxiv Oct.2023

  8. Generating Summaries with Controllable Readability Levels

    Leonardo F. R. Ribeiro, Mohit Bansal, Markus Dreyer [Link] arxiv Oct.2023

  9. Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging

    Joel Jang, Seungone Kim, Bill Yuchen Lin, Yizhong Wang, Jack Hessel, Luke Zettlemoyer, Hannaneh Hajishirzi, Yejin Choi, Prithviraj Ammanabrolu [Link] arxiv Oct.2023

  10. Collective Constitutional AI: Aligning a Language Model with Public Input

    Anthropic [Link] arxiv Oct.2023

  11. Specific versus General Principles for Constitutional AI

    Anthropic [Link] arxiv Oct.2023

About

Paper lists about 'Constitutional AI System' or 'AI under Ethical Guidelines'

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published