A curated list of awesome responsible machine learning resources.
-
Updated
May 24, 2024
A curated list of awesome responsible machine learning resources.
🐢 Open-Source Evaluation & Testing for LLMs and ML models
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Aligning AI With Shared Human Values (ICLR 2021)
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022
Open Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)
[NeurIPS '23 Spotlight] Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
RuLES: a benchmark for evaluating rule-following in language models
Code accompanying the paper Pretraining Language Models with Human Preferences
LAMBDA is a model-based reinforcement learning agent that uses Bayesian world models for safe policy optimization
[ICCV2021 Oral] Fooling LiDAR by Attacking GPS Trajectory
Attack to induce LLMs within hallucinations
Feature Space Singularity for Out-of-Distribution Detection. (SafeAI 2021)
📚 A curated list of papers & technical articles on AI Quality & Safety
A project to add scalable state-of-the-art out-of-distribution detection (open set recognition) support by changing two lines of code! Perform efficient inferences (i.e., do not increase inference time) and detection without classification accuracy drop, hyperparameter tuning, or collecting additional data.
An unrestricted attack based on diffusion models that can achieve both good transferability and imperceptibility.
Sparse probing paper full code.
A curated list of awesome resources for getting-started-with and staying-in-touch-with Artificial Intelligence Alignment research.
Scan your AI/ML models for problems before you put them into production.
Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.
To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."