#

ai-safety

Here are 91 public repositories matching this topic...

giskard

Giskard-AI / giskard

🐢 Open-Source Evaluation & Testing for LLMs and ML models

Updated May 24, 2024
Python

jphall663 / awesome-machine-learning-interpretability

A curated list of awesome responsible machine learning resources.

Updated May 24, 2024

normster / llm_rules

RuLES: a benchmark for evaluating rule-following in language models

ai-safety ai-security gpt-4

Updated May 24, 2024
Python

IQTLabs / daisybell

Scan your AI/ML models for problems before you put them into production.

cybersecurity ai-safety bias-correction bias-detection ai-alignment model-poison ai-assurance

Updated May 23, 2024
Python

SafeAILab / RAIN

[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning

alignment ai-safety large-language-models

Updated May 23, 2024
Python

WindVChen / VCO-AP

A novel physical adversarial attack tackling the Digital-to-Physical Visual Inconsistency problem.

remote-sensing object-detection ai-safety adversarial-attacks physical-attacks oriented-object-detection adversarial-patches physical-adversarial-attacks

Updated May 23, 2024
Python

yyy01 / PAC

The official implementation of the paper "Data Contamination Calibration for Black-box LLMs" (ACL 2024)

nlp machine-learning ai-safety data-contamination membership-inference-attack large-language-models

Updated May 21, 2024
Python

StampyAI / stampy-ui

AI Safety Q&A web frontend

Updated May 24, 2024
TypeScript

zhoumingyi / ModelObfuscator

Code for our paper "Modelobfuscator: Obfuscating Model Information to Protect Deployed ML-Based Systems" that has been published by ISSTA'23

obfuscation deep-learning ai-safety

Updated May 18, 2024
C++

PKU-YuanGroup / Hallucination-Attack

Attack to induce LLMs within hallucinations

nlp machine-learning deep-learning ai-safety adversarial-attacks hallucinations llm llm-safety

Updated May 17, 2024
Python

dynaroars / neuralsat

DPLL(T)-based Verification tool for DNNs

abstraction sat-solver software-verification ai-safety robustness dpll adversarial-attacks robustness-verification dnn-verification ai-assurance neural-network-veri

Updated May 13, 2024
Python

dynaroars / vnncomp-benchmark-generation

benchmark verification ai-safety ai-assurance vnncomp

Updated May 12, 2024
Python

riceissa / aiwatch

Website to track people, organizations, and products (tools, websites, etc.) in AI safety

mysql php database dataset ai-safety data-portal aisafety ai-alignment

Updated May 24, 2024
HTML

levitation-opensource / ai-safety-gridworlds

Extended, multi-agent and multi-objective (MaMoRL) environments based on DeepMind's AI Safety Gridworlds. This is a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. It is made compatible with OpenAI's Gym/Gymnasium and Farama Foundation PettingZoo.

Updated May 8, 2024
Python

jacksonkarel / recursive-other-improvement

ai artificial-intelligence code-generation agents automl autonomous-agents ai-safety ai-agents large-language-models llms

Updated May 7, 2024
Jupyter Notebook

Dunchead / ai-safety

Mapping AI risks and possible solutions

ai ai-safety ai-risk

Updated May 6, 2024
JavaScript

tamlhp / awesome-privex

Awesome PrivEx: Privacy-Preserving Explainable AI (PPXAI)

awesome explanation risk-assessment ai-safety model-explanation explainable-ai xai ai-risk privacy-preserving-explainable-ai privacy-preserving-xai privacy-preserving-explanation privacy-preserving-model-explanation privex

Updated Apr 23, 2024

AlexTMjugador / redwoodresearch-interp-docker

📦 Redwood Research's transformer interpretability tools, conveniently packaged in a Docker container for simple and reproducible deployments.

docker ai ai-safety redwood-research ai-interpretability

Updated Apr 21, 2024
Dockerfile

PKU-Alignment / safe-rlhf

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Updated Apr 20, 2024
Python

Nkluge-correa / Aira

Aira is a series of chatbots developed as an experimentation playground for value alignment.

natural-language-processing ai chatbot alignment language-model ai-safety

Updated Apr 17, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."