Add benchmarks and evals for jailbreaks #51

steventkrawczyk · 2023-08-02T00:11:03Z

🚀 The feature

As we add benchmarks, it would be good to cover common jailbreak scenarios. We should incorporate these benchmarks, and have auto-evals that check responses to see if they are "broken"

Motivation, pitch

https://github.com/llm-attacks/llm-attacks

Alternatives

No response

Additional context

No response

steventkrawczyk added enhancement New feature or request help wanted Extra attention is needed labels Aug 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarks and evals for jailbreaks #51

Add benchmarks and evals for jailbreaks #51

steventkrawczyk commented Aug 2, 2023

Add benchmarks and evals for jailbreaks #51

Add benchmarks and evals for jailbreaks #51

Comments

steventkrawczyk commented Aug 2, 2023

🚀 The feature

Motivation, pitch

Alternatives

Additional context