Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precision/Recall Curves #34

Open
normster opened this issue May 11, 2024 · 0 comments
Open

Precision/Recall Curves #34

normster opened this issue May 11, 2024 · 0 comments
Labels
Llama-Guard Attach this label if the issue is related to the Llama Guard codebase.

Comments

@normster
Copy link

Thank you for releasing Llama Guard 2, it looks like a very promising model!

I was wondering if it would be feasible to release precision/recall curves or numbers by harm category, for your internal benchmark evaluation? Or is there any hope of publicly releasing a small labeled test set for the community to evaluate for ourselves?

From Table 2 in the model card, it looks like a classification threshold of 0.5 results in a rather high FNRs for some categories and I'd like to use a classification threshold with more balanced errors, but am not sure how to go about tuning it myself because the new MLCommons harm taxonomy doesn't map 1:1 with public content classification datasets like OpenAI's moderation dataset.

@SimonWan SimonWan added the Llama-Guard Attach this label if the issue is related to the Llama Guard codebase. label May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Llama-Guard Attach this label if the issue is related to the Llama Guard codebase.
Projects
None yet
Development

No branches or pull requests

2 participants