New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Robustness evaluation #69
Labels
Comments
steventkrawczyk
added
help wanted
Extra attention is needed
good first issue
Good for newcomers
labels
Aug 10, 2023
Hey, I am working on the issue and I have generated 2 sample scripts - one involving prompt substitution and the other for auto evaluation. I am using Promptbench in both scripts. Can you please guide me as to how to integrate the scripts into your experiments? I am joining your discord group wherein we can discuss this issue in detail. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
馃殌 The feature
Request from potential user: "There are two main aspects, 1) adjusting prompts that changing semantic words does not trigger hallucination, 2) the prompt itself is such that LLM doesnt slip away from instruction"
Idea: for (1) use prompt templates to substitute words, run evals to check semantic similarity of all results. For (2) use auto-evaluation given instruction, prompt, and response to determine if the LLM followed instructions.
Motivation, pitch
We got this request from a potential user, and also robustness is a common concern in LLM evaluation
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: