Robustness evaluation #69

steventkrawczyk · 2023-08-10T02:45:28Z

🚀 The feature

Request from potential user: "There are two main aspects, 1) adjusting prompts that changing semantic words does not trigger hallucination, 2) the prompt itself is such that LLM doesnt slip away from instruction"

Idea: for (1) use prompt templates to substitute words, run evals to check semantic similarity of all results. For (2) use auto-evaluation given instruction, prompt, and response to determine if the LLM followed instructions.

Motivation, pitch

We got this request from a potential user, and also robustness is a common concern in LLM evaluation

Alternatives

No response

Additional context

No response

RigvedRocks · 2023-12-26T05:34:35Z

Hey, I am working on the issue and I have generated 2 sample scripts - one involving prompt substitution and the other for auto evaluation. I am using Promptbench in both scripts. Can you please guide me as to how to integrate the scripts into your experiments? I am joining your discord group wherein we can discuss this issue in detail.

steventkrawczyk added help wanted Extra attention is needed good first issue Good for newcomers labels Aug 10, 2023

RigvedRocks mentioned this issue Dec 28, 2023

Generated Robustness Evaluation Sample Scripts #117

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Robustness evaluation #69

Robustness evaluation #69

steventkrawczyk commented Aug 10, 2023

RigvedRocks commented Dec 26, 2023

Robustness evaluation #69

Robustness evaluation #69

Comments

steventkrawczyk commented Aug 10, 2023

🚀 The feature

Motivation, pitch

Alternatives

Additional context

RigvedRocks commented Dec 26, 2023