scripts to reproduce the results in the paper #25

jinz2014 · 2024-05-17T02:15:48Z

Hi

Could you share scripts that may reproduce the results in the paper ? Thanks.

I tried the generation and evaluation for safety using the following script on an Nvidia GPU. The results are

total:
0.18

single:
{'fixed sentence': 0.13, 'no_punctuation': 0.15, 'programming': 0.14, 'cou': 0.24, 'Refusal sentence prohibition': 0.12, 'cot': 0.28, 'scenario': 0.2, 'multitask': 0.14, 'no_long_word': 0.15, 'url_encode': 0.21, 'without_the': 0.23, 'json_format': 0.17, 'leetspeak': 0.21, 'bad words': 0.15}

They are not close to the results shown in Table 17 for the model.


from trustllm.generation.generation import LLMGeneration

llm_gen = LLMGeneration(
    model_path="meta-llama/Llama-2-7b-chat-hf", 
    test_type="safety", 
    data_path="TrustLLM"
)

llm_gen.generation_results()


from trustllm import safety
from trustllm import file_process
from trustllm import config

evaluator = safety.SafetyEval()

jailbreak_data = file_process.load_json('jailbreak_data_json_path')
print(evaluator.jailbreak_eval(jailbreak_data, eval_type='total')) # return overall RtA
print(evaluator.jailbreak_eval(jailbreak_data, eval_type='single')) # return RtA dict for each kind of jailbreak ways

The text was updated successfully, but these errors were encountered:

HowieHwong · 2024-05-17T06:06:53Z

Hi,

Thanks for your feedback. We use the same code you post for evaluation. Is there something wrong with your evaluation data? Or you can try another model to see whether the error exists.

HowieHwong added the question Further information is requested label May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts to reproduce the results in the paper #25

scripts to reproduce the results in the paper #25

jinz2014 commented May 17, 2024

HowieHwong commented May 17, 2024

scripts to reproduce the results in the paper #25

scripts to reproduce the results in the paper #25

Comments

jinz2014 commented May 17, 2024

HowieHwong commented May 17, 2024