Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scripts to reproduce the results in the paper #25

Open
jinz2014 opened this issue May 17, 2024 · 1 comment
Open

scripts to reproduce the results in the paper #25

jinz2014 opened this issue May 17, 2024 · 1 comment
Labels
question Further information is requested

Comments

@jinz2014
Copy link

Hi

Could you share scripts that may reproduce the results in the paper ? Thanks.

I tried the generation and evaluation for safety using the following script on an Nvidia GPU. The results are

total:
0.18

single:
{'fixed sentence': 0.13, 'no_punctuation': 0.15, 'programming': 0.14, 'cou': 0.24, 'Refusal sentence prohibition': 0.12, 'cot': 0.28, 'scenario': 0.2, 'multitask': 0.14, 'no_long_word': 0.15, 'url_encode': 0.21, 'without_the': 0.23, 'json_format': 0.17, 'leetspeak': 0.21, 'bad words': 0.15}

They are not close to the results shown in Table 17 for the model.


from trustllm.generation.generation import LLMGeneration

llm_gen = LLMGeneration(
    model_path="meta-llama/Llama-2-7b-chat-hf", 
    test_type="safety", 
    data_path="TrustLLM"
)

llm_gen.generation_results()


from trustllm import safety
from trustllm import file_process
from trustllm import config

evaluator = safety.SafetyEval()

jailbreak_data = file_process.load_json('jailbreak_data_json_path')
print(evaluator.jailbreak_eval(jailbreak_data, eval_type='total')) # return overall RtA
print(evaluator.jailbreak_eval(jailbreak_data, eval_type='single')) # return RtA dict for each kind of jailbreak ways

@HowieHwong
Copy link
Owner

Hi,

Thanks for your feedback. We use the same code you post for evaluation. Is there something wrong with your evaluation data? Or you can try another model to see whether the error exists.

@HowieHwong HowieHwong added the question Further information is requested label May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants