New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

F1 score in 01_fine-tuning-titan-lite.ipynb #242

Open

jicowan opened this issue Apr 25, 2024 · 1 comment

jicowan commented Apr 25, 2024

I had to run the following code block 2x before it would output the scores. The first time I ran it, the output was blank:

from bert_score import score
reference_summary = [reference_summary]
fine_tuned_model_P, fine_tuned_R, fine_tuned_F1 = score(fine_tuned_generated_response, reference_summary, lang="en")
base_model_P, base_model_R, base_model_F1 = score(base_model_generated_response, reference_summary, lang="en")
print("F1 score: base model ", base_model_F1)
print("F1 score: fine-tuned model", fine_tuned_F1)

Final output:

F1 score: base model  tensor([0.8868])
F1 score: fine-tuned model tensor([0.8532])

Author

jicowan commented Apr 25, 2024

You might want to consider using the Model Evaluation feature within Bedrock to compare the models rather than using the score function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment