Why is Cosine Similarity Scaled for Zero-Shot Image Classifcation? #763
-
Hi All, I have a simple question based on the zero-shot image classification provided in the README. Why is the Cosine Similarity multiplied by 100? Is to reverse the normalization done in the previous steps? The line I am referring to is pasted below. Thank you very much in advance! text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1) |
Beta Was this translation helpful? Give feedback.
Answered by
gabrielilharco
Dec 21, 2023
Replies: 1 comment 2 replies
-
Hi @rsorbello. The 100 comes from the learnable logit scaling parameter used in the original paper, which they clip at 100. In general many models have a |
Beta Was this translation helpful? Give feedback.
2 replies
Answer selected by
rsorbello
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi @rsorbello. The 100 comes from the learnable logit scaling parameter used in the original paper, which they clip at 100. In general many models have a
logit_scale
which is learned during training and used to scale logits