Question about answer ranking #118

dhansmair · 2023-01-31T13:55:59Z

Hi there, I see that in line

Line 198 in b9727e4

log_probs_sum = log_probs.sum(1)

you are using a sum to accumulate the loss for the tokens in the answer sequence. How does this behave if the possible answers have varying lengths? Shouldn't the loss be divided by the sequence length to get the average loss per token? Otherwise, won't the ranking be biased towards shorter sequences?

LiJunnan1992 · 2023-02-02T03:01:42Z

Hi, sum of log_probs = log of the multiplication of probs = log of the sequence prob

MLAlex1 · 2023-06-28T13:47:06Z

I think @dhansmair makes a good point - indeed I also think it will be biased if we do not divide by the length of each answer sequence.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about answer ranking #118

Question about answer ranking #118

dhansmair commented Jan 31, 2023

LiJunnan1992 commented Feb 2, 2023

MLAlex1 commented Jun 28, 2023

Question about answer ranking #118

Question about answer ranking #118

Comments

dhansmair commented Jan 31, 2023

LiJunnan1992 commented Feb 2, 2023

MLAlex1 commented Jun 28, 2023