How to evaluate BLEU score on LM1B? #6

jzhang38 · 2022-12-15T08:55:15Z

Dear authors,

I understand that you plan to release your code on January. But could you share more details regarding how you evaluate the BLEU score and PPL on the LM1B dataset? I am also working on Diffusion Model for text and may potentially cite your paper. Thanks!

Hzfinfdu · 2022-12-15T09:24:16Z

Hi,

We computed the BLEU score with all test data as references and reported the average BLEU score of each generated sentence. We sampled 1K sentences respectively for evaluating BLEU and S-BLEU.
For PPL, the ELBO on the test set is an upper bound of token-wise NLL. And we first convert such bound to per-word NLL and use this to get the per-word PPL.
Hope this helps!

yujianll · 2023-01-05T19:23:27Z

@Hzfinfdu Thanks for the great work!
I have a follow up question. When you say per-word NLL, do you mean to calculate $\mathcal{L}_{vlb}$ in Eq. 3 for each token? Do you sum up NLL for all tokens in the sequence and use it as NLL for the sequence?
Also, I noticed that in Fig. 4, the validation ELBO is around 110 after training. However, the test set PPL is around 60~70. I wonder why would these two values have such a big difference.

Hzfinfdu · 2023-01-06T06:14:49Z

@yujianll Hi,

Yes, we sum up NLL for all tokens in the sequence as NLL for the sequence.
The validation ELBO is around 110. And the average number of words in each sequence in the test set is around 26. Thus per-word NLL is around 4.23. The test PPL is obtained by exp(4.23).

yujianll · 2023-01-06T06:44:08Z

@Hzfinfdu Thanks for the reply!
I have another low-level question. When you calculate NLL on test set, do you sum for all T diffusion steps, or do you sample a few time steps for calculation? If you do sample, how many time steps do you use?

Hzfinfdu · 2023-01-06T06:52:23Z

@yujianll Hi,

We trained DiffusionBERT with 512 steps and used DDIM sampling to uniformly sample 128 steps on test set, both for NLL calculation and generation.

Hope this helps!

yujianll · 2023-01-06T18:42:06Z

Thanks, this helps a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to evaluate BLEU score on LM1B? #6

How to evaluate BLEU score on LM1B? #6

jzhang38 commented Dec 15, 2022

Hzfinfdu commented Dec 15, 2022

yujianll commented Jan 5, 2023 •

edited

Hzfinfdu commented Jan 6, 2023

yujianll commented Jan 6, 2023

Hzfinfdu commented Jan 6, 2023

yujianll commented Jan 6, 2023

How to evaluate BLEU score on LM1B? #6

How to evaluate BLEU score on LM1B? #6

Comments

jzhang38 commented Dec 15, 2022

Hzfinfdu commented Dec 15, 2022

yujianll commented Jan 5, 2023 • edited

Hzfinfdu commented Jan 6, 2023

yujianll commented Jan 6, 2023

Hzfinfdu commented Jan 6, 2023

yujianll commented Jan 6, 2023

yujianll commented Jan 5, 2023 •

edited