Joint loss in pretraining #21

zhangliang-04 · 2021-09-15T02:22:36Z

Hi,
We found that video text joint loss in pretraining is calculated from masked video and text. Why not use the origin video and text like retrieval finetune?

UniVL/modules/modeling.py

Line 258 in 0a7c07f

 sim_matrix_text_visual = self.get_similarity_logits(sequence_output_alm, visual_output_alm, 

ArrowLuo · 2021-09-15T08:43:13Z

Hi @zhangliang-04, we use the masked sequences for the consistency of other losses. An elaborate design for the retrieval task may benefit from a non-masked version, however, we have not tested on it. Maybe it can improve performance further.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Joint loss in pretraining #21

Joint loss in pretraining #21

zhangliang-04 commented Sep 15, 2021

ArrowLuo commented Sep 15, 2021 •

edited

Joint loss in pretraining #21

Joint loss in pretraining #21

Comments

zhangliang-04 commented Sep 15, 2021

ArrowLuo commented Sep 15, 2021 • edited

ArrowLuo commented Sep 15, 2021 •

edited