New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ITM loss #126
Comments
My recent work also has the same problem. Because of the overlap of text or images, the model cannot learn the difference from the negative samples, resulting in the loss of ITC and ITM tasks not converging. Have you solved this problem? Thanks! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi, thanks again for this great work!
During pre-training phase, for example taking the VG dataset, we have multiple captions corresponding to the same image. It's not clear to me, when you do ITM loss if the same image with different captions happens to appear multiple times in a batch it will become a hard negative example for it but it is actually a valid description for that image, even if as by implementation it will have a label 0, i.e., not a match. Could you please explain the reasoning here? Should we prevent somehow that the same image appears multiple times in the batch to avoid this issue?
The text was updated successfully, but these errors were encountered: