Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the calculation method of loss when there are multiple gpus #101

Open
YinAoXiong opened this issue Mar 3, 2023 · 1 comment

Comments

@YinAoXiong
Copy link

retrieve_logits = logit_scale * torch.matmul(sequence_output, visual_output.t())

        if self.training:
            visual_output = allgather(visual_output, self.task_config)
            video_mask = allgather(video_mask, self.task_config)
            sequence_output = allgather(sequence_output, self.task_config)
            torch.distributed.barrier()

        visual_output = visual_output / visual_output.norm(dim=-1, keepdim=True)
        visual_output = self._mean_pooling_for_similarity_visual(visual_output, video_mask)
        visual_output = visual_output / visual_output.norm(dim=-1, keepdim=True)

        sequence_output = sequence_output.squeeze(1)
        sequence_output = sequence_output / sequence_output.norm(dim=-1, keepdim=True)

        logit_scale = self.clip.logit_scale.exp()
        retrieve_logits = logit_scale * torch.matmul(sequence_output, visual_output.t())

The current code seems to calculate the loss on the global similarity matrix on each gpu. Computing loss only for local and global features as described in openai/CLIP#132 seems to be more computationally and memory efficient.
Sorry to bother you if I misunderstood the code

@zsnoob
Copy link

zsnoob commented Nov 23, 2023

My idea is just like yours. After debugging, I found that during the training epoch, all GPUs compute the same global loss with the same sim_matrix instead of individually calculating local losses and then gathering and averaging them. There is a clear computation overlap here. I also have seen that in the function "train_epoch", there is an useless computation "loss.mean()" that seems do nothing after the model.forward(). We only need do local loss following the openai/CLIP#132 and do loss.backward(), The gradient synchronization will be done automatically by DDP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants