-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss is None while training #26
Comments
We never faced this problem. The factor for random scaling is chosen between 1.0 and 1.4. So it's quite unlikely to pick a batch full of void. Which dataset and batchsize do you use for training? |
I was using the SUNRGBD dataset with default parameters (thus, batch_size = 8). It was the very first run I did with the code, so I did not modify anything. I thought about the cropping because it happened at a random epoch, so it should not be a problem of corrupted image or similar. |
I tried a training with the SUNRGBD dataset, and got the error
Loss is None
. Inspecting the code, it seems like it can only be caused by the loss function inESANet/src/utils.py
, and specifically here:number_of_pixels_per_class = torch.bincount(targets.flatten().type(self.dtype), minlength=self.num_classes) divisor_weighted_pixel_sum = torch.sum(number_of_pixels_per_class[1:] * self.weight) # without void losses.append(torch.sum(loss_all) / divisor_weighted_pixel_sum)
My assumption is that
divisor_weighted_pixel_sum
can be 0 with some very 'unlucky' random cropping.The following modification seems to solve the problem:
divisor_weighted_pixel_sum = torch.sum(number_of_pixels_per_class[1:] * self.weight).clamp(min=1e-5) # without void
Let me know if you ever experienced something similar, or if you have a better fix.
The text was updated successfully, but these errors were encountered: