Loss is None while training #26

matteosodano · 2021-08-31T08:13:37Z

I tried a training with the SUNRGBD dataset, and got the error Loss is None. Inspecting the code, it seems like it can only be caused by the loss function in ESANet/src/utils.py, and specifically here:

number_of_pixels_per_class = torch.bincount(targets.flatten().type(self.dtype), minlength=self.num_classes) divisor_weighted_pixel_sum = torch.sum(number_of_pixels_per_class[1:] * self.weight) # without void losses.append(torch.sum(loss_all) / divisor_weighted_pixel_sum)

My assumption is that divisor_weighted_pixel_sum can be 0 with some very 'unlucky' random cropping.

The following modification seems to solve the problem:
divisor_weighted_pixel_sum = torch.sum(number_of_pixels_per_class[1:] * self.weight).clamp(min=1e-5) # without void

Let me know if you ever experienced something similar, or if you have a better fix.

The text was updated successfully, but these errors were encountered:

danielS91 · 2021-09-03T14:46:18Z

We never faced this problem. The factor for random scaling is chosen between 1.0 and 1.4. So it's quite unlikely to pick a batch full of void. Which dataset and batchsize do you use for training?

matteosodano · 2021-09-03T15:05:55Z

I was using the SUNRGBD dataset with default parameters (thus, batch_size = 8). It was the very first run I did with the code, so I did not modify anything. I thought about the cropping because it happened at a random epoch, so it should not be a problem of corrupted image or similar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss is None while training #26

Loss is None while training #26

matteosodano commented Aug 31, 2021 •

edited

danielS91 commented Sep 3, 2021

matteosodano commented Sep 3, 2021

Loss is None while training #26

Loss is None while training #26

Comments

matteosodano commented Aug 31, 2021 • edited

danielS91 commented Sep 3, 2021

matteosodano commented Sep 3, 2021

matteosodano commented Aug 31, 2021 •

edited