Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: expected scalar type Half but found Float #2

Closed
Jessejx opened this issue Apr 6, 2021 · 7 comments
Closed

RuntimeError: expected scalar type Half but found Float #2

Jessejx opened this issue Apr 6, 2021 · 7 comments
Labels
bug Something isn't working

Comments

@Jessejx
Copy link

Jessejx commented Apr 6, 2021

Thanks for your great work. I met this problem when I use resnet as backbone.

Traceback (most recent call last):
File "train.py", line 527, in
train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 293, in train
pred = model(imgs) # forward
File "/home/jx/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jx/Workings/flexible-yolov5-main/od/models/model.py", line 67, in forward
out = self.backbone(x)
File "/home/jx/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jx/Workings/flexible-yolov5-main/od/models/backbone/resnet.py", line 208, in forward
x2 = self.layer2(x1)
File "/home/jx/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jx/anaconda3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/jx/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jx/Workings/flexible-yolov5-main/od/models/backbone/resnet.py", line 119, in forward
out = self.conv2(out, offset)
File "/home/jx/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jx/anaconda3/lib/python3.8/site-packages/torchvision/ops/deform_conv.py", line 143, in forward
return deform_conv2d(input, offset, self.weight, self.bias, stride=self.stride,
File "/home/jx/anaconda3/lib/python3.8/site-packages/torchvision/ops/deform_conv.py", line 76, in deform_conv2d
return torch.ops.torchvision.deform_conv2d(
RuntimeError: expected scalar type Half but found Float

@Bobo-y
Copy link
Owner

Bobo-y commented Apr 7, 2021

@Jessejx hi, this problem is caused by resnet with dcn. I have updated the configuration and code set dcn as False. so, you can use resnet without dcn. please update your code.
And I also try to solve this problem so that the network can be trained with DCN,

if __name__ == '__main__': import torch args = {'version': '50', 'dcn': True} x = torch.zeros(2, 3, 640, 640).cuda() net = resnet(pretrained=False, **args).cuda() y = net(x) for u in y: print(u.shape) print(net.out_channels)

only test resnet, there is no error, but when i train total network, the error is same as you. And id have check their weights:

weights of DeformConv2d

image
weights of norml conv2d

image

Although they are all float32, we can see that the former should be in half format. But I don't know why. Therefore, don't use the DCN for now. I will solve it when I have time. thanks!

@Bobo-y Bobo-y closed this as completed Apr 7, 2021
@Bobo-y
Copy link
Owner

Bobo-y commented Apr 12, 2021

I think this is caused by the mixed precision training of apex, and the DCN support for this is not very good. I train resnet with dcn for DBNet, There was no problem. @Jessejx

@Jessejx
Copy link
Author

Jessejx commented Apr 15, 2021

I think this is caused by the mixed precision training of apex, and the DCN support for this is not very good. I train resnet with dcn for DBNet, There was no problem. @Jessejx

Thanks again.

@Bobo-y
Copy link
Owner

Bobo-y commented Dec 24, 2021

ultralytics/yolov5#2981

@ratom
Copy link

ratom commented Jul 17, 2022

I tried to train today, still this issue is not fixed. I also set half=False, still the issue is not fixed. I got this issue while training, not during eval.
image

@ratom
Copy link

ratom commented Jul 17, 2022

Moreover when I complete the training, then it only gave the bounding box but not the corresponding name of the detected object
image

@Bobo-y
Copy link
Owner

Bobo-y commented Jul 17, 2022

@ratom pleasse give me more your configs info, next week i will check it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants