Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Illegal Memory Access In FastRCNN with ROIAlign #72

Open
tkreiman opened this issue Feb 5, 2021 · 0 comments
Open

CUDA Illegal Memory Access In FastRCNN with ROIAlign #72

tkreiman opened this issue Feb 5, 2021 · 0 comments

Comments

@tkreiman
Copy link

tkreiman commented Feb 5, 2021

After line 162 in fast_rcnn.py which runs the following code:
roi_align_res = self.roi_align(img_feats['body4'], rois).type(images.dtype)

I get the following CUDA Memory error with the bounding boxes tensor when trying to print it out (the same error occurs later on in the code on the first access to the variable boxes, but I pinpointed that after line 162 runs this error starts happening):
Traceback (most recent call last):

File "vqa/mytrain_end2end.py", line 65, in <module>
   main()
 File "vqa/mytrain_end2end.py", line 57, in main
   rank, model = train_net(args, config)
 File "/home/gabriel/Desktop/Toby/VL-BERT/vqa/../vqa/function/mytrain.py", line 336, in train_net
   gradient_accumulate_steps=config.TRAIN.GRAD_ACCUMULATE_STEPS)
 File "/home/gabriel/Desktop/Toby/VL-BERT/vqa/../common/trainer.py", line 115, in train
   outputs, loss = net(*batch)
 File "/home/gabriel/anaconda3/envs/vl-bert/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
   result = self.forward(*input, **kwargs)
 File "/home/gabriel/Desktop/Toby/VL-BERT/vqa/../common/module.py", line 22, in forward
   return self.train_forward(*inputs, **kwargs)
 File "/home/gabriel/Desktop/Toby/VL-BERT/vqa/../vqa/modules/myresnet_vlbert_for_vqa.py", line 203, in train_forward
   segms=None)
 File "/home/gabriel/anaconda3/envs/vl-bert/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
   result = self.forward(*input, **kwargs)
 File "/home/gabriel/Desktop/Toby/VL-BERT/vqa/../common/fast_rcnn.py", line 163, in forward
   print("3", boxes)
 File "/home/gabriel/anaconda3/envs/vl-bert/lib/python3.6/site-packages/torch/tensor.py", line 179, in __repr__
   return torch._tensor_str._str(self)
 File "/home/gabriel/anaconda3/envs/vl-bert/lib/python3.6/site-packages/torch/_tensor_str.py", line 372, in _str
   return _str_intern(self)
 File "/home/gabriel/anaconda3/envs/vl-bert/lib/python3.6/site-packages/torch/_tensor_str.py", line 352, in _str_intern
   tensor_str = _tensor_str(self, indent)
 File "/home/gabriel/anaconda3/envs/vl-bert/lib/python3.6/site-packages/torch/_tensor_str.py", line 241, in _tensor_str
   formatter = _Formatter(get_summarized_data(self) if summarize else self)
 File "/home/gabriel/anaconda3/envs/vl-bert/lib/python3.6/site-packages/torch/_tensor_str.py", line 89, in __init__
   nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: CUDA error: an illegal memory access was encountered

My environment is as follows:

torchvision 0.8.2
pytorch 1.7.1
cudatoolkit 11.0.221

I wonder if there is a different ROI align I could use instead, or ways to get around this issue. Thanks for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant