CUDA Illegal Memory Access In FastRCNN with ROIAlign #72

tkreiman · 2021-02-05T00:21:23Z

After line 162 in fast_rcnn.py which runs the following code:
roi_align_res = self.roi_align(img_feats['body4'], rois).type(images.dtype)

I get the following CUDA Memory error with the bounding boxes tensor when trying to print it out (the same error occurs later on in the code on the first access to the variable boxes, but I pinpointed that after line 162 runs this error starts happening):
Traceback (most recent call last):

File "vqa/mytrain_end2end.py", line 65, in <module>
   main()
 File "vqa/mytrain_end2end.py", line 57, in main
   rank, model = train_net(args, config)
 File "/home/gabriel/Desktop/Toby/VL-BERT/vqa/../vqa/function/mytrain.py", line 336, in train_net
   gradient_accumulate_steps=config.TRAIN.GRAD_ACCUMULATE_STEPS)
 File "/home/gabriel/Desktop/Toby/VL-BERT/vqa/../common/trainer.py", line 115, in train
   outputs, loss = net(*batch)
 File "/home/gabriel/anaconda3/envs/vl-bert/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
   result = self.forward(*input, **kwargs)
 File "/home/gabriel/Desktop/Toby/VL-BERT/vqa/../common/module.py", line 22, in forward
   return self.train_forward(*inputs, **kwargs)
 File "/home/gabriel/Desktop/Toby/VL-BERT/vqa/../vqa/modules/myresnet_vlbert_for_vqa.py", line 203, in train_forward
   segms=None)
 File "/home/gabriel/anaconda3/envs/vl-bert/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
   result = self.forward(*input, **kwargs)
 File "/home/gabriel/Desktop/Toby/VL-BERT/vqa/../common/fast_rcnn.py", line 163, in forward
   print("3", boxes)
 File "/home/gabriel/anaconda3/envs/vl-bert/lib/python3.6/site-packages/torch/tensor.py", line 179, in __repr__
   return torch._tensor_str._str(self)
 File "/home/gabriel/anaconda3/envs/vl-bert/lib/python3.6/site-packages/torch/_tensor_str.py", line 372, in _str
   return _str_intern(self)
 File "/home/gabriel/anaconda3/envs/vl-bert/lib/python3.6/site-packages/torch/_tensor_str.py", line 352, in _str_intern
   tensor_str = _tensor_str(self, indent)
 File "/home/gabriel/anaconda3/envs/vl-bert/lib/python3.6/site-packages/torch/_tensor_str.py", line 241, in _tensor_str
   formatter = _Formatter(get_summarized_data(self) if summarize else self)
 File "/home/gabriel/anaconda3/envs/vl-bert/lib/python3.6/site-packages/torch/_tensor_str.py", line 89, in __init__
   nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: CUDA error: an illegal memory access was encountered

My environment is as follows:

torchvision 0.8.2
pytorch 1.7.1
cudatoolkit 11.0.221

I wonder if there is a different ROI align I could use instead, or ways to get around this issue. Thanks for the help.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA Illegal Memory Access In FastRCNN with ROIAlign #72

CUDA Illegal Memory Access In FastRCNN with ROIAlign #72

tkreiman commented Feb 5, 2021

CUDA Illegal Memory Access In FastRCNN with ROIAlign #72

CUDA Illegal Memory Access In FastRCNN with ROIAlign #72

Comments

tkreiman commented Feb 5, 2021