an inplace operation #22

zhl98 · 2023-09-19T04:59:28Z

Hello, I encountered this problem during the training process. Do you know where the problem is? The dimension of torch. cuda. LongTensor is [1,25]

antoyang · 2023-09-26T12:37:55Z

Hi, can you give more context on the issue so that I can help you?

SkylerSuen · 2023-10-20T07:49:52Z

Hello, I encountered this problem during the training process. Do you know where the problem is? The dimension of torch. cuda. LongTensor is [1,25]

Hi, I have the same problem, did you figure out how to fix this?

Infinitywxh · 2023-11-22T13:25:07Z

I encountered a similar error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.LongTensor [1, 13]] is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I set torch.autograd.set_detect_anomaly(True), and get the following information:


/scratch/miniconda3/lib/python3.8/site-packages/torch/autograd/__init__.py:200: UserWarning: Error detected in EmbeddingBackward0. Traceback of forward call that caused the error:
  File "main_new.py", line 651, in <module>
    main(args)
  File "main_new.py", line 591, in main
    train_stats = train_one_epoch(
  File "/scratch/TubeDETR-main/engine.py", line 67, in train_one_epoch
    memory_cache = model(
  File "/scratch/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/scratch/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1156, in forward
    output = self._run_ddp_forward(*inputs, **kwargs)
  File "/scratch/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward
    return module_to_run(*inputs[0], **kwargs[0])  # type: ignore[index]
  File "/scratch/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/scratch/TubeDETR-main/models/tubedetr.py", line 190, in forward
    memory_cache = self.transformer(
  File "/scratch/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/scratch/TubeDETR-main/models/transformer.py", line 256, in forward
    encoded_text = self.text_encoder(**tokenized)
  File "/scratch/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/scratch/miniconda3/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py", line 828, in forward
    embedding_output = self.embeddings(
  File "/scratch/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/scratch/miniconda3/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py", line 126, in forward
    token_type_embeddings = self.token_type_embeddings(token_type_ids)
  File "/scratch/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/scratch/miniconda3/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 162, in forward
    return F.embedding(
  File "/scratch/miniconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
 (Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:114.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
  File "main_new.py", line 651, in <module>
    main(args)
  File "main_new.py", line 591, in main
    train_stats = train_one_epoch(
  File "/scratch/TubeDETR-main/engine.py", line 148, in train_one_epoch
    losses.backward()
  File "/scratch/miniconda3/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/scratch/miniconda3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.LongTensor [1, 13]] is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

furkancoskun · 2024-06-07T16:18:50Z

I encountered the same problem. Is there anyone who solved it?

furkancoskun · 2024-06-08T12:30:35Z

I solved the problem with adding broadcast_buffers=False to torch.nn.parallel.DistributedDataParallel

change main.py line 373 as following

        model = torch.nn.parallel.DistributedDataParallel(
            model, device_ids=[args.gpu], find_unused_parameters=True, broadcast_buffers=False,
        )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

an inplace operation #22

an inplace operation #22

zhl98 commented Sep 19, 2023

antoyang commented Sep 26, 2023

SkylerSuen commented Oct 20, 2023

Infinitywxh commented Nov 22, 2023

furkancoskun commented Jun 7, 2024

furkancoskun commented Jun 8, 2024

an inplace operation #22

an inplace operation #22

Comments

zhl98 commented Sep 19, 2023

antoyang commented Sep 26, 2023

SkylerSuen commented Oct 20, 2023

Infinitywxh commented Nov 22, 2023

furkancoskun commented Jun 7, 2024

furkancoskun commented Jun 8, 2024