Get the error: AttributeError: Can't pickle local object 'convert_frame.<locals>._convert_frame' #93470

fladventurerob · 2022-12-07T08:47:08Z

🐛 Describe the bug

When adding the line:
model = torch.compile(model) after loading the model, this error occurs. When removing the line, the script functions as intended.

Error logs

File "/opt/anaconda3/envs/ml1/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/opt/anaconda3/envs/ml1/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/opt/anaconda3/envs/ml1/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/opt/anaconda3/envs/ml1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/opt/anaconda3/envs/ml1/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/opt/anaconda3/envs/ml1/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/opt/anaconda3/envs/ml1/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'convert_frame.._convert_frame'

Minified repro

No response

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @wconstab @bdhirsh @anijain2305 @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @aakhundov @Xia-Weiwen @ipiszy @soumith @ngimel

The text was updated successfully, but these errors were encountered:

voznesenskym · 2022-12-07T09:01:34Z

torch.compile does not emit a model in the way you expect it to, and I think maybe some new documentation has led you astray.

If you intend to pickle the exported model, give export a try. See the export section of https://pytorch.org/get-started/pytorch-2.0/

wconstab · 2022-12-07T16:07:35Z

it sounds like @fladventurerob is trying to compile the model after loading it, not before exporting it.

however, @fladventurerob it would be helpful if you can provide a runnable script for us to look at, rather than just a description.

soumith · 2022-12-07T19:04:55Z

it might be because of multiprocess compile.
@fladventurerob can you give this a try:

At the top of your script, after you import torch, add these two lines:

from torch._inductor import config
config.compile_threads = 1

Then, add back the torch.compile call.

Does the error reproduce?

fladventurerob · 2022-12-08T01:22:34Z

it sounds like @fladventurerob is trying to compile the model after loading it, not before exporting it.

however, @fladventurerob it would be helpful if you can provide a runnable script for us to look at, rather than just a description.

You are correct. The model exists already. I was loading it into a forward test script. Based upon the documentation I was assuming this line was needed to use an existing model, rather than to export the model.

wconstab · 2022-12-09T17:12:53Z

@fladventurerob any update on whether compile_threads=1 helps, or are you able to provide a repro script for us?

ConnorBaker · 2023-01-06T23:37:23Z

I had the same issue -- setting compile_threads=1 fixed it for me!

ConnorBaker · 2023-02-02T05:20:21Z

For what it's worth, I'm using triton at HEAD and this isn't an issue I run into any more (I no longer need to specify compile_threads=1) when I built today.

I haven't followed the changes made very closely recently, but I did see this merged today (though perhaps unrelated): triton-lang/triton#1133.

VasLem · 2023-06-02T09:22:10Z

I would not close this issue. It randomly appears when you are going to torch save a model that needs a lot of time to be trained, which means that nothing is saved. Could you update then __getstate__ of that model internally, so that any multithreading related pickling issues won't occur? I find this to be a bug of torch.compile()

jansel · 2023-06-04T22:32:02Z

I believe this one will be fixed by #101651 when it lands

kxzxvbk · 2023-07-22T14:35:58Z

I met the same bug, here is the simplest test case I can offer:

import torch
from torch import nn


class CNNModel(nn.Module):

    def __init__(
        self,
        class_number=10,
        input_channel=3,
        dropout=0.1,
        kernel_sizes=[5, 3, 3],
        paddings=[2, 1, 1],
        hidden_dims=[32, 32, 32]
    ):
        super(CNNModel, self).__init__()

        self.layers = []
        self.layers.append(nn.Conv2d(input_channel, hidden_dims[0], kernel_size=kernel_sizes[0], padding=paddings[0]))
        self.layers.append(nn.ReLU())
        for i in range(len(hidden_dims) - 1):
            self.layers.append(
                nn.Conv2d(hidden_dims[i], hidden_dims[i + 1], kernel_size=kernel_sizes[i + 1], padding=paddings[i + 1])
            )
            self.layers.append(nn.ReLU())
        self.layers.append(nn.Dropout(p=dropout))
        self.layers = nn.Sequential(*self.layers)

        self.glp = nn.AdaptiveAvgPool2d((1, 1))

        self.fc = nn.Sequential(nn.Flatten(), nn.Linear(hidden_dims[-1], class_number))

    def forward(self, x):
        x = self.layers(x)
        x = self.glp(x)
        x = self.fc(x)
        return x


if __name__ == '__main__':
    import pickle
    model = CNNModel()
    from torch._inductor import config
    config.compile_threads = 1
    model = torch.compile(model)
    pickle.dumps(model)

The error is:

Traceback (most recent call last):
  File "test", line 46, in <module>
    pickle.dumps(model)
AttributeError: Can't pickle local object 'convert_frame.<locals>._convert_frame'

Maybe this information helps :)

Parskatt · 2023-10-13T09:54:35Z

I'm having a similar issue, but with dataloaders with num_workers > 0.

(In particular, when using spawn instead of fork, I think this is probably cause it copies the entire environment)

Parskatt · 2023-10-13T09:56:54Z

#101107 seems relevant.

sascharo · 2023-11-24T17:48:48Z

config.compile_threads = 1 doesn't fix the error for me with 2.1.1.

antoinebrl · 2023-11-28T14:11:02Z

Hello! I encounter this issue while compiling a transformation used within the multi-processing context of the data loader.

Even though I intend for the transformation to run on the CPU, the GPU is detected leading to the following error:

RuntimeError: Cannot re-initialize CUDA in forked subprocess. 
    To use CUDA with multiprocessing, you must use the 'spawn' start method

Despite attempting to investigate the issue by commenting out calls to torch.cuda.get_rng_state() and torch.cuda.set_rng_state() in torch/_dynamo/convert_frame.py::wrap_convert_context and torch/_dynamo/utils.py::preserve_rng_state, the problem persists. I suspect that these two functions are called elsewhere (possibly in the _inductor or the compiled code itself). I'm happy to turn this into a dedicated issue.

In an effort to resolve this error, I followed the recommendation and utilized spawn as the multiprocessing context for the DataLoader. However, this led to the error mentioned in this issue:

AttributeError: Can't pickle local object 'convert_frame.<locals>._convert_frame'

While preparing a pull request, I attempted to move the definition of _convert_frame() outside of convert_frame(). This introduced a new challenge because two attributes are added to the function which change the object and breaks the serialization:

_pickle.PicklingError: Can't pickle <function _convert_frame at 0x7f64e40a5ea0>:
    it's not the same object as torch._dynamo.convert_frame._convert_frame

msaroufim · 2023-12-11T21:26:29Z

So unassigning myself from this because i couldn't justify spending more time, I got stuck with getting 3 final tests to pass here #101651

The core idea is just to wrap some functions into classes so they become picklable

jansel · 2024-03-26T18:16:07Z

@anijain2305 to provide update

ringohoffman · 2024-05-21T00:44:44Z

Possible partial fix:

gh-87533: Expand pickle importing to support non-package C-modules python/cpython#119152

anthai0908 · 2024-05-25T06:40:47Z

AttributeError: Can't pickle local object 'TrainAugmentation.init..'. Hi I got this error, and i actually don't know how to fix.

ringohoffman · 2024-05-25T07:24:13Z

@anthai0908 qfgaohao/pytorch-ssd#71 seems more relevant to you. I think your comment is really long and unrelated. It might be nice to remove it as to not clutter this issue.

anthai0908 · 2024-05-25T11:24:19Z

thanks @ringohoffman. After training and exporting to onnx, I have one question, is it possible to deploy inferenceon python 3.6.9 with onnx format?

fladventurerob added the bug label Dec 7, 2022

malfet transferred this issue from pytorch/torchdynamo Feb 1, 2023

drisspg added the oncall: pt2 label Feb 1, 2023

ezyang closed this as completed Feb 2, 2023

ezyang reopened this Jun 2, 2023

jansel assigned msaroufim Jun 4, 2023

jansel added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: dynamo labels Jun 4, 2023

jansel added the high priority label Nov 29, 2023

pytorch-bot bot added the triage review label Nov 29, 2023

penguinwu added the module: startup-tracing-compile Compilation mechanism or time spent in (re)compilation, tracing, startup label Dec 11, 2023

msaroufim removed their assignment Dec 11, 2023

mlazos assigned anijain2305 Jan 30, 2024

mlazos removed the triage review label Jan 30, 2024

anijain2305 added the dynamo-must-fix These bugs affect TorchDynamo reliability. label Jan 31, 2024

fxmarty mentioned this issue Feb 20, 2024

Make compiled models serializable #101107

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get the error: AttributeError: Can't pickle local object 'convert_frame.<locals>._convert_frame' #93470

Get the error: AttributeError: Can't pickle local object 'convert_frame.<locals>._convert_frame' #93470

fladventurerob commented Dec 7, 2022 •

edited by pytorch-bot bot

voznesenskym commented Dec 7, 2022 •

edited

wconstab commented Dec 7, 2022

soumith commented Dec 7, 2022

fladventurerob commented Dec 8, 2022

wconstab commented Dec 9, 2022

ConnorBaker commented Jan 6, 2023

ConnorBaker commented Feb 2, 2023

VasLem commented Jun 2, 2023 •

edited

jansel commented Jun 4, 2023

kxzxvbk commented Jul 22, 2023

Parskatt commented Oct 13, 2023 •

edited

Parskatt commented Oct 13, 2023

sascharo commented Nov 24, 2023

antoinebrl commented Nov 28, 2023

msaroufim commented Dec 11, 2023

jansel commented Mar 26, 2024

ringohoffman commented May 21, 2024

anthai0908 commented May 25, 2024 •

edited

ringohoffman commented May 25, 2024

anthai0908 commented May 25, 2024

Get the error: AttributeError: Can't pickle local object 'convert_frame.<locals>._convert_frame' #93470

Get the error: AttributeError: Can't pickle local object 'convert_frame.<locals>._convert_frame' #93470

Comments

fladventurerob commented Dec 7, 2022 • edited by pytorch-bot bot

🐛 Describe the bug

Error logs

Minified repro

voznesenskym commented Dec 7, 2022 • edited

wconstab commented Dec 7, 2022

soumith commented Dec 7, 2022

fladventurerob commented Dec 8, 2022

wconstab commented Dec 9, 2022

ConnorBaker commented Jan 6, 2023

ConnorBaker commented Feb 2, 2023

VasLem commented Jun 2, 2023 • edited

jansel commented Jun 4, 2023

kxzxvbk commented Jul 22, 2023

Parskatt commented Oct 13, 2023 • edited

Parskatt commented Oct 13, 2023

sascharo commented Nov 24, 2023

antoinebrl commented Nov 28, 2023

msaroufim commented Dec 11, 2023

jansel commented Mar 26, 2024

ringohoffman commented May 21, 2024

anthai0908 commented May 25, 2024 • edited

ringohoffman commented May 25, 2024

anthai0908 commented May 25, 2024

fladventurerob commented Dec 7, 2022 •

edited by pytorch-bot bot

voznesenskym commented Dec 7, 2022 •

edited

VasLem commented Jun 2, 2023 •

edited

Parskatt commented Oct 13, 2023 •

edited

anthai0908 commented May 25, 2024 •

edited