-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multiprocessing.Pool.map hangs (joblib.Parallel/delayed works) #1396
Comments
When trying to submit this bug I managed to reproduce the second problem, namely staleness:
Note: For reproduction it would be nice to be able to simulate the complete lifecycle - I think you can do that from the test code, but doing it from within a notebook (even with a 'beware-api-quicksand" warning, i.e. marimo.edit_cell_by_id(id, new_code) would be nice. Note 2: The above hang was done using the current git version, could be related. But I suspect some shared memory related issue since when I closed the notebook I got a python warning about that (I failed to copy it). |
Note 3: joblib's Parallel/delayed seems to work fine |
Thanks for the thorough bug reports -- will look into it. |
Does it always hang, or only sometimes? I'm unable to reproduce the hanging on my machine unfortunately so far |
Similarly is this something you can reproduce consistently, or only sometimes? I also couldn't reproduce this :/ EDIT: Just kidding -- seeing the staleness issue now ... |
If I recreate the process pool, it uses the latest value of the function. It also doesn't hang. I would suggest using the pool context manager, that way you don't have to think about managing your pool and recreating it: import marimo
__generated_with = "0.5.2"
app = marimo.App()
@app.cell
def __():
from multiprocessing import Pool
return Pool,
@app.cell
def __():
def f(x):
return 10 + x
return f,
@app.cell
def __(Pool, f):
with Pool() as pool:
outputs = list(pool.map(f, [1,2,3]))
outputs
return outputs, pool
if __name__ == "__main__":
app.run()
For what it's worth, here's a Python script that fails in an analogous way: from multiprocessing import Pool
def f(x):
return x + 10
if __name__ == "__main__":
pool = Pool()
print(list(pool.map(f, [1, 2, 3])))
# try to redefine `f` -- the pool won't pick it up.
def f(x):
return x + 11
# uses the "old" value of `f`
print(list(pool.map(f, [1, 2, 3])))
# try to call `g` -- the pool will hang
def g(x):
return x + 11
print(list(pool.map(g, [1, 2, 3])))
print("I won't be printed") |
In summary, my understanding is that I don't think there's anything we can do to fix this, or even fail gracefully. |
Thanks, that works, it is still faster than joblib.Parallel(backend='multiprocessing') this way, and probably what joblib did that caused it to work. You can close this (or should I? not sure what the workflow you prefer is) |
Great, thanks for confirming. I'll close the issue. |
Describe the bug
I'm trying to use multiprocessing.Pool.map. I have a reproducer below for hanging, locally I could get it to work but then ignore updates to the called function as well.
Running this notebook the 4th cell hangs and when interrupted (it is responsive) the stack is:
Environment
Code to reproduce
The text was updated successfully, but these errors were encountered: