New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage increases across multiple parallel_apply
#264
Comments
hogan-roblox
changed the title
Memory usage continuously increases over time
Memory usage increases across multiple Mar 11, 2024
parallel_apply
Could you please attach a sample CSV and the simplest I'm unable to reproduce your problems with the memory usage.
import pandas as pd
import pandarallel
pandarallel.pandarallel.initialize(progress_bar=True, nb_workers=120)
for _ in range(10):
df = pd.DataFrame({"foo": range(100_000)})
df = pd.DataFrame.from_dict(
df.sample(frac=1.0).parallel_apply(lambda x: x+1, axis=1).to_dict(),
orient="columns",
) You mentioned that this issue is no longer a blocker for you, so if you don't reply in a while, this issue should probably be closed. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
General
Acknowledgement
pandas
without alone (withoutpandarallel
)before writing a new bug report
Bug description
If I run continuous data processing tasks, each with a huge DataFrame using
parallel_apply
, their MEM footprints somehow accumulates.Observed behavior
My code logic looks like below.
All tasks should have similar footprints in MEM. However, from the below image, one can tell the MEM drops after the first task is finished but soon climbs back up after loading the second task.
Expected behavior
Given that two tasks have similar MEM footprints, I would assume the MEM pattern to be repeated but not accumulated.
Minimal but working code sample to ease bug fix for
pandarallel
teamAs the pseudocode I attached above.
The text was updated successfully, but these errors were encountered: