consider taking in global environment variable for nb_workers and possibly other parameters too #257

SiRumCz · 2023-11-14T16:37:00Z

Please write here what feature pandarallel is missing: Would like to control the number of workers being generation without touching the code for different machines.
Example: A new pandas API which is not (yet) supported by pandarallel.

The text was updated successfully, but these errors were encountered:

nalepae · 2024-01-23T09:48:18Z

Pandaral·lel is looking for a maintainer!
If you are interested, please open an GitHub issue.

shermansiu · 2024-04-27T11:05:53Z

I'd prefer to keep the code in the core package minimal to make things easier to maintain.

Wouldn't you be able to maintain the same functionality by reading in the environment variables with os.environ and passing them to pandarallel.initialize?

IMO this is a wontfix issue, unless a compelling reason is given.

IceFreez3r · 2024-05-13T14:12:30Z

One of the tools I use, that uses pandarallel, fails consistently in a cluster environment with out-of-memory errors.
According to vladr on SO:

Memory-wise, we already know that subprocess.Popen uses fork/clone under the hood, meaning that every time you call it you're requesting once more as much memory as Python is already eating up, i.e. in the hundreds of additional MB, all in order to then exec a puny 10kB executable such as free or ps. In the case of an unfavourable overcommit policy, you'll soon see ENOMEM.

This wouldn't be a problem in the general case, but overcommiting memory is disabled on the cluster. Since the cluster comes with a lot of cores this easily eats up the entire RAM, even for processes that would be fine with 10GB of memory.
I've ran the tool with the exact same commands on a working machine with a few cores and overcommiting enabled and it worked fine.

If I could just limit the number of workers/subprocesses this problem wouldn't occur.

Edit: Also note that I cannot just edit pandarallel.initialize since I'm using the code from someone else.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consider taking in global environment variable for nb_workers and possibly other parameters too #257

consider taking in global environment variable for nb_workers and possibly other parameters too #257

SiRumCz commented Nov 14, 2023

nalepae commented Jan 23, 2024

shermansiu commented Apr 27, 2024

IceFreez3r commented May 13, 2024 •

edited

consider taking in global environment variable for nb_workers and possibly other parameters too #257

consider taking in global environment variable for nb_workers and possibly other parameters too #257

Comments

SiRumCz commented Nov 14, 2023

nalepae commented Jan 23, 2024

shermansiu commented Apr 27, 2024

IceFreez3r commented May 13, 2024 • edited

IceFreez3r commented May 13, 2024 •

edited