Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit number of progress bars on larger multi-core systems #242

Open
applio opened this issue Jun 8, 2023 · 3 comments
Open

Limit number of progress bars on larger multi-core systems #242

applio opened this issue Jun 8, 2023 · 3 comments

Comments

@applio
Copy link

applio commented Jun 8, 2023

Currently, pandarallel dutifully creates one progress bar for each and every worker but on multi-core systems with a large-ish number of cores (say 128 or more) seeing so many progress bars can be overwhelming. In these situations, it may prove more valuable to display a smaller number of progress bars (not necessarily one overall) with each worker mapped to one of the displayed progress bars.

What is proposed:

  1. For N workers, offer the option to display M progress bars where N >= M and each worker contributes to one progress bar (i.e. worker n contributes to progress bar m such that m = (n % M)).
  2. If an error occurs during execution, that worker's progress bar (which may represent progress from multiple workers) will indicate an error occurred, matching current functionality.
  3. Keep the existing default behavior unchanged so that not specifying a maximum number of progress bars to display results in as many progress bars as workers.

Additional motivation:
We have successfully used pandarallel on systems with a much larger number of cores than 128 where seeing as many progress bars as workers is genuinely problematic. We very much benefit from and do not want to simply disable the progress bars -- we want to monitor the progress of our parallel_apply() and parallel_map() operations in a digestible way and without flooding the screen / notebook with too much information.

Proposed implementation:
A working implementation has been prepared along with unittests -- a pull request will be added to this issue.

@applio
Copy link
Author

applio commented Jun 8, 2023

Example of the code from PR #243 in use in a Jupyter notebook:
pandarallel_jupyter_notebook_run_finished

@applio
Copy link
Author

applio commented Jun 8, 2023

Example of the code from PR #243 in use in the IPython console:
pandarallel_ipython_console_run_underway

Of the 20 workers, each gets 5M rows from a 100M row DataFrame. Because 20 is not divisble by 3, the first 2 progress bars each represent 7 workers and the last 1 progress bar represents 6 workers.

@nalepae
Copy link
Owner

nalepae commented Jan 23, 2024

Pandaral·lel is looking for a maintainer!
If you are interested, please open an GitHub issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants