Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to work out the total number of iterations while the loop is running #1550

Open
5 of 6 tasks
kalekundert opened this issue Feb 7, 2024 · 0 comments
Open
5 of 6 tasks

Comments

@kalekundert
Copy link

kalekundert commented Feb 7, 2024

  • I have marked all applicable categories:
    • documentation request (i.e. "X is missing from the documentation." If instead I want to ask "how to use X?" I understand StackOverflow#tqdm is more appropriate)
    • new feature request
  • I have visited the source website, and in particular
    read the known issues
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and
    environment, where applicable:
    import tqdm, sys
    print(tqdm.__version__, sys.version, sys.platform)

Sometimes it takes a long time just to figure out how many iterations are going to happen. For example, consider the following:

from pathlib import Path
from tqdm import tqdm

paths = Path.cwd().glob('**/*')

for path in tqdm(paths):
    # Do some work...

Working out the number of paths that will be matched by the glob requires interacting with the filesystem, and could take a long time in a big directory. But it's really useful to have this number, because without it there's no way to actually render a progress bar.

If I want the above script to have a true progress bar, I can only think of two ways to do it:

  • Read all the paths into a list, then iterate through that.
  • Iterate once though all the paths to come up with a count, then iterate through them again to actually do the work.

Neither of these approaches are ideal. The first could use a prohibitive amount of memory, and both could cause the program to take a long time before even beginning to render the progress bar.

I'd like to propose a new API that provides a better alternative. The idea is to let the user provide a function that can be called to get the total number of expected iterations, then to run that function in a background thread. Before the function finishes, the "progress bar" would just display the same thing it currently does when the total number of iterations isn't known, i.e. the current iteration number, the elapsed time, etc. After the function finishes, a true progress bar would be displayed. I think this gives the best of both worlds: the progress bar would start immediately with the information it has, and once better information is available, it would provide the user an estimate for how long they'll need to wait.

As for the actual API, I can imagine two options. I think I slightly prefer the second, but I'd be happy with either:

  • Allow tqdm(total=...) to be a function, in which case it will be handled as described above.
  • Add a new tqdm(total_bg=...) argument, that only accepts functions.

If there's interest in adding a feature like this, let me know. No guarantees, but I might be able to make a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant