tqdm versions of generators

Created on 6 Aug 2016  Â·  5Comments  Â·  Source: tqdm/tqdm

It was denied to give the itertools utilities a __len__. Let's make tqdm versions of those so the progress bar can use them.

p4-enhancement-future 🧨 questiodocs ‽ submodule ⊂

Most helpful comment

This is the same issue as mentioned in the FAQ regarding other generators such as zip and enumerate. It violates python's philosophy & definition of generators to attempt to fix this but at the least maybe we should update the documentation to make work-arounds more clear, and possibly as you say provide wrappers.

All 5 comments

There's no way around because they are generators, it's by design that we can't know their length. But it's possible to do:

tqdm(list(itertools.combinations(iterable, r))):
    ...

or if you don't want to convert to a list (which can be memory intensive if your combination is huge), then you need to find another way to precompute the total length of your loop. Unfortunately, it's not possible to generically know what any function is going to produce, it totally depends on user's intent.

Also, even if we make a special case that could detect the use of itertools.combinations(), what if the user does a mix of multiple functions like itertools.permutations(itertools.combinations(iterable, r), s)? We would have to inspect every subfunction fed through tqdm and follow up the stack to a hardly definable point (when should we stop the inspection?)...

I would like to suggest reopening this issue, but with an alternative approach: provide some functions in the tqdm namespace (e.g. a tqdm.itertools subpackage) that behave like itertools, but provide the length if available, so that one can do (e.g.)

from tqdm import tqdm, itertools  # tqdm.itertools, not the stdlib itertools

for ... in tqdm(itertools.product(...)): ....

where tqdm.itertools.product is defined essentially as

class product:
    def __init__(self, *args):
        self._it = itertools.product(*args)  # the stdlib one
        try:
            self._len = len(args[0]) * len(args[1]) * ...
        except TypeError:  # (raised by one of the len() calls)
            self._len = None

    def __iter__(self):
        # Return the stdlib C iterator so there's no overhead when iterating,
        # only when initializing.
        return self._it

    def __len__(self):
        if self._len is not None:
            return self._len
        else:
            # Whatever makes tqdm.tqdm understand that the length is not available.
            raise TypeError(...)

Thoughts?

This is the same issue as mentioned in the FAQ regarding other generators such as zip and enumerate. It violates python's philosophy & definition of generators to attempt to fix this but at the least maybe we should update the documentation to make work-arounds more clear, and possibly as you say provide wrappers.

Idea: tqdm.queue. Given two generator functions (a producer and a consumer), it creates a queue between them, and the progress bar shows how many items are made by the producer but not consumed yet. It would also show the overall number of things produced.

We could even make it pickle the generator functions to disk.

sure, could be part of a submodule (which should be easier now that #800 is merged)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

AliAbdelaal picture AliAbdelaal  Â·  4Comments

ddkang picture ddkang  Â·  4Comments

Erawpalassalg picture Erawpalassalg  Â·  4Comments

andreasbaumann picture andreasbaumann  Â·  5Comments

anntzer picture anntzer  Â·  5Comments