From https://travis-ci.com/iterative/dvc/jobs/228530997 https://github.com/iterative/dvc/pull/2428
Traceback (most recent call last):
File "/home/travis/build/iterative/dvc/dvc/remote/base.py", line 447, in upload
no_progress_bar=no_progress_bar,
File "/home/travis/build/iterative/dvc/dvc/remote/oss.py", line 114, in _upload
to_info.path, from_file, progress_callback=pbar.update_to
File "/home/travis/virtualenv/python2.7.15/lib/python2.7/site-packages/tqdm/_tqdm.py", line 1015, in __exit__
self.close()
File "/home/travis/virtualenv/python2.7.15/lib/python2.7/site-packages/tqdm/_tqdm.py", line 1202, in close
self._decr_instances(self)
File "/home/travis/virtualenv/python2.7.15/lib/python2.7/site-packages/tqdm/_tqdm.py", line 536, in _decr_instances
inst.clear(nolock=True)
File "/home/travis/virtualenv/python2.7.15/lib/python2.7/site-packages/tqdm/_tqdm.py", line 1240, in clear
self.sp('')
AttributeError: 'Tqdm' object has no attribute 'sp'
------------------------------------------------------------
@casperdcl Could you please take a look?
I think this is an ignorable error caused by the trace's attempt to print a tqdm bar which has been closed. The actual error happens earlier at https://travis-ci.com/iterative/dvc/jobs/228530997#L2853 I think
@casperdcl The line you've linked is a wrapper that collects all the errors that happened in the workers when they are not able to upload/download something, which is a rather frequent occurrence in dvc. Is there any way we could handle that situation gracefully with tqdm? That no attribute seems like a bug.
ah the error is https://travis-ci.com/iterative/dvc/jobs/228530997#L2622
Hmm not sure about how to clean this up in tqdm, as sp is an attribute which is definitely created in its __init__. In fact there are also 2 locations in tqdm which raise a warning if sp is not defined.
ah the error is https://travis-ci.com/iterative/dvc/jobs/228530997#L2622
That one is actually unrelated. It is about ssh, while the one we are discussing is about oss.
It seems weird that it throws an attribute error when something unrelated to the progress bar itself fails. It looks like if anything within with Tqdm raises an error, tqdm is not able to handle that gracefully. It should instead clearup the progress bar or something and definitely not throw attribute errors, right?
yes I think that's https://github.com/tqdm/tqdm/issues/548 - nobody's posted a solution yet
@casperdcl Some uploads/downloads failing is a normal situation for us (e.g. someone dvc pulling while on unstable network on a train or something), so we'll need to fix that.
hmm I've just pushed a commit to tqdm:devel. Does using pip install -e git+https://github.com/tqdm/tqdm.git@devel#egg=tqdm fix this issue?
@efiop it would be great if you could post a minimal reproducible example of this. I've tried and can't get the same error.
@casperdcl Not able to reproduce with other remotes or with a minimal example without threads. Probably a race condition or something.
@efiop I've tried to reproduce with threads and still couldn't manage.
from __future__ import print_function
from tqdm import tqdm, trange
from concurrent.futures import ThreadPoolExecutor
N = int(1e7)
def fun(N):
try:
with trange(N, desc="1", leave=False) as t:
for i in t:
if i == N // 10:
break
except:
with trange(N, desc="2", leave=False) as t:
for i in t:
if i == N // 10:
raise ValueError("fun times")
else:
with trange(N, desc="3", leave=False) as t:
for i in t:
if i == N // 10:
raise ValueError("fun times")
tqdm.get_lock()
with ThreadPoolExecutor(max_workers=8) as executor:
executor.map(fun, [N] * 400)
@casperdcl I see, well, races are always hard to reliably reproduce :) The fix from devel branch seems like an acceptable workaround, do you plan on merging it into master? If you do, then we could close this issue as well.
doesn't have to be perfectly reliable, but at least being able to reproduce once would help.
I'm hesitant to merge that devel commit into master considering nobody else seems to have reported this issue, and it doesn't seem like you've managed to reproduce it either. did it happen more than once on travis?
@casperdcl True, it was the first and the last time I've seen it so far. It was also on 2.7, which might be related. Alright, let's close this one and if it ever happens again, we could get back to it.
Getting it once again on py2 https://travis-ci.com/iterative/dvc/jobs/255935049#L3369
@casperdcl What do you think? If this takes any effort, we can just forget about this, as py2 is almost dead anyways.
@efiop is this the actual error? https://travis-ci.com/iterative/dvc/jobs/255935049#L2554
Not sure how the trace output is set up. Don't know if for example there's another error but the printing of said error on a killed thread is also causing a different error on L3369...
@casperdcl Not seeing any other error, seems like it is the cause.
Will be fixed by #1818