Version:
DVC version: 0.73.0
Python version: 3.6.8
Platform: Linux-4.15.0-1056-aws-x86_64-with-Ubuntu-18.04-bionic
Binary: False
Package: None
Filesystem type (cache directory): ('xfs', '/dev/nvme1n1')
Filesystem type (workspace): ('xfs', '/dev/nvme1n1')
Reproduce:
git clone [email protected]:iterative/dataset-registry-private.git
cd dataset-registry-private
dvc pull -j 100 ILSVRC.dvc
Output:
(sorry for the video in this format, was the fastest way to capture it while it was running)
https://www.dropbox.com/s/m821sgc1flk770o/dvc-pull-going-crazy.mov?dl=0
Should be fixed by combining pbars. @casperdcl
-j 100
would be the issue (nested bars scrolling off the screen). I should be able to come up with a temp patch for now. P.S. what happened with the old method? No scrolling but lots of flickering and very slow?
@casperdcl what's the point? Let's go for the actual fix, i.e. combining into a single pbar.
The point would be I can downgrade a p1
to p2
/p3
today, but don't think I'll have time for a full fix soon.
@casperdcl p1 is not p0 (fix asap), p1 can wait. TBH, I see combining threads as top priority in pbars (maybe even UI as a whole) now since it's the highest value, so no point doing something else instead/in the meantime unless it makes dvc unusable.
Totally agree with @Suor .
Btw dvc push
(without any explicit multithread args) also has this issue.
The problem appears when amount of the bars is more then rows in the terminal.
We use threads where Tqdm instance is created. So when we have amount of jobs (threads) are more then rows in the terminal Tqdm draws bars on top each other
self.pos >= ROWS
, but there are also problems with internalsHi @casperdcl, I know that you are an author of the Tqdm library (awesome tool :+1: :1st_place_medal: ), could you take a look on my PR and share your thoughts please?
Another approach is to limit bars to only to one and show accumulated progress. It requires to rework few Tqdm calls in dvc pull
, dvc push
, dvc add
(on adding >1Gb files), provide wrapper that can retrieve updates through Queue and draws single bar. I like that approach but it will affect a lot of places of the system
I should ask - are there any opetations where a user would ever benefit from having more than about 5 threads? Surely I/O is a bottleneck for anything more? Maybe we can just set an upper limit to number of threads?
@casperdcl 5 is not a limit for many applications. Downloading millions of tiny files from s3 benefits from dozens of workers, for example.
@casperdcl "limiting number of threads because of the progress bar is a very weird approach, feels like we are not solving the right issue here :smile:
not because of the progress bar, hah. I'm just asking if we can avoid solving the progress bar problem by solving a different problem.
Apparently not. :)
Most helpful comment
@casperdcl p1 is not p0 (fix asap), p1 can wait. TBH, I see combining threads as top priority in pbars (maybe even UI as a whole) now since it's the highest value, so no point doing something else instead/in the meantime unless it makes dvc unusable.