We are introducing DVC in our company and were quite happy until we started using it on a large project containing few hundred of thousands of files representing approximatively 300 Gb.
We use S3 as storage.
When someone from our team did a dvc pull of this project, it sucked the whole internet bandwidth of our office.
We tried to mitigate the issue by limiting the number of concurrent jobs to 1 (option -j 1) but it was not enough.
Our IT Ops team told us that dvc has opened hundred of concurrent connections to download files from our S3 bucket, and that it explains why we have been able to suck most of the bandwidth.
Is there other option than --jobs to limit the number of parallel connections we should take care of?
Is there some existing workaround for this situation?
@pommedeterresautee I was not able to reproduce it 馃
could you please run dvc version?
also, when you run dvc pull -j 1 and it starts downloading, how many progress bars do you see?
DVC version: 0.86.2
Python version: 3.6.9
Platform: Linux-5.3.0-40-generic-x86_64-with-Ubuntu-19.10-eoan
Binary: False
Package: snap
Cache: reflink - not supported, hardlink - supported, symlink - supported
I see quite a lot of progress bars, too many for my terminal which crazily scroll the output:
dvc pull -j 1

@iterative/engineering someone who is using Linux, could you please check really quick that j is being propagated properly?
@pommedeterresautee could you if you have anything in you DVC config file related to the number of jobs? It's .dvc/config and .dvc/config.local?
Had a quick look; not sure if this is the issue but repo.fetch._fetch_external() doesn't get a jobs argument.
@casperdcl great catch! I just realized that it pull (fetches) from external repos! So, yes it looks like we definitely need to pass j to it down to it.
@efiop it looks like the reason for this is clear, can we prioritize and add this?
great catch!
Never underestimate the debugging power of debian on a phone :)
Most helpful comment
Never underestimate the debugging power of debian on a phone :)