dvc status and dvc pull is failing with data of size 11.5gb

Created on 30 Jul 2018  路  3Comments  路  Source: iterative/dvc

I was trying to add a directory which contains 5 directories and files in respective directories.
Files are in zips,text, and csv.

while dvc add worked fine but it stuck while pushing data to amazon aws.
Internet connection might be an issue as files are large.
My download and upload speed are as follows:
DOWNLOAD 8.62 Mbps
Upload 23.11 Mbps

a regular progress bar which i saw in case of small files or any kind of notification is not available on terminal.

I could not execute dvc remove as it removes the data as well and I have no way to find that my dvc push is successful or not.

bug

All 3 comments

Hi @analystanand !

dvc push on a directory with 12G file works for me as expected. I suppose you've killed the process after a while? How long was it hanging before you killed it?. That being said, I am able to reproduce dvc status issue with not showing proper status when dvc push didn't fully work(i.e. was killed before finishing upload). Already preparing a patch.

EDIT: Actually dvc status -c works for me as expected too. Did you use dvc status or dvc status -c? Note that dvc status is for local status and dvc status -c is for cloud status, so you should use it with combination with dvc push and dvc pull.

Thanks,
Ruslan

Yes, it started uploading. I could not figure exactly when it started.

Does it wait for compressing or something?

So ideally, its a workflow that we have to optimise like..add zipped files of datasets and then add into DVC so as to reduce file size.

So ideally, its a workflow that we have to optimise like..add zipped files of datasets and then add into DVC so as to reduce file size.

And does dvc status -c work for you as expected?

Does it wait for compressing or something?

No, it doesn't wait for compressing. It actually establishes connection with the s3 and figuring out what files it needs to upload. Looks like we should add an additional message to help keep users in touch if the operation takes a bit longer than usual. I'll send a patch for that soon(created https://github.com/iterative/dvc/issues/957 to track that).

So ideally, its a workflow that we have to optimise like..add zipped files of datasets and then add into DVC so as to reduce file size.

Yes, dvc currently doesn't alter your data in any way and transfers it as is. Zipping files yourself could greatly improve the speed of push/pull. We have been thinking about optimizing this on dvc's side, but didn't actually get to implementing anything for that. We will surely take a closer look at it in the future.

Thanks,
Ruslan

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kskyten picture kskyten  路  44Comments

Casyfill picture Casyfill  路  56Comments

Suor picture Suor  路  39Comments

danfischetti picture danfischetti  路  41Comments

dmpetrov picture dmpetrov  路  64Comments