I was trying to add a directory which contains 5 directories and files in respective directories.
Files are in zips,text, and csv.
while dvc add
worked fine but it stuck while pushing data to amazon aws.
Internet connection might be an issue as files are large.
My download and upload speed are as follows:
DOWNLOAD 8.62 Mbps
Upload 23.11 Mbps
a regular progress bar which i saw in case of small files or any kind of notification is not available on terminal.
I could not execute dvc remove
as it removes the data as well and I have no way to find that my dvc push
is successful or not.
Hi @analystanand !
dvc push
on a directory with 12G file works for me as expected. I suppose you've killed the process after a while? How long was it hanging before you killed it?. That being said, I am able to reproduce dvc status
issue with not showing proper status when dvc push
didn't fully work(i.e. was killed before finishing upload). Already preparing a patch.
EDIT: Actually dvc status -c
works for me as expected too. Did you use dvc status
or dvc status -c
? Note that dvc status
is for local status and dvc status -c
is for cloud status, so you should use it with combination with dvc push
and dvc pull
.
Thanks,
Ruslan
Yes, it started uploading. I could not figure exactly when it started.
Does it wait for compressing or something?
So ideally, its a workflow that we have to optimise like..add zipped files of datasets and then add into DVC so as to reduce file size.
So ideally, its a workflow that we have to optimise like..add zipped files of datasets and then add into DVC so as to reduce file size.
And does dvc status -c
work for you as expected?
Does it wait for compressing or something?
No, it doesn't wait for compressing. It actually establishes connection with the s3 and figuring out what files it needs to upload. Looks like we should add an additional message to help keep users in touch if the operation takes a bit longer than usual. I'll send a patch for that soon(created https://github.com/iterative/dvc/issues/957 to track that).
So ideally, its a workflow that we have to optimise like..add zipped files of datasets and then add into DVC so as to reduce file size.
Yes, dvc currently doesn't alter your data in any way and transfers it as is. Zipping files yourself could greatly improve the speed of push/pull. We have been thinking about optimizing this on dvc's side, but didn't actually get to implementing anything for that. We will surely take a closer look at it in the future.
Thanks,
Ruslan