traceback (most recent call last):
File "/Users/ophir/anaconda3/envs/p2/bin/dvc", line 11, in <module>
sys.exit(main())
File "/Users/ophir/anaconda3/envs/p2/lib/python2.7/site-packages/dvc/main.py", line 63, in main
Runtime.run(CmdDataSync)
File "/Users/ophir/anaconda3/envs/p2/lib/python2.7/site-packages/dvc/runtime.py", line 41, in run
sys.exit(instance.run())
File "/Users/ophir/anaconda3/envs/p2/lib/python2.7/site-packages/dvc/command/data_sync.py", line 47, in run
pool.map(cloud.sync, targets)
File "/Users/ophir/anaconda3/envs/p2/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/Users/ophir/anaconda3/envs/p2/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
socket.error: [Errno 32] Broken pipe
p.s.
is syncing the file manually using the aws cli a viable workaround, or are there other things done during the sync (updating status file or something similar)
Hi!
Thank you for reporting this to us. Could you please share some additional info? This is dvc v8.5.0 from pip, right? How big of a file are we talking about?
There is additional info(i.e. hash) being used during the sync, so using aws cli directly in a naive way probably wouldn't work =(.
i's indeed 0.85 from pip
the file is ~8GB
copying the file didn't work, even though I copied the file from the cache directory (that includes the hash as part of the name)
Hm... I just tried to sync 20G files and it worked just fine. @ophiry could you please try upstream dvc? I.e. clone it somewhere and do "pip uninstall dvc && ./build_package.sh && pip install dist/dvc-0.8.5.tar.gz".
@efiop it looks like concurrency issue, not file size.
Closed by 059fcc0
Package version: 0.8.6
@ophiry please update the package pip -U dvc
Btw... thank you for reporting the bug.
@dmpetrov Were you able to reproduce?
Yes, it was easy to reproduce on Mac. The issue was related to zero file size data/empty.
I guess Mac OS has a special version of some library (multiprocessing probably).
Oh, those Macs... Thanks for the info.
still have the same problem after upgrading to 0.8.6
(1/10): [ ] 0% data.mdb_ded3948489cTraceback (most recent call last):
File "/Users/ophir/anaconda3/envs/p2/bin/dvc", line 11, in <module>
sys.exit(main())
File "/Users/ophir/anaconda3/envs/p2/lib/python2.7/site-packages/dvc/main.py", line 64, in main
Runtime.run(CmdDataSync)
File "/Users/ophir/anaconda3/envs/p2/lib/python2.7/site-packages/dvc/runtime.py", line 41, in run
sys.exit(instance.run())
File "/Users/ophir/anaconda3/envs/p2/lib/python2.7/site-packages/dvc/command/data_sync.py", line 45, in run
map_progress(cloud.sync, targets, self.parsed_args.jobs)
File "/Users/ophir/anaconda3/envs/p2/lib/python2.7/site-packages/dvc/utils.py", line 67, in map_progress
p.map(func, targets)
File "/Users/ophir/anaconda3/envs/p2/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/Users/ophir/anaconda3/envs/p2/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
socket.error: [Errno 32] Broken pipe
Wow.
Thank you for letting us know. Reopening...
@ophiry As we've discovered with @dmpetrov , the issue looks to be related to network connection loss when your laptop goes into standby mode while syncing big files. I just merged a temporary fix https://github.com/dataversioncontrol/dvc/pull/109 that will at least notify us if something goes wrong, but a proper fix would be to implement partial download/upload as huge files are not something unusual for dvc scenarios and it would be great if we could continue download/upload from where we've left: https://github.com/dataversioncontrol/dvc/issues/108.
@ophiry could you please try out new dvc once again so we could confirm that the issue is indeed caused by lost network connection? Thank you.
this is the latest log (install from master version)
Checksum miss-match. Re-uploading is required.
.cache/quality_model/training/rdb.lmdb/data.mdb_ded3948489c: 0.0B transferred out of 8.6GB
(1/10): [ ] 0% data.mdb_ded3948489c.cache/quality_model/training/rdb.lmdb/data.mdb_ded3948489c: 0.0B transferred out of 8.6GB
(1/10): [ ] 0% data.mdb_ded3948489c.cache/quality_model/training/rdb.lmdb/data.mdb_ded3948489c: 0.0B transferred out of 8.6GB
(1/10): [ ] 0% data.mdb_ded3948489c.cache/quality_model/training/rdb.lmdb/data.mdb_ded3948489c: 0.0B transferred out of 8.6GB
(1/10): [ ] 0% data.mdb_ded3948489c.cache/quality_model/training/rdb.lmdb/data.mdb_ded3948489c: 0.0B transferred out of 8.6GB
(1/10): [ ] 0% data.mdb_ded3948489c.cache/quality_model/training/rdb.lmdb/data.mdb_ded3948489c: 0.0B transferred out of 8.6GB
(1/10): [ ] 0% data.mdb_ded3948489c.cache/quality_model/training/rdb.lmdb/data.mdb_ded3948489c: 0.0B transferred out of 8.6GB
(1/10): [ ] 0% data.mdb_ded3948489cFailed to upload ".cache/quality_model/training/rdb.lmdb/data.mdb_ded3948489c": [Errno 32] Broken pipe
@ophiry could you please clarify. Did your laptop fell into sleep mode during the downloading? It looks like the issue happens after sleep mode (most likely when network turns on for 5+ minutes).
If not - when this issue happened: right after dvc sync ... or in 10/20 minutes?
it wasn't during sleep, it was a few minutes after the sync started.
the strange thing is that from the progress bar there was no data transferred at all
communicating with s3 through the aws cli worked
Hi @ophiry ! Sorry for such a long delay, I'm back on this issue again.
Unfortunately, I'm still not able to reproduce it, but after a bit for googling, I found https://github.com/boto/boto/issues/621 which sounds very similar, but we actually do explicitly specify host when creating S3Connection so this bug should not occur. However, I see that we construct host as 's3.%s' and not as 's3-%s'(notice the '-' instead of '.') which actually doesn't always formally apply as one could see in http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region(i.e. us-west-2 is only listed with s3-us-west-2 and without s3.us-west-2, though both links are ping-able). What is your s3 region from dvc.conf?
the region is us-east-1, which is the region of the bucket
not sure it's relevant, but the issue is with an lmdb file
Sorry again for such a delay. I managed to reproduce this issue(only on Mac, other platforms work fine) for files that are >5G(tested on 8G and 4G, the former reproduced the issue and the latter was uploaded just fine), which makes sense as aws actually mentions this limitation in their docs, but the strange part is that on Linux this limit doesn't seem to result in anything). So this issue should be fixed with https://github.com/dataversioncontrol/dvc/issues/163. I'm working on implementing it right now and expect to deliver it in 24h hopefully =). Thank you for your patience.
@ophiry I merged #178. Tested it on my mac, everything seems fine now. Could you confirm that it works for you too? Also note that you can now use 'dvc push data/DATA' command for pushing data to the cloud ;) . Feel free to reopen this issue if anything is still wrong.
Oh, actually, just a second. Seems like I broke it.
Looks like md5 got screwed. Reverted #178 . Reopening this issue for now.
do you mean that etag doesn't contain now the md5 of the full file?
A possible workaround is to store the md5 in git (in the state file) when importing a file, and use this value as the "ground truth" for md5
do you mean that etag doesn't contain now the md5 of the full file?
Yes, precisely, multipart uploads add a giant hustle with the md5.
A possible workaround is to store the md5 in git (in the state file) when importing a file, and use this value as the "ground truth" for md5
That's a brilliant idea! Thank you! I will implement it shortly.
Actually, the problem with storing md5 in state file is that even thought it will help us determine if local data has changed, we will still have to download data from the cloud to verify it, because getting md5 of a multi-part object stored in the cloud is still a hustle.
A better solution would be to store original md5 in metadata when uploading file, so we have an easy access to it without actually having to download full file. Ok, I will try this out and will get back soon =)
Thank you!
Right, storing md5 in metadata worked out great. We should be set now.
Feel free to reopen this issue if anything is still wrong.