Currently the remote MD5 value is checked, after the file download is finished, for both _different_ and _missing_.
When the remote MD5 is missing, the download process can take hours, then fail.
Request:
Would it be possible to pre-check for the existence of the remote MD5 value then fail fast, before the download. This would save time.
MD5 check for --check-md5=FailIfDifferentOrMissing:
https://github.com/Azure/azure-storage-azcopy/blob/d4e856e572f7f63521329703d18d9f96bac21613/ste/md5Comparer.go#L64-L65
Repro:
azcopy.exe cp "https://lilablobssc.blob.core.windows.net/snapshotserengeti-v-2-0/SnapshotSerengeti_S10_v2_0_part2.zip" "F:\datasets\SnapshotSerengeti\raw\" --check-md5=FailIfDifferentOrMissing --overwrite=false
File is 166GB and public (src).
Error: _(only in log; should be raised to user level)_
000 : no MD5 was stored in the Blob/File service against this file. So the downloaded data cannot be MD5-validated. This application is currently configured to treat missing MD5 hashes as errors. When Checking MD5 hash. X-Ms-Request-Id:
Whoof, yeah, hours is pretty rough for this kind of a thing. I think we could perform that early during transfer init. I'll pop open a PR for that here in a moment.
Closing since this work has now been done.
Most helpful comment
Whoof, yeah, hours is pretty rough for this kind of a thing. I think we could perform that early during transfer init. I'll pop open a PR for that here in a moment.