Azure-storage-azcopy: Add --check-md5 NoCheck to [S2S] Copy

Created on 18 Jun 2019  路  6Comments  路  Source: Azure/azure-storage-azcopy

Which version of the AzCopy was used?

10

Which platform are you using? (ex: Windows, Mac, Linux)

Windows

What command did you run?

azcopy cp ...

What problem was encountered?

Invalid Md5

It seems that AzCopy v10 does a Md5 valiadtion due to the new copy behavior with PutBlockfromURL. AzCopy v8 does not do this validation and is able to copy the md5 regardless of it not being 128 bit base 64 encoded.

Can you add a --check-md5 optional parameter to the cp behavior?

Thanks,
Xin

feature request

Most helpful comment

I doubt that the issue is in PutBlockFromURL. I suspect it occurs at the end, when we call PutBlockList. We would need to figure out what the desired behavior is and what the flag should be called. 'check' is not the right concept, because nothing is actually complaining that file content doesn't match the MD5. It's complaining that the MD5 is not in the accepted format.

All 6 comments

Hi @xyh1, could you please clarify the question?

I assume you are doing a blob to blob copy right? And there's invalid stored MD5 on the source blob?

Hi @zezha-msft

Yes, a customer uploaded a bunch of blobs with azcopy / storage explorer and then some of the MD5s were modified such that they no longer adhered to our storage MD5 128 bit base64 encoded string.

They were able to make a copy of some of those "invalid" md5 blobs from 1 storage account to another using storage explorer because storage explorer uses the CopyBlob API instead of PutBlockfromURL. There is no md5 validation done on a CopyBlob operation , everything is copied over to destination from source blob.

azcopy v10 introduced the new copy behavior that is different from azcopy v8 and storage explorer because as it uses PutBlockfromURL instead of CopyBlob. This makes copies faster but there is no simple way to make a direct copy with all of the same properties using azcopy v10 and take advantage of the increased speed of copies.

The customer also said that they were okay with the Md5 nocheck behavior, not copying the existing content-md5 from the source and have the destination md5 be recalculated. But as it is right now, there's no way for them to copy with azcopy v10 without having to modify the source.

This probably necessitates a change to the PutBlockfromURL REST API as there doesn't seem to be a way to disable the Md5 check.

We can discuss more in person when I'm back in the office if you want.

Thanks,
Xin

I doubt that the issue is in PutBlockFromURL. I suspect it occurs at the end, when we call PutBlockList. We would need to figure out what the desired behavior is and what the flag should be called. 'check' is not the right concept, because nothing is actually complaining that file content doesn't match the MD5. It's complaining that the MD5 is not in the accepted format.

@JohnRusk

Yes you're right, it should be from PutBlockList when committing the blocks.
I think a way to handle could be similar to this copy option:

  --s2s-handle-invalid-metadata string   specifies how invalid metadata keys are handled. AvailabeOptions: ExcludeIfInvalid, FailIfInvalid, RenameIfInvalid. (default "ExcludeIfInvalid")

There can probably be a option to handle invalid system properties.

@xyh1 This is on our backlog, but not scheduled for work in the next 4 months. Pls email me if you want to discuss further.

To be clear, to anyone reading this, the feature already exists for downloads. --check-md5 NoCheck works just fine for downloads. This feature request is just for service-to-service copies (e.g. container to container or account to account)

Was this page helpful?
0 / 5 - 0 ratings