As the title says, we currently do not support external dependencies and outputs from azure. So,
dvc import-url and friends don't work.
It should be trivial to implement.  dvc uses etag to ensure that it's same file. So, just need to implement following methods in dvc/remote/azure.py similar to one in dvc/remote/s3.py:
https://github.com/iterative/dvc/blob/a1fe6c6f44777463876ad24ee0d162173999f9d3/dvc/remote/s3.py#L78-L79
From Ruslan's reply on Discord: https://discordapp.com/channels/485586884165107732/485596304961962003/691327678271193160
Then:
DependencyAzure and OutputAzure classes. Refer DependencyS3 and OutputS3 for the example.Add OutputAzure on OUTS, OUTS_MAP and CHECKSUMS_SCHEMA on dvc/output/__init__.py
Also DependencyAzure needs to be added on DEPS and DEPS_MAP on dvc/dependency/__init__.py.
P.S. I could be missing a few things though. :(
For the purposes of completeness and easy reference (this was also discussed on Discord and in the relevant PR):
From the official API docs:
In version 2012-02-12 and newer, Put Blob sets a block blob鈥檚 MD5 hash value even when the Put Blob request doesn鈥檛 include an MD5 header.
If I am reading that correctly, it basically means that every blob will always have the content-MD5 property set on upload. Additionally, the PUT request fails if the MD5 specified in the request doesn't match that of the computed one, so that means no blob can ever have an incorrect MD5. If the header is omitted, the MD5 is calculated anyway.
Nevertheless, testing with uploading a 4GB file through the web UI of Azure resulted in it not being assigned a Content-MD5 property. Hence, apparently it does not actually work as expected. I have submitted a request to to Azure support to ask for clarification, explaining that ideally it would always have a correct Content-MD5 property set. I'll provide an update of some kind when I get a response.
Most helpful comment
For the purposes of completeness and easy reference (this was also discussed on Discord and in the relevant PR):
From the official API docs:
If I am reading that correctly, it basically means that every blob will always have the content-MD5 property set on upload. Additionally, the PUT request fails if the MD5 specified in the request doesn't match that of the computed one, so that means no blob can ever have an incorrect MD5. If the header is omitted, the MD5 is calculated anyway.
Nevertheless, testing with uploading a 4GB file through the web UI of Azure resulted in it not being assigned a
Content-MD5property. Hence, apparently it does not actually work as expected. I have submitted a request to to Azure support to ask for clarification, explaining that ideally it would always have a correctContent-MD5property set. I'll provide an update of some kind when I get a response.