Either I cannot find out how to do it, or az storage blob download does not seem to take advantage of sparse files.
Downloading a vm os disk snapshot of 30GB, containing only about 1.7GB of data takes more than 60 minutes while azcopy downloads the same file in 7 minutes.
The only reference to something like this in the documentation is --max-connections:
--max-connections : ...
...
This may also be useful if many blobs are expected to be empty
as an extra request is required for empty blobs if
max_connections is greater than 1. Default: 2.
However, setting max-connections to 1 does not seem to make a difference to using the default of 2.
Tried in two different environments, mac os:
$ echo $SHELL
zsh
$ brew install azure-cli
...
$ az --version
azure-cli (2.0.28)
...
Python location '/usr/local/opt/python/bin/python3.6'
Extensions directory '/Users/d064064/.azure/cliextensions'
Python (Darwin) 3.6.4 (default, Mar 1 2018, 18:36:50)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)]
Also using docker image (on linux)
docker run --rm -it microsoft/azure-cli:2.0.28 bash
@marvinthepa thanks for pointing this out, we are aware of the current limitation.
The cli leverages the storage sdk: https://github.com/Azure/azure-storage-python
This feature should be implemented on the sdk level so python devs, as well as the CLI, can make use of it.
@seguler
I believe it does. @zezha-msft to confirm
@seguler We have the sparse file optimization for upload but not for download. For download, we are currently treating all blob types as equal and simply downloading everything. Perhaps we can add this item to our backlog.
@williexu I have added this item in our backlog.
+1 on sparse enabled downloads using az cli - the only packaged alternative appears to be to use AzCopy, which is not portable to Linux/Mac agents. Want to use this in VSTS.
Edit: AzCopy is available on Linux - it was in Azure Automation that it wasn't readily suitable to consume.
This issue will be solved in the new AzCopy V10. The related issue is here.
@zezha-msft - Please note AzCopy is NOT available under Azure Automation: Even if I manually downloaded AzCopy within the Azure Automation, Azure Automation didn't/doesn't allow running of arbitrary binaries.
It would be preferable that az storage blob download handles sparse file downloading efficiently (similarly to AZcopy).
Hi @iyerusad, thanks for the clarification! I see that it's still necessary to provide this functionality in the Python SDK.
I've logged this item to be included in the next generation of the Storage SDK: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/storage/azure-storage-blob.
It will be part of the GA criteria.
Most helpful comment
@williexu I have added this item in our backlog.