az feedbackauto-generates most of the information requested below, as of CLI version 2.0.62
Describe the bug
When az storage blob download-batch is called with a prefix, list_blobs is called on the container root with no delimiter, effectively listing the entire contents of the container before downloading the requested files. For containers with many blobs, this process is prohibitively slow - we have a container with hundred of thousands (if not millions) of blobs, and 'az' takes 6 hours for the listing to complete, even when the --prefix is specified so the wildcard is many levels down the directory structure. For comparison, the same exact files, container layout, and batch download operation in AWS (using the aws cli) takes about 2 seconds.
From what I can tell, though collect_blobs is called with a (correct) "pattern" arg here: https://github.com/Azure/azure-cli/blob/f6376c366ac364cb61ccc33ef04b86a18b09af2d/src/command_modules/azure-cli-storage/azure/cli/command_modules/storage/operations/blob.py#L165 , the collect_blobs function does not use pattern to intelligently limit the scope of its listing, as seen here: https://github.com/Azure/azure-cli/blob/f6376c366ac364cb61ccc33ef04b86a18b09af2d/src/command_modules/azure-cli-storage/azure/cli/command_modules/storage/util.py#L24 . Similarly, without a delimiter, this listing seems to be recursive, i.e. will list every blob in the whole container.
To Reproduce
Run "az storage blob download-batch" on a container with many files.
Expected behavior
Intelligent scoping: either allow the "-s" argument of "az storage blob download-batch" to be a path within a container (not just the container name), so you can start the list query from there, or perform some other operation so that a query with a long "pattern" is more efficiently executed.
Environment summary
Install Method (e.g. pip, interactive script, apt-get, Docker, MSI, edge build) / CLI version (az --version) / OS version / Shell Type (e.g. bash, cmd.exe, Bash on Windows)
pip install on CentOS and homebrew installs on OSX.
Additional context
Add any other context about the problem here.
Hi @rickle-msft - I submitted this ticket months ago and there's been no activity since, any chance you can help this ticket get at least triaged? This is a serious usability issue.
Thanks,
Jacob
@jacobw125 I am sorry that you have experienced such a delay in responsiveness. I do not work on this project, but I have sent a message to the team that does alerting them to your request. If you do not get a response from them soon, please feel free to reach out again and I will do my best to escalate
Thanks Rick, appreciate it!
@Juliehzl, can you please take a look?
Sorry for late response because we are in transition process. Actually we are doing something as what you expect. We will provide a new command named az storage copy -s source-path -d destination-path to achieve more efficient jobs including upload, download and copy between different storage account. And both source-path and destination-path can use wildcard to match some files and directories. I think this command can be released in next release day. If you have other suggestions, please fell free to let me know. 馃槉
Experiencing this issue too.
Container with over 20K files, when specifying a pattern under specific path, takes ~2.5 minutes to find and download files.
And in some cases almost 5 minutes.
Example
time az storage blob download-batch -s "$CONF_CONTAINER" -d "$CONF_HOME/" --pattern "$CONF_DIR/*.xml"
real 2m19.133s
@Juliehzl any progress on az storage copy command?
Experiencing this issue too.
Container with over 20K files, when specifying a pattern under specific path, takes ~2.5 minutes to find and download files.
And in some cases almost 5 minutes.
Example
time az storage blob download-batch -s "$CONF_CONTAINER" -d "$CONF_HOME/" --pattern "$CONF_DIR/*.xml" real 2m19.133s@Juliehzl any progress on
az storage copycommand?
az srorage copy command is already released. You can try with the latest azure-cli.
Unfortunately az storage copy does not handle all the authentications necessary to use it in scripting (connection-string for example). Can this still be addressed within blob storage download-batch?
Stumbled on this issue too...
Seeing this issue has been left unattended, if anyone stumbles on this and was previously using the It doesn't work, one needs to create a SAS token to get a similar behaviour.--connection-string parameter of batch-download and now needs to use the az storage copy command to have something working, it seems one can use the AZURE_STORAGE_CONNECTION_STRING envvar to have something that works.
@Timost connection strings can contain SAS tokens, can't they?
Hum I'm not sure, I guess they can but I don't know how you would pass them to az storage copy.
What I ended up doing to use az storage copy was:
download-batchMy use case is automatic asset download on container startup.
This is still a problem:
https://github.com/Azure/azure-cli/blob/df3943f539d29596abd03dd00b28e06b4eace0e5/src/azure-cli/azure/cli/command_modules/storage/util.py#L32
@Juliehzl, could you reopen this issue? Suggesting to "just use another command" isn't a proper solution IMHO.
Most helpful comment
Unfortunately
az storage copydoes not handle all the authentications necessary to use it in scripting (connection-string for example). Can this still be addressed withinblob storage download-batch?