Azure-storage-azcopy: Feature: Support batch jobs [ie lists of files]

Created on 4 Feb 2019  Â·  19Comments  Â·  Source: Azure/azure-storage-azcopy

Hi there, I'd like to request that AzCopy add a 'batch job' feature. The scenario here is that I have a large quantity of individual copies that need to be made, but I'd like to run only one instance of AzCopy which treats the group as a single job.

Proposed approach:
.\azcopy batch <batch path>

Where <batch path> points to a text file. The text file would contain the exact same parameters that would be used if executing azcopy directly, one per line. Say I have three files I want to copy as one job, the file could have:

cp "C:\file1.txt" "https://account.blob.core.windows.net/container/file1.txt" --content-type "plain/text"
cp "C:\file2.xml" "https://account.blob.core.windows.net/container/file2.xml" --content-type "application/xml"
cp "C:\file3.bin" "https://account.blob.core.windows.net/container/file3.bin"

The three files would be copied in parallel, as one job.

In my real-life scenario I have ~2M select files that I need to copy and ideally treat as one job.

Thanks!

feature request

Most helpful comment

Yes, it's intentional. See https://github.com/Azure/azure-storage-azcopy/wiki/Listing-specific-files-to-transfer

(BTW, I'm not actually working on AzCopy any more, but got the notification because you @-mentioned me)

All 19 comments

Hi @jcarnahan, thanks for reaching out!

To understand your scenario better, is there any pattern in your 2M files? How are you selecting them?

I'm asking this because AzCopy has pattern matching available, so you could easily select files/directories matching the given pattern, just like the copy command on linux. Please refer to ./azcopy cp --help to see examples.

The pattern matching feature will not assist me with my scenario. The files I would like to copy were all created with a random GUID in their name, and reside in the same container + virtual directory as ~10M other files also using random guids in their name.

Hi @jcarnahan, is it possible to just upload the root folder? It sounds like all your files (with GUIDs as name) are stored together, right?

If I'm understanding your scenario correctly, you are generating the files with GUIDs, and you want to upload them as they become available, right? Maybe that's why you couldn't upload the root folder?

Let me add a bit more detail - my storage account has one container, and inside that container I have ~12M blobs, all which use the same naming pattern, which is a unique GUID. There is a specific set of 2M blobs that I want to copy via AzCopy. My ask is for some way that I can point AzCopy at a file that lists all 2M files I want to copy, and have AzCopy them all as one job. I only want to make a copy of this set of blobs once, and ideally without invoking AzCopy 2M times.

@jcarnahan thanks for the clarification, now I understand. I'll discuss with the team and get back to you.

One possible workaround.

If azcopy supports symlinks (resolving symlinks to realpaths), then we could have a directory with just symlinks to files you want to copy -- this would serve as a hacky batch. You can create these symlinks using a bash/Powershell/etc from any suitable source of your choosing. @jcarnahan looks like you are on Windows but last I've seen there are symlinks there too.

And actually it is a good question: @zezha-msft how does azcopy treat synlinks?

Hi @ppanyukov, great suggestion!

On the copy command, there is indeed a flag --follow-symlinks which should accomplish what you are suggesting.

I can visualize how that might work for an upload scenario, but how would
that work in my scenario, where I want to download a large # of select
blobs from my blob storage account?

On Wed, Feb 27, 2019 at 11:37 AM Ze Qian Zhang notifications@github.com
wrote:

Hi @ppanyukov https://github.com/ppanyukov, great suggestion!

On the copy command, there is indeed a flag --follow-symlinks which
should accomplish what you are suggesting.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/Azure/azure-storage-azcopy/issues/196#issuecomment-468000733,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAzNvoOVZ7r28L3qt1fNQ6vi54xGI-Wyks5vRt6TgaJpZM4aiK7P
.

Hello,

Have you any ETA for this feature ?

I have the same problem. I have a large number of files in an Azure Blob and I need to copy them in another. I can't use wildcard, and I cannot use directory copy.

I have on one side files stored using hash as name 0f/14/487ef487a4874154, and on the other side I need to resolve them with a full path like /some/directory/file.ext

It's would be great to be able to give in a file (or by reading STDIN) the list of files to copy (like source/destination on each line) and then let all the work be done on the server side.

We now have the ability to supply a list of files, but not the ability to rename them. See https://github.com/Azure/azure-storage-azcopy/wiki/Listing-specific-files-to-transfer

I'm going to close this issue now, because the basic support is there now (i.e. for a list of files, for both upload and download). Right now, the list-of-files parameter is only documented at the URL listed in my comment above. In release 10.4, it will be included in the publicly documented list of parameters, as show by "azcopy copy --help"

@socolin if you'd like renaming as a feature, please log that as a separate issue, thanks.
@jcarnahan if you need to specific specific content types, per file, rather than letting AzCopy figure out the type for each one, then please also log that as a separate issue, thanks.

Hi @JohnRusk ,

Quick question/reminder. On MacOS with brew i get the version 10.7.0, but the cli help still doesn't mention this option (at least i haven't found it) . Is this intentional?

$ azcopy --version
azcopy version 10.7.0

$ azcopy copy --help |grep -i list-of-files

Yes, it's intentional. See https://github.com/Azure/azure-storage-azcopy/wiki/Listing-specific-files-to-transfer

(BTW, I'm not actually working on AzCopy any more, but got the notification because you @-mentioned me)

Does this support copying a list of files from blob storage? Or only a list of files off the local hard disk?

Hi @micktion, it supports all source types.

Hi @zezha-msft, great. What would azcopy command look like? And what would be the contents of a file if you wanted to copy a list of blobs from blob containers in one azure account to another azure blob storage, using Shared Access Signatures for both the source and destination? Like do you include the full URL for each blob plus the SAS in the batch file? Are there any examples of this around?

@micktion the flag is explained here: https://github.com/Azure/azure-storage-azcopy/wiki/Listing-specific-files-to-transfer

The command would look like:
AzCopy copy "https://mysourceaccount.blob.core.windows.net/mycontainer?[SAS]" "https://mydestinationaccount.blob.core.windows.net/mycontainer?[SAS]" --list-of-files fullPathToYourTextFile

The file list should only contain the blob names, each located on a different line.

Please give it a try and let us know if you have any more question.

@zezha-msft thanks

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jbpaux picture jbpaux  Â·  5Comments

wahalulu picture wahalulu  Â·  3Comments

brettrowberry picture brettrowberry  Â·  4Comments

Icybiubiubiu picture Icybiubiubiu  Â·  4Comments

AMoghrabi picture AMoghrabi  Â·  5Comments