Azure-storage-azcopy: azcopy sync --exclude-path not working

Created on 17 Dec 2019  Â·  26Comments  Â·  Source: Azure/azure-storage-azcopy

Which version of the AzCopy was used?

azcopy 10.3.3

Which platform are you using? (ex: Windows, Mac, Linux)

Windows

What command did you run?

azcopy sync "V:\Folder1" "https://storageaccount.blob.core.windows.net/folder1/?sv=2019-02-02&ss=bfqt&srt=sco&sp=rwdlacup&se=2020-01-01T16:07:51Z&st=2019-01-01T08:07:51Z&spr=https&sig=SIG" --exclude-path="f1" --exclude-path="f2 f3" --exclude-path="f4" --recursive=true --cap-mbps=60 --delete-destination=true --log-level=DEBUG

What problem was encountered?

Folders "f1", "f2 f3", "f4" were not excluded from scanning and syncing.

How can we reproduce the problem in the simplest way?

Retry the upper command.

Have you found a mitigation/solution?

No.

usability and error messages

Most helpful comment

  1. Its relative Path to be mentioned in exclude-path command.

Source= site
Destination: site

Folder Structure:--
site/site1/App_Data/ClientDependency
site/site2/App_Data/ClientDependency
site/site3/App_Data/ClientDependency

Try this command:--
./azcopy sync Source "https://StorageaccountName.blob.core.windows.net/containerName/site/?sv=A....D" --put-md5 --recursive --exclude-path 'site1/App_Data/ClientDependency;site2/App_Data/ClientDependency'

This will only Sync Site3 Folder to Azure Container.

All 26 comments

This also does not work:
azcopy sync "V:\Folder1" "https://storageaccount.blob.core.windows.net/folder1/?sv=2019-02-02&ss=bfqt&srt=sco&sp=rwdlacup&se=2020-01-01T16:07:51Z&st=2019-01-01T08:07:51Z&spr=https&sig=SIG" --exclude-path="f1;f2 f3;f4"

Nor does this:
azcopy sync "V:\Folder1" "https://storageaccount.blob.core.windows.net/folder1/?sv=2019-02-02&ss=bfqt&srt=sco&sp=rwdlacup&se=2020-01-01T16:07:51Z&st=2019-01-01T08:07:51Z&spr=https&sig=SIG" --exclude-path=f1

Or this:
azcopy sync "V:\Folder1" "https://storageaccount.blob.core.windows.net/folder1/?sv=2019-02-02&ss=bfqt&srt=sco&sp=rwdlacup&se=2020-01-01T16:07:51Z&st=2019-01-01T08:07:51Z&spr=https&sig=SIG" --exclude-path="f1"

Local structure:
"V:\Folder1"
"V:\Folder1\f1"
"V:\Folder1\f1..."
"V:\Folder1\f2 f3"
"V:\Folder1\f2 f3..."
"V:\Folder1\f4"
"V:\Folder1\f4"...

Always scans complete tree structure. Strange, this is. ¯_(ツ)_/¯

@zezha-msft Did you see this one?

Hi @matevzg, sorry for the delayed reply.

Unfortunately, I wasn't able to repro this on my end, the exclude-path flag is working as expected.

Could you please clarify the observed behavior? Were the excluded folders still getting synced? They always get scanned, but they shouldn't be replicated to the destination.

Hi @matevzg,

I can see some syntax issue.
Could you please try the below syntax:
"https://storageaccount.blob.core.windows.net/folder1/?sv=2019-02-02&ss=bfqt&srt=sco&sp=rwdlacup&se=2020-01-01T16:07:51Z&st=2019-01-01T08:07:51Z&spr=https&sig=SIG" --exclude-path f1;f2 f3;f4

Command Executed on Linux:
azcopy copy "/mnt/" "https://prodneuazcopyst.blob.core.windows.net/xxxxx/[SAS] --recursive=true --follow-symlinks=false --exclude-path /mnt/.snapshot/

Still scanning the .snapshot folder.


INFO: Scanning...
INFO: Skipping over symlink at /mnt/.snapshot/HANAEQCPOCSNAP-031920201401/interfaces/SAPCD/download_migration because --follow-symlinks is false
INFO: Skipping over symlink at /mnt/.snapshot/HANAEQCPOCSNAP-031920201401/interfaces/SAPCD/software/51050882_export/51050882_EXP1_part1.exe because --follow-symlinks is false

INFO: Skipping over symlink at /mnt/.snapshot/HANAEQCPOCSNAP-031920201401/interfaces/SAPCD/software/51050882_export/51050882_EXP1_part2.rar because --follow-symlinks is false

I think it will work as desired if you remove the trailing / from the exclude parameter. (Maybe we should automatically remove those).

I think that what you've used is being interpreted by the tool as "don't scan any directories _inside_ the snapshot folder".

Tried, --exclude-path "/mnt/.snapshot" and "--exclude-path=/mnt/.snapshot" - Both are not working.
exclude path not working in azcopy.

Thanks for the test results @ramuadapa

Depending on whether we can reproduce this, and how it gets triaged if we do, we _might_ be able to fix this in release 10.4. Maybe...

@ramuadapa just a thought, but could you try not having the root folder on that path?

ex. --exclude-path=.snapshot

IIRC we check for a prefix on a _relative_ path for exclude path.

(that being said though, I've seen users do both ways, so this could arguably be a usability complaint)

Upvote for the relative & absolute path - I just saw this at a customer as well.

(that being said though, I've seen users do both ways, so this could arguably be a usability compl

@ramuadapa just a thought, but could you try not having the root folder on that path?

ex. --exclude-path=.snapshot

IIRC we check for a prefix on a _relative_ path for exclude path.

Even tried this before updating the blog, with relative path also, we are seeing issues.

@adreed-msft, @zezha-msft , @nakulkar-msft Any thoughts?

./azcopy sync "I:\final\1001\1001001" "container?sv=tocken" --delete-destination=true --include-pattern=".dwg;.pdf" --exclude-path="1/Obsolete;1/Quality;1/Quote;2;3;4"

--exclude-path string Exclude these paths when copying. This option does not support wildcard characters (*). Checks relative path prefix(For example: myFolder;myFolder/subDirName/file.pdf). When used in combination with account traversal, paths do not include the container name.

Just curious if there's been any headway on this issue. I'm running into this same problem but I'm using the copy mode instead of sync.

My source is "H:\BackupRoot\SiteBackup" and and I want to exclude "H:\BackupRoot\SiteBackup\SQLBackup"

I've tried the following combinations with no luck:
--exclude-path="SQLBackup"
--exclude-path="H:\BackupRoot\SiteBackup\SQLBackup\database.mdf"
--exclude-path="SiteBackup\SQLBackup\database.mdf"
--exclude-path="SiteBackup\SQLBackup"

No errors, and I see it's interrupted in the Job-Command line of the log.
Using AzCopy 10.4.3 x64 on Windows.

I am using the Azcopy v10.5.0 on Windows x64 and it is still not working. I tried using both full path and prefixes with no luck.

We are planning to use Azcopy in a production environment and this feature is urgently needed. I would be grateful if you can fix this issue.

Hi @berguner, can you post the command you ran, and the AzCopy log file? Please make sure you redact any SAS tokens used in the command.

I am trying to exclude a subfolder called "Thumbnail_Images" and I tried:

azcopy.exe sync $relative_folder_path "$blob_container/$folder_name/$sas" --recursive --exclude-path $run_name/Thumbnail_Images
azcopy.exe sync $relative_folder_path "$blob_container/$folder_name$sas" --recursive --exclude-path $run_name/Thumbnail_Images
azcopy.exe sync $relative_folder_path "$blob_container/$folder_name/$sas" --recursive --exclude-path $run_name\Thumbnail_Images
azcopy.exe sync $relative_folder_path "$blob_container/$folder_name$sas" --recursive --exclude-path $run_name\Thumbnail_Images
azcopy.exe sync $relative_folder_path "$blob_container/$folder_name/$sas" --recursive --exclude-path Thumbnail_Images
azcopy.exe sync $relative_folder_path "$blob_container/$folder_name$sas" --recursive --exclude-path Thumbnail_Images
azcopy.exe sync $full_folder_path "$blob_container/$folder_name/$sas" --recursive --exclude-path $run_name/Thumbnail_Images
azcopy.exe sync $full_folder_path "$blob_container/$folder_name$sas" --recursive --exclude-path $run_name/Thumbnail_Images
azcopy.exe sync $full_folder_path "$blob_container/$folder_name/$sas" --recursive --exclude-path $run_name\Thumbnail_Images
azcopy.exe sync $full_folder_path "$blob_container/$folder_name$sas" --recursive --exclude-path $run_name\Thumbnail_Images
azcopy.exe sync $full_folder_path "$blob_container/$folder_name/$sas" --recursive --exclude-path Thumbnail_Images
azcopy.exe sync $full_folder_path "$blob_container/$folder_name$sas" --recursive --exclude-path Thumbnail_Images

And none of the above worked. The log files don't show much because the "Thumbnail_Images" were already uploaded and there is nothing to sync. I can tell that the "Thumbnail_Images" is still being scanned both on the source and destination based on the number of files being scanned.

I am using the cp command below for the time being but it is not ideal because it only compares the timestamps. In the logs of the cp command, I can see that the number of scanned files don't include the number of files in the "Thumbnail_Images" folder.
azcopy.exe cp $full_folder_path "$blob_container/$sas" --recursive --overwrite isSourceNewer --exclude-path Thumbnail_Images

@berguner The exclude-path uses relative path, and I'd expect 'azcopy cp _src dst_ --recursive --exclude-path Thumbnail__Images' to work. Can you verify through AzCopy logs that it is not enclosed in quotes when passed to AzCopy as in here:
_Job-Command copy /home/nakulkar https://myaccount.blob.core.windows.net/container?SAS --exclude-path="NoQuotesHere" --recursive_
The AzCopy logs are in $HOME/.azcopy. I'll have a look after you post the logs here.

azcopy version 10.9.0

Same issue here. Tried relative path, folder name. All the folders/files excluded are replicated. My structure is as follow in the container:

/site1/App_Data/ClientDependency
/site2/App_Data/ClientDependency
...

I would like to exclude all "App_Data/ClientDependency" folders.

azcopy sync 'source' 'destination' --recursive --exclude-path='App_Data/ClientDependency' > doesn't work
azcopy sync 'source' 'destination' --recursive --exclude-path='/App_Data/ClientDependency' > doesn't work
azcopy sync 'source' 'destination' --recursive --exclude-path='ClientDependency' > doesn't work
azcopy sync 'source' 'destination' --recursive --exclude-path='site1/App_Data/ClientDependency' > works

It looks like the exclude-path must be specified starting from root.

  1. Its relative Path to be mentioned in exclude-path command.

Source= site
Destination: site

Folder Structure:--
site/site1/App_Data/ClientDependency
site/site2/App_Data/ClientDependency
site/site3/App_Data/ClientDependency

Try this command:--
./azcopy sync Source "https://StorageaccountName.blob.core.windows.net/containerName/site/?sv=A....D" --put-md5 --recursive --exclude-path 'site1/App_Data/ClientDependency;site2/App_Data/ClientDependency'

This will only Sync Site3 Folder to Azure Container.

Thank you for your answer. Yes using relative path it's working fine. My issue is that I have an unknown number of sites and I would like to define a single exclude-path rule which would exclude a folder for all sites.

Wildcards are not supported, exclude-pattern applies only to files and --list-of-files is not supported on sync so I guess my only chance would be to build a powershell script which goes trough the structure and calls the azcopy sync commands on folder I want to sync

To clarify, exclude-path works on relative paths under the given source. And exclude-pattern is for file names only. We'll try to clarify the docs to avoid this confusion.

--exclude-path string Exclude these paths when copying. This option does not support wildcard characters (*). Checks relative path prefix(For example: myFolder;myFolder/subDirName/file.pdf). When used in combination with account traversal, paths do not include the container name.

--exclude-pattern string Exclude these files when copying. This option supports wildcard characters (*).

@gfaessler we understand that there's not enough flexibility here to accommodate scenarios like yours. We were thinking that perhaps providing some kind of include-regex and exclude-regex may help, it'd be used over the entire relative path (under the source root) of each file. Please let us know if you have any feedback about that idea. Thanks.

@zezha-msft Providing include/exclude path regex would definitely be a useful feature to handle this kind of scenario. Otherwise supporting wildcard characters in exclude-path would also do the job in my scenario.

Just tried using AzCopy as Storage Explorer wasn't flexible enough. After 30mins of struggling to make exclude path work I ended up here. My scenario is I have a complex hierarchy several layers deep and at the deepest levels there's a collection of folders and I want to exclude one of those folders (that share a common name) from a sync from the blob to a local drive. I'd hate to have to manually specify each and every folder explicitly to exclude. Something simpler like the glob syntax ( /Folder/ ) or just skip any folder that matches the string in the exclude path would be perfect (and also to expose that in Storage Explorer too eventually too)

Was this page helpful?
0 / 5 - 0 ratings