Azure-storage-azcopy: AZCopy missing files

Created on 5 Jun 2019  路  7Comments  路  Source: Azure/azure-storage-azcopy

Which version of the AzCopy was used?

Note: The version is visible when running AzCopy without any argument

AzCopy 10.1.2

Which platform are you using? (ex: Windows, Mac, Linux)

Windows Server 2012 R2 Version 6.3 Build 9600

What command did you run?

Note: Please remove the SAS to avoid exposing your credentials. If you cannot remember the exact command, please retrieve it from the beginning of the log file.

.\azcopy.exe copy "F:\Folder\Archive\Company\*" "https://mystorrageaccount.blob.core.windows.net/frx?sv=..." --recursive --log-level DEBUG

What problem was encountered?

Some files are not copied to the storage account. The Debug log makes no mention of the missing files.

Properties of such a file:
File not available in the storage account

In the same folder, there are subfolders with in those subfolders files which are correctly uploaded. This seems unrelated, sometimes we have folders with only files and only some of the files are uploaded.

What does strike me as odd, is that on first inspection, it looks like only files that have size on disk: 8 KB are impacted

It is true that Deduplication is turned on for this drive:

Deduplication

How can we reproduce the problem in the simplest way?

I'm unsure but I think it might have something to do with Windows File Services Deduplication where if a file is on the drive multiple times, pointers are not followed.
(This is a hunch)

Have you found a mitigation/solution?

No

Most helpful comment

Thanks @JohnRusk

Something I discovered that might point towards the problem being in AzCopy is the following:

Folder Layout:

F:\Folder
F:\Folder\RegularFile.txt
F:\Folder\DeduppedFile.txt
F:\Folder\SubFolder\RegularFile.txt
F:\Folder\SubFolder\DeduppedFile.txt

Possible commands:

  1. .\azcopy.exe copy "F:\Folder" "https://storageaccount.blob.core.windows.net/folder?sv=... --recursive --log-level DEBUG
    This copies all regular files and folders, Dedupped files are not touched. The following files are in the storage account:
  • F:\Folder
  • F:\Folder\RegularFile.txt
  • F:\Folder\SubFolder\RegularFile.txt
  1. .\azcopy.exe copy "F:\Folder\*" "https://storageaccount.blob.core.windows.net/folder?sv=... --recursive --log-level DEBUG
    This copies the contents of 'Folder' completely, also the deduplicated files for the first level. The following files are in the storage account:
  • F:\Folder\RegularFile.txt
  • F:\Folder\DeduppedFile.txt
  • F:\Folder\SubFolder\RegularFile.txt

(Since we added the star, it's logical F:\Folder is no longer there)

  1. .\azcopy.exe copy "F:\Folder" "https://storageaccount.blob.core.windows.net/folder?sv=... --recursive --log-level DEBUG --follow-symlinks
    For every deduplicated file, this generates one time this line:

INFO: error evaluating the symlink path

The following files are in the storage account:

  • F:\Folder
  • F:\Folder\RegularFile.txt
  • F:\Folder\SubFolder\RegularFile.txt

Adding in the * after the folderpath also copies the deduplicated files in that folder, but not in the levels below (SubFolder) in this case.

This leads me to believe there is a difference in the way files are categorised between wildcard search (*) and files found through --follow-symlinks.

I hope that helps you guys in figuring this out.

This means a workaround is possible. I'll try to write a PowerShell script that does the folder enumeration. But that's for when I had some sleep. 馃槈

All 7 comments

Thanks @TiZon for such a detailed description.

Building on your hunch: deduplicated files are stored as reparse points. One type of reparse point is a symbolic link. My hunch is that maybe duplicated files are also getting identified by AzCopy as symlinks. I'm not sure of the details by which we identify symlinks, so don't know for sure if that's true. But, you could test it. Do you have a small reproducible test case? E.g. a directory that you can upload to a test container and for which you know some files are in that 8 KB state? If so, I'd suggest this:

  1. Upload it to a test container and confirm that, as per your hunch, some files are missing at the destination.
  2. Run AzCopy again, this time with this added to the command line:
    --follow-symlinks
    If that works, it confirms both your hunch and mine.

Hey @JohnRusk

Thanks for getting back to me. As per your points,

  1. This is true
  2. When using --follow-symlinks I get a huge amount of these errors:
INFO: error evaluating the symlink path

Looks like this error is generated on 2 places in the code:
https://github.com/Azure/azure-storage-azcopy/blob/4fb1b2f85fa098e989331b7e390f067424be0bf1/cmd/copyUploadEnumerator.go#L246

and

https://github.com/Azure/azure-storage-azcopy/blob/4fb1b2f85fa098e989331b7e390f067424be0bf1/cmd/copyUploadEnumerator.go#L378

both going to util.evaluateSymlinkPath where the interesting part is this I think:

https://github.com/Azure/azure-storage-azcopy/blob/4fb1b2f85fa098e989331b7e390f067424be0bf1/cmd/copyUtil.go#L269

What I did see was an interesting comment there:

// Network drives are not evaluated using the api "filepath.EvalSymlinks" since it returns error for the network drives.
// So readlink api is used to evaluate the symlinks.

So, as a workaround, I tried approaching the folder as a file share, this resulted in the same problem.

From what I can tell, either the file is incorrectly labeled as a symlink and should follow a different approach, or this is in fact a symlink but filepath.EvalSymlinks is unable to retrieve the correct file for some unknown reason.

As an extra, it looks like Azure File Explorer does handle this use case correctly (without using AZCopy in the background).

I'm now tracing with procmon and process explorer to see what comes up.

PS: Nothing is written in the DEBUG log about this currently.

Thanks @TiZon. That's further evidence that something is misclassifying your reparse points (for dedup) as if they are the other kind of reparse point (used for symlinks). The difficulty of fixing this will depend on whether the misclassification is happening in AzCopy code, or in the underlying Go SDK. (AzCopy v10 is written in Go). I'll add this issue to our backlog for investigation.

Thanks @JohnRusk

Something I discovered that might point towards the problem being in AzCopy is the following:

Folder Layout:

F:\Folder
F:\Folder\RegularFile.txt
F:\Folder\DeduppedFile.txt
F:\Folder\SubFolder\RegularFile.txt
F:\Folder\SubFolder\DeduppedFile.txt

Possible commands:

  1. .\azcopy.exe copy "F:\Folder" "https://storageaccount.blob.core.windows.net/folder?sv=... --recursive --log-level DEBUG
    This copies all regular files and folders, Dedupped files are not touched. The following files are in the storage account:
  • F:\Folder
  • F:\Folder\RegularFile.txt
  • F:\Folder\SubFolder\RegularFile.txt
  1. .\azcopy.exe copy "F:\Folder\*" "https://storageaccount.blob.core.windows.net/folder?sv=... --recursive --log-level DEBUG
    This copies the contents of 'Folder' completely, also the deduplicated files for the first level. The following files are in the storage account:
  • F:\Folder\RegularFile.txt
  • F:\Folder\DeduppedFile.txt
  • F:\Folder\SubFolder\RegularFile.txt

(Since we added the star, it's logical F:\Folder is no longer there)

  1. .\azcopy.exe copy "F:\Folder" "https://storageaccount.blob.core.windows.net/folder?sv=... --recursive --log-level DEBUG --follow-symlinks
    For every deduplicated file, this generates one time this line:

INFO: error evaluating the symlink path

The following files are in the storage account:

  • F:\Folder
  • F:\Folder\RegularFile.txt
  • F:\Folder\SubFolder\RegularFile.txt

Adding in the * after the folderpath also copies the deduplicated files in that folder, but not in the levels below (SubFolder) in this case.

This leads me to believe there is a difference in the way files are categorised between wildcard search (*) and files found through --follow-symlinks.

I hope that helps you guys in figuring this out.

This means a workaround is possible. I'll try to write a PowerShell script that does the folder enumeration. But that's for when I had some sleep. 馃槈

@TiZon It looks like you theory about symlinks is totally correct. We have traced it to a problem where the underlying Go SDK was mis-classifying de-duped files as symlinks. The bug has been fixed in the latest version of the Go SDK, so we will upgrade to that for the next release and, based on my testing today, that will fix it. We're aiming for a release around the end of this month.

If you can't wait that long, you can build AzCopy for source with the Go 12.6 SDK, and that should give you a working solution.

Great news! Thanks for that!

Fixed and released in 10.2.1

Was this page helpful?
0 / 5 - 0 ratings

Related issues

brettrowberry picture brettrowberry  路  4Comments

TiloWiklund picture TiloWiklund  路  5Comments

alvipeo picture alvipeo  路  4Comments

wahalulu picture wahalulu  路  3Comments

LoofahBu picture LoofahBu  路  3Comments