AzCopy 10.1.2
Windows Server 2012 R2 Version 6.3 Build 9600
.\azcopy.exe copy "F:\Folder\Archive\Company\*" "https://mystorrageaccount.blob.core.windows.net/frx?sv=..." --recursive --log-level DEBUG
Some files are not copied to the storage account. The Debug log makes no mention of the missing files.
Properties of such a file:

In the same folder, there are subfolders with in those subfolders files which are correctly uploaded. This seems unrelated, sometimes we have folders with only files and only some of the files are uploaded.
What does strike me as odd, is that on first inspection, it looks like only files that have size on disk: 8 KB are impacted
It is true that Deduplication is turned on for this drive:

I'm unsure but I think it might have something to do with Windows File Services Deduplication where if a file is on the drive multiple times, pointers are not followed.
(This is a hunch)
No
Thanks @TiZon for such a detailed description.
Building on your hunch: deduplicated files are stored as reparse points. One type of reparse point is a symbolic link. My hunch is that maybe duplicated files are also getting identified by AzCopy as symlinks. I'm not sure of the details by which we identify symlinks, so don't know for sure if that's true. But, you could test it. Do you have a small reproducible test case? E.g. a directory that you can upload to a test container and for which you know some files are in that 8 KB state? If so, I'd suggest this:
Hey @JohnRusk
Thanks for getting back to me. As per your points,
--follow-symlinks I get a huge amount of these errors:INFO: error evaluating the symlink path
Looks like this error is generated on 2 places in the code:
https://github.com/Azure/azure-storage-azcopy/blob/4fb1b2f85fa098e989331b7e390f067424be0bf1/cmd/copyUploadEnumerator.go#L246
and
both going to util.evaluateSymlinkPath where the interesting part is this I think:
What I did see was an interesting comment there:
// Network drives are not evaluated using the api "filepath.EvalSymlinks" since it returns error for the network drives.
// So readlink api is used to evaluate the symlinks.
So, as a workaround, I tried approaching the folder as a file share, this resulted in the same problem.
From what I can tell, either the file is incorrectly labeled as a symlink and should follow a different approach, or this is in fact a symlink but filepath.EvalSymlinks is unable to retrieve the correct file for some unknown reason.
As an extra, it looks like Azure File Explorer does handle this use case correctly (without using AZCopy in the background).
I'm now tracing with procmon and process explorer to see what comes up.
PS: Nothing is written in the DEBUG log about this currently.
Thanks @TiZon. That's further evidence that something is misclassifying your reparse points (for dedup) as if they are the other kind of reparse point (used for symlinks). The difficulty of fixing this will depend on whether the misclassification is happening in AzCopy code, or in the underlying Go SDK. (AzCopy v10 is written in Go). I'll add this issue to our backlog for investigation.
Thanks @JohnRusk
Something I discovered that might point towards the problem being in AzCopy is the following:
Folder Layout:
F:\Folder
F:\Folder\RegularFile.txt
F:\Folder\DeduppedFile.txt
F:\Folder\SubFolder\RegularFile.txt
F:\Folder\SubFolder\DeduppedFile.txt
Possible commands:
.\azcopy.exe copy "F:\Folder" "https://storageaccount.blob.core.windows.net/folder?sv=... --recursive --log-level DEBUG.\azcopy.exe copy "F:\Folder\*" "https://storageaccount.blob.core.windows.net/folder?sv=... --recursive --log-level DEBUG(Since we added the star, it's logical F:\Folder is no longer there)
.\azcopy.exe copy "F:\Folder" "https://storageaccount.blob.core.windows.net/folder?sv=... --recursive --log-level DEBUG --follow-symlinksINFO: error evaluating the symlink path
The following files are in the storage account:
Adding in the * after the folderpath also copies the deduplicated files in that folder, but not in the levels below (SubFolder) in this case.
This leads me to believe there is a difference in the way files are categorised between wildcard search (*) and files found through --follow-symlinks.
I hope that helps you guys in figuring this out.
This means a workaround is possible. I'll try to write a PowerShell script that does the folder enumeration. But that's for when I had some sleep. 馃槈
@TiZon It looks like you theory about symlinks is totally correct. We have traced it to a problem where the underlying Go SDK was mis-classifying de-duped files as symlinks. The bug has been fixed in the latest version of the Go SDK, so we will upgrade to that for the next release and, based on my testing today, that will fix it. We're aiming for a release around the end of this month.
If you can't wait that long, you can build AzCopy for source with the Go 12.6 SDK, and that should give you a working solution.
Great news! Thanks for that!
Fixed and released in 10.2.1
Most helpful comment
Thanks @JohnRusk
Something I discovered that might point towards the problem being in AzCopy is the following:
Folder Layout:
F:\Folder
F:\Folder\RegularFile.txt
F:\Folder\DeduppedFile.txt
F:\Folder\SubFolder\RegularFile.txt
F:\Folder\SubFolder\DeduppedFile.txt
Possible commands:
.\azcopy.exe copy "F:\Folder" "https://storageaccount.blob.core.windows.net/folder?sv=... --recursive --log-level DEBUGThis copies all regular files and folders, Dedupped files are not touched. The following files are in the storage account:
.\azcopy.exe copy "F:\Folder\*" "https://storageaccount.blob.core.windows.net/folder?sv=... --recursive --log-level DEBUGThis copies the contents of 'Folder' completely, also the deduplicated files for the first level. The following files are in the storage account:
(Since we added the star, it's logical F:\Folder is no longer there)
.\azcopy.exe copy "F:\Folder" "https://storageaccount.blob.core.windows.net/folder?sv=... --recursive --log-level DEBUG --follow-symlinksFor every deduplicated file, this generates one time this line:
The following files are in the storage account:
Adding in the * after the folderpath also copies the deduplicated files in that folder, but not in the levels below (SubFolder) in this case.
This leads me to believe there is a difference in the way files are categorised between wildcard search (*) and files found through
--follow-symlinks.I hope that helps you guys in figuring this out.
This means a workaround is possible. I'll try to write a PowerShell script that does the folder enumeration. But that's for when I had some sleep. 馃槈