Windowsserverdocs: Robocopy corrupting deduplicated volume?

Created on 11 Aug 2018  Â·  13Comments  Â·  Source: MicrosoftDocs/windowsserverdocs

The documents state:

Running Robocopy with Data Deduplication is not recommended because certain Robocopy commands can corrupt the Chunk Store. The Chunk Store is stored in the System Volume Information folder for a volume. If the folder is deleted, the optimized files (reparse points) that are copied from the source volume become corrupted because the data chunks are not copied to the destination volume.

The statement that using a well-known file copy tool will cause corruption of the chunk store is quite concerning. The details are unclear, or otherwise insufficiently inform the user what is not allowed or would cause corruption.

Questions

  1. Is the problem being referenced simply a deletion of the folder System Volume Information in the root of a volume?
  2. Is there something special about robocopy (vs. other commands such as rd /s /q System Volume Information) that makes robocopy specifically dangerous?
  3. What are examples of the "dangerous" commands, with explanations of the danger?

Strawman Examples Of "Dangerous" commands

  • robocopy D:\ E:\ /E -- because this includes the System Volume Information, if both volumes are deduplicated, [[_something bad happens_]]??

Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Pri2 assigned-to-author question storage triaged windows-server-thresholprod

Most helpful comment

8 months later and this is still open... It would be great to get more clarification on what can and cannot be done with robocopy on a deduped hard drive. Leaving it so vague has the potential to cause major issues for users.

All 13 comments

Hi @henrygab,

Robocopy has a design bug that causes it to delete SVI on the source-side of a copy operation. This has been the case for many years (at least in 2012, 2012 R2, and 2016, though likely earlier). This has been fixed in Windows Server, version 1803, and will be available in Windows Server 2019 as well.

There is nothing special about robocopy in this case, any software or command that deletes SVI will corrupt deduplicated files because that's where the file's content is located.

@dawnwood please make a call whether this issue can be closed or should remain open until Windows Server 2019 ships.

Thanks,

Will Gries
Program Manager, Data Deduplication

Can you help me find more information on when Robocopy causes this?

Specifically, since you indicate it deletes the source System Volume Information, it sounds like this will only occur when the command line options are set to modify the source. For example, using the /MOV or /MOVE command-line option.

Based on the scant details so far, it also sounds like this could also occur if using /MIR or /PURGE, where the destination System Volume Information get corrupted.

So, if robocopy is kept at least one directory away from the root directory (and thus never sees SVI), does the bug with Robocopy + data deduplication ever manifest? If not, that's a LOT easier to understand and avoid.

Thanks for helping me understand this, regardless of the outcome!

@wmgries Can you help @henrygab with this question?

Would also love an answer to this question.
Generally, out of habit, I throw this switch into my Robocopies: /XD "$RECYCLE.BIN" "System Volume Information"
Would that be enough to avoid potential corruption?

Maybe include "/XA:H" too, system files should be hidden, but you may miss some other stuff :-/

I think there is the possibility that the

  • "Recovery" dir (!) and maybe the Files:
  • hiberfile.sys
  • pagefile.sys
  • swapfile.sys

could cause problems as well. I think Robocopy should get a switch /DoNotCOPYwinOSfiles :-) and just not copy these files by default. Or the help file/documentation should be improved and include warnings.

@wmgries
Sure you meant source drive? because this makes no sense while copying, moving of course, but /E ?

@wmgries , have you had a chance to determine the appropriate clarification here?

It seems difficult for multiple people to understand how the source drive would be affected (unless using /MOV).
If this was intending to indicate that the target drive could be corrupted, e.g., by including System Volume Information, then I think all is good.
If there was a third potential cause, please let the world know … I rely on robocopy too much to not know....
Thanks!

It's been a while, Robocopy damages the source directory/source device if you copy something, right?
Or was it just moving files? or /MIR?

It's just mindboggling that a backup tool destroys the source and for the devs it's

There is nothing special about Robocopy in this case, any software or command that deletes SVI will corrupt deduplicated files because that's where the file's content is located.

Does only /MIR destory the Source (!) device or /E too?

Just saw this, as I'm considering Dedup fo rmy backup server. I am thinking robocopy is safe, as long as you are not touching the SVI. Any further news on this?

8 months later and this is still open... It would be great to get more clarification on what can and cannot be done with robocopy on a deduped hard drive. Leaving it so vague has the potential to cause major issues for users.

I am one of the lucky ones! This killed my data in 2016:

Options: *.* /S /E /DCOPY:DA /COPY:DAT /PURGE /MIR /B /NP /XJ /R:0 /W:30 

The source was not deduplicated, the destination was deduplicated: All copied files on destination report their correct size but they yield 0 bytes of data when read.

I still keep an external hard drive (destination) with corrupted chunks as a keepsake of my ongoing commitment to never use data deduplication on important data and only do GUI-based copy/paste for moving data to and from those volumes.

Since the reparse point data format is undocumented (when i checked last time and found only an academic paper about forensic analysis of 2012's version but it did not go deep enough to be useful) there is no hope to rebuild files even if the chunk data was copied in SystemVolume Information ....

I recovered most of it from previous backups but some data is still unique in that drive...

If I had a way to invoke the deduplication funtction manually by passing all of the required data it would be useful for my case.

I think the issue only happens if you use /MOVE or /MIR on the root of a dedup'd volume. That way it tries to move System Volume too breaking all pointers in the progress. Still would appreciate it if @wmgries could provide more clarity. It's been over a year since this was raised...

I could not understand this bug without this nice article:
https://support.microsoft.com/en-us/help/2834834/fsrm-and-data-deduplication-may-be-adversely-affected-when-you-use-rob

I will copy it because microsoft always change/break links:

When you use the Robocopy utility together with the /MIR option in Windows Server 2012, Robocopy mirrors the source directory to the destination directory. First, all contents in the destination directory that do not exist in the source directory are deleted. Then, all files that the user can access are copied from the source directory to the destination directory. To copy the files and folders that the user cannot access, you should use the /B or /ZB option of Robocopy.

When you use the /MIR option to copy a whole volume

  • File Server Resource Manager (FSRM)

FSRM stores quotas, file screens, and other configuration information in the System Volume Information folder. If the folder is deleted, quotas, file screens, and other configuration information will not be enforced on the destination volume.

  • Data Deduplication

Data Deduplication keeps the common chunk store in the System Volume Information folder. If the folder is deleted, the optimized files (reparse points) that are copied from the source volume become corrupted because the data chunks are not copied to the destination volume.

Moreover, issues occur if the source volume does not have Data Deduplication enabled while the destination volume does, or vice versa. The following are some examples:

Note In the sample commands, P: is a volume that does not have Data Deduplication enabled, and M: is a volume that has data deduplication enabled.

  1. You execute the following command:
    robocopy P: M: /MIR
    The result is that M:System Volume Information is deleted. Therefore, the deduplicated files on M: are corrupted.
  2. You execute the following command:
    robocopy P: M: /MIR /ZB
    The result is that M:System Volume Information\Dedup is deleted. Therefore, the deduplicated files on M: are corrupted.
  3. You execute the following command:

robocopy M: P: /MIR /ZB
The result is that all deduplication metadata is copied to the P:System Volume Information\Dedup folder. Because the chunk store IDs on both volumes are the same, problems may occur in future migrations.
To work around the problems in these examples, use the /XD option to exclude the System Volume Information folder from the scope of the command. For example, the following command excludes the System Volume Information folder:
robocopy P: M: /MIR [/ZB] /XD "System Volume Information"
Last Updated: Jun 5, 2013

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tgmoorhead picture tgmoorhead  Â·  4Comments

wilsonnkwan picture wilsonnkwan  Â·  4Comments

timtribers picture timtribers  Â·  4Comments

Garthmj picture Garthmj  Â·  3Comments

parabolic123 picture parabolic123  Â·  4Comments