Dvc: Hardlinks not correctly detected on SMB network share (NTFS?)

Created on 7 Jan 2020  路  24Comments  路  Source: iterative/dvc

Using DVC version 0.80, with repo and cache both on the same network drive.
Windows setup.

I'm testing the unprotect command, but I assume other commands relying on the same check will also exhibit strange failures.

PS Q:\dvc-test> dvc --version
0.80.0
PS Q:\dvc-test> cat .dvc\config
[cache]
dir = 'Q:\DVC cache (test)'
type = "reflink,symlink,hardlink,copy"
protected = true
PS Q:\dvc-test> dir


    Directory: Q:\dvc-test


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----       06/01/2020     16:58                .dvc
-a----       06/01/2020     14:57             24 .gitignore
-a----       06/01/2020     14:57            155 big_file.dvc
-ar---       27/12/2019     16:42      134217728 big_file

PS Q:\dvc-test> fsutil.exe hardlink list big_file
Error:  The request is not supported.

As you can see, I checked out my files from git at 14:57 today; dvc checkout created a protected link to the original 128Mb file created 10 days ago, rather than a copy (as requested), but Windows can't tell me that it is a link. Trust me, it is 馃槈

PS Q:\dvc-test> dvc unprotect big_file
PS Q:\dvc-test> dir


    Directory: Q:\dvc-test


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----       06/01/2020     16:58                .dvc
-a----       06/01/2020     14:57             24 .gitignore
-a----       06/01/2020     14:57            155 big_file.dvc
-a----       27/12/2019     16:42      134217728 big_file

Strange - the unprotect operation was very fast, and the creation date of the file is still last year rather than today.

PS Q:\dvc-test> echo Hello >> big_file
PS Q:\dvc-test> dvc status
WARNING: corrupted cache file '..\DVC cache (test)\1d\cca63ad430e16fa12716d1a9bb3a6c'.
big_file_3.dvc:
     changed outs:
                                                                                                                       not in cache:       big_file_3

Sure enough, modifying big_file corrupts the cache, since I was not modifying a copy.

awaiting response bug enhancement

All 24 comments

@rxxg I am unabe to reproduce the issue,
can I ask you to run following scirpt?

rmdir /s repo
mkdir repo
pushd repo
git init --quiet
dvc init -q
dvc config cache.type hardlink
dvc config cache.protected true
git commit -am "init dvc"
fsutil file createnew data 10485760
dvc add data
git add .gitignore data.dvc
git commit -am "add data"
fsutil hardlink list data
dvc unprotect data
fsutil hardlink list data
echo hello >> data
dvc status
popd

Does the status display corrupted cache WARNING?

[EDIT]
Also, can I ask you to provide output of dvc version command? (note that its without --)

[EDIT2]
Sorry, forgot its NFS drive, let me try to reproduce that again.

@rxxg Also, as a temporary workaround you can change cache type to copy (dvc config cache.type copy) and use dvc checkout --relink big_file.dvc.

@pared Thanks for looking at this. Yes, the use of a network drive (repo and cache) is essential.

(I had been using copy cache but it is very slow for our use case since the copy that DVC does involves reading then writing to the network drive in 16k chunks. Native Windows copy is of the order of 3 seconds for a 128Mb file, 30 seconds for shutil.copyfileobj. There may be a separate bug report or PR for this.)

For the record:

Q:\repo> git init --quiet
Q:\repo> dvc init -q
Q:\repo> dvc config cache.type hardlink
WARNING: You have changed the 'cache.type' option. This doesn't update any existing workspace file links, but it can be done with:
             dvc checkout --relink
Q:\repo> dvc config cache.protected true
Q:\repo> git commit -am "init dvc"
[master (root-commit) 7e8ba77] init dvc
 2 files changed, 12 insertions(+)
 create mode 100644 .dvc/.gitignore
 create mode 100644 .dvc/config
Q:\repo> fsutil file createnew data 10485760
File Q:\repo\data is created
Q:\repo> dvc add data
100% Add|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾1.00/1.00 [00:01<00:00,  1.48s/file]

To track the changes with git, run:

     git add .gitignore data.dvc
Q:\repo> git add .gitignore data.dvc
Q:\repo> git commit -am "add data"
[master d7e862f] add data
 2 files changed, 8 insertions(+)
 create mode 100644 .gitignore
 create mode 100644 data.dvc
Q:\repo> fsutil hardlink list data
Error:  The request is not supported.
Q:\repo> dvc unprotect data
Q:\repo> fsutil hardlink list data
Error:  The request is not supported.
Q:\repo> echo hello >> data
Q:\repo> dvc status
WARNING: corrupted cache file '.dvc\cache\f1\c9645dbc14efddc7d8a322685f26eb'.
data.dvc:
     changed outs:
                                                                                                                                                                                           not in cache:       data
Q:\repo> dvc version
DVC version: 0.80.0
Python version: 3.7.1
Platform: Windows-10-10.0.16299-SP0
Binary: False
Package: pip
Cache: reflink - False, hardlink - True, symlink - False

@rxxg Ok, thank you very much, I am trying to reproduce it on my machine.

@rxxg Could you please install psutil with pip install psutil and then run dvc version again and show us the output?

Side note for us: need to improve the way dvc version tests for link types by doing additional check for the created links. E.g. create hardlink and then do a sanity check with System.is_hardlink.

Sure.

DVC version: 0.80.0
Python version: 3.7.1
Platform: Windows-10-10.0.16299-SP0
Binary: False
Package: pip
Cache: reflink - False, hardlink - True, symlink - False
Filesystem type (cache directory): ('NTFS', 'Q:\\')
Filesystem type (workspace): ('NTFS', 'Q:\\')

@rxxg I see that it is reporting NTFS, but you were saying you are on NFS. Was it a typo or am I missing something?

@rxxg Btw, is it your work machine or your personal one? We've seen something similar on NTFS in https://github.com/iterative/dvc/issues/2831 , but weren't able to find the cause for such a strange FS behavior at that time.

Sorry, typo 馃槼 Windows NTFS network share
It's my work machine, so I have zero control over the servers and even finding out info about the hardware/network protocol is hard work.
I had locking failures which came from the same cause (#2944) but things are fine since the change to the locking system.

@rxxg Thanks for clarifying! Makes more sense now. Btw, I suppose you don't have WSL enabled either, right? That would explain why fsutil doesn't work for you. That won't explain the original issue though, so we are still researching...

The issue might be caused by us using GetFileInformationByHandle, which could return incomplete data https://stackoverflow.com/questions/3523271/get-windows-hardlink-count-without-getfileinformationbyhandle. Looks like FindFirstFileNameW and FindNextFileNameW are the alternatives. And ansible is actually using it as well https://github.com/ansible/ansible/blob/105f60cf480572fb5547794cda1f9a05559ae636/lib/ansible/module_utils/powershell/Ansible.ModuleUtils.LinkUtil.psm1#L230 .

fsutil does work as expected on my local drive. I don't know what WSL is I'm afraid.

So we need to make our is_hardlink https://github.com/iterative/dvc/blob/0.80.0/dvc/system.py#L235 use FindFirstFileNameW and FindNextFileNameW to count hardlinks instead of relying on nNumberOfLinks. And then give you the dev version to check if it works for you. 馃檪

@rxxg Created a POC patch for it. Please run

pip uninstall -y dvc
pip install git+https://github.com/efiop/dvc@3080

to install it and then run

dvc version

and share its output.

Bad news 馃槥

(dvc-3080) PS Q:\dvc-test> dvc -v version
ERROR: unexpected error - (50, 'FindFileNames', 'The request is not supported.')


Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

@rxxg Could you show dvc version -v (in that particular order), please?

Oops, sorry.

(dvc-3080) PS Q:\dvc-test> dvc version -v
ERROR: unexpected error - (50, 'FindFileNames', 'The request is not supported.')
------------------------------------------------------------
Traceback (most recent call last):
  File "c:\users\rxg\dvc-3080\lib\site-packages\dvc\main.py", line 48, in main
    ret = cmd.run()
  File "c:\users\rxg\dvc-3080\lib\site-packages\dvc\command\version.py", line 46, in run
    "Cache: {}".format(self.get_linktype_support_info(repo))
  File "c:\users\rxg\dvc-3080\lib\site-packages\dvc\command\version.py", line 103, in get_linktype_support_info
    link(src, dst)
  File "c:\users\rxg\dvc-3080\lib\site-packages\dvc\system.py", line 48, in hardlink
    assert System.is_hardlink(link_name)
  File "c:\users\rxg\dvc-3080\lib\site-packages\dvc\system.py", line 250, in is_hardlink
    return System._count_hardlinks(path) > 1
  File "c:\users\rxg\dvc-3080\lib\site-packages\dvc\system.py", line 241, in _count_hardlinks
    return len(FindFileNames(path))
pywintypes.error: (50, 'FindFileNames', 'The request is not supported.')
------------------------------------------------------------


Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

Thanks @rxxg ! Interesting. Btw, are you aware of how the network share is setup? I'm not really a windows guy, and google didn't help much 馃檨 Is there a central server that you are connected to? If that is so, my only explanation right now is that it is running something old which doesn't support FileFirstFileName.

Looks like we ran out of options here, fs is returning an incomplete data and alternative ways of counting links are not supported. Another option that might work for you is enabling symlink support on your machine and using dvc config cache.type symlink.

I don't have many details about the network server, sorry. Windows tells me that there is a cluster running NTFS + DFS, but I don't know what's on the other side.

My biggest concern at this point is that DVC is detecting that hardlinks are available (which they are, kind of) and trying to use them, but then failing to detect that the links have been correctly created. So if there are no other options for checking links DVC should refuse to try and create them and fallback to the next cache type?

I will try symlinks next.

[EDIT]
So under Windows, symlinks require special workstation configuration which means it's a non-starter for me unfortunately.

My biggest concern at this point is that DVC is detecting that hardlinks are available (which they are, kind of) and trying to use them, but then failing to detect that the links have been correctly created. So if there are no other options for checking links DVC should refuse to try and create them and fallback to the next cache type?

Yes, will update my PR to do preciselly that. Currently using simple asserts in it, but should actually rise a proper exception instead. Thanks for the reminder! 馃檪

So under Windows, symlinks require special workstation configuration which means it's a non-starter for me unfortunately.

Have you tried installing our exe? Or do you have very limited rights on your machine?

So under Windows, symlinks require special workstation configuration which means it's a non-starter for me unfortunately.

Have you tried installing our exe? Or do you have very limited rights on your machine?

I'm working on a utility which combines git + DVC in one UI for our particular workflow, to be internally redistributed as one package. My user base doesn't have admin rights on their machines. (I'm assuming the dvc exe needs admin to set up symlinks?)

Sorry, typo 馃槼 Windows NTFS network share

I've edited the issue description, I'm not 100% convinced that there actually is an NTFS system on the other side of the network. Windows seems to always report NTFS in the UI, even when there is something else (like an OSX share).

My user base doesn't have admin rights on their machines. (I'm assuming the dvc exe needs admin to set up symlinks?)

Yes, but I think there is a way to do that without those rights https://www.google.com/search?q=windows+symlink+without+admin , but I haven't tried that myself 馃檨 Or, well, you could ask your admin to enable those for your machines 馃檪

Was this page helpful?
0 / 5 - 0 ratings

Related issues

GildedHonour picture GildedHonour  路  3Comments

tc-ying picture tc-ying  路  3Comments

robguinness picture robguinness  路  3Comments

shcheklein picture shcheklein  路  3Comments

dmpetrov picture dmpetrov  路  3Comments