Dvc: add: empty files add broken when cache mode is hardlinks

Created on 24 Feb 2020  路  2Comments  路  Source: iterative/dvc

DVC Version

DVC version: 0.86.5+f67314.mod
Python version: 3.7.6
Platform: Darwin-18.2.0-x86_64-i386-64bit
Binary: False
Package: None
Cache: reflink - supported, hardlink - supported, symlink - supported
Filesystem type (cache directory): ('apfs', '/dev/disk1s1')
Filesystem type (workspace): ('apfs', '/dev/disk1s1')

Reproduce

dvc config cache.type hardlink
touch foo
dvc add foo

outputs:

ERROR: no possible cache types left to try out.
bug p0-critical

Most helpful comment

Found the root cause of this. This happens because of the optimization that we do if the file is empty (i.e. 0 bytes size). We never create a hardlink for the file with 0 bytes size and therefore it fails when we try to verify if the hardlink was created.

https://github.com/iterative/dvc/blob/a9bc65ee1f0446de766db59ad1b149de064c5360/dvc/remote/local.py#L170-L182

I think, is_hardlink() function should be aware of this optimization as well. I'll make a fix tomorrow.

P.S. This only failed if the file was empty (not even a newline). So:

echo > foo && dvc add foo # works because of extra-newline character added by `echo`
touch bar && dvc add bar # does not work as the file is empty

Also, subsequent runs would not fail because before this error is thrown out, the file gets cached (however, deletes file from workspace), i.e.

dvc init --no-scm
dvc config cache.type hardlink
touch foo && dvc add foo  # fails and deletes foo from workspace
touch foo && dvc add foo  # passes

All 2 comments

Found the root cause of this. This happens because of the optimization that we do if the file is empty (i.e. 0 bytes size). We never create a hardlink for the file with 0 bytes size and therefore it fails when we try to verify if the hardlink was created.

https://github.com/iterative/dvc/blob/a9bc65ee1f0446de766db59ad1b149de064c5360/dvc/remote/local.py#L170-L182

I think, is_hardlink() function should be aware of this optimization as well. I'll make a fix tomorrow.

P.S. This only failed if the file was empty (not even a newline). So:

echo > foo && dvc add foo # works because of extra-newline character added by `echo`
touch bar && dvc add bar # does not work as the file is empty

Also, subsequent runs would not fail because before this error is thrown out, the file gets cached (however, deletes file from workspace), i.e.

dvc init --no-scm
dvc config cache.type hardlink
touch foo && dvc add foo  # fails and deletes foo from workspace
touch foo && dvc add foo  # passes

@skshetry Great investigation! :pray:

Was this page helpful?
0 / 5 - 0 ratings