dvc add foo.dat.foo.data is _deleted_, and then dvc raises a "No cache types left to try" error when trying to symlink from the cache back to the local directory. copy and everything works fine.I understand the symlink creation failing, but I don't understand why that error should result in the loss of data!
DVC version: 1.8.1 (exe)
Platform: Python 3.7.9 on Windows-10-10.0.17763-SP0
Supports: All remotes
Cache types: symlink
Cache directory: NTFS on Z:\
Workspace directory: NTFS on D:\
Repo: dvc, git
So in this case what's probably happening is that technically the data is not lost since the files are properly moved into the cache (on your SMB share). However, due to our cache.save behavior we end up making it look like a loss of data because of the symlink failure (and we don't provide any useful information for recovering the file from cache).
The issue is that for cache.save the move from workspace to cache on its own is atomic, but the entire "move + link" step is not if the link fails.
I think what we should probably be doing here (when the cache file is created, but only the link fails) is to consider the add successful and create the .dvc file, and then raise a troubleshooting message suggesting something along the lines of "try configuring a different link type and then rerun dvc checkout ...".
Alternatively, we would need to adjust cache.save to be properly atomic - so if the link step fails, we would need to copy the file from cache back into the original workspace location.
@efiop thoughts?
Technically no loss of data, agreed, but data that you can't access is still data loss from the user's perspective!
Most helpful comment
Technically no loss of data, agreed, but data that you can't access is still data loss from the user's perspective!