Dvc: ssh: support reflink/hardlink/symlink

Created on 22 Feb 2019  路  5Comments  路  Source: iterative/dvc

Hi.

I often work with data files of several GBs, and in my DVC setup I use SSH for all of my data files and remote cache.
As I understand by this Discord messages, DVCs implementation of remote SSH cache do not support the use of reflinks, hardlinks, or symlinks.

Thus, I propose the feature request of enhancing the implementation of remote SSH cache to use file links, instead of the current copying og data files to the cache.

c13-half-a-week feature request p1-important

Most helpful comment

Note: sftp supports symlinks http://docs.paramiko.org/en/2.6/api/sftp.html#paramiko.sftp_si.SFTPServerInterface.symlink . But no reflinks or hardlinks so far, so for those we would have to either use cp through CLI or maybe launch our helper python script on the remote and use it. The former is a great start though, so we should use it for now, especially since symlinks are as fast as everything else and don't require much effort from our side.

All 5 comments

Note: sftp supports symlinks http://docs.paramiko.org/en/2.6/api/sftp.html#paramiko.sftp_si.SFTPServerInterface.symlink . But no reflinks or hardlinks so far, so for those we would have to either use cp through CLI or maybe launch our helper python script on the remote and use it. The former is a great start though, so we should use it for now, especially since symlinks are as fast as everything else and don't require much effort from our side.

  • [ ] Implement symlink first (as a work in progress)

Hi @mroutis and @efiop. Thank you of this feature. It seems to work out of the box, by adding dvc remote modify <remote> type symlink. Nicely done. 馃憤
But is it correct that SSH symlink is currently not supported for folders? 馃檪

@PeterFogh Dvc is not using symlinks for the folders themselves, it just creates them as needed, since there is no overhead and that is better compatible with the way we store cache. So with type symlink, you'll see regular directories with symlinks to cache in them.

@efiop, thanks for the clarifications. My concern was with the files in a folder, which initially did not seem to change to symlinks, but after running my pipeline again, I see all the files in a folder is stored as symlinks. It is awesome 馃憤

Was this page helpful?
0 / 5 - 0 ratings