Hi.
I often work with data files of several GBs, and in my DVC setup I use SSH for all of my data files and remote cache.
As I understand by this Discord messages, DVCs implementation of remote SSH cache do not support the use of reflinks, hardlinks, or symlinks.
Thus, I propose the feature request of enhancing the implementation of remote SSH cache to use file links, instead of the current copying og data files to the cache.
Note: sftp supports symlinks http://docs.paramiko.org/en/2.6/api/sftp.html#paramiko.sftp_si.SFTPServerInterface.symlink . But no reflinks or hardlinks so far, so for those we would have to either use cp through CLI or maybe launch our helper python script on the remote and use it. The former is a great start though, so we should use it for now, especially since symlinks are as fast as everything else and don't require much effort from our side.
Hi @mroutis and @efiop. Thank you of this feature. It seems to work out of the box, by adding dvc remote modify <remote> type symlink. Nicely done. 馃憤
But is it correct that SSH symlink is currently not supported for folders? 馃檪
@PeterFogh Dvc is not using symlinks for the folders themselves, it just creates them as needed, since there is no overhead and that is better compatible with the way we store cache. So with type symlink, you'll see regular directories with symlinks to cache in them.
@efiop, thanks for the clarifications. My concern was with the files in a folder, which initially did not seem to change to symlinks, but after running my pipeline again, I see all the files in a folder is stored as symlinks. It is awesome 馃憤
Most helpful comment
Note: sftp supports symlinks http://docs.paramiko.org/en/2.6/api/sftp.html#paramiko.sftp_si.SFTPServerInterface.symlink . But no reflinks or hardlinks so far, so for those we would have to either use
cpthrough CLI or maybe launch our helper python script on the remote and use it. The former is a great start though, so we should use it for now, especially since symlinks are as fast as everything else and don't require much effort from our side.