Dvc: question: symlinks vs. hardlinks

Created on 3 Sep 2019  路  4Comments  路  Source: iterative/dvc

From this feature table it seems clear to me that symlinks are better than hardlinks (because they also work across partition boundaries):

Then the question is why not drop hardlinks and support only symlinks? From the users point of view this means less options to choose from and less confusion. But I believe that it may also simplify (slightly) the code, because there is one case less to consider and care about.

question

All 4 comments

@dashohoxha Hardlinks are not prone to stat/lstat, exist/lexist errors. Hardlinks are supported by default on windows, while symlinks need special permissions. Besides, I suppose some filesystem or their mounting options might have more or less trouble with one or the other, so it is nice having both to choose from, even though it marginally complicates the choosing process for new users, but we don't use any of those by default, so it is not like new users are going to notice. hardlinks and symlinks are for power users.

@efiop Thanks for the explanation.

Hardlinks are supported by default on windows, while symlinks need special permissions.

I'll take your word for it because I am not familiar with windows :)

I suppose some filesystem or their mounting options might have more or less trouble with one or the other

In Linux, hardlinks do not work across different mounted filesystems (by definition), while symlinks do. The type of filesystem does not matter. This article explains more about the details: https://hackernoon.com/reflinks-vs-symlinks-vs-hard-links-and-how-they-can-help-machine-learning-projects-wz2ej3xa7

but we don't use any of those by default, so it is not like new users are going to notice. hardlinks and symlinks are for power users.

That's true, and it is a good reason not to worry about hardlink/symlink.

@dashohoxha

I'll take your word for it because I am not familiar with windows :)

Yeah, we even have a special script to enable that https://github.com/iterative/dvc/blob/master/scripts/innosetup/addSymLinkPermissions.ps1 for our windows binary installer.

In Linux, hardlinks do not work across different mounted filesystems (by definition), while symlinks do. The type of filesystem does not matter. This article explains more about the details: https://hackernoon.com/reflinks-vs-symlinks-vs-hard-links-and-how-they-can-help-machine-learning-projects-wz2ej3xa7

I'm aware of that, and that is why in external cache dir scenario(e.g. you are working on ssd, but want your .dvc/cache to be on hdd) we are telling users to use symlinks. I'm just saying that I'm worried that there might be some filesystems or their particular mounting options that might have problems with symlinks within the same fs. That being said, except for windows, I don't remember any particular scenario where that would break, but we've seen some weird proprietary NASes that would behave really odd, so it seems ok to at least have the option to switch from one link type to another.
:slightly_smiling_face:

@dashohoxha I'll close this issue, as it seems to be resolved. Please feel free to reopen :slightly_smiling_face:

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jorgeorpinel picture jorgeorpinel  路  3Comments

dmpetrov picture dmpetrov  路  3Comments

ghost picture ghost  路  3Comments

TezRomacH picture TezRomacH  路  3Comments

shcheklein picture shcheklein  路  3Comments