The goal of this feature would be to preserve permission of files added to dvc. In our particular use-case it is useful to add a directory of scripts, binaries and config to dvc. When we pull those files or get them, they have lost original permissions requiring us to restore permission manually or with something like acl.
This feature would add a new attribute in the .dvc file.
eg: filemode
Or something more OS agnostic.
This feature would add a new --track-permissions flag:
dvc add
Later when a user pulls or gets files/dir with dvc the permissions are restored.
If track-permissions is not used, files are treated like they are now.
@mgkwill, it might be tricky for a file inside a DVC-tracked directory (we might need to add to .dir file meaning we'll have this information in two places). For a dvc-tracked file, it might be trivial to add the support.
we might need to add to .dir file meaning we'll have this information in two places
not exactly related, but it might be a good opportunity to include file sizes finally into dir files?
@mgkwill Do you mean simple user execution permission (like git) or something else as well?
@skshetry @shcheklein about sizes and metadata, there was an idea of having metadata stored separately from the regular cache files, so it is easier to manage ACL for it without giving access to the data contents. Just for the record.
Update: we can't really store perms for files in datasets right now, as .dir cache format doesn't currently allow for expansion. We'll be changing that in 2.0 very soon. But we could totally start with something like:
$ dvc add binary
...
$ cat binary.dvc
outs:
- md5: 9152ab3f172b3e23fd6b1a3ab0e1d150
path: foo
mode: 0775
so we would add mode to the generated dvcfile and will use it during checkout.
There are some questions as to wether or not store the whole mode or just limit ourselves to some subset and we also need to remember that this will need nice error handling, but overall still should be a good start.
I am picking up this issue.
@efiop I've created initial draft PR for this issue: https://github.com/iterative/dvc/pull/5036 It's far from ready to review. I'm following your recent work to add size into .dvc files from https://github.com/iterative/dvc/commit/e1b82c5222930c55886ca16a48c3a223d05b4af0
A few questions:
- md5: d41d8cd98f00b204e9800998ecf8427e
size: 0
mode: '0o100644'
path: binary
Is mode OK in such Python 3 octal notation, or should 0o better be removed?
https://docs.python.org/3/library/os.html#os.chmod
Note Although Windows supports chmod(), you can only set the file鈥檚 read-only flag with it (via the stat.S_IWRITE and
stat.S_IREAD constants or a corresponding integer value). All other bits are ignored.
Should dvc output some message in this case?
Hi @dudarev ! Thanks for looking into this! I've left a few comments in the PR itself.
Is mode OK in such Python 3 octal notation, or should 0o better be removed?
No strong opinion here, I imagine that we might end up supporting multiple formats, but any vanilla one will do as a default. Btw, maybe we should limit it to permission bits only? The S_IFREG part is not really useful for our purposes, so far we only care about permissions and, as you've noted in your windows comment, even that is kinda limited for windows. :slightly_frowning_face:
When chmod will be applied on file checkout on Windows the mode probably could not be set as mentioned in
Should dvc output some message in this case?
Great point! Another option is to mimic what git does and just track the user exec bit(maybe even rename the mode). Seems like that should suffice for @mgkwill too. WDYT?
Most helpful comment
@efiop I've created initial draft PR for this issue: https://github.com/iterative/dvc/pull/5036 It's far from ready to review. I'm following your recent work to add size into .dvc files from https://github.com/iterative/dvc/commit/e1b82c5222930c55886ca16a48c3a223d05b4af0
A few questions:
Is
modeOK in such Python 3 octal notation, or should0obetter be removed?https://docs.python.org/3/library/os.html#os.chmod
Should dvc output some message in this case?