Say I have the following structure:
script.py
module/submodule/foo.py
Then:
dvc run script.py -d module -o bar.npy
If I modify foo.py, dvc repro does not notice. The reason is that neither the inode nor the mtime of module is updated when a modification is made to foo.py.
For now it is possible to add all the files in module as dependencies but it would surely be better to have a way to handle folders recursively.
Hi @tdeboissiere !
Great point! Preparing a patch right now.
Thanks,
Ruslan
Hi @tdeboissiere !
The patch is merged, I'm releasing 0.18.6 with it right now. Should be ready in an hour or so. I'll let you know when it is out. Thank you so much for the feedback!
@tdeboissiere 0.18.6 is out :slightly_smiling_face: Please feel free to upgrade and give it a try.
Thanks,
Ruslan
Lightning fast as usual, the fix seems to work well !
2 observations:
The same problem seems to exist for outputs. If I have dvc run -d XXX -o output_folder, then dvc repro output_folder.dvc will not reproduce the steps if something within output_folder is changed.
Say I have a pipeline with two stages, and each stage has the same -d folder dependency. In that case, if I change a file in folder which would only be used in the second stage, dvc repro will reproduce the first stage as well because of the -d folder dependency. Solving this problem would require DVC to know in advance all the files which are used at a given stage, which is probably a bit tricky and best left to the user...
Hi @tdeboissiere !
The same problem seems to exist for outputs. If I have dvc run -d XXX -o output_folder, then dvc repro output_folder.dvc will not reproduce the steps if something within output_folder is changed.
Are you experiencing it even with the never version?
Say I have a pipeline with two stages, and each stage has the same -d folder dependency. In that case, if I change a file in folder which would only be used in the second stage, dvc repro will reproduce the first stage as well because of the -d folder dependency. Solving this problem would require DVC to know in advance all the files which are used at a given stage, which is probably a bit tricky and best left to the user...
Yes, this is something that should be handled by the user. Dvc can only know that you are using something if you explicitly specify it with -d. The scenario you've described can be worked around by separating files in a folder appropriately and then specifying separated and common parts as dependencies in your pipeline stages. I.e. something like:
$ dvc run -d common_dir -d dir_1 ...
$ dvc run -d common_dir -d dir_2 ...
Thanks,
Ruslan
Hm, I'll investigate that ASAP, seems like a bug. Thank you for reporting the issue!
Reopening the issue to track the progress on the outputs bug.
@tdeboissiere Got it. Was able to reproduce. Preparing a patch right now and will release 0.18.8 in an hour or so. Thank you!
@tdeboissiere 0.18.8 is out. Please feel free to upgrade and see if the issue persists ( I've added a test, but it never hurts to confirm that original issue is indeed fixed :slightly_smiling_face: ).
Thanks,
Ruslan
It does work in the use case that caused me to raise the issue, thanks !
Glad it worked for you! One more question:
Thanks for the tip for my second point ! Rather than splitting sub directories, I think it is simpler if I externally enforce the same git status for all my depencies at all stages.
Could you please elaborate on what you meant by the git status please?
Thanks,
Ruslan
Ah, got it. Thank you for clarifying!
Most helpful comment
@tdeboissiere 0.18.8 is out. Please feel free to upgrade and see if the issue persists ( I've added a test, but it never hurts to confirm that original issue is indeed fixed :slightly_smiling_face: ).
Thanks,
Ruslan