Why does't DVC support subdirectories? In some cases, users would prefer to keep all dvc files in a separate dir or in the same dir as outputs. It is not convenient to use the --cwd option all the time and manually redefine paths relative to outputs.
The implementation should be straightforward - just redefine all inputs and outputs relative to this dir and store in dvc file.
$ dvc run -f data/process.dvc -d data/file.txt -o data/out.txt python myprocess.py
Error: failed to run command - stage file name 'data/process.dvc' should not contain subdirectories. Use '-c|--cwd' to change location of the stage file.
This would be great鈥攁nother pain point with the current system is that I currently need to copy my oauth credentials for our cloud data source from the project root into the data/ directory to make this work. Otherwise my job fails because it can't auth.
Or better yet, have the .dvc file by default end up alongside the file it's describing. So
$ dvc run -d data/file.txt -o data/out.txt python myprocess.py
would put both out.txt and out.txt.dvc into data/ by default. This works great for my use case :)
@sursh do you mean that --cwd data/ option forces you to copy your credentials in data/?
Create dvc file in data/ dir - it is a good point. Another option might be a single dvc file (like Dvcfile) for multiple stages which is not implemented yet - #1341. Which option would you prefer as a default behavior?
The implementation should be straightforward - just redefine all inputs and outputs relative to this dir and store in dvc file.
That won't work, because you also need to redefine paths in your command/script itself. Hence the --cwd option that actually runs your command in that directory, so paths in -d/-o match the ones that you would provide for your cmd. E.g. dvc run -c dir -d input -o output cmd input output.
What we really need is cwd: support in our dvc files, so that dvc run would save the directory that cmd is running from and then dvc repro could use it instead of relying on dvc file location as a cwd for repro. The task for that is here https://github.com/iterative/dvc/issues/1092
That won't work, because you also need to redefine paths
You are right, it is not as straightforward as it looks like. First, we need to implement cwd support in dvc files.
Related #1092
Yes, a Dvcfile by default with an ability to redefine output by -f could be a good option, especially with multiple-commands-Dvcfile. However, it is a plan for 1.0 and it might take time to release.
But it might worth to implement -f with subdirectories and cwd-in-dvc-file now since it is compatible with 1.0.
To clarify the credentials question鈥攎y script needs oauth credentials which are stored in .httr-oauth in the project root. When I run dvc run with a -f data/ argument, my script can no longer find .httr-oauth because it's not in /data. So I had to copy .httr-oauth into data/ to get the script to work.
Hi @sursh !
We've released this feature in 0.30.0. Please feel free to upgrade and give it a try :slightly_smiling_face: Would love to hear about your experience with it!
Thanks,
Ruslan
Most helpful comment
Hi @sursh !
We've released this feature in 0.30.0. Please feel free to upgrade and give it a try :slightly_smiling_face: Would love to hear about your experience with it!
Thanks,
Ruslan