Dvc: Overlapping outs paths

Created on 19 Dec 2019  路  4Comments  路  Source: iterative/dvc

I'm getting a quite confusing error when trying to call dvc run

ERROR: failed to run command - Paths for outs:                          
'data'('data.dvc')
'data/prepared/prepared_data.csv'('data_preparation.dvc')
overlap. To avoid unpredictable behaviour, rerun command with non overlapping outs paths.


Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

Here are the steps to reproduce the issue:

  1. Create a new directory myproj, then call git init and dvc initinside of it
  2. Create a new directory data and put some data in it. The structure of my data directory is as follows:
data
|___ raw
        |___ dir1
                |___ file1
                ...
                |___ fileN
        |___ dir2
                |___ file1
                ...
                |___ fileN
  1. Run dvc add data to track the directory using dvc
  2. Create a new file called myscript.py within the myproj directory which does some preprocessing of the data
  3. Run the following command:
dvc run -d data/raw -d myscript.py -o data/prepared/prepared_data.csv -f data_preparation.dvc python3 myscript.py
  1. Bump into the error message that I've mentioned

DVC version (i.e. dvc --version): 0.77.3

Platform: KDE neon 5.17

Method of installation: DEB(Linux)

awaiting response question

Most helpful comment

Hi @anferico ! Good question! Like @efiop mentioned there is nothing wrong with DVC behavior indeed. I think there is some confusion in your workflow (and may be we don't communicate it somewhere well enough in our docs). In your case what you want to is to run:

dvc add data/raw

and then:

dvc run -d data/raw -d myscript.py -o data/prepared/prepared_data.csv -f data_preparation.dvc python3 myscript.py

(may be you would need to create the data/prepared directory first)

It might be that you actually can do this:

dvc run -d data/raw -d myscript.py -o data/prepared -f data_preparation.dvc python3 myscript.py

All 4 comments

Hi @anferico ! This is the expected behavior. When you dvc add data, dvc will track that whole directory as one entity, so you can't use another dvc-file to output anything inside of it, as it will break the reproducibility and checkout in general.

Hi @anferico ! Good question! Like @efiop mentioned there is nothing wrong with DVC behavior indeed. I think there is some confusion in your workflow (and may be we don't communicate it somewhere well enough in our docs). In your case what you want to is to run:

dvc add data/raw

and then:

dvc run -d data/raw -d myscript.py -o data/prepared/prepared_data.csv -f data_preparation.dvc python3 myscript.py

(may be you would need to create the data/prepared directory first)

It might be that you actually can do this:

dvc run -d data/raw -d myscript.py -o data/prepared -f data_preparation.dvc python3 myscript.py

Thank you @efiop and @shcheklein, it is actually me who hasn't read the documentation thoroughly.
Concerning the second solution pointed out by @shcheklein, if I specified -o data/prepared, I wouldn't be able to output any other file inside data/prepared through subsequent calls to dvc run, correct?

@anferico no worries :) glad that it helped.

Yes, if you do -o data/prepared then it expects from your script that you write the _full content_ of that directory every time you run the script. In fact, it'll be removing the previous version to ensure that they are not mixed and actually reproducible.

But to some extent it's the same as with -o data/prepared/prepared_data.csv - every time you run dvc run it'll be removing prepared_data.csv (it's not data loss since it is saved in cache anyway) and expecting your script to write it again.

Does it work for you or you have different requirements in mind?

(I'm closing this since it's not an issue, but let's keep the discussion going)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Suor picture Suor  路  39Comments

andrethrill picture andrethrill  路  70Comments

shcheklein picture shcheklein  路  36Comments

luchoPipe87 picture luchoPipe87  路  69Comments

kskyten picture kskyten  路  44Comments