Dvc: repro -f should still honour dependencies

Created on 31 Mar 2020  路  10Comments  路  Source: iterative/dvc

DVC 0.91.0, installed with pipx on Ubuntu 18.04 with Python 3.6.9

The repro command seems to ignore dependency order when multiple targets are specified. For example, if I'm doing transfer learning from model A to model B, then training stage B depends on A:

01-prep-a            01-prep-b
    *    ****            *
    *        ****        *
    *            ****    *
02-train-a ********* 02-train-b

When reproducing just the models, I would like to run:

dvc repro -f --downstream 02-train-*

But sometimes, this causes 02-train-B to run before 02-train-A. Perhaps it depends on the order that results when the shell expands the wildcard.

I think inter-stage dependencies should resolved by DVC when multiple targets are given.

feature request help wanted p2-medium

Most helpful comment

Hi! Thanks for continuing to look into this, but I think the current discussion misses the point of this issue. The wildcard was just a concise way to describe the problem. The point is: 02-train-B should not run before 02-train-A if both are specified. The corresponding example in the new syntax is (I think; untested):

dvc repro -f --downstream 02-train-B 02-train-A

Which should execute 02-train-A and then 02-train-B.

All 10 comments

Hi @z0u !

So A and B somehow depend on each other, right? In that case, it seems it wont make a difference in -f case, as all of them and their dependencies will be reproduced anyway. Or do you mean that it would be nice to avoid reproducing common deps multiple times in that scenario?

Hi @efiop ,

Sorry for not being clear about the dependencies. Yep, B depends on A. So what I expect to happen is: A always executes before B.

|Command|Effect|
|---|---|
|dvc repro|Runs 01-prep-A, 01-prep-B, 02-train-A, 02-train-B|
|dvc repro -f 02-train-A|Runs 01-prep-A, 01-prep-B, 02-train-A|
|dvc repro -f --downstream 02-train-A|Runs 02-train-A, 02-train-B|
|dvc repro -f --downstream 02-train-B 02-train-A|Runs 02-train-A, 02-train-B|
|dvc repro -sf 02-train-*|Runs 02-train-A, 02-train-B|

Got it @z0u . So to implement it we will need to pass all targets in https://github.com/iterative/dvc/blob/0.91.3/dvc/command/repro.py#L41 and then properly collect them and put them into the common stages to run in https://github.com/iterative/dvc/blob/0.91.3/dvc/repo/reproduce.py#L53 . Unfortunately we will not be able to get to implementing this right away ourselves, but we would definitely help out if anyone else is interested in contributing a PR :slightly_smiling_face:

I don't think issue is reproducable in latest releases of dvc (at least in commit 0275a0d8).

If stages are stored in dvc.yaml then output of dvc repro -f --downstream 02-train-* is:

ERROR: "Stage '02-train-*' not found inside 'dvc.yaml' file"

If stages are stored in old-fashioned way (i.e. dvc files with no dvc.yaml file), then output of dvc repro -f --downstream 02-train-* is:

ERROR: 'dvc.yaml' does not exist.

P.S.: it seems that it is impossible to use wildcards with dvc repro in the latest releases. Not sure if it is a bug though.

@nik123

dvc itself doesn't evaluate the wildcards and never did - your shell does that and then passes it to dvc. What shell do you use? And do you have our completion scripts installed? If so, could you check that you have the latest ones?

@efiop , I use standard Ubuntu intepreter (which is bash AFAIK). bash --version gives me: 4.4.20(1)-release (x86_64-pc-linux-gnu)

I've reinstalled latest version (1.1.1) via pip (pip install dvc) and rerun dvc repro -f --downstream 02-train-*. Now I get the same error for dvc.yaml file but error disappeared with old-fashioned dvc files.

@nik123, why are you moving back-and-forth with different formats?

Also, when you are using 02-train-*, there's no .dvc files, so it's complaining as no file exists with that matching glob.
dvc repro cannot understand globs right now. Regarding when it was .dvc files, it was getting interpreted by your shell
and the expanded version was being sent to dvc. Now that you don't have any files matching the glob, it is sent as-is to dvc.

@nik123, why are you moving back-and-forth with different formats?

Once I failed to reproduce the issue with new dvc.yaml format, I've tried to reproduce it with dvc-files. That's the only reason.

Also, when you are using 02-train-*, there's no .dvc files, so it's complaining as no file exists with that matching glob.
dvc repro cannot understand globs right now. Regarding when it was .dvc files, it was getting interpreted by your shell
and the expanded version was being sent to dvc. Now that you don't have any files matching the glob, it is sent as-is to dvc.

@skshetry I've checked it with debugger and it seems you are absolutely right. Thanks for the help!

P.S.: I don't know what are the plans for old format support but it will probably be canceled at some point. That means that dvc repro -f --downstream 02-train-* probably won't be the valid command and it raises the question if this issue should be fixed at all.

P.S.: I don't know what are the plans for old format support but it will probably be canceled at some point. That means that dvc repro -f --downstream 02-train-* probably won't be the valid command and it raises the question if this issue should be fixed at all.

@nik123 It will be valid with our completion scripts, we just didn't get to working on that feature yet. https://github.com/iterative/dvc/issues/3743

Hi! Thanks for continuing to look into this, but I think the current discussion misses the point of this issue. The wildcard was just a concise way to describe the problem. The point is: 02-train-B should not run before 02-train-A if both are specified. The corresponding example in the new syntax is (I think; untested):

dvc repro -f --downstream 02-train-B 02-train-A

Which should execute 02-train-A and then 02-train-B.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mdscruggs picture mdscruggs  路  3Comments

siddygups picture siddygups  路  3Comments

ghost picture ghost  路  3Comments

anotherbugmaster picture anotherbugmaster  路  3Comments

GildedHonour picture GildedHonour  路  3Comments