When I run dvc status I can see all the stages that "need" to be reproduced, and I would like to reproduce (some of) them with a simple command. Ideally, it would look something like this
$ dvc repro -A -b -i # (-A for all, -i for interactive, -b for batch)
Stage A changed, should it be reproduced? (Y/N) N
OK, not reproducing A.
Stage B changed, should it be reproduced? (Y/N) Y
OK, B will be reproduced.
Stage C changed, should it be reproduced? (Y/N) Y
OK, C will be reproduced.
No more stages, would you like to change your answers? (Y/N) N
OK, Starting to reproduce:
Stage B....
Stage C....
The idea is that I do not want to babysit the computer while it runs through it's steps. I want to be able to reproduce the all the metric files, but I might not want to retrain the model, so that means that I need to have a way of stating upfront which stages need to be processed and which do not.
What priority do you think this is @efiop @shcheklein ? I would guess p3.
@jorgeorpinel Thanks for the heads up! 馃檪
@yfarjoun Thinking about this again, I need to point out, that we don't really know if something has changed in advance. For example you have a.dvc -> b.dvc, and b.dvc has a as a dependency. Then if when a.dvc was reproduced, a didn't change - then b.dvc won't reproduce. And if a did change, then b.dvc would be reproduced. This is not a big problem, as we could simply ask about b.dvc as well, but still, we should keep this in mind.
right, good point. I guess the assumption could be that once a step is reproduced, it's dependents will need to be reproduced, and so the system should ask. if affirmative, but then the results show no reason to reproduce, then that stage should be skipped....
@yfarjoun @efiop can it be done with dvc lock and/or other commands/options like --downstream, etc?
@shcheklein Not sure what you mean.
@efiop I meant that there are options that help you isolating repro scope. I understand that it's probably not the same as being able randomly specify which stage reproduce and which not, but was thinking may it's enough in most cases? @yfarjoun could you provide a real-life example or two, please? Just to better understand how can it be solved.
In theory, I think that it can be done with dvc lock and --downstream, but the problem is that it requires a level of understanding of the pipelines that can only be found after running in dry-mode or something like that. The point is that if I were to run a dvc repro --downstream --dry-run (assuming there is such a thing), look at the resulting list of steps, and dvc lock the ones I do not want reprocessed, then rerun dvc repro --downstream without the --dry-run and then dvc unlock the steps I locked earlier, I'd get more or less what I am looking for, albeit with a lot more manual steps.....