Dvc: Allow pipeline to create pipeline

Created on 15 Jul 2020 · 4Comments · Source: iterative/dvc

Like #3549 say, "A run of a pipeline is like a function call". Based on that analogy, we can't right now call a function from another function: f(g(x)) doesn't make any sense for the current dvc model. As discussed in #331, it would allow to use very flexible workflows by allowing to dynamically created pipelines.

Some potential applications:

Incremental processing of dynamically generated files #331
Reconfigurable pipelines #1462
Build matrix #1018
Hyperparameters tuning #2799

A possible interface could be:

dvc run could specify a yaml file to use, instead of dvc.yml
We allow that file to the output of another pipeline step
Then dvc run -n "create_pipe" -o generated_pipe.yml generate_pipe.sh would create the generated_pipe.yml file, and dvc repro -c generated_pipe.yml would execute all the steps dynamically defined.

feature request p3-nice-to-have research

Source

MatthieuBizien

👀2 ❤1

Most helpful comment

Without the generation overhead, we could introduce depends_on key in dvc.yaml, but it might not be straightforward,
as we create a pipeline based on relationship of outputs-dependencies which might easily create cyclic graph.

stages:
   stage1:
     ...
   stage2:
    depends_on: stage1
    ...

skshetry on 16 Jul 2020

👍2

All 4 comments

stages:
   stage1:
     ...
   stage2:
    depends_on: stage1
    ...

skshetry on 16 Jul 2020

👍2

The first implementation is coming #4734. The next step is #331.

Let's close this issue and move all the discussions to "umbrella" issues #3633 & #331.

dmpetrov on 19 Oct 2020

👍1

Thanks for #4734, it address one of the major limitation of DVC! But this issue is about something different, more powerful, so I don't think it should be closed.

MatthieuBizien on 19 Oct 2020

@MatthieuBizien Have you considered the alternative approach that @skshetry suggested? For now we are not sure if we will ever allow dynamic generation like you've suggested, as there are alternatives like parametrization or generating the whole pipeline programmatically.