Like #3549 say, "A run of a pipeline is like a function call". Based on that analogy, we can't right now call a function from another function: f(g(x)) doesn't make any sense for the current dvc model. As discussed in #331, it would allow to use very flexible workflows by allowing to dynamically created pipelines.
Some potential applications:
A possible interface could be:
dvc run could specify a yaml file to use, instead of dvc.ymldvc run -n "create_pipe" -o generated_pipe.yml generate_pipe.sh would create the generated_pipe.yml file, and dvc repro -c generated_pipe.yml would execute all the steps dynamically defined.Without the generation overhead, we could introduce depends_on key in dvc.yaml, but it might not be straightforward,
as we create a pipeline based on relationship of outputs-dependencies which might easily create cyclic graph.
stages:
stage1:
...
stage2:
depends_on: stage1
...
The first implementation is coming #4734. The next step is #331.
Let's close this issue and move all the discussions to "umbrella" issues #3633 & #331.
Thanks for #4734, it address one of the major limitation of DVC! But this issue is about something different, more powerful, so I don't think it should be closed.
@MatthieuBizien Have you considered the alternative approach that @skshetry suggested? For now we are not sure if we will ever allow dynamic generation like you've suggested, as there are alternatives like parametrization or generating the whole pipeline programmatically.
Most helpful comment
Without the generation overhead, we could introduce
depends_onkey indvc.yaml, but it might not be straightforward,as we create a pipeline based on relationship of outputs-dependencies which might easily create cyclic graph.