Dvc: Allow pipeline to create pipeline

Created on 15 Jul 2020  路  4Comments  路  Source: iterative/dvc

Like #3549 say, "A run of a pipeline is like a function call". Based on that analogy, we can't right now call a function from another function: f(g(x)) doesn't make any sense for the current dvc model. As discussed in #331, it would allow to use very flexible workflows by allowing to dynamically created pipelines.

Some potential applications:

  • Incremental processing of dynamically generated files #331
  • Reconfigurable pipelines #1462
  • Build matrix #1018
  • Hyperparameters tuning #2799

A possible interface could be:

  • dvc run could specify a yaml file to use, instead of dvc.yml
  • We allow that file to the output of another pipeline step
  • Then dvc run -n "create_pipe" -o generated_pipe.yml generate_pipe.sh would create the generated_pipe.yml file, and dvc repro -c generated_pipe.yml would execute all the steps dynamically defined.
feature request p3-nice-to-have research

Most helpful comment

Without the generation overhead, we could introduce depends_on key in dvc.yaml, but it might not be straightforward,
as we create a pipeline based on relationship of outputs-dependencies which might easily create cyclic graph.

stages:
   stage1:
     ...
   stage2:
    depends_on: stage1
    ...

All 4 comments

Without the generation overhead, we could introduce depends_on key in dvc.yaml, but it might not be straightforward,
as we create a pipeline based on relationship of outputs-dependencies which might easily create cyclic graph.

stages:
   stage1:
     ...
   stage2:
    depends_on: stage1
    ...

The first implementation is coming #4734. The next step is #331.

Let's close this issue and move all the discussions to "umbrella" issues #3633 & #331.

Thanks for #4734, it address one of the major limitation of DVC! But this issue is about something different, more powerful, so I don't think it should be closed.

@MatthieuBizien Have you considered the alternative approach that @skshetry suggested? For now we are not sure if we will ever allow dynamic generation like you've suggested, as there are alternatives like parametrization or generating the whole pipeline programmatically.

Was this page helpful?
0 / 5 - 0 ratings