Airflow: Execute SubDAG tasks as part of parent DAG

Created on 2 Apr 2020  路  15Comments  路  Source: apache/airflow

Description

Currently, the SubDagOperator launches a completely different DAG and then monitors it as a separate entity. This has lead to all sorts of edge case (e.g. when workers have different executors than the scheduler).

This Issue suggests that we instead merge all tasks from the subdag into the original DAG with an extra "original_dag" label. this means at the UI level we can still condense all subdag tasks into a single task, but a single scheduler will still perform all task launching.

Use case / motivation

We want SubDag tasks to act exactly the same as non-subdag tasks

Related Issues

Roadmap feature

Most helpful comment

@xinbinhuang SGTM! I'm gonna play around with a Draft PR just to see what I can jerry-rig, but it will be more based around "what CAN we do" than "what WILL we do"

All 15 comments

@xinbinhuang

+1

I think it's totally reasonable to say that SubDag separation should be more of a UI feature than an execution feature, so I think if we can identify which tasks are "subdag" tasks, we can condense them in the UI.

@kaxil does this mess with DAG serialization? Would this subdag task adding happen at DAG parsing time?

Yeah, totally agree. I don't think the added complexity with nested dags gives more values than errors.

I actually have a more simplistic thought. Instead of returning a DAG from the dag factory, we can just ask the dag factory to return a list of tasks. While the SubDagOperator (this become more like a SubTasksOperator in this case) can be just used to add an annotation to that group of tasks that will be later used to render the DAG graph in the UI.

Yeah, totally agree. I don't think the added complexity with nested dags gives more values than errors.

I actually have a more simplistic thought. Instead of returning a DAG from the dag factory, we can just ask the dag factory to return a list of tasks. While the SubDagOperator (this become more like a SubTasksOperator in this case) can be just used to add an annotation to that group of tasks that will be later used to render the DAG graph in the UI.

I like the idea.

@kaxil does this mess with DAG serialization? Would this subdag task adding happen at DAG parsing time?

No that should work fine

@kaxil @dimberman Do we need an AIP for this? This seems to change the whole behavior of the SubDagOperator or even renaming/removing it.

@kaxil @dimberman Do we need an AIP for this? This seems to change the whole behavior of the SubDagOperator or even renaming/removing it.

Yes, please. This definitely needs voting and some design discussion. Can you please create an AIP and start a discussion thread once you have that AIP doc ready.

@kaxil @dimberman Do we need an AIP for this? This seems to change the whole behavior of the SubDagOperator or even renaming/removing it.

Yes, please. This definitely needs voting and some design discussion. Can you please create an AIP and start a discussion thread once you have that AIP doc ready.

Sounds good. I will draft an AIP within this week and open a thread for discussion.

@xinbinhuang SGTM! I'm gonna play around with a Draft PR just to see what I can jerry-rig, but it will be more based around "what CAN we do" than "what WILL we do"

@kaxil I think I need permission to create an AIP at Confluence. Can you give me permission?

Done

@kaxil @turbaszek @dimberman I found this PR https://github.com/apache/airflow/pull/5498 when I was working on the AIP draft. It fixed the SubDagOperator by using the scheduler instead of backfill, so SubDags will use the same executor as the parent dag. I guess this solves the biggest problem of SubDagOperator.

It seems to me that the problem has been fixed? I wonder if it is still useful for the suggested change and proceed to finish the AIP draft?

It seems to me that the problem has been fixed? I wonder if it is still useful for the suggested change and proceed to finish the AIP draft?

I think I will proceed with finishing the AIP and then let the community decide. I think it makes more sense to treat subdag as in subgraph to graph rather than a vertex as right now.

It seems to me that the problem has been fixed? I wonder if it is still useful for the suggested change and proceed to finish the AIP draft?

I think I will proceed with finishing the AIP and then let the community decide. I think it makes more sense to treat subdag as in subgraph to graph rather than a vertex as right now.

Yes, please do 馃憤

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jdavidheiser picture jdavidheiser  路  3Comments

JavierLopezT picture JavierLopezT  路  4Comments

mik-laj picture mik-laj  路  4Comments

zacwellmer picture zacwellmer  路  4Comments

darwinyip picture darwinyip  路  3Comments