Description
Currently, the SubDagOperator launches a completely different DAG and then monitors it as a separate entity. This has lead to all sorts of edge case (e.g. when workers have different executors than the scheduler).
This Issue suggests that we instead merge all tasks from the subdag into the original DAG with an extra "original_dag" label. this means at the UI level we can still condense all subdag tasks into a single task, but a single scheduler will still perform all task launching.
Use case / motivation
We want SubDag tasks to act exactly the same as non-subdag tasks
Related Issues
@xinbinhuang
+1
I think it's totally reasonable to say that SubDag separation should be more of a UI feature than an execution feature, so I think if we can identify which tasks are "subdag" tasks, we can condense them in the UI.
@kaxil does this mess with DAG serialization? Would this subdag task adding happen at DAG parsing time?
Yeah, totally agree. I don't think the added complexity with nested dags gives more values than errors.
I actually have a more simplistic thought. Instead of returning a DAG from the dag factory, we can just ask the dag factory to return a list of tasks. While the SubDagOperator (this become more like a SubTasksOperator in this case) can be just used to add an annotation to that group of tasks that will be later used to render the DAG graph in the UI.
Yeah, totally agree. I don't think the added complexity with nested dags gives more values than errors.
I actually have a more simplistic thought. Instead of returning a DAG from the
dag factory, we can just ask thedag factoryto return a list of tasks. While theSubDagOperator(this become more like aSubTasksOperatorin this case) can be just used to add an annotation to that group of tasks that will be later used to render the DAG graph in the UI.
I like the idea.
@kaxil does this mess with DAG serialization? Would this subdag task adding happen at DAG parsing time?
No that should work fine
@kaxil @dimberman Do we need an AIP for this? This seems to change the whole behavior of the SubDagOperator or even renaming/removing it.
@kaxil @dimberman Do we need an AIP for this? This seems to change the whole behavior of the
SubDagOperatoror even renaming/removing it.
Yes, please. This definitely needs voting and some design discussion. Can you please create an AIP and start a discussion thread once you have that AIP doc ready.
@kaxil @dimberman Do we need an AIP for this? This seems to change the whole behavior of the
SubDagOperatoror even renaming/removing it.Yes, please. This definitely needs voting and some design discussion. Can you please create an AIP and start a discussion thread once you have that AIP doc ready.
Sounds good. I will draft an AIP within this week and open a thread for discussion.
@xinbinhuang SGTM! I'm gonna play around with a Draft PR just to see what I can jerry-rig, but it will be more based around "what CAN we do" than "what WILL we do"
@kaxil I think I need permission to create an AIP at Confluence. Can you give me permission?
Done
@kaxil @turbaszek @dimberman I found this PR https://github.com/apache/airflow/pull/5498 when I was working on the AIP draft. It fixed the SubDagOperator by using the scheduler instead of backfill, so SubDags will use the same executor as the parent dag. I guess this solves the biggest problem of SubDagOperator.
It seems to me that the problem has been fixed? I wonder if it is still useful for the suggested change and proceed to finish the AIP draft?
It seems to me that the problem has been fixed? I wonder if it is still useful for the suggested change and proceed to finish the AIP draft?
I think I will proceed with finishing the AIP and then let the community decide. I think it makes more sense to treat subdag as in subgraph to graph rather than a vertex as right now.
It seems to me that the problem has been fixed? I wonder if it is still useful for the suggested change and proceed to finish the AIP draft?
I think I will proceed with finishing the AIP and then let the community decide. I think it makes more sense to treat
subdagas insubgraph to graphrather than a vertex as right now.
Yes, please do 馃憤
Closing in favour of https://github.com/apache/airflow/pull/10153
Most helpful comment
@xinbinhuang SGTM! I'm gonna play around with a Draft PR just to see what I can jerry-rig, but it will be more based around "what CAN we do" than "what WILL we do"