Airflow: Code reuse with Dags

Created on 29 Sep 2015  路  8Comments  路  Source: apache/airflow

I'd like clone a dag multiple times to replicate the logic for multiple similar markets/sources.

I drive most of my dag configuration by pulling json Variables to parametrize values, so having configuration for each market isnt a problem. My problem is actually cloning the dag+tasks without actually having two seperate scripts containing the same code (albeit with different dag names).

Is this possible or does it go against the design pattern of aiflow dags?

Most helpful comment

For folks generating multiple dags in a script, maybe a DagContainer object (which is just a list subclass) would be nice... then in addition to any DAG objects found in a script, anything in any DagContainers is also imported. Not sure how many people are actually doing this but I know it's come up a few times.

All 8 comments

from copy import deepcopy
new_dag = deepcopy(dag)

It's actually probably cleaner to write a function or a class that generates dag objects base on input parameters

Thanks for the pointer, out of interest should this work too?

markets = ['a', 'b', 'c']

for dag_market in markets:
    dag = DAG(dag_id='my_dag_'+dag_market, default_args=global_args)

   #tasks here...

If I use this logic and push each dag into a list, I can see all 3 dags with three unique dag ids (when printing the list following the loop). However they dont all get put into the DagBag. I'm guessing this is some python shallowcopy business that I've not yet come across primarily being from a Java background.

I think I'll go down your route of a parent dag generator function.

@martingrayson I'm not sure but I think it's because each dag object gets overwritten by the next one in the loop, so when the script is finished only the last dag is available for import. I think @mistercrunch has an idiom for this which goes something like:

for dag_market in markets:
    globals()[dag] = DAG(...)

but I haven't done this myself so I'm not 100% sure.

For folks generating multiple dags in a script, maybe a DagContainer object (which is just a list subclass) would be nice... then in addition to any DAG objects found in a script, anything in any DagContainers is also imported. Not sure how many people are actually doing this but I know it's come up a few times.

Thanks @jlowin I'll give this a bash tomorrow.

Yes, this should work, the DagBag crawls through the folder importing python modules and inspecting their global namespace for dag objects.

@jlowin, I agree that this is magical and that we should have an alternative to use an explicit method to add a dag to the dagbag. I'll add a note.

Thanks guys, this works perfectly for me at the moment. I'll keep an eye on the roadmap and switch to a DagContainer style object if you decide that's the way to go.

I guess this could also help to group dags into logical areas if you're able to name a dag container. That could make the UI pretty by giving the ability to hide a whole group of dags.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

d-lee picture d-lee  路  4Comments

ryanahamilton picture ryanahamilton  路  3Comments

ephraimbuddy picture ephraimbuddy  路  3Comments

Milchdealer picture Milchdealer  路  4Comments

mik-laj picture mik-laj  路  3Comments