Airflow: max_active_runs = 1 can still create multiple active execution runs

Created on 24 Jul 2020  路  6Comments  路  Source: apache/airflow

Apache Airflow version: 1.10.11, localExecutor

What happened:

I have max_active_runs = 1 in my dag file (which consists of multiple tasks) and I manually triggered a dag. While it was running, a second execution began under its scheduled time while the first execution was running.

I should note that the second execution is initially queued. It's only when the dag's 1st execution moves to the next task that the second execution actually starts.

My dag definition. The dag just contains tasks using pythonOperator.

dag = DAG(
    'dag1',
    default_args=default_args,
    description='xyz',
    schedule_interval=timedelta(hours=1),
    catchup=False,
    max_active_runs=1
)

What you expected to happen:

Only one execution should run. A second execution should be queued but not begin executing.

How to reproduce it:
In my scenario:

  1. Manually trigger dag with multiple tasks.. have task1 take longer than the beginning of the next scheduled execution. (Dag Execution1). As an example, if the scheduled interval is 1 hour, have task1 take longer than 1 hour so as to queue up the second execution (Execution2).
  2. When task1 of Execution1 finishes and just before starting task2, the second execution (Execution2, which is already queued) begins running.

image

Anything else we need to know:
I _think_ the second execution begins in between the task1 and task2 of execution1. I think there's a few second delay there and maybe that's when Airflow thinks there's no dag execution? That's just a guess.

Btw, this can have potentially disastrous effects (errors, incomplete data without errors, etc)

bug

Most helpful comment

The problem is, we don't have a state that describes DAG Run that are saved but not running. All DAG Run have running state initially. If we want to fix this bug we have to add a new dag state.

All 6 comments

Thanks for opening your first issue here! Be sure to follow the issue template!

The problem is, we don't have a state that describes DAG Run that are saved but not running. All DAG Run have running state initially. If we want to fix this bug we have to add a new dag state.

I am running into the exact same issue.

The same issue here

Would someone be able to test if this specific case still happens on Airflow 2.0.0alpha1? (A few things about how we created DagRuns changed so this _might_ have been fixed, but I didn't specifically set out to fix this.

Read the reproduction steps, and this bit sounds bang on:

I think the second execution begins in between the task1 and task2 of execution1. I think there's a few second delay there and maybe that's when Airflow thinks there's no dag execution? That's just a guess.

Yes, looking at the code that sounds right, and also hasn't changed in 2.0.0alpha1, the same logic is used.

Was this page helpful?
0 / 5 - 0 ratings