Apache Airflow version: 1.10.13
What happened:
After performing an upgrade to v1.10.13 we noticed that tasks in some of our DAGs were not be scheduled. After a bit of investigation we discovered that by commenting out 'depends_on_past': True the issue went away.
What you expected to happen:
We think the issue might have something to do with this which was introduced to 1.10.13
[AIRFLOW-3607] Only query DB once per DAG run for TriggerRuleDep (#4751)
How to reproduce it:
from airflow import models
from airflow.operators.dummy_operator import DummyOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'airflow',
'start_date': datetime(2018, 10, 31),
'depends_on_past': True,
'retries': 3,
'retry_delay': timedelta(minutes=5)
}
dag_name = 'my-test-dag'
with models.DAG(dag_name,
default_args=default_args,
schedule_interval='0 0 * * *',
catchup=False,
max_active_runs=5,
) as dag:
test = DummyOperator(
task_id='test'
)
On Master, this was fixed by https://github.com/apache/airflow/pull/7402 & further optimised by https://github.com/apache/airflow/pull/7503
in 1.10.13 -- this was clubbed by the following 2 commits:
I will investigate this further
@nathadfield has confirmed the issue does not exist on 2.0.0b3
Since v1.10.13 we also noticed, for some dags, the tasks are not being scheduled. They stay forever with a None state. Nothing in the scheduler logs (DEBUG level). Running the tasks manually work fine though. In our case, some/most of the dags have indeed depends_on_past set to true but not all of them it seems. So maybe this is something different. I will try to investigate deeper and share any relevant info.
Since v1.10.13 we also noticed, for some dags, the tasks are not being scheduled. They stay forever with a None state. Nothing in the scheduler logs (DEBUG level). Running the tasks manually work fine though. In our case, some/most of the dags have indeed
depends_on_pastset to true but not all of them it seems. So maybe this is something different. I will try to investigate deeper and share any relevant info.
Can you check if the other DAGs (not using depends_on_past but are still stuck) have task_concurrency set? @mthoretton
I did not check absolutely all dags we have but yes, the "broken" dags either have depends_on_past or task_concurrency set. I was not aware of the 2 issues you mentionened above, I will have a look but it definitely looks related.
I can confirm the bug. I was able to reproduce it with task with task_concurrency or depends_on_past with LocalExecutor and the following DAG:
from airflow import models
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'airflow',
'start_date': datetime(2018, 10, 31),
'retries': 3,
'retry_delay': timedelta(minutes=5)
}
dag_name = 'dag-bugcheck'
with models.DAG(dag_name,
default_args=default_args,
schedule_interval='0 0 * * *',
catchup=False,
max_active_runs=5,
) as dag:
test1 = DummyOperator(
task_id='test1',
task_concurrency=10,
)
test2 = BashOperator(
task_id='test2',
bash_command='echo hi',
depends_on_past=True,
)
test3 = BashOperator(
task_id='test3',
bash_command='echo hi',
)
https://github.com/apache/airflow/pull/12663 should fix it @nathadfield @mthoretton
Nice one @kaxil! Will this force the need for a 1.10.14 then?
Nice one @kaxil! Will this force the need for a 1.10.14 then?
Yup, indeed. I hope to get it out by early next week
Closed by #12663
Most helpful comment
https://github.com/apache/airflow/pull/12663 should fix it @nathadfield @mthoretton