Charts: stable/airflow Airflow not initializing due gunicorn error

Created on 11 Sep 2019  路  10Comments  路  Source: helm/charts

Describe the bug
Airflow web does not initialize due to gunicorn error in minikube.

Version of Helm and Kubernetes:
Helm: v2.14.3
Kubernetes: v1.15.2
Minikube: v1.3.1

Which chart: stable/airflow

What happened: All Airflow pods are ready except for the airflow-web pod

What you expected to happen: Airflow web server should start

How to reproduce it (as minimally and precisely as possible):
run
helm install --namespace "airflow" --name "airflow" stable/airflow

kubectl get pods -n airflow 
NAME                                 READY   STATUS             RESTARTS   AGE
airflow-flower-595f6659f-tm2r5       1/1     Running            0          121m
airflow-postgresql-bdcb64f8d-b62js   1/1     Running            0          121m
airflow-redis-master-0               1/1     Running            0          121m
airflow-scheduler-5744c766b7-mltm9   1/1     Running            0          121m
airflow-web-c9cdcb8f5-ljx4x          0/1     CrashLoopBackOff   16         121m
airflow-worker-0                     1/1     Running            0          121m

Anything else we need to know:
Airflow web logs are the following:

kubectl logs -f airflow-web-c9cdcb8f5-ljx4x -n airflow      
Wed Sep 11 15:13:04 UTC 2019 - waiting for Postgres... 1/20
Wed Sep 11 15:13:12 UTC 2019 - waiting for Redis... 1/20
waiting 60s...
executing webserver...
[2019-09-11 15:14:17,681] {{settings.py:213}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=1
/usr/local/lib/python3.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)
[2019-09-11 15:14:17,899] {{__init__.py:51}} INFO - Using executor CeleryExecutor
  ____________       _____________
 ____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
 _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
[2019-09-11 15:14:18,556] {{dagbag.py:90}} INFO - Filling up the DagBag from /usr/local/airflow/dags
Running the Gunicorn Server with:
Workers: 4 sync
Host: 0.0.0.0:8080
Timeout: 120
Logfiles: - -
=================================================================            
[2019-09-11 15:14:19,695] {{settings.py:213}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=22
/usr/local/lib/python3.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)
[2019-09-11 15:14:19 +0000] [22] [INFO] Starting gunicorn 19.9.0
[2019-09-11 15:14:19 +0000] [22] [INFO] Listening at: http://0.0.0.0:8080 (22)
[2019-09-11 15:14:19 +0000] [22] [INFO] Using worker: sync
[2019-09-11 15:14:19 +0000] [26] [INFO] Booting worker with pid: 26
[2019-09-11 15:14:19 +0000] [27] [INFO] Booting worker with pid: 27
[2019-09-11 15:14:19 +0000] [28] [INFO] Booting worker with pid: 28
[2019-09-11 15:14:20 +0000] [29] [INFO] Booting worker with pid: 29
[2019-09-11 15:14:20,541] {{__init__.py:51}} INFO - Using executor CeleryExecutor
[2019-09-11 15:14:20,549] {{__init__.py:51}} INFO - Using executor CeleryExecutor
[2019-09-11 15:14:20,634] {{__init__.py:51}} INFO - Using executor CeleryExecutor
[2019-09-11 15:14:20,847] {{__init__.py:51}} INFO - Using executor CeleryExecutor
[2019-09-11 15:14:28,737] {{cli.py:825}} ERROR - [0 / 0] some workers seem to have died and gunicorndid not restart them as expected
[2019-09-11 15:14:31,987[2019-09-11 15:14:31,987] {{dagbag.py:90}} INFO - Filling up the DagBag from /usr/local/airflow/dags
] {{dagbag.py:90}} INFO - Filling up the DagBag from /usr/local/airflow/dags
[2019-09-11 15:14:31,990] {{dagbag.py:90}} INFO - Filling up the DagBag from /usr/local/airflow/dags
[2019-09-11 15:14:32,014] {{dagbag.py:90}} INFO - Filling up the DagBag from /usr/local/airflow/dags
[2019-09-11 15:14:36 +0000] [28] [INFO] Parent changed, shutting down: <Worker 28>
[2019-09-11 15:14:36 +0000] [28] [INFO] Worker exiting (pid: 28)
[2019-09-11 15:14:36 +0000] [26] [INFO] Parent changed, shutting down: <Worker 26>
[2019-09-11 15:14:36 +0000] [26] [INFO] Worker exiting (pid: 26)
[2019-09-11 15:14:36 +0000] [29] [INFO] Parent changed, shutting down: <Worker 29>
[2019-09-11 15:14:36 +0000] [29] [INFO] Worker exiting (pid: 29)
[2019-09-11 15:14:37 +0000] [27] [INFO] Parent changed, shutting down: <Worker 27>
[2019-09-11 15:14:37 +0000] [27] [INFO] Worker exiting (pid: 27)
[2019-09-11 15:16:39,097] {{cli.py:832}} ERROR - No response from gunicorn master within 120 seconds
[2019-09-11 15:16:39,101] {{cli.py:833}} ERROR - Shutting down webserver

Most helpful comment

@estefaniarabadan - Can you please share how airflow-web deployment should be updated, only way I see is to add env var 'POSTGRES_HOST' (it's value should be IP of postgres pod) but for some reason that also does not work for me

All 10 comments

UPDATE: on debug node, I can see the following error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/models/dagbag.py", line 202, in process_file
    m = imp.load_source(mod_name, filepath)
  File "/usr/local/lib/python3.7/imp.py", line 171, in load_source
    module = _load(spec)
  File "<frozen importlib._bootstrap>", line 696, in _load
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/python3.7/site-packages/airflow/example_dags/example_subdag_operator.py", line 47, in <module>
    dag=dag,
  File "/usr/local/lib/python3.7/site-packages/airflow/utils/db.py", line 74, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/airflow/utils/decorators.py", line 98, in wrapper
    result = func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/subdag_operator.py", line 77, in __init__
    .filter(Pool.pool == self.pool)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3232, in first
    ret = list(self[0:1])
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3018, in __getitem__
    return list(res)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3334, in __iter__
    return self._execute_and_instances(context)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3356, in _execute_and_instances
    querycontext, self._connection_from_session, close_with_result=True
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3371, in _get_bind_args
    mapper=self._bind_mapper(), clause=querycontext.statement, **kw
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3349, in _connection_from_session
    conn = self.session.connection(**kw)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1124, in connection
    execution_options=execution_options,
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1130, in _connection_for_bind
    engine, execution_options
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 431, in _connection_for_bind
    conn = bind._contextual_connect()
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2239, in _contextual_connect
    self._wrap_pool_connect(self.pool.connect, None),
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2279, in _wrap_pool_connect
    e, dialect, self
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1544, in _handle_dbapi_exception_noconnection
    util.raise_from_cause(sqlalchemy_exception, exc_info)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 398, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 152, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2275, in _wrap_pool_connect
    return fn()
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 363, in connect
    return _ConnectionFairy._checkout(self)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 760, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 492, in checkout
    rec = pool._do_get()
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/pool/impl.py", line 139, in _do_get
    self._dec_overflow()
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
    raise value
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/pool/impl.py", line 136, in _do_get
    return self._create_connection()
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 308, in _create_connection
    return _ConnectionRecord(self)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 437, in __init__
    self.__connect(first_connect_check=True)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 639, in __connect
    connection = pool._invoke_creator(self)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
    return dialect.connect(*cargs, **cparams)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 453, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/usr/local/lib/python3.7/site-packages/psycopg2/__init__.py", line 130, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not translate host name "airflow-postgresql" to address: Temporary failure in name resolution
kubectl get services -n airflow                                                                      
NAME                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
airflow-flower           ClusterIP   10.100.175.94   <none>        5555/TCP         13m
airflow-postgresql       ClusterIP   10.108.31.138   <none>        5432/TCP         13m
airflow-redis-headless   ClusterIP   None            <none>        6379/TCP         13m
airflow-redis-master     ClusterIP   10.106.131.90   <none>        6379/TCP         13m
airflow-web              NodePort    10.99.124.244   <none>        8080:32584/TCP   13m
airflow-worker           ClusterIP   None            <none>        8793/TCP         13m

Issue was with minikube, not helm

@estefaniarabadan What was the issue and how did you fix it? I think I am running into the same problem.

The issue was that minikube wasn't being able to resolve the hostname airflow-postgresql. It was fixed it by:

  1. Getting the ip the airflow-postgresql was running on (describing the pod)
  2. Editting the airflow-web deployment
  3. Restarting airflow-web

Hope this helps

@estefaniarabadan - Can you please provide more details on the steps we need to take to resolve this issue? Thanks for your help on this.

@estefaniarabadan could you please describe how the airflow-web deployment needed to be updated? 馃檹

@estefaniarabadan - Can you please share how airflow-web deployment should be updated, only way I see is to add env var 'POSTGRES_HOST' (it's value should be IP of postgres pod) but for some reason that also does not work for me

Do we have solution to this yet, i am also facing same issue. It is necessary to first verify on local ( minikube) before deploying on EKS/GKE etc

@nuarc @milan-usermind I can't remember exactly but I recall resolving this issue by double checking airflow image (for python version), k8 version, and memory allocated to minkube, e.g. minikube start --memory=12288 --cpus=2 --kubernetes-version=v1.11.10. Hope this helps.

I tried by launching minikube with above config, however issue still pesists where webserver is waiting for postgres

(venv) (base) $ kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
postgres-5f5879b484-n2nb2    1/1     Running   0          14s
webserver-5cd9f9447b-b88mq   1/1     Running   0          14s
(venv) (base) $ kubectl logs -f webserver-5cd9f9447b-b88mq 
Wed 01 Apr 2020 05:22:22 PM UTC - waiting for Postgres... 1/20
Wed 01 Apr 2020 05:22:27 PM UTC - waiting for Postgres... 2/20
Wed 01 Apr 2020 05:22:32 PM UTC - waiting for Postgres... 3/20
Wed 01 Apr 2020 05:22:37 PM UTC - waiting for Postgres... 4/20
Wed 01 Apr 2020 05:22:42 PM UTC - waiting for Postgres... 5/20
Wed 01 Apr 2020 05:22:47 PM UTC - waiting for Postgres... 6/20
Wed 01 Apr 2020 05:22:52 PM UTC - waiting for Postgres... 7/20
Wed 01 Apr 2020 05:22:57 PM UTC - waiting for Postgres... 8/20
Wed 01 Apr 2020 05:23:02 PM UTC - waiting for Postgres... 9/20
Wed 01 Apr 2020 05:23:07 PM UTC - waiting for Postgres... 10/20
Wed 01 Apr 2020 05:23:12 PM UTC - waiting for Postgres... 11/20

Was this page helpful?
0 / 5 - 0 ratings