Describe the bug
Airflow web does not initialize due to gunicorn error in minikube.
Version of Helm and Kubernetes:
Helm: v2.14.3
Kubernetes: v1.15.2
Minikube: v1.3.1
Which chart: stable/airflow
What happened: All Airflow pods are ready except for the airflow-web pod
What you expected to happen: Airflow web server should start
How to reproduce it (as minimally and precisely as possible):
run
helm install --namespace "airflow" --name "airflow" stable/airflow
kubectl get pods -n airflow
NAME READY STATUS RESTARTS AGE
airflow-flower-595f6659f-tm2r5 1/1 Running 0 121m
airflow-postgresql-bdcb64f8d-b62js 1/1 Running 0 121m
airflow-redis-master-0 1/1 Running 0 121m
airflow-scheduler-5744c766b7-mltm9 1/1 Running 0 121m
airflow-web-c9cdcb8f5-ljx4x 0/1 CrashLoopBackOff 16 121m
airflow-worker-0 1/1 Running 0 121m
Anything else we need to know:
Airflow web logs are the following:
kubectl logs -f airflow-web-c9cdcb8f5-ljx4x -n airflow
Wed Sep 11 15:13:04 UTC 2019 - waiting for Postgres... 1/20
Wed Sep 11 15:13:12 UTC 2019 - waiting for Redis... 1/20
waiting 60s...
executing webserver...
[2019-09-11 15:14:17,681] {{settings.py:213}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=1
/usr/local/lib/python3.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
""")
[2019-09-11 15:14:17,899] {{__init__.py:51}} INFO - Using executor CeleryExecutor
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
[2019-09-11 15:14:18,556] {{dagbag.py:90}} INFO - Filling up the DagBag from /usr/local/airflow/dags
Running the Gunicorn Server with:
Workers: 4 sync
Host: 0.0.0.0:8080
Timeout: 120
Logfiles: - -
=================================================================
[2019-09-11 15:14:19,695] {{settings.py:213}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=22
/usr/local/lib/python3.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
""")
[2019-09-11 15:14:19 +0000] [22] [INFO] Starting gunicorn 19.9.0
[2019-09-11 15:14:19 +0000] [22] [INFO] Listening at: http://0.0.0.0:8080 (22)
[2019-09-11 15:14:19 +0000] [22] [INFO] Using worker: sync
[2019-09-11 15:14:19 +0000] [26] [INFO] Booting worker with pid: 26
[2019-09-11 15:14:19 +0000] [27] [INFO] Booting worker with pid: 27
[2019-09-11 15:14:19 +0000] [28] [INFO] Booting worker with pid: 28
[2019-09-11 15:14:20 +0000] [29] [INFO] Booting worker with pid: 29
[2019-09-11 15:14:20,541] {{__init__.py:51}} INFO - Using executor CeleryExecutor
[2019-09-11 15:14:20,549] {{__init__.py:51}} INFO - Using executor CeleryExecutor
[2019-09-11 15:14:20,634] {{__init__.py:51}} INFO - Using executor CeleryExecutor
[2019-09-11 15:14:20,847] {{__init__.py:51}} INFO - Using executor CeleryExecutor
[2019-09-11 15:14:28,737] {{cli.py:825}} ERROR - [0 / 0] some workers seem to have died and gunicorndid not restart them as expected
[2019-09-11 15:14:31,987[2019-09-11 15:14:31,987] {{dagbag.py:90}} INFO - Filling up the DagBag from /usr/local/airflow/dags
] {{dagbag.py:90}} INFO - Filling up the DagBag from /usr/local/airflow/dags
[2019-09-11 15:14:31,990] {{dagbag.py:90}} INFO - Filling up the DagBag from /usr/local/airflow/dags
[2019-09-11 15:14:32,014] {{dagbag.py:90}} INFO - Filling up the DagBag from /usr/local/airflow/dags
[2019-09-11 15:14:36 +0000] [28] [INFO] Parent changed, shutting down: <Worker 28>
[2019-09-11 15:14:36 +0000] [28] [INFO] Worker exiting (pid: 28)
[2019-09-11 15:14:36 +0000] [26] [INFO] Parent changed, shutting down: <Worker 26>
[2019-09-11 15:14:36 +0000] [26] [INFO] Worker exiting (pid: 26)
[2019-09-11 15:14:36 +0000] [29] [INFO] Parent changed, shutting down: <Worker 29>
[2019-09-11 15:14:36 +0000] [29] [INFO] Worker exiting (pid: 29)
[2019-09-11 15:14:37 +0000] [27] [INFO] Parent changed, shutting down: <Worker 27>
[2019-09-11 15:14:37 +0000] [27] [INFO] Worker exiting (pid: 27)
[2019-09-11 15:16:39,097] {{cli.py:832}} ERROR - No response from gunicorn master within 120 seconds
[2019-09-11 15:16:39,101] {{cli.py:833}} ERROR - Shutting down webserver
UPDATE: on debug node, I can see the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/airflow/models/dagbag.py", line 202, in process_file
m = imp.load_source(mod_name, filepath)
File "/usr/local/lib/python3.7/imp.py", line 171, in load_source
module = _load(spec)
File "<frozen importlib._bootstrap>", line 696, in _load
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/usr/local/lib/python3.7/site-packages/airflow/example_dags/example_subdag_operator.py", line 47, in <module>
dag=dag,
File "/usr/local/lib/python3.7/site-packages/airflow/utils/db.py", line 74, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/airflow/utils/decorators.py", line 98, in wrapper
result = func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/airflow/operators/subdag_operator.py", line 77, in __init__
.filter(Pool.pool == self.pool)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3232, in first
ret = list(self[0:1])
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3018, in __getitem__
return list(res)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3334, in __iter__
return self._execute_and_instances(context)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3356, in _execute_and_instances
querycontext, self._connection_from_session, close_with_result=True
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3371, in _get_bind_args
mapper=self._bind_mapper(), clause=querycontext.statement, **kw
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3349, in _connection_from_session
conn = self.session.connection(**kw)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1124, in connection
execution_options=execution_options,
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1130, in _connection_for_bind
engine, execution_options
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 431, in _connection_for_bind
conn = bind._contextual_connect()
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2239, in _contextual_connect
self._wrap_pool_connect(self.pool.connect, None),
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2279, in _wrap_pool_connect
e, dialect, self
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1544, in _handle_dbapi_exception_noconnection
util.raise_from_cause(sqlalchemy_exception, exc_info)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 398, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 152, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2275, in _wrap_pool_connect
return fn()
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 363, in connect
return _ConnectionFairy._checkout(self)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 760, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 492, in checkout
rec = pool._do_get()
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/pool/impl.py", line 139, in _do_get
self._dec_overflow()
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
compat.reraise(exc_type, exc_value, exc_tb)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
raise value
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/pool/impl.py", line 136, in _do_get
return self._create_connection()
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 308, in _create_connection
return _ConnectionRecord(self)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 437, in __init__
self.__connect(first_connect_check=True)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 639, in __connect
connection = pool._invoke_creator(self)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
return dialect.connect(*cargs, **cparams)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 453, in connect
return self.dbapi.connect(*cargs, **cparams)
File "/usr/local/lib/python3.7/site-packages/psycopg2/__init__.py", line 130, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not translate host name "airflow-postgresql" to address: Temporary failure in name resolution
kubectl get services -n airflow
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
airflow-flower ClusterIP 10.100.175.94 <none> 5555/TCP 13m
airflow-postgresql ClusterIP 10.108.31.138 <none> 5432/TCP 13m
airflow-redis-headless ClusterIP None <none> 6379/TCP 13m
airflow-redis-master ClusterIP 10.106.131.90 <none> 6379/TCP 13m
airflow-web NodePort 10.99.124.244 <none> 8080:32584/TCP 13m
airflow-worker ClusterIP None <none> 8793/TCP 13m
Issue was with minikube, not helm
@estefaniarabadan What was the issue and how did you fix it? I think I am running into the same problem.
The issue was that minikube wasn't being able to resolve the hostname airflow-postgresql. It was fixed it by:
Hope this helps
@estefaniarabadan - Can you please provide more details on the steps we need to take to resolve this issue? Thanks for your help on this.
@estefaniarabadan could you please describe how the airflow-web deployment needed to be updated? 馃檹
@estefaniarabadan - Can you please share how airflow-web deployment should be updated, only way I see is to add env var 'POSTGRES_HOST' (it's value should be IP of postgres pod) but for some reason that also does not work for me
Do we have solution to this yet, i am also facing same issue. It is necessary to first verify on local ( minikube) before deploying on EKS/GKE etc
@nuarc @milan-usermind I can't remember exactly but I recall resolving this issue by double checking airflow image (for python version), k8 version, and memory allocated to minkube, e.g. minikube start --memory=12288 --cpus=2 --kubernetes-version=v1.11.10. Hope this helps.
I tried by launching minikube with above config, however issue still pesists where webserver is waiting for postgres
(venv) (base) $ kubectl get pods
NAME READY STATUS RESTARTS AGE
postgres-5f5879b484-n2nb2 1/1 Running 0 14s
webserver-5cd9f9447b-b88mq 1/1 Running 0 14s
(venv) (base) $ kubectl logs -f webserver-5cd9f9447b-b88mq
Wed 01 Apr 2020 05:22:22 PM UTC - waiting for Postgres... 1/20
Wed 01 Apr 2020 05:22:27 PM UTC - waiting for Postgres... 2/20
Wed 01 Apr 2020 05:22:32 PM UTC - waiting for Postgres... 3/20
Wed 01 Apr 2020 05:22:37 PM UTC - waiting for Postgres... 4/20
Wed 01 Apr 2020 05:22:42 PM UTC - waiting for Postgres... 5/20
Wed 01 Apr 2020 05:22:47 PM UTC - waiting for Postgres... 6/20
Wed 01 Apr 2020 05:22:52 PM UTC - waiting for Postgres... 7/20
Wed 01 Apr 2020 05:22:57 PM UTC - waiting for Postgres... 8/20
Wed 01 Apr 2020 05:23:02 PM UTC - waiting for Postgres... 9/20
Wed 01 Apr 2020 05:23:07 PM UTC - waiting for Postgres... 10/20
Wed 01 Apr 2020 05:23:12 PM UTC - waiting for Postgres... 11/20
Most helpful comment
@estefaniarabadan - Can you please share how airflow-web deployment should be updated, only way I see is to add env var 'POSTGRES_HOST' (it's value should be IP of postgres pod) but for some reason that also does not work for me