Airflow: WebUI does not have access to pod logs

Created on 20 Oct 2020  路  9Comments  路  Source: apache/airflow

Apache Airflow version: v2.0.0a1 (latest master)

Environment:

  • OS: Ubuntu 18.04.4 LTS
  • Kubernetes: v1.19.3
  • Docker: v19.03.12
  • Helm: v3.3.4

What happened:

Trying to get a task log via the task instance list (http:localhost:8080/taskinstance/list/) yields an error saying that the ServiceAccount airflow-webserver does not have the permission to list pods/log.

*** Trying to get logs (last 100 lines) from worker pod  ***

*** Unable to fetch logs from worker pod  ***
(403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Tue, 20 Oct 2020 16:36:31 GMT', 'Content-Length': '296'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \\"system:serviceaccount:airflow:airflow-webserver\\" cannot list resource \\"pods/log\\" in API group \\"\\" in the namespace \\"airflow\\"","reason":"Forbidden","details":{"kind":"pods"},"code":403}\n'

How to reproduce it:

I created a Kubernetes Cluster using kubeadm and added Flannel as Pod Network. Afterward I built the Airflow production image via breeze, then deployed it to Cluster via helm (Mounting DAGS from an externally populated PVC)

$~ ./breeze build-image --production-image
$~ helm install airflow . \
    --namespace airflow \
    --set dags.persistence.enabled=true \
    --set dags.persistence.existingClaim=my-hostPath-claim \
    --set dags.gitSync.enabled=false \
    --set uid=1000 \
    --set gid=1000 \
    --set executor=KubernetesExecutor \
    --set images.airflow.tag=master-python3.6
$~ kubectl get pods -n airflow
NAME                                 READY   STATUS    RESTARTS   AGE
airflow-postgresql-0                 1/1     Running   0          75m
airflow-scheduler-6df9cf9855-4xzd4   2/2     Running   0          75m
airflow-statsd-5556dc96bc-zdtjp      1/1     Running   0          75m
airflow-webserver-dc8c746b7-9wqlh    1/1     Running   0          75m

I triggered a simple DAG. Also posting it here for completeness.


DAG file

from airflow import DAG
from datetime import timedelta, datetime
from airflow.operators.bash_operator import BashOperator

dag = DAG(
    'simple_dag',
    default_args= {
        'owner': 'airflow',
        'depends_on_past': False,
        'retries' : 0,
        'start_date': datetime(1970, 1, 1),
        'retry_delay': timedelta(seconds=30),
    },
    description='',
    schedule_interval=None,
    catchup=False,
)

t1 = BashOperator(
    task_id='task1',
    bash_command='echo 1',
    dag=dag
)

Possible solution:

Checking airflow/chart/templates/rbac/pod-launcher-rolebinding.yaml I can verify that the ServiceAccount airflow-webserver can't get the needed airflow-pod-launcher-role permissions (as stated in the error). Also I think airflow/chart/templates/rbac/pod-launcher-role.yaml additionally needs the "list" verb for the "pods/log" resource. Applying these changes gets rid of the error but yields a different error. Nevertheless should I add these changes to the chart templates?

*** Trying to get logs (last 100 lines) from worker pod  ***

*** Unable to fetch logs from worker pod  ***
(400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Tue, 20 Oct 2020 16:29:32 GMT', 'Content-Length': '136'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"name must be provided","reason":"BadRequest","code":400}\n'
helm-chart bug

All 9 comments

@msumit Caan you help with it? I see that you added this feature to Airflow, but you probably forgot about the documentation that describes the required permissions.

@FloChehab Can you look at it also?

@FloChehab Can you look at it also?

Sure, I'll have a look at this ; give me 12h :)

@mik-laj So, I can confirm / reproduce the issue.
I guess it's the webserver that is trying to fetch the logs and not the scheduler.

From what I can see in the chart:

So everything works "as expected" and the chart would need to be updated a bit.

Thanks for checking this out. As described I edited the helm chart to grant the permissions to airflow-webserver which solved the permissions issue for fetching the logs but led to the other error described.

Thanks for checking this out. As described I edited the helm chart to grant the permissions to _airflow-webserver_ which solved the permissions issue for fetching the logs but led to the other error described.

Sorry, I forgot to read the second part of the description. Judging by the second error, I guess there might be a bug in airflow itself (I would have expected a 404 if the pod has been deleted but here it's a 400 that states that the pod name is missing in the request ?)

Some more info on this:
I think #11729 should fix the issue for access to the pod logs (I'm not a Kubernetes expert though).

The other error I mentioned (about getting an error 400 when trying to fetch the logs) was related to setting a different uid/gid in the helm install. The worker pod's were launched with that uid, so the default user was not airflow. I got this log from the worker-pod:

$~ kubectl logs simpledagtask1-12f123a06db04f9684628ff0dedd96cb -n airflow
Traceback (most recent call last):
  File "/home/airflow/.local/bin/airflow", line 5, in <module>
    from airflow.__main__ import main
ModuleNotFoundError: No module named 'airflow'

After removing the --set uid=1000 from the helm install I could launch worker pods and read logs.

@grepthat we all learn the hard way "once" that we must set the uid / gid to be consistent with the user in the docker image haha. If you want to play with that, the airflow image can be parametrized in that regard.

WebUI/Scheduler pods should run with serviceAccount which have RBAC permissions on k8s cluster to get logs

spec.template.spec.serviceAccount: airflow
apiVersion: v1
kind: ServiceAccount
metadata:
  name: airflow                              
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: airflow
rules:
- apiGroups: [""]
  resources: [pods]
  verbs: [create, get, delete, list, watch]
- apiGroups: [""]
  resources: [pods/log]
  verbs: [get, list]
- apiGroups: [""]
  resources: [pods/exec]
  verbs: [create, get]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: airflow
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: airflow
subjects:
- kind: ServiceAccount
  name: airflow
Was this page helpful?
0 / 5 - 0 ratings