Airflow: Kubernetes config dags_volume_subpath breaks PVC in helm chart

Created on 30 Aug 2020  ·  5Comments  ·  Source: apache/airflow

Apache Airflow version: 1.10.12, master

**Kubernetes version: v1.17.9-eks-4c6976 (server)/ v.1.18.6 (client)

Environment:

  • Cloud provider or hardware configuration: EKS
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

What happened:

Current logic of setting dags_volume_subpath is broken for the following use case:

Dag loaded from PVC but gitSync disabled.

I am using the chart from apache/airflow master thusly:

helm install airflow chart --namespace airflow-dev \
--set dags.persistence.enabled=true \
--set dags.persistence.existingClaim=airflow-dag-pvc \
--set dags.gitSync.enabled=false

For the longest time, even with a vanilla install, the workers kept dying. Tailing the logs clued me in that workers were not able to find the dag. I verified from the scheduler that dags were present (it showed up in the UI etc)

Further debugging (looking at the worker pod config) was the main clue ... here is the volumemount

    - mountPath: /opt/airflow/dags
      name: airflow-dags
      readOnly: true
      subPath: repo/tests/dags

Why/who would add repo/tests/dags as a subpath?? 🤦‍♂️

Finally found the problem logic here:
https://github.com/apache/airflow/blob/9b2efc6dcc298e3df4d1365fe809ea1dc0697b3b/chart/values.yaml#L556

Note the implied connection between dags.persistence.enabled and dags.gitSync! This looks like some leftover code from when gitsync and external dag went hand in hand.

Ideally, an user should be able to use a PVC _without_ using git sync

What you expected to happen:
I should be able to use a PVC without gitsync logic messing up my mount path

See above.

How to reproduce it:
I am kinda surprised that this has not bitten anyone yet. I'd like to think my example is essentially a hello world of helm chart with an dag from a PVC.

Anything else we need to know:

This is the patch that worked for me. Seems fairly reasonable -- only muck with dags_volume_subpath if gitSync is enabled.

Even better would be to audit other code and clearly separate out the different competing use case

  1. use PVC but no gitsync
  2. use PVC with gitsync
  3. use gitsync without PVC
diff --git a/chart/values.yaml b/chart/values.yaml
index 00832b435..8f6506cd4 100644
--- a/chart/values.yaml
+++ b/chart/values.yaml
@@ -550,10 +550,10 @@ config:
     delete_worker_pods: 'True'
     run_as_user: '{{ .Values.uid }}'
     fs_group: '{{ .Values.gid }}'
     dags_volume_claim: '{{- if .Values.dags.persistence.enabled }}{{ include "airflow_dags_volume_claim" . }}{{ end }}'
-    dags_volume_subpath: '{{- if .Values.dags.persistence.enabled }}{{.Values.dags.gitSync.dest }}/{{ .Values.dags.gitSync.subPath }}{{ end }}'
+    dags_volume_subpath: '{{- if .Values.dags.gitSync.enabled }}{{.Values.dags.gitSync.dest }}/{{ .Values.dags.gitSync.subPath }}{{ end }}'
     git_repo: '{{- if and .Values.dags.gitSync.enabled (not .Values.dags.persistence.enabled) }}{{ .Values.dags.gitSync.repo }}{{ end }}'
     git_branch: '{{ .Values.dags.gitSync.branch }}'
     git_sync_rev: '{{ .Values.dags.gitSync.rev }}'
helm-chart bug

Most helpful comment

facing the same issue

All 5 comments

facing the same issue

@dimberman can you have a look or ask someone else from our team to check it out?
Noy also posted some details in Slack https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1601396841221900

Hi @aimran-adroll @NoyMiranFyber Are you using the helm chart on master? We recently removed a lot of that code and I'd be interested if this is still a problem (this PR is where we made the change)

https://github.com/apache/airflow/commit/56bd9b7d6b494251fa728ff6a7eb06d6d7eeb2c8#diff-f18d9ef27be2f36a73a018900706456eL556 here is the line where that value mentioned was deleted.

@dimberman on master in files/pod-template-file.yaml this problematic piece of code still exists:

{{- if or .Values.dags.gitSync.enabled .Values.dags.persistence.enabled }}
        - mountPath: {{ include "airflow_dags_mount_path" . }}
          name: airflow-dags
          readOnly: true
{{- if .Values.dags.persistence.enabled }}
          subPath: {{.Values.dags.gitSync.dest }}/{{ .Values.dags.gitSync.subPath }}

I'm facing the issue with using the default repo configurations and just turning on the git-sync as mentioned here.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Labbs picture Labbs  ·  4Comments

JavierLopezT picture JavierLopezT  ·  4Comments

jdavidheiser picture jdavidheiser  ·  3Comments

ephraimbuddy picture ephraimbuddy  ·  3Comments

turbaszek picture turbaszek  ·  3Comments