Charts: [stable/airflow] initContainers / Git-sync with K8sExecutor: dag_id not found

Created on 2 Jun 2020  路  9Comments  路  Source: helm/charts

Describe the bug
When either the initContainer parameter or Git-sync parameter is set to true, the repo with the DAGs is well cloned but not in the worker pods.

Version of Helm and Kubernetes:
Helm 3
Kubernetes 1.16

Which chart:
stable/airflow

What happened:
airflow.exceptions.AirflowException: dag_id could not be found: parallel_dag. Either the dag did not exist or it failed to parse
In the pod executing the task and the folder /dags is empty

What you expected to happen:
The folder shouldn't be empty

How to reproduce it (as minimally and precisely as possible):

values:
    dags:
        git:
            url: "https://github.com/marclamberti/airflow-dags"
            ref: "master"
        initContainer:
            enabled: true

Anything else we need to know:
Not yet tested, but I think the following env vars should be set as well:
AIRFLOW__KUBERNETES__GIT_REPO: "{{ .Values.dags.git.url }}"
AIRFLOW__KUBERNETES__GIT_BRANCH: "{{ .Values.dags.git.ref }}"
AIRFLOW__KUBERNETES__GIT_DAGS_FOLDER_MOUNT_POINT: "{{ .Values.dags.path }}"
when initContainer or GitSync is actived

@thesuperzapper keep you in touch :)

lifecyclstale

Most helpful comment

All right I got the issue,
When we turn on git sync, the parameter dags_in_image must turned to false and so the following parameters must be filled:

AIRFLOW__KUBERNETES__GIT_BRANCH
AIRFLOW__KUBERNETES__GIT_DAGS_FOLDER_MOUNT_POINT

where AIRFLOW__KUBERNETES__GIT_DAGS_FOLDER_MOUNT_POINT is basically = CORE_DAGS_FOLDER
But, the parameter AIRFLOW__KUBERNETES__DAGS_VOLUME_SUBPATH should be also given with the value "repo/" since git_sync_dest is by default set to "repo" otherwise the DAGs won't be found.
Hope it helps

All 9 comments

Actually, the dags are well cloned but there are not in the folder dags there are in:
dags/repo/my_dags.py

All right I got the issue,
When we turn on git sync, the parameter dags_in_image must turned to false and so the following parameters must be filled:

AIRFLOW__KUBERNETES__GIT_BRANCH
AIRFLOW__KUBERNETES__GIT_DAGS_FOLDER_MOUNT_POINT

where AIRFLOW__KUBERNETES__GIT_DAGS_FOLDER_MOUNT_POINT is basically = CORE_DAGS_FOLDER
But, the parameter AIRFLOW__KUBERNETES__DAGS_VOLUME_SUBPATH should be also given with the value "repo/" since git_sync_dest is by default set to "repo" otherwise the DAGs won't be found.
Hope it helps

@marclamberti, I think we should add some better documentation for running in KubernetesExecutor mode.

The dags.git.gitSync values wont auto set AIRFLOW__KUBERNETES__XXXXXXX, the reason I chose not to set them is because some values are structured differently in the airflow git-sync, (notably how the secrets are stored).

However, I am planning on moving this chart to the kubernetes/git-sync container (which is what airflow uses), so that should fix this, but will also also force users to change their git secrets.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

This issue is being automatically closed due to inactivity.

@marclamberti @thesuperzapper I am facing the same issue and after setting the following values for airflow.cfg in values.yaml. I am getting below error

[2020-08-24 12:55:18,345] {__init__.py:51} INFO - Using executor LocalExecutor
[2020-08-24 12:55:18,346] {dagbag.py:396} INFO - Filling up the DagBag from /opt/airflow/dags/repo/test_dag.py
Traceback (most recent call last):
  File "/home/airflow/.local/bin/airflow", line 37, in <module>
    args.func(args)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/utils/cli.py", line 75, in wrapper
    return f(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/bin/cli.py", line 523, in run
    dag = get_dag(args)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/bin/cli.py", line 149, in get_dag
    'parse.'.format(args.dag_id))
airflow.exceptions.AirflowException: dag_id could not be found: Test. Either the dag did not exist or it failed to parse.

Here are the values that I am setting airflow.cfg

        AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY: "apache/airflow"
        AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG: "1.10.10-python3.6"
        AIRFLOW__KUBERNETES__NAMESPACE: "airflow-etl"
        AIRFLOW__KUBERNETES__KUBE_CLIENT_REQUEST_ARGS: '{\"_request_timeout\":[60,60]}'
        AIRFLOW__KUBERNETES__GIT_REPO: "[email protected]:************/airflow-etl.git"
        AIRFLOW__KUBERNETES__GIT_BRANCH: "master"
        AIRFLOW__KUBERNETES__GIT_DAGS_FOLDER_MOUNT_POINT: "/opt/airflow/dags"
        AIRFLOW__KUBERNETES__DAGS_VOLUME_SUBPATH: "repo/"
        AIRFLOW__KUBERNETES__DELETE_WORKER_PODS: "False"
        AIRFLOW__KUBERNETES__DELETE_WORKER_PODS_ON_FAILURE: "False"
        AIRFLOW__KUBERNETES__GIT_SSH_KEY_SECRET_NAME: "airflow-secrets"
        AIRFLOW__KUBERNETES__GIT_SSH_KNOWN_HOSTS_CONFIGMAP_NAME: "airflow-git-knownhosts"
        AIRFLOW__KUBERNETES__GIT_SYNC_ROOT: "/dags"
        AIRFLOW__KUBERNETES__DAGS_IN_IMAGE: "False"
        AIRFLOW__KUBERNETES__RUN_AS_USER: "50000"
        AIRFLOW__KUBERNETES__DELETE_WORKER_PODS: "False"

@hussainsaify i have the same issue, did you resolve this issue?

@nguyenkien1402 or @hussainsaify, can you please raise an issue on the new repo if you are still having problems: https://github.com/airflow-helm/charts/tree/main/charts/airflow

@thesuperzapper thanks, I will

Was this page helpful?
0 / 5 - 0 ratings