Airflow: DAG's parameter access_control is not refreshing in the UI

Created on 28 Apr 2020  路  15Comments  路  Source: apache/airflow

Apache Airflow version: 1.10.9

Environment:

  • Cloud provider or hardware configuration: N/A
  • OS (e.g. from /etc/os-release): Debian GNU/Linux 9 (stretch)
  • Kernel (e.g. uname -a): Linux 4cc8ac3c2cfb 5.4.0-26-generic #30-Ubuntu SMP Mon Apr 20 16:58:30 UTC 2020 x86_64 GNU/Linux
  • Install tools: N/A

What happened:

When I update DAG's parameter access_control the change is not updated in the UI at roles/list/. I have to trigger refresh DAG manually in the UI to get the change. (I tried it many times and waited 10+ minutes.)

What you expected to happen:

I assume the change change should be updated automatically, as for any other DAG's parameter.

How to reproduce it:

Create new DAG for example with parameter: access_control={'Public': ['can_dag_read']}.

bug

Most helpful comment

@jdavidheiser haven't got pinged in Airflow for a while :) If I recall(~ 2 years ago), we first need this commit (https://github.com/apache/airflow/pull/4642/files) to be use access_control in the Airflow version.

I have thought about that at that time to allow calling sync_perm_for_dag to work around this issue (Pinterest did this way). But the issue here is that RBAC lives in webserver/ flask-app builder while scheduler does the DAG parsing. Calling a webserver function within scheduler kinda violate the contract which ideally it should calls a DB function instead of a webserver internal call(scheduler shouldn't be aware of webserver internal and shouldn't instantiate a webserver obj). But the challenge is that all the RBAC db models live in RBAC instead of Airflow if I recall.

I think the ultimate solution is to have an API gateway(if we don't have that, webserver API is fine) which allows scheduler to calls that to update permission during DAG parsing. What we did currently to workaround is to use cli to periodic update that part (cc @astahlman ). I haven't looked at the latest Airflow code for a while, will defer to @ashb or other to comment if there are other alternative approaches.

All 15 comments

Thanks for opening your first issue here! Be sure to follow the issue template!

Even I see this issue, any update on this ticket, current workaround I am using to run airflow sync_perm command as cron job for every 5 minutes.

@iam432 Can I ask you how do you use that command in the cron job? We use Airflow with Docker and I'm able to run airflow sync_perm inside the container, it works and the permissions are updated. However when I schedule that command (* * * * * /usr/local/bin/airflow sync_perm) in the cron, it doesn't work and I receive the following error:

[parameters: ('2020-05-06 14:20:02.886571', None, None, 'cli_sync_perm', None, 'airflow', '{"host_name": "857656ec757f", "full_command": "[\'/usr/local/bin/airflow\', \'sync_perm\']"}')]
(Background on this error at: http://sqlalche.me/e/e3q8)
The sync_perm command only works for rbac UI.

@vdusek : created a shell script for sync (there are other things in script) and scheduled cron run this script every 5 minutes as below.

[airflow@hostname ~]$ cat /data/airflow/sync.sh

! /bin/bash

airflow sync_perm
[airflow@hostname ~]$

[airflow@hostname ~]$ crontab -l
*/5 * * * * sh /data/airflow/sync.sh
[airflow@hostname ~]$

@vdusek : forgot to say RBAC is enabled in my environment

@vdusek : Looks my solution is not working, did you find a solution which is working for you ? please suggest if you have any workaround

@iam432 I had troubles setting up cron running sync_perm as you suggested. I didn't find any other solution so I let it be. When there's a need for refresh, we do it manually in UI. Hope it's goona be fixed soon.

Thank you for the response, I am able to automate this by creating a DAG and running it with a schedule of every 2 minutes and it solved but it is a workaround.

Can you guide me on - When there's a need for refresh, we do it manually in UI

As a admin I can see with refresh but users who has limited access cannot see DAGs after refresh, which refresh you mean in UI, can you please suggest.

@iam432 Let me answer your question on behalf of @vdusek

We noticed if you click on DAG refresh button on the right side of the main screen (in Links column) access rights are updated for the particular DAG - however you have to be admin.
This is not usable workaround since every time a new DAG is added someone with Admin rights would have to go into UI and press _refresh_ button.

At this moment we had to disable access control on the level of DAGs however it's something we need to resolve mid-term.

This is still an issue, and at least two more people have run into it (per discussion in Airflow Slack). @feng-tao @astahlman per the comment on https://github.com/apache/airflow/blob/0eb5020fda46a6eceaa3652846598cb4ba34493b/airflow/www/security.py#L524, this seems like it should be a relatively simple fix to call sync_perm_for_dag for every dag when the rest of the permissions are refreshed. Does that sound reasonable? If so, I can submit a PR.

@jdavidheiser haven't got pinged in Airflow for a while :) If I recall(~ 2 years ago), we first need this commit (https://github.com/apache/airflow/pull/4642/files) to be use access_control in the Airflow version.

I have thought about that at that time to allow calling sync_perm_for_dag to work around this issue (Pinterest did this way). But the issue here is that RBAC lives in webserver/ flask-app builder while scheduler does the DAG parsing. Calling a webserver function within scheduler kinda violate the contract which ideally it should calls a DB function instead of a webserver internal call(scheduler shouldn't be aware of webserver internal and shouldn't instantiate a webserver obj). But the challenge is that all the RBAC db models live in RBAC instead of Airflow if I recall.

I think the ultimate solution is to have an API gateway(if we don't have that, webserver API is fine) which allows scheduler to calls that to update permission during DAG parsing. What we did currently to workaround is to use cli to periodic update that part (cc @astahlman ). I haven't looked at the latest Airflow code for a while, will defer to @ashb or other to comment if there are other alternative approaches.

Any update on this?

Any news on this issue?

cc @ashb @kaxil

Could someone try out alpha 2 and let us know if this is still a problem?

Was this page helpful?
0 / 5 - 0 ratings