Airflow: Unclear documentation for the delegate_to parameter

Created on 21 Jun 2020  路  12Comments  路  Source: apache/airflow

Hello,

Most GCP operators accept the delegate_to parameter. It is not super well documented and it can be confusing to users exactly when they would use this as opposed to another conn id.

A little more documentation on the following page would be helpful.
https://airflow.readthedocs.io/en/latest/howto/connection/gcp.html

You can get extra points for adding information about Impersonated credentials
. This feature is not supported by Airflow, but it is related (ticket).

Are you wondering how to start contributing to this project? Start by reading our contributor guide

Best regards,
Kamil

CC: @jaketf

docs good first issue bug Google

All 12 comments

@olchas] Do you want to work on it? It looks like this is a task for you.

@mik-laj sure. Could you assign me to this issue, please?

@olchas We first need to ask a basic question. Shouldn't we abandon support for this feature? In what cases does it work? In what cases does it not work?

@mik-laj I am still looking into it. The name suggests that domain-wide delegation only makes sense for G Suite applications (so in terms of Airflow it should only be applied to GoogleDriveHook and GSheetsHook), but this article calls it a legacy branding and tells that it applies to Cloud Identity as well.

I am also still uncertain about how the two impersonation mechanisms can/should work together. As far as I can tell, domain-wide delegation is supposed to be used to impersonate user account using service account, while direct impersonation can be used to impersonate service account using either another service account or user account.

So, I can see two scenarios:

  1. You start with a service/user account that you use to directly impersonate some service account, that is then used to perform domain-wide delegation on some user.
  2. You start by performing domain-wide delegation on some user, and then use this user to impersonate some service account.

However, Credentials class from google.auth.impersonated_credentials module does not have with_subject method, so apparently it is impossible to use directly impersonated account to perform domain-wide delegation of authority, which renders first scenario impossible. On the other hand, it seems you can specify the delegate for source credentials and then use these credentials for direct impersonation as in scenario 2, but I did not have a chance to test it.

@jaketf, @amithmathew, do you perhaps have more insight on the topic?

Taking a look at the code now it seems we have this common GoogleBaseHook used by hooks for gsuite and cloud. This delegate_to seems not really not useful for cloud, and I don't think the scenario 2 of delegating to a human user to impersonate a service account is an advisable pattern / one worth supporting in airflow core. I think delegate_to should be removed / deprecated from the Google Cloud Hooks / Operators to avoid confusion.

To play devil's advocate: There may be use cases where users expect delegate_to to attribute API calls (e.g. a BQ Query) to the delegated human user. Again, I don't think I'd recommend this as an auditing posture as anyone could throw [email protected] into the delegate_to and bootstrap my IAM permissions. IMO This seems like something we shouldn't support.

delegate_to would be used to impersonate a user account using a (specific) service account - It does look like delegate_to may be used on the GSuite side of things.

@jaketf, @amithmathew so it would seem that the approach could be to:

  1. disallow simultaneous use of delegate_to and impersonation_chain (new argument used for direct impersonation) in GoogleBaseHook by raising an exception if both arguments are provided
  2. remove delegate_to from most Google operators and hooks, leaving it only in G Suite operators and hooks

WDYT?
CC: @turbaszek

Sounds like great plan.

One small problem though - we will have to deprecate the delegate_to parameter rather than remove them initially. In 2.1 we might be able to remove them,

@olchas I'm in favour of 2nd approach. Regarding the deprecation - first I would check if current implementation works. If it does not then I see no reason for deprecation - let us just remove it.

@olchas I'm in favour of 2nd approach. Regarding the deprecation - first I would check if current implementation works. If it does not then I see no reason for deprecation - let us just remove it.

Agree with @turbaszek. I thought about deprecation in the sense that it will not fail when someone actually adds the "delegate_to" parameter. I am afraid there might be cases that someone already added this parameter and we do not want their DAGs to fail (even if this parameter was superfluous).

I believe we have **kwargs in most such operators but it's worth checking.

I agree remove delegate_to, make sure it is an ignored kwarg rather than breaking peoples DAGs if passed.

Was this page helpful?
0 / 5 - 0 ratings