Pipelines: Add authentication with ServiceAccountToken

Created on 15 Feb 2021  路  14Comments  路  Source: kubeflow/pipelines

Problem Statement

Clients in various namespaces (e.g., Notebooks) need to access the Pipelines API. However, there is currently no way for these clients to authenticate to the Pipelines API:
https://github.com/kubeflow/pipelines/issues/4440
https://github.com/kubeflow/pipelines/issues/4733
In-cluster clients need a way to authenticate to the KFP API Server.

Proposed Solution

The correct way to do this is by using audience-scoped ServiceAccountTokens. In Arrikto's Kubeflow distribution, we have been successfully using this method for a long time, in numerous customer environments. We want to upstream this solution so the whole community can benefit as well, since we see this is an issue many users bump into.
Changes need to happen in 2 places:

  • API Server, which needs to support authentication with ServiceAccountToken.
  • KFP Client, to better support this authentication method.

/assign @yanniszark
cc @Bobgy

Most helpful comment

@elikatsis @yanniszark If any help is needed so this can be added for 1.3 please let me know. I think a large portion of the community has been waiting for this a while now and it would be great to have it included in 1.3.

All 14 comments

Thank you for the proposal! I'd love to see it getting it upstreamed too. It's a common request in #4440.

Hello!

I'll provide an update here as I'll be pushing a PR covering the backend part very soon.

As mentioned in the first comment, we are adding a new authentication method: authentication using ServiceAccountTokens.
For this, we need the clients to put ServiceAccountTokens in requests and the backend (KFP API server) to retrieve them and authenticate the requests.

How will this ServiceAccountToken find its way in the requests?

  1. The client finds a proper ServiceAccountToken (more on this later on)
  2. It adds an Authorization: Bearer <token> header in all requests

What does the authentication cycle of the backend look like?

  1. We will extend the authentication mechanisms of the KFP API server with one more authenticator [and we will make the available authenticators extendable]
  2. Every request will pass through all available authenticators (currently, Kubeflow-UserID header and ServiceAccountToken) until one succeeds.
    Then, that is, if one succeeds, authentication succeeds.
    Otherwise, that is, if all authenticators have failed, the request is considered unauthenticated.

How does the ServiceAccountToken authenticator work?

  1. The KFP API server creates a TokenReview using the ServiceAccountToken retrieved from the requests bearer token header and some expected audience (for the KFP case, this can be ml-pipeline)
  2. Kubernetes responds (with the TokenReviewStatus) whether the token is associated with a known user and with what audience
  3. The KFP API server verifies that ml-pipeline is in the audience specified in the Kubernetes response
  4. The KFP API server considers the request authenticated and assumes the user specified by Kubernetes in its response

Useful links:

How does the client find a ServiceAccountToken to use?

Kubernetes has built-in ways to project tokens with specific audience for the ServiceAccount of a pod.
Each container of a pod mounts the token similarly to how it would mount some volume.
The kubelet generates a token and stores it in a file. Then, to retrieve the token, it's just a matter of reading this file.

The KFP client should have a seamless way to

  1. retrieve the path where the token is mounted,
  2. read it, and
  3. use it in request headers.

The token has an expiration time, however the kubelet makes sure to refresh this token before it expires.
So, finally, the client should re-read the token every now and then.

This last part is also relevant to the discussion of https://github.com/kubeflow/pipelines/issues/4683

Useful links:

@elikatsis @yanniszark If any help is needed so this can be added for 1.3 please let me know. I think a large portion of the community has been waiting for this a while now and it would be great to have it included in 1.3.

/assign @elikatsis

@DavidSpek thanks for volunteering!
We actually have the code ready for a PR, and we've extensively tested it in our deployments.
I believe it would be really helpful if had the time to test the PRs (backend & client)!

Before I open the client PR I'll present some implementation details (we've described an overview in the comment above)

As mentioned in https://github.com/kubeflow/pipelines/issues/4683#issuecomment-719652792, we want to have a generic way to provide credentials to the client. We will be using a TokenCredentials abstract class for this and we will be making use of a very interesting built-in Kubernetes Configuration functionality: auth_settings. [Obviously, we use a Configuration in our client ([source](https://github.com/kubeflow/pipelines/blob/1577bdb41913613f6268366b6e6e20fdfddde693/sdk/python/kfp/_client.py#L131)).]

Requirements

  1. We want some credentials to find their way in request headers and, more specifically, in the Authorization: Bearer <token> header.
  2. Also, we need a way to refresh the token before making an API call (as mentioned in the comment above, when projecting service account tokens for pods, the kubelet refreshes them every now and then, so a client needs to read the token often)

Information about the Kubernetes Configuration object

  1. A Kubernetes Configuration, based on its attributes, it may hold some BearerToken authentication settings (source)
  2. Before making an API call it updates the request using these settings (source). Based on these settings, it may populate the request with:

    • cookies,

    • headers, or

    • queries.

[Expanding (1)] As shown in this source, by providing a Configuration.api_key["authorization"] we can add a BearerToken auth setting which:

  1. adds a header to the request (source)
  2. the header name is authorization (source)
  3. the header value is retrieved using the get_api_key_with_prefix() method (source)

[Expanding (3)] The get_api_key_with_prefix() method (source)

  1. Eventually returns self.api_key["some-key"] with a desired prefix if self.api_key_prefix["some-key"] is set
  2. Note that before running any of this, it executes the refresh_api_key_hook() method if it is defined :exclamation:

[Expanding (2)] The refresh_api_key_hook() method runs before every request. And, as its name suggests, it's a neat way to refresh the api keys!

Conclusions

To sum up, what we need to do is:

  1. populate our config.api_key["authorization"] = token,
  2. populate our config.api_key_prefix["authorization"] = "Bearer", and
  3. provide our config.refresh_api_key_hook with a function that updated config.api_key["authorization"].

So, for this case (authentication with ServiceAccountTokens), we need to

  1. Read the contents of a specific file in the container's file system (projected service account tokens are essentially volumes mounted on pods). This is the token
  2. Use a method that reads and returns the contents of this file as the refresh_api_key_hook

Design decisions

  1. We will create a subclass of TokenReview named ServiceAccountTokenVolumeCredentials
  2. The class constructor will be expecting a path pointing to the file where the token is stored
  3. If the user doesn't provide a path, the constructor will look for an environment setting: the value of the environment variable ML_PIPELINE_SA_TOKEN_PATH
  4. If the user doesn't provide a path and the environment variable is not set, the constructor will fall back to reading the path /var/run/secrets/ml-pipeline/token
  5. The Client constructor will be expecting a credentials argument and manipulate it accordingly
  6. If no credentials are provided and the client detects it is running inside a pod, it will attempt to use a ServiceAccountTokenVolumeCredentials.

How to set up the pod to authenticate against KFP

We (Arrikto) have been using a PodDefault that configures the pod to authenticate against KFP based on the aforementioned design.
Here follows the PodDefault, it essentially describes all that we need to supplement the pod definition with:

apiVersion: kubeflow.org/v1alpha1
kind: PodDefault
metadata:
  name: access-ml-pipeline
spec:
  desc: Allow access to Kubeflow Pipelines
  selector:
    matchLabels:
      access-ml-pipeline: "true"
  volumeMounts:
  - mountPath: /var/run/secrets/ml-pipeline
    name: volume-ml-pipeline-token
    readOnly: true
  volumes:
  - name: volume-ml-pipeline-token
    projected:
      sources:
      - serviceAccountToken:
          path: token
          expirationSeconds: 7200
          audience: ml-pipeline
  env:
  - name: ML_PIPELINE_SA_TOKEN_PATH
    value: /var/run/secrets/ml-pipeline/token  # this is dependent on the volume mount path and SAT path

@elikatsis Thanks for the detailed post. I will look at it more closely tomorrow and do my best to help test the PRs.

@Bobgy, @DavidSpek I've opened two PRs :tada:

  1. Backend: #5286
  2. Client: #5287

Hi @elikatsis! Thank you for the detailed design and PRs!
I think these are absolutely great work and I'll start looking at them right now.

However, despite that, I'm a little concerned that the design was only made public 5 days before Kubeflow 1.3 feature cut date -- March 15th. I think we agreed early on the rough direction, that was a good heads up, but it's not possible to discuss this fairly complex feature design thoroughly within 5 days. If we commit to shipping this in KF 1.3, we can only rush to a decision.

Besides that an important dependency (important in the terms of making zero-config default better experience) on PodDefault was only revealed 3 days before the feature cut date, which I especially worry about.

@Bobgy thanks for putting time on this!

I'm a little concerned that the design was only made public 5 days before Kubeflow 1.3

Your concerns are totally valid and understandable. We agree it is very close to the first RC and this may be a bit pressing.

I think we agreed early on the rough direction, that was a good heads up, but it's not possible to discuss this fairly complex feature design thoroughly within 5 days

Indeed, this is an advanced feature. However, most of the changes we had already discussed due to the joint talk you had with @yanniszark.

That's why we expect the backend changes to be unsurprising.
As far as the client is concerned, the change is relatively small and fully backwards compatible. In fact, it doesn't affect existing users at all.

Note that all of the changes are extensions to existing functionality and are not removing or changing any old behavior.

Besides that an important dependency (important in the terms of making zero-config default better experience) on PodDefault was only revealed 3 days before the feature cut date, which I especially worry about.

We agree, but it's not necessary to have a zero-config issue before the RC. We can still use the alternative of _some_ config, if we want.

To sum up: yes, we are very close to the RC (also take into consideration that cutting a release was pushed one week), but let's do our best and see if we can make it! Many users rely on it. If we don't make it, it's ok!

@elikatsis thanks I just realized the RC cut delay, I'm glad we get some more breath on this feature.

We agree, but it's not necessary to have a zero-config issue before the RC. We can still use the alternative of some config, if we want.

Makes sense, so I'd frame the discussion around common things we agree on, would you mind splitting your PR as smaller ones , so that we can approve the ones we fully agree on right now first for the RC? (For clarification, I don't mean to ask you to split right now, but rather during review if we see parts that everyone agrees on, we can split them out for a quick merge.)

and I've got very good context on the backend part based on previous discussion with Yannis, I think we can get them merged.

The only part I have concerns is the user facing interface to add service account tokens. What do you think about letting KFP api server inject projected service account token to every KFP pod? I don't think that raises more security risk (because service account tokens are already available there), nor is there chance to break existing components. Pros -- we do not need PodDefault there, so one less dependency.

e.g. I guess we can configure https://argoproj.github.io/argo-workflows/default-workflow-specs/ with a global podSpecPatch like https://github.com/argoproj/argo-workflows/blob/master/examples/pod-spec-patch-wf-tmpl.yaml to get this behavior easily.

For clarification, I'm prioritizing reviewing the backend PR, because it's a blocker of release. The SDK PR can be released after Kubeflow release, because users can easily upgrade the SDK at any time, and there's very little coupling to the server.

I had totally missed these comments :scream:

would you mind splitting your PR as smaller ones , so that we can approve the ones we fully agree on right now first for the RC?

We've merged the backend now, I hope you are good with this and did not hesitate asking me to split some commits. Next time feel free to explicitly ask for things like that during the review!

What do you think about letting KFP api server inject projected service account token to every KFP pod?
e.g. I guess we can configure https://argoproj.github.io/argo-workflows/default-workflow-specs/ with a global podSpecPatch like https://github.com/argoproj/argo-workflows/blob/master/examples/pod-spec-patch-wf-tmpl.yaml to get this behavior easily.

These sound like very good ideas. However, maybe we want an explicit way to declare something like "allow _this_ pod to have access to KFP, but not _this_ one".

We will iterate on these ideas internally and come back to it!

@elikatsis Does it make sense to integrate the PodDefault you shared above with the notebook controller to make the user experience more seamless? I believe this would be the best way to solve this long standing issue.

I had totally missed these comments :scream:

would you mind splitting your PR as smaller ones , so that we can approve the ones we fully agree on right now first for the RC?

We've merged the backend now, I hope you are good with this and did not hesitate asking me to split some commits. Next time feel free to explicitly ask for things like that during the review!

No worries, the backend PR LGTM. I was mostly talking about concerns for the sdk PR.

What do you think about letting KFP api server inject projected service account token to every KFP pod?
e.g. I guess we can configure https://argoproj.github.io/argo-workflows/default-workflow-specs/ with a global podSpecPatch like https://github.com/argoproj/argo-workflows/blob/master/examples/pod-spec-patch-wf-tmpl.yaml to get this behavior easily.

These sound like very good ideas. However, maybe we want an explicit way to declare something like "allow _this_ pod to have access to KFP, but not _this_ one".

We will iterate on these ideas internally and come back to it!

I'd prefer adhering to the standard RBAC model. Each Pod has access to a service account, while we add RBAC rules to control what one service account can do. I worry the addition of choosing which pods can have access to KFP api is introducing an unnecessary abstract layer.

Was this page helpful?
0 / 5 - 0 ratings