Pipelines: [Multi user] How do we release KFP multi user in Kubeflow?

Created on 29 Apr 2020 · 19Comments · Source: kubeflow/pipelines

Part of https://github.com/kubeflow/pipelines/issues/1223

/cc @jlewi @chensun @IronPan

Options:

As a fork based on kubeflow 1.0.2
Merge to kubeflow 1.0 branch and release as 1.0.3
Wait for kubeflow 1.1 (any ETA?)

Note, it's only for gcp-iap deployment initially.

arerelease cumulti-user statutriaged

Source

Bobgy

Most helpful comment

Do you mean we agreed to use environment variables to make the headers programmable?

@jlewi
Yes, exactly, that was bad phrasing on my part.
Since in v0.6 we agreed to make them configurable (through envvars, cli args or config files), I would expect new applications integrating with Kubeflow's multi-tenancy to follow that approach.

@yanniszark @jlewi Can you share what options for other apps look like for configuring headers? I'm not sure they have enough visibility for other developers.

Absolutely!
Here are some examples:

@Bobgy I agree that expectations of Web Apps in terms of auth should be clear.
To help with making the concepts more clear and streamline implementations of future components, I have started a design doc that gathers all the current knowledge that is spread through the code in one place:
https://docs.google.com/document/d/11Xi-I2OqJvUuy_Zg0NskMF9GmworRDJhlu68poDgK5c/edit#bookmark=id.d3wqgeictsp4
I expect we will also discuss it further in Tuesday's Kubeflow community meeting.

The KUBEFLOW_USERID_HEADER and KUBEFLOW_USERID_PREFIX options would be a great start to get Pipelines working in all platforms.

yanniszark on 8 May 2020

👍4

All 19 comments

@jlewi I think we want a way to release it soon (maybe as experimental first).

So we can also go with release as an experimental fork from kubeflow 1.0.2 and wait for either 1.0.3 or 1.1.

What do you think?

Bobgy on 29 Apr 2020

@Bobgy Why do we need a fork? If its on master can't people just deploy from master?

jlewi on 29 Apr 2020

@Bobgy is all the relevant code/fixes on master?

What is the path forward for turning on ISTIO in the kubeflow namespace? kubeflow/kfctl#296

Does the KFP manifest(https://github.com/kubeflow/manifests/tree/master/pipeline) need to be updated with multiuser changes?

jlewi on 29 Apr 2020

@jlewi I verified using namespace resource works (I guess I misconfigured sth when I tried it before), so we can just use kfctl 1.0.2.

All relevant code/fixes are on KFP master and my forked manifest branch: https://github.com/Bobgy/manifests/pull/3/files.

(it can be deployed normally following https://www.kubeflow.org/docs/gke/deploy/deploy-cli/ and use CONFIG_URI="https://raw.githubusercontent.com/Bobgy/manifests/kfp-multi-user/kfdef/kfctl_gcp_iap.yaml" instead

Bobgy on 30 Apr 2020

I prefer that, in this release, only KFP is kind of experimental using the multi user mode, but all other components are stable. Therefore, I think KF 1.0.2 might be a good candidate as a base.

What do you think?

in the mean time, I can try to merge forked changes to Kubeflow master

Bobgy on 30 Apr 2020

@Bobgy my suggestion would be to get things working on master. I don't know that we want to big changes onto release branches. Either way though getting it checked in working on master is a precursor so lets do that first.

jlewi on 1 May 2020

I think we'd want to release a relatively stable version soon for community to try out KFP multi user mode in a forked branch.

In the mean time, I can try to start sending PRs to master.

Bobgy on 1 May 2020

Would back-porting this onto 1.0 branch be consistent with semantic versioning?
If we backport this onto 1.0 do we risk blocking our ability to release minor patch fixes to 1.0?

jlewi on 4 May 2020

I see, that makes sense to me.

Shall we

target 1.1 release like before
provide a quick try-out version based on forking 1.0.2 unofficially
?

Bobgy on 5 May 2020

My suggestion would be to get it working on master. Once we have it working on master we can either add a git tag or possibly create a new release branch. I'd probably recommend we bump the minor version; e.g 1.1.0-alpha1.

jlewi on 6 May 2020

@jlewi Sounds good.

After rethinking over it, I agree there's not much value added if the early access fork is based on 1.0.2, because KFP itself is also experimental.

I'll try to merge to kubeflow master first

Bobgy on 6 May 2020

@Bobgy +1
It would be good to get the multiuser support integrated into the GCP blueprint for blueprint users. That should be fairly straightforward. Blueprints should be easier to upgrade so it will be easier to roll out changes to customers using the blueprints.

I think the only changes would be

Add pipelines to the stacks kustomization file
- Which is related to kubeflow/manifests#1048 but not necessarily blocked on it.
Replace the namespaces package in the blueprint (this [file])(https://github.com/kubeflow/gcp-blueprints/blob/master/kubeflow/instance/kustomize/namespaces/namespaces.yaml) with the one you added to kubeflow/manifests so the blueprint is in sync

Per: kubeflow/gcp-blueprints#5 I'm setting up autodeployments for gcp blueprints. That should make it easy for us to verify and ultimately add tests to verify it is all working.

jlewi on 7 May 2020

@jlewi Thanks for the suggestion! Because I have limited capacity working on these, I will get these changes merged to kubeflow/manifest master first before trying out the gcp blue print.

Bobgy on 8 May 2020

@Bobgy thanks a lot for confirming.
It seems that the user-identity parsing code is already present in a few places and probably more in the future.

While we (and other users) can test multi-user pipelines on their on-prem installations, it's very difficult to dig through every part of the code and make sure the headers are configurable.
The best time to incorporate these options is when the code is first written.

Is there a plan to make the headers configurable like the rest of Kubeflow's web apps? (Jupyter Web App, CentralDashboard, etc).
That would allow non-GCP users of Kubeflow to use multi-user pipelines.

@jlewi we had similar issues with web apps being GCP-only in v0.6 and we contributed changes throughout the code after the fact. After v0.6, we had agreed to use the established headers, so we wouldn't have to do it again. How can we make sure that new code is written in a way that it doesn't have an artificial dependence on GCP?

yanniszark on 8 May 2020

we had agreed to use the established headers
Do you mean we agreed to use environment variables to make the headers programmable? I recall discussing 2 options

Using ISTIO to transform the headers to some standard values
Making the headers programmatic for each application.

I thought we mostly went with the second option?

@yanniszark I think the best way would be to come with ways to make it easy for application developers to follow best practices. Ideally, that would mean finding some way to handle it centrally so people don't have to think about it or creating reusable libraries.

Tests would also help. If application developers write tests for the feature (e.g. multiuser pipelines) and we have CI setup for different platforms then that test would fail on other platforms and provide appropriate signal.

Finally documenting it and promoting it is also helpful.

jlewi on 8 May 2020

Multi user is a very complex feature, I don't feel there's any problem if we prototype with just one platform first. It was good enough we agreed on the high-level design in the beginning and guaranteed it should work with all platforms. Now, it's pretty trivial to port the feature.

There are only been two places we are using it: https://github.com/kubeflow/pipelines/search?q=x-goog-authenticated-user-email&unscoped_q=x-goog-authenticated-user-email

@yanniszark @jlewi Can you share what options for other apps look like for configuring headers? I'm not sure they have enough visibility for other developers.

Bobgy on 8 May 2020

Do you mean we agreed to use environment variables to make the headers programmable?

@yanniszark @jlewi Can you share what options for other apps look like for configuring headers? I'm not sure they have enough visibility for other developers.

Absolutely!
Here are some examples:

The KUBEFLOW_USERID_HEADER and KUBEFLOW_USERID_PREFIX options would be a great start to get Pipelines working in all platforms.

yanniszark on 8 May 2020

👍4

/close
As we have reached agreement, further progress will be tracked in https://github.com/kubeflow/pipelines/issues/3693

Bobgy on 13 May 2020

@Bobgy: Closing this issue.

In response to this:

/close
As we have reached agreement, further progress will be tracked in https://github.com/kubeflow/pipelines/issues/3693

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.