Pipelines: support separate pipeline for each namespace

Created on 10 Jul 2020  路  22Comments  路  Source: kubeflow/pipelines

EDIT from @Bobgy: This issue got 47 upvotes when requesting user feedback: https://github.com/kubeflow/pipelines/issues/1223#issuecomment-656508302.

Proposed change

Currently with 1.0.2 version, Kubeflow Deployment with kfctl_k8s_istio shares pipelines for all namespaces defined in user profile. Separate pipeline support for each namespace is needed because the multi-user notebook server separation is already supported. It is natural to support pipeline separation.

Alternative options

NA

Who would use this feature?

A lot of enterprises will benefit from this feature as allocating different namespaces to different teams is a common practice in Kubernetes and resources in existing namespaces can be effectively used.

Suggest a solution

NA

kinfeature

Most helpful comment

@Jeffwan @Bobgy

We completed the implementation using KFam (for now). From a UI standpoint, we only added a checkbox to indicate if the Pipeline is "Shared" or not, Pipeline Versions of course cannot be shared. So if you select "create new pipeline..." this checkmark will not be there.

Most of the changes were in the Backend. You can see the implementation details on this commit: https://github.com/arllanos/pipelines/commit/2c88722f6f05b67acc16c2b4e7bc54ff91c3f36c

We have not implemented an Env Variable to disable "Pipeline Sharing", however this shouldn't be too bad I think it should suffice to disable the button all together and Pipelines will be namespaced by default.

Next week I'll switch the authentication to SubjectAccessReview and make the PR.

One thing to note is that, "Shared Pipelines" are still deletable by other users in the System. I'm not sure what the ideal scenario should be here.

Screen Shot 2020-11-19 at 11 41 12 AM

All 22 comments

FYI, in existing to be released KF 1.1, pipeline runs are already in user namespaces.

but the static pipeline yaml files are not separated. Do you want this additionally?

There are more clarification in https://github.com/kubeflow/pipelines/issues/1223#issuecomment-656507073.

If you agree with the proposal in https://github.com/kubeflow/pipelines/issues/1223#issuecomment-656508302, can you rephrase your description to just focus on pipeline resource for this issue.

I think the pipeline yaml files should be supported also for E2E separation. Thanks. Already upvote in #1223 .

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

/frozen

Sorry, I'm kind of oversubscribed because of extra Kubeflow duties recently, so I may not be able to take this issue soon.

Leaving my previous thoughts about this:

For use-case context: some users want full separation of pipelines, and some want no separation. The distinction is mostly organization culture and I think both requests are valid.

So I think a MVP UX I can imagine that satisfies both is to:

  1. pipelines in DB should have a namespace column

    • if namespace == '' (empty string), then it's in the shared space

    • if namespace is not empty, it's in that namespace

  2. all requests to get pipelines can specify namespace='' to query only shared pipelines
  3. or specify namespace='XXX' to find pipelines in a namespace + shared pipelines
  4. We need to add a config option that disables shared pipelines for organizations that definitely do not want shared pipelines
  5. We need to add a KFP UI option to switch between uploading a pipeline to shared space or current namespace.

Benefits:

  1. The same backend logic seems to be able to support both 'all shared' or 'all separated' use cases.
  2. There's no DB migration needed, this will be a seamless transition from older versions of KFP.

Probably there are cases I haven't fully thought through. I'd suggest anyone who's willing to push this issue forward to do the following:

  1. try to discuss and converge on a set of CUJs (critical user journey) that works for everyone, note we should probably start with MVPs that address immediate concerns, but make sure we can head towards the direction that all valid use-cases can be supported.
  2. after consensus on the UX, discuss with maintainers a rough design of the implementation
  3. contribute PRs to implement this feature in smaller self-contained steps

/cc @yanniszark
about this

and
/cc @chensun
who designed multi user backend separation for other KFP resources

@Bobgy @chensun

We need to add a KFP UI option to switch between uploading a pipeline to shared space or current namespace.

Do we want user to upload their pipeline to share space? Should they use Managed Contributor instead like we did for experiments and runs?

There's no DB migration needed, this will be a seamless transition from older versions of KFP.

Pipeline schema doesn't have namespace concept yet. seamless migration here means implicitly migration, right? for all previous pipeline, we can think they have all empty namespace

@Bobgy @chensun

We need to add a KFP UI option to switch between uploading a pipeline to shared space or current namespace.

Do we want user to upload their pipeline to share space? Should they use Managed Contributor instead like we did for experiments and runs?

Some customers I worked with prefer this shared space though. A problem with manage contributor is that, they want to take a pipeline from shared space and run it in their own namespace. This will not be possible with adding contributors.

(It might be a good idea, to allow specifying pipelines in other users' namespace and KFP backend checks for that permission like contributors. Is that what you are proposing?)

There's no DB migration needed, this will be a seamless transition from older versions of KFP.

Pipeline schema doesn't have namespace concept yet. seamless migration here means implicitly migration, right? for all previous pipeline, we can think they have all empty namespace

If we do not introduce the shared pipeline concept, then we either need to figure out which pipeline belongs to which namespace or make all past pipelines inaccessible after an upgrade. This is fairly tricky to deal with.

A problem with manage contributor is that, they want to take a pipeline from shared space and run it in their own namespace. This will not be possible with adding contributors.

Got it, they like to "folk" the pipeline into their own namespace. Managed contributor can only manage pipeline between users but not able to public to all. I think this makes sense.

to allow specifying pipelines in other users' namespace
It might be hard to deal with pipeline discovery (current KFP doesn't have granular control). User may have to provide pipeline name in other user's namespace.

I think what you propose is a simple and better way to support isolated and shared pipelines.

then we either need to figure out which pipeline belongs to which namespace

I notice UI actually determine namespace from centralboard. https://github.com/kubeflow/pipelines/blob/935a9b5ba5057bc9801fee87ed17c03c2907ec85/frontend/src/lib/KubeflowClient.tsx#L23-L33.

Do you think it's better to figure out username and corresponding namespace via headers and KFAM?
backend can parse user's email and check kfam to figure out which namespace this profile owns and the namespace other users share with the current user. Then KFP resources can be filtered by these namespaces.

With this we can decouple centraldashboard from KFP. I think multi-user KFP can be used separated with Central dashboard. (probably only need KFAM profile and istio)

Regarding namespace selector, I think the topic was brought up before too, if we plan to let KFP support multi user mode without central dashboard, we can build a namespace selector similar to the one in centraldashboard. It won't be so much effort on the UI side (it was designed to be very decoupled in UI code from the beginning).

Regarding to KFAM, we have decided to deprecate it, @elikatsis has sent out a PR a few days ago: https://github.com/kubeflow/pipelines/pull/4723. Let's move related discussion to that PR, and keep discussion here focused on separating pipeline resources.

@Bobgy

  1. pipelines in DB should have a namespace column

Shall we have Namespace in pipeline_versions schema? For example I see Description is in pipelines but not in pipeline_versions. Not sure what is the criteria to decide here, but seems users are not allowed to edit or change pipelines once loaded, right?

  1. We need to add a config option that disables shared pipelines for organizations that definitely do not want shared pipelines

A config option shall allow organizations to enable/disable the possibility pipelines can be public. Where do you think this config can be set (DB, configMap, config.json)?

@Bobgy

  1. pipelines in DB should have a namespace column

Shall we have Namespace in pipeline_versions schema? For example I see Description is in pipelines but not in pipeline_versions. Not sure what is the criteria to decide here, but seems users are not allowed to edit or change pipelines once loaded, right?

Versions are sub resources of pipelines, so they should be in the same namespace of the pipeline.

Description was more of an unfinished migration to version API, ideally we should have description field on versions too.

  1. We need to add a config option that disables shared pipelines for organizations that definitely do not want shared pipelines

A config option shall allow organizations to enable/disable the possibility pipelines can be public. Where do you think this config can be set (DB, configMap, config.json)?

You should use viper to get this config like other configs in API server, so that it will be configurable via config map, config.json or env var directly.

@Jeffwan @Bobgy

We completed the implementation using KFam (for now). From a UI standpoint, we only added a checkbox to indicate if the Pipeline is "Shared" or not, Pipeline Versions of course cannot be shared. So if you select "create new pipeline..." this checkmark will not be there.

Most of the changes were in the Backend. You can see the implementation details on this commit: https://github.com/arllanos/pipelines/commit/2c88722f6f05b67acc16c2b4e7bc54ff91c3f36c

We have not implemented an Env Variable to disable "Pipeline Sharing", however this shouldn't be too bad I think it should suffice to disable the button all together and Pipelines will be namespaced by default.

Next week I'll switch the authentication to SubjectAccessReview and make the PR.

One thing to note is that, "Shared Pipelines" are still deletable by other users in the System. I'm not sure what the ideal scenario should be here.

Screen Shot 2020-11-19 at 11 41 12 AM

@Bobdy @maganaluis thanks for your efforts on this. Some comments:

or specify namespace='XXX' to find pipelines in a namespace + shared pipelines

@Bobgy I think it would be best to not return mixed results here. Specifying namespace='XXX' should only return pipelines from namespace 'XXX'.

We need to add a config option that disables shared pipelines for organizations that definitely do not want shared pipelines

@Bobgy let me propose something different. Instead of turning off features, why not protect them with authorization? For example, we can have two kinds (in RBAC only, not in the MySQL DB), Pipelines and ClusterPipelines (namespace=="", name inspired from Role and ClusterRole). If an organization doesn't want to use one or the other, they simply need to not give their users permissions for one or the other.

@bobdy @maganaluis thanks for your efforts on this. Some comments:

or specify namespace='XXX' to find pipelines in a namespace + shared pipelines

@Bobgy I think it would be best to not return mixed results here. Specifying namespace='XXX' should only return pipelines from namespace 'XXX'.

You are right. Agree with that. So there needs to be some UI change to allow finding pipelines from either shared or namespaced ones.

We need to add a config option that disables shared pipelines for organizations that definitely do not want shared pipelines

@Bobgy let me propose something different. Instead of turning off features, why not protect them with authorization? For example, we can have two kinds (in RBAC only, not in the MySQL DB), Pipelines and ClusterPipelines (namespace=="", name inspired from Role and ClusterRole). If an organization doesn't want to use one or the other, they simply need to not give their users permissions for one or the other.

I guess it depends more on whether the shared pipelines will be a long term UX we will maintain. If it is, then I agree with your suggestion, that sounds like a clean solution. Do we have enough evidence it will be?

I think there're still quite some questions we need a clear answer on.
@maganaluis will you be willing to draft a rough design doc for this issue

Hmm, maybe we can start with your PR description for discussion and see if everyone can agree on that.

Note that, for above discussion, my personal opinion is that, the MVP PR do not need to implement any of the ClusterPipeline rbac stuff nor the "pipeline sharing" switch. We can start from the minimal and enhance based on further requests.

One thing to note is that, "Shared Pipelines" are still deletable by other users in the System. I'm not sure what the ideal scenario should be here.

It's backward compatible behavior, so I don't think we need to worry about that right now.

We still need UI work to finish this

Remaining work item tracked in https://github.com/kubeflow/pipelines/issues/5084

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Bobgy picture Bobgy  路  3Comments

zijianjoy picture zijianjoy  路  3Comments

VindhyaSRajan picture VindhyaSRajan  路  3Comments

rcleere picture rcleere  路  3Comments

Svendegroote91 picture Svendegroote91  路  3Comments