Pipelines: Multi-User support for Kubeflow Pipelines

Created on 25 Apr 2019  ·  67Comments  ·  Source: kubeflow/pipelines

[April/6/2020]
Latest design is in https://docs.google.com/document/d/1R9bj1uI0As6umCTZ2mv_6_tjgFshIKxkSt00QLYjNV4/edit?ts=5e4d8fbb#heading=h.5s8rbufek1ax

Areas we are working on:

Release

Areas related to integration with Kubeflow

=============== original description

Some users express the interest of an isolation between the cluster admin and cluster user - Cluster admin deploy Kubeflow Pipelines as part of Kubeflow in the cluster;
Cluster user can use Kubeflow Pipelines functionalities, without being able to access the control plane.

Here are the steps to support this functionality.

  1. Provision control plane in one namespace, and launch argo workflow instances in another

    • provision control plane in kubeflow namespace, and argo job in namespace FOO (parameterization)

    • API server should update the incoming workflow definition to namespace FOO. Sample code that API server modify the workflow

  2. Currently all workflows are run under a clusterrole pipeline-runner (definition). And it's specified during compilation (link). Instead, it should run the workflows under a role instead of a clusterrole.

    • change pipeline-runner to role, and specify the namespace during deployment (expose as deployment parameter)

    • API server should update the incoming workflow definition to use pipeline-runner role.

  3. Cluster user can access UI through IAP/SimpleAuth endpoint, instead of port-forwarding.
arebackend arefrontend arewide-impact help wanted kinfeature prioritp1 statutriaged

Most helpful comment

If your organization would prefer pipeline resource separated by namespace, please upvote here. We can consider adding the support if there are enough user interest.

EDIT: enough reactions collected, the issue is tracked in https://github.com/kubeflow/pipelines/issues/4197 with priority

All 67 comments

Ideally this should be implemented in a way that get Kubeflow Pipeline closer to support multi-user. E.g. launch workflow in arbitrary namespace

What's the priority of this?

How does this align with the broader plans in Kubeflow to support multiple users?

This is not yet being prioritized, although I think this deserve a high priority.

In addition to admin/user isolation, here is a list of items to achieve the full multi-user support for KFP

  1. Every user (or group of users) will have a dedicate namespace and service account, role, and role binding in that namespace. These resources should be create by the Kubeflow Profile CRD.
  2. With IAP integration, the incoming request contains the user email. Pipeline API server should authorize the email with Kubernetes API by doing user impersonation check

    • In case of creating a job/run, the job/run should be created in the user's namespace, run by the service account in that namespace. Argo crd or scheduled workflow crd should be able to control resources across all namespaces.

    • In case of creating all resources, API server need to add additional column in the resource table to log the user's identity or namespace or both, so it can filter the resource in Get/List call.

    • In case of Get/List resource, API server need to filter the resource based on user's privileges.

@jessiezcc Any update on this work? Do you think this is something that will get done in Q3 and thus be part of 0.7?

This work is not currently scheduled for Q3.

Some customers express the interests of having ACL for API. e.g. lock down the API for deleting the resource to admin.

/cc @krishnadurai

/cc @songole

Hi @IronPan.
We (Arrikto) have been exploring this problem for the past month and we generally agree with your overview of the steps required to have multi-user functionality in pipelines.

I'm assigning this to me, we have made good progress and we should have initial support for multi-user pipelines in v0.7.

/assign @yanniszark

@yanniszark just curious, is there a design or plan for what this functionality might look like in 0.7 that you could share? We have been eagerly awaiting multi-user support in KFP and would love to review and give any feedback (assuming you'd want some).

Would it work if one would simply change the Client object so that the namespace can optionally be provided at instantiation?

https://github.com/kubeflow/pipelines/blob/2f7d55b98ca04a4b74983d7732f5ad1ee6e74f72/sdk/python/kfp/_client.py#L74

I'm not sure what happens with the generated API's but it kind of looks like (assuming the cluster config is ok for the namespace) it would work. Sadly, I don't understand enough of this.

Given the option to choose the namespace where the pipeline should run, would be a good start. There would be at least some separation and it would be easier to manage resources and cost for multiple teams. Ideally, the Client would be instantiated with the namespace that is chosen in the UI.

@yanniszark Is there any work in progress on this that could be shared?

We will have some design doc ready in the following weeks and reviews and feedback are very much welcome then.
Thanks

Hi @danielnorberg!
Thanks for your interest in multi-user pipelines.
We have actually made a lot of progress and presented a demo at the Kubeflow Community Meeting.
Our design has been reviewed and validated by many end-users and we are working with the Pipelines team to iron out all the details. A design doc will soon follow.

Slides: https://docs.google.com/presentation/d/1fj0YM4LdToYY8cWSFUViTn_1t63Twm70QwG0cV0CX1Q
A video recording will soon be available.

@yanniszark waiting for this feature. currently not able to run pipelines from jupyter notebook as pipeline exist in kubeflow namespace and not in notebook's. need to copy everything in notebook's pvc to pipeline's pvc for the pipeline to mount and use. also the reference suggesting ways of creating components, only lightweight components work for on premise users as other require staging_gcs_path parameter.

Hi all,

We are in the process of iterating upon a design doc for multi-user pipelines, a much requested feature.

After our (Arrikto) initial demo of a PoC for multi-user pipelines to the community meeting, back in October, we were asked by the pipelines team and the rest of the Kubeflow community, to describe our design and implementation. You can find it here:

Multi-User design doc with demo/slides:
https://docs.google.com/document/d/18X6vKCddRARwGR8MfGHE1RkkIIDzLdZvSlx38ZAjW4U/edit?usp=sharing

We will also present this design at the next Pipelines community meeting as well, which should be on Wednesday, 11th of December 2019, at 10AM PT.
Meeting Notes are here:
https://docs.google.com/document/d/1cHAdK1FoGEbuQ-Rl6adBDL5W2YpDiUbnMLIwmoXBoAU

Community contribution in reviewing the two current docs will help us a lot in the merging process to end-up with the final design document.

Looking forward to your comments

@IronPan @Bobgy @gaoning777 can you link to your design doc?

Hi, all
We have reviewed the multi-user design during the kubeflow pipeline community meeting on 11/27/2019.

Here is the Multi-user design doc from the Kubeflow Pipeline team:
https://docs.google.com/document/d/12ikhUKAb3KhbO9AR6JUk_UX_D9pf7nFWfHAyLDv2BB0/edit?usp=sharing

/assign @chensun

@gaoning777: GitHub didn't allow me to assign the following users: chensun.

Note that only kubeflow members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @chensun

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

/unassign @gaoning777

/assign @chensun

Doubt: Can you please clarify the usage of Kfam?
As per the arrikto reference, subjectaccessreview is planned for usage. I am not clear about the interaction between Kfam and this. Thanks

@nrchakradhar kfam is a the api service for Kubeflow multi user support. KFP team's multi user design uses it as source of truth for user authorization rules.

However, we have two different designs now.
The arrikto reference is a different design of multi user support. Hopefully, we can merge both designs, but that hasn't happened.

Hey @yanniszark , thanks for your efforts, I'm very looking forward to this feature.

Currently we are using Kale to deploy pipelines, where attached workspace pv is being used for data passing between components. However we failed to mount the data using existing volumes as

This step is in Pending state with this message: Unschedulable: persistentvolumeclaim "workspace-notebook" not found

I assume this is because the pipeline runs in Kubeflow namespace while my pv workspace-notebook is a user namespace property thus it cannot be found. Enabling multi-user for pipeline will definitely help us a lot in this case. ☺️

I assume this is because the pipeline runs in Kubeflow namespace while my pv workspace-notebook is a user namespace property thus it cannot be found.

@Felihong you are correct, the Pipeline cannot find the PVC because it's not in the same namespace.
Indeed, multi-user Pipelines would solve your issue.

kfam is a the api service for Kubeflow multi user support. KFP team's multi user design uses it as source of truth for user authorization rules.

@Bobgy KFAM is not expected to be an abstraction on top of Kubeflow to use for authorization.
We are moving away from that practice in Kubeflow, as can be seen from the transition of the Jupyter Web App to use SubjectAccessReview.
cc @jlewi

At Arrikto, we have been exploring and designing this feature since October and we are very excited to see the big user interest around this issue.

I have added an agenda item for the Pipelines community meeting on February 5th, to discuss and do a status update on the current design.
We have also updated the design doc with a user journey section, as requested by @jessiezcc.
https://docs.google.com/document/d/18X6vKCddRARwGR8MfGHE1RkkIIDzLdZvSlx38ZAjW4U/edit#heading=h.3ckxvbum5d4f

@yanniszark Continuing on what @Felihong said about notebooks and PVCs, since KFserving lets me segregate models into namespaces, i have the MODEL stored in a PVC which the inference server can load in the same namespace. If i have a PIPELINE then in the kubeflow namespace, i am prevented from loading that pretrained model for retraining (e.g. retrain(old_model, new_data) => new_model) Correct? And multi-user pipelines would presumably solve that as well where a "user" corresponds to the KF serving namespaces. (The alternative being to use object storage instead of pvc).

Any update on this work in kubeflow 1.0

@Mddct I'm working on preparing changes in KF 1.0 to be ready for KFP multi user support.
Work will be tracked in https://github.com/kubeflow/pipelines/issues/3241

The changes to support multi-user are pretty substantial (e.g. turning on ISTIO) in the kubeflow namespace. So these should probably be targeted to a minor release (e.g. 1.1) and not be slated for a patch release.

I think we are targeting 1.1 for Q2 so June.

@jlewi while I understand the formal release can be part of 1.1 rollout, based on discussion with @gaoning777 it looked like we will have an early version of the code soon is master, which was project to be March.

@gaoning777 are we still on track vis a vis that?

Hi @animeshsingh, @gaoning777 has decided to take a new adventure, @chensun and I are actively working on multi user support now.

We are mostly on track, main functionality for backend and UI will likely on target for end of March, but tensorboard, visualization... might take longer.

The instruction for Phase 1 of the multi-user work can be found in
https://drive.google.com/file/d/1aqiBrYzTJQ9dUrjOjB2OWfTBD6MKrbt6/view

This is still early stage and the API might subject to non compatible change. But please feel free to give it a try and and feedback is appreciated.

Removed from KFP 1.0 project, because this will be released separately in Kubeflow.

@Bobgy What are the remaining kustomize changes needed to make multiuser KFP available on master?

@Bobgy @jlewi We are looking to have this feature before taking Kubeflow into production. I think it was labeled as high risk for 1.1, and there were not many users asking for it, let me know if we need present our case so it can make it in the release.

The implementation for us is quite large, and we'll have have about 100-200 users just for the initial implementation, without this feature it will not be possible having multiple teams sharing a production cluster. We are using Istio + Dex to remain cloud agnostic.

@maganaluis Thanks for bringing it up here!

I've finished other work items in the KF 1.1 integration list in https://github.com/kubeflow/pipelines/projects/5.
Currently WIP on multi user mode for gcp + iap manifest. Getting it ready for GCP isn't very risky now.

However, istio + dex manifest is maintained by Arrikto, @yanniszark @jbottum do you have any plan you can share of supporting istio + dex with kfp multi user mode in addition to MiniKF?

I'd recommend presenting your case in Kubeflow Pipelines community meeting to let different groups get this notice.

We are deployed Kbueflow on istio. Currently though we can set multiple user namesapce, but the pipeline are shared. And data in user namespace and kubeflow is isolated. I manually mount the the 2pvs to the same path. Ideally, the user namespace should also work for pipeline separation, then the data sharing should be supported by default.

We are deployed Kbueflow on istio. Currently though we can set multiple user namesapce, but the pipeline are shared. And data in user namespace and kubeflow is isolated. I manually mount the the 2pvs to the same path. Ideally, the user namespace should also work for pipeline separation, then the data sharing should be supported by default.

Can you create separate issues for them?
Separating pipelines seem a common request, we can consider adding it.

Which data do you mean? Data in minio? They are shared though.

Another voice in favor of this, we were hoping that with multi-tenancy enabled it would be possible for a user's pipeline runs to occur in their namespace: for us this is vital for auditing and billing purposes, and is more important than being able to segregate the pipelines themselves. Is this coming down the line? Is it planned for a particular release? Is there any beta etc. available?

@jackwhelpton you can take a look at

Multi user mode early access is released with doc:
Instructions doc - KFP multi-user instructions for GCP: https://docs.google.com/document/d/1Ws4X1oNlaczhESNuEanZxbF-cnSfO78B1rBHWOkIAzo/
this is shared with kubeflow-discuss@ google group.

This is the user instructions we shared for early access to multi user mode. Being able to let users' pipeline runs occur in their own namespace is already supported.

Current plan is to release with KF 1.1, most code changes already merged in kubeflow repo. So some of you can try it soon if interested.

@Bobgy, Thank you for reply. I just open a new issue.
The data sharing is not very related to Kubeflow itself. I am using Kale extension to automate the Kubeflow pipeline compile and run. The data of the notebook server can't be passed to Kubeflow pipeline directly because the notebook server is user profile namespace while the pipeline run is in kubeflow namespace. I solved this problem by connecting two pvcs in two namespaces manually. I am thinking if the pipeline is supported separately within different user namespace, then the E2E multi-user isolation is completely and the data is shared naturally because the notebook and pipeline are both with user namespace.

Cross posting for clarification https://github.com/kubeflow/pipelines/issues/4197#issuecomment-656458724:

EDIT: described features below will be released with Kubeflow 1.1. You can use these instructions for preview on GCP. It's NOT RELEASED YET.
Installation for Kubeflow 1.1 rc on GCP: https://github.com/kubeflow/gcp-blueprints/tree/v1.1-branch
KFP Multi User instructions: https://docs.google.com/document/d/1Ws4X1oNlaczhESNuEanZxbF-cnSfO78B1rBHWOkIAzo/edit?usp=sharing

pipeline runs are already designed to run in user namespaces.
The only resource in KFP core system that is not namespace separated (as of today) is static pipeline yaml files you upload to the server. They will remain public to anyone in the cluster. Users can try to launch any pipelines in their own namespaces.

For details about which resources and which services support namespace separation, please read this early access user instruction: https://docs.google.com/document/d/1Ws4X1oNlaczhESNuEanZxbF-cnSfO78B1rBHWOkIAzo/.

A quick list of things we don't support multi user separation in the upcoming KF 1.1 release:

  • pipeline resources (the static yaml/tar files you upload)
  • minio artifact storage
  • MLMD

If your organization would prefer pipeline resource separated by namespace, please upvote here. We can consider adding the support if there are enough user interest.

EDIT: enough reactions collected, the issue is tracked in https://github.com/kubeflow/pipelines/issues/4197 with priority

@Bobgy it should be a feature which is enabled - if users want to "promote" their pipeline resource to be public, its allowed. Else int their namespace by default.

@Bobgy it should be a feature which is enabled - if users want to "promote" their pipeline resource to be public, its allowed. Else int their namespace by default.

Yes, I agree if we decide to implement, we'll make it configurable.

Will upvote. Thanks!

On Thu, Jul 9, 2020 at 11:43 PM Yuan (Bob) Gong notifications@github.com
wrote:

@Bobgy https://github.com/Bobgy it should be a feature which is enabled

  • if users want to "promote" their pipeline resource to be public, its
    allowed. Else int their namespace by default.

Yes, I agree if we decide to implement, we'll make it configurable.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/kubeflow/pipelines/issues/1223#issuecomment-656512584,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AKSGHEYOKQAMX4ODRSHQ4RDR222CXANCNFSM4HIISE7Q
.

Just working my way through the documentation, thanks for pointing me in that direction. It seems geared around using kfp.Client to execute pipelines; what's the corresponding vision when executing through the UI? I was hoping that pipelines would execute in a namespace based on what's selected in the top drop-down, is that the idea?

@jackwhelpton Yes, the feature you described is already there. They are not mentioned in the doc just because they work seamlessly.

@Bobgy re minio artifact store not being supported in KF 1.1 release, does that mean that a pipeline running in my namespace still writes to a shared artifact store? For example, anything my pipeline writes implicitly (eg: data written when piping results between steps in a pipeline like consumer_op(producer_task.output)) is accessible to anyone who can look inside that artifact store?

@ca-scribner That's right.
Current suggested workaround is to only pass urls through minio, let components read/write GCS/S3 directly and manage permission there if you care about data separation.
(If you use TFX, that's already the case.)

Or I think minio supports multi tenant natively: https://docs.min.io/docs/multi-tenant-minio-deployment-guide.html, we'd welcome contribution how that can be integrated with KFP multi user mode.

@Bobgy ok we lose kfp's helpful automatic piping of real data, but the data is still secure. Only meaningful downside I think is that everyone has to teach their components how to talk to their blob storage rather than offloading it to reusable blob-put/blob-get components. That's a fair compromise.

You're right about minio multi-tenancy (I work in one atm). I'll ask around for ideas.

@ca-scribner I think the Minio "Multi tenant" is slightly different than what we're doing; I think we're using OPA or Istio magic or something to provide every namespace with a private bucket on a single tenant (We do have minimal v.s. premium tenants, but that's different). I think the term "tenant" is a bit overloaded here

@jackwhelpton Yes, the feature you described is already there. They are not mentioned in the doc just because they work seamlessly.

Hi @Bobgy, we're hoping to get more clarification on multi-tenancy and the expected behavior. When you say "seamlessly", does that mean kubeflow will natively assign new experiments to the user's namespace as long as the headers are passed correctly, or do we need to add more components to our pipeline configuration to get the experiments to run under the user's namespace?

The reason I'm asking this is we're currently seeing the following msg in our [ ml-pipeline-scheduledworkflow ] logs:
time="2020-07-21T06:34:19Z" level=info msg="Processing object (inception-v3-transfer-hq5zv): object has no owner." Workflow=inception-v3-transfer-hq5zv

@RoyerRamirez Yes, experiments will be assigned to user's namespace (the namespace you selected in Kubeflow dashboard). Actions will be authorized by user's header.

The reason I'm asking this is we're currently seeing the following msg in our [ ml-pipeline-scheduledworkflow ] logs:
time="2020-07-21T06:34:19Z" level=info msg="Processing object (inception-v3-transfer-hq5zv): object has no owner." Workflow=inception-v3-transfer-hq5zv

Can you open a separate issue describing how you deployed and what problems you met?

@Bobgy

A quick list of things we don't support multi user separation in the upcoming KF 1.1 release:

  • pipeline resources (the static yaml/tar files you upload)
  • minio artifact storage
  • MLMD

Any plans for MLMD?
Are you talking about aggregation? like we only read artifacts/executions belongs to visible KFP resources from user's namespace?
Or native isolation on the MLMD side? I think MLMD schema currently doesn't provide any concept for users?

@Jeffwan Yes, you understandings are correct.
So far I'm not aware of any plan for MLMD multi-user separation.

/cc @neuromage @dushyanthsc
Is there anything you can share about this?

@Jeffwan @Bobgy Based on the initial documents that the Karl shared as part of the Model Management group, MLMD was going to support a "Project" context, or at least the ability to create such a context. This project context could be tied to the User's Profile and provide the necessary isolation for metadata.

https://docs.google.com/presentation/d/1HiLIOm-ij0vdS_kEIQSAeICNsGSOl946qhT69WTgK5k/edit#slide=id.g8dfffc9b8a_0_37

@maganaluis em. Seems it remove context and bring in project product workflow. Have this proposal reviewed by mlmd team? I feel like this is a big schema change and some projects like TFX need to buy in the proposal which may take some time. At the same time, as a short term solution, we can group artifacts/executions by user's pipeline runs as @Bobgy originally proposed. Currently, I think only KFP use metadata service, so it's kind of safe to do this way.

@maganaluis I think @karlschriek 's doc is just a proposal; so it might change. I think in my discussions with @neuromage we were talking about using labels to group metadata. So "project", "experiment", etc... might just be user defined labels. As such they probably wouldn't be closely tied to multi-user support.

@Jeffwan Yes, you understandings are correct.
So far I'm not aware of any plan for MLMD multi-user separation.

/cc @neuromage @dushyanthsc
Is there anything you can share about this?

Hi, we have no current plans to add multi-user support directly in MLMD at this point in time. As you point out, there is no support for users in the MLMD schemas right now unfortunately. It would be worth exploring the use-cases for multi-user MLMD to figure out the right approach as well.

KFP multi-user shipped in KF 1.1.
I suggest closing this issue and opening up more actionable, scoped issues for further improvements.

/close

@jlewi: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings