Part of https://github.com/kubeflow/pipelines/issues/1223
Support configuring separate artifact repository for each namespace, so that artifact repository can have namespace separation too.
This is currently not supported in KFP multi user support as explained in https://github.com/kubeflow/pipelines/issues/1223#issuecomment-656507073.
Use your thumbs up to show how important this issue is for you.
We'll most likely need upstream enhancement from argo in https://github.com/argoproj/argo/issues/3307.
Last year I've added the ArtifactRepositoryRef feature to Argo to support this goal.
The idea is to make it possible to specify the artifact repository information by referencing a ConfigMap.
See https://github.com/argoproj/argo/pull/1350 and https://github.com/argoproj/argo/pull/1821
With both features in place it might even be unnecessary to make any SDK-side changes.
Each namespace can have a different artifact repository ConfigMap with the same name. Then Argo will choose one based on the namespace.
The backend can add the artifactRepositoryRef field when submitting the run.
I'm not sure argoproj/argo#3307 is relevant since it's about explicit input artifacts, a feature not used by KFP.
If anyone would like to help this issue, confirming artifactRepositoryRef usage and whether it supports all the features we want can be a step forward
In case of S3, would it make sense to perform the auth for minio using dex? Because otherwise someone has to generate access/secret keys for each namespace -> https://github.com/minio/minio/blob/master/docs/sts/dex.md
@Bobgy we can pick this up as this will be a requirement for us to go multi-tenant. I think this makes sense as the administrator of the platform would be provisioning the namespaces and with that can provision the scoped configMap.
I suspect we want to edit the workflow spec more upstream to here: https://github.com/kubeflow/pipelines/blob/89e42105bd941e1d918d0bff9ba0c4f80b106ed4/backend/src/crd/controller/scheduledworkflow/client/workflow_client.go#L138
Any suggestions on the best place to support the change to the workflow spec?
edit: Suppose writing methods to add this new object to the workflow spec could be added here: https://github.com/kubeflow/pipelines/blob/89e42105bd941e1d918d0bff9ba0c4f80b106ed4/backend/src/crd/controller/scheduledworkflow/util/scheduled_workflow.go#L165
Hi @marcjimz, thank you for offering help.
You can look at
https://github.com/kubeflow/pipelines/blob/135bfbb9f2945a7e841d5891ee0f5862c75aa3b7/backend/src/apiserver/resource/resource_manager.go#L317
and
https://github.com/kubeflow/pipelines/blob/135bfbb9f2945a7e841d5891ee0f5862c75aa3b7/backend/src/apiserver/resource/resource_manager.go#L645
The two methods are responsible for preparing argo workflow yaml before submitting to K8s api, we might be able to inject the artifact repository ref there.
Read documentation for argo in https://argoproj.github.io/argo/configure-artifact-repository/,
Hi @Bobgy,
I have been testing the artifactRepositoryRef feature and also tested locally initial pipeline code change as to inject the artifactRepositoryRef as suggested above.
Before creating a PR I would like to bring up the following points to get your feedback:
artifact-repositories it will be the default config map for artifact repositories in the namespace. apiVersion: v1
kind: ConfigMap
metadata:
# if you want to use this config map by default - name it "artifact-repositories"
name: artifact-repositories
annotations:
# if you want to use a specific key, put that's key into this annotation
workflows.argoproj.io/default-artifact-repository: mlpipeline-repository
data:
mlpipeline-repository: |
s3:
bucket: my-bucket
keyPrefix: artifacts
endpoint: minio-service.kubeflow:9000
insecure: true
accessKeySecret:
name: mlpipeline-minio-artifact
key: accesskey
secretKeySecret:
name: mlpipeline-minio-artifact
key: secretkey
We can either rely on the default config map name (artifact-repositories) or we can have separate config map like mlpipeline-minio-artifact. I would be using the default artifact-repositories config map but you may have reasons to use a separate config map instead. What is you feedback here?
Awesome progress,
@Bobgy, feel free to take a look into #5017.
Note this new functionality requires the following:
We could have them created as part of profile creation and tackle this in a separate PR.
I'm really glad that this feature is pushed! I wonder if the standalone version will be able to use this feature. Any thoughts about that?
I'm really glad that this feature is pushed! I wonder if the standalone version will be able to use this feature. Any thoughts about that?
Good point @amitripshtos regarding standalone version. I'll to test this particular scenario. Depending on the result, extra changes might be needed.
@Bobgy / @Ark-kun,
as mentioned above, when submitting workflows with the ArtifactRepositoryRef field, two things have to exist for each namespace: (1) the referenced config map artifact-repositories and (2) the bucket that corresponds to the namespace.
Looking forward to getting your advise about these two points.
Content of the config map should be user defined, so it's not suitable for the profile controller.
My gut feeling is that, we should make this ref configurable with API request. It's similar to a service account.
Maybe we can define a user configurable namespace default in some way.
Anyway, I'd suggest thinking about the MVP of this feature, so we do not expand responsibility of a single pr too much. There can always be follow ups to add the additional features
Most helpful comment
Last year I've added the
ArtifactRepositoryReffeature to Argo to support this goal.The idea is to make it possible to specify the artifact repository information by referencing a ConfigMap.
See https://github.com/argoproj/argo/pull/1350 and https://github.com/argoproj/argo/pull/1821
With both features in place it might even be unnecessary to make any SDK-side changes.
Each namespace can have a different artifact repository ConfigMap with the same name. Then Argo will choose one based on the namespace.
The backend can add the artifactRepositoryRef field when submitting the run.
I'm not sure argoproj/argo#3307 is relevant since it's about explicit input artifacts, a feature not used by KFP.