Pipelines: Azure ADLSGen2 Support as a Kubeflow Pipelines Artifact Store

Created on 19 Nov 2020  路  8Comments  路  Source: kubeflow/pipelines

Currently Kubeflow Pipelines only supports MinIO, GS, and S3 to store and retrieve Artifacts.

The ask here is quite simple, Kubeflow Pipelines should also support reading/writing from Microsoft's ADLSGen2, however given that current features around multi-tenancy the artifacts themselves should also be isolated in different "Containers" and authenticated using Azure Active Directory.

A similar effort was already done by AWS. https://github.com/kubeflow/pipelines/issues/3405

kinfeature

Most helpful comment

@berndverst Yeah I did a small POC with MinIO STS and an Azure Gateway, which effectively solved for multi-tenancy and integration with Azure. The problem with this approach is that you need to setup your own etcd database and we don't want to get into managing it. Perhaps CosmosDB could be used here, but that's still on preview.

https://docs.min.io/docs/minio-sts-quickstart-guide.html

All 8 comments

@maganaluis Using object storage directly instead Minio give some benefits like policy control, more reliability, etc. The tricky thing is we assume the pluggable object storage is Minio/s3 protocol compatible. Otherwise, a lot of places needs to be changed and that will be hard to maintain.

@Jeffwan Thank you, it seems platform owners should be responsible for adding support here. We'll explore other options in the mean time, but I'll leave the issue open to see if there is other interest from the community.

@maganaluis have you tried replacing the Minio that Kubeflow ships with? You can use deploy this:
https://docs.min.io/docs/minio-gateway-for-azure.html

ADLSGen2 support a Blob Storage interface (with some limitations I usually don't encounter -- see https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-supported-blob-storage-features)

As discussed at https://www.kubeflow.org/docs/pipelines/multi-user/#resources-without-isolation the upstream Kubeflow project does not provide for multi-user isolation for Minio artifact storage.

/assign berndverst

@berndverst Yeah I did a small POC with MinIO STS and an Azure Gateway, which effectively solved for multi-tenancy and integration with Azure. The problem with this approach is that you need to setup your own etcd database and we don't want to get into managing it. Perhaps CosmosDB could be used here, but that's still on preview.

https://docs.min.io/docs/minio-sts-quickstart-guide.html

@maganaluis would love to see whatever POC you created - in whatever state it is. Could help to inspire the right people to improve the integration between services :)

Hi @maganaluis , maybe we could try using the manifest with OIDC auth and configure the blob similar to this link so that we will not be maintaining etcd database while having OIDC support.

@yilun-msft just be aware that in upstream OICD is limited to Istio 1.3.X (or maybe 1.4.X) at the moment. There is upstream community work ongoing to support newer Istio versions with OICD.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

maggiemhanna picture maggiemhanna  路  5Comments

discordianfish picture discordianfish  路  4Comments

Bobgy picture Bobgy  路  3Comments

goswamig picture goswamig  路  5Comments

zijianjoy picture zijianjoy  路  3Comments