@jlewi what do you think about moving the kubeflow jobs over to https://github.com/GoogleCloudPlatform/oss-test-infra/tree/master/prow/prowjobs and prow.gflocks.com?
@fejta That seems fine with me.
What else would need to change? i.e
Do testgrids move?
Do we use a different GCS bucket for prow artifacts?
@scottilee is this something you could help with?
@fejta How urgent is this on your end?
Not urgent.
Testgrid can stay the same.
Are you still not using pod utils? If so then yes, you'll upload to a different bucket (maybe prow specifies where to upload it?)
@chases2 we should probably set up GCP/oss-test-infra to work like istio -- where we can annotate prowjobs there and have them show up in this testgrid.
@fejta correct we manually upload our files to GCS right now; but we could probably switch to use pod_utils.
@fejta Can you share some more info (e.g., a link to a ticket or document with explanation if available) on why the move from prow.k8s.io to prow.gflocks.com?
Also, would it just be creating a "kubeflow" folder in https://github.com/GoogleCloudPlatform/oss-test-infra/tree/master/prow/prowjobs and moving the files in https://github.com/kubernetes/test-infra/tree/master/config/jobs/kubeflow to there?
Lastly, I'm not familiar with pod-utils. I'm assuming it's this https://github.com/kubernetes/test-infra/tree/master/prow/pod-utils? Any more info anywhere so I can read up on it?
why the move
prow.k8s.io is for kubernetes (or at least CNCF project)
prow.gflocks.com is for public google projects.
Eventually we want to migrate all non-CNCF projects out of prow.k8s.io
And yes, it is ideally
a) creating a GKE cluster to run jobs (provides you with isolation from other jobs)
b) configuring prow to schedule kubeflow jobs into that cluster
b1) also moving any secrets, configmaps, etc that jobs use
c) moving jobs to that prow instance
pod-utils
Let's not worry about this for now, see https://github.com/kubernetes/test-infra/blob/master/prow/jobs.md#pod-utilities for more detail.
Test containers should no longer need to check out repos and/or upload results to GCS. Sidecar containers will do this for you.
@fejta I apologize for the delay on this. I started a PR at https://github.com/GoogleCloudPlatform/oss-test-infra/pull/93, which is probably wrong 馃檮 but it's a start! Let me know what's missing...
oss-test-infra/prow/prowjobs)?@scottilee Kubeflow already has a Kubernetes cluster in project kubeflow-ci which we use for testing purposes. So I believe with the new approach the goal would be to have prow schedule the jobs for Kubeflow on that instance. I'm not sure what we need to do to make that happen. I suspect we need to install some CRs and other infra on our test cluster.
Given that we are getting close to 0.7 we might want to be careful not make an infra changes that could inhibit us releasing on time.
@jlewi can I either get access to the kubeflow-ci project or can you create the test-pods namespace and generate the cluster values according to the directions here https://github.com/GoogleCloudPlatform/oss-test-infra/pull/93#issuecomment-535367394.
If you need access to the CI cluster please join this group.
https://github.com/kubeflow/internal-acls/blob/master/ci-team.members.txt
Lets proceed cautiously in terms of moving our prow infrastructure because we are getting ready to do a release and don't want to disrupt our test infra.
@scottilee opened up kubeflow/testing#475 to track changes to our test infra. I will run mkbuild-cluster as soon as I can.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
Both https://github.com/kubernetes/test-infra/pull/16898 and are https://github.com/kubernetes/test-infra/pull/16906 merged. Can we close this issue?
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/lifecycle frozen
I believe the first part was merged:
kubernetes/test-infra#16898
Here's the doc @clarketm put together: https://docs.google.com/document/d/17sA-rRBe30bM034nL353CgrXETfy2vMe2m_g_bB_ILY/edit#
I believe per the doc Kubeflow is now using its own build cluster; i.e. the prow jobs run inside a kubeflow owned cluster.
So I believe the next part of the migration is to move from the CNCF/kubernetes prow control plane to the GCP/kubernetes control plane
/assign @Bobgy
I'll try to push this forward moving to GCP/oss-test-infra, so that Kubeflow maintainers can be added as approvers.
I have coordinated with gcp oss prow team and will start the migration this week.
/cc @chaodaiG @jlewi @chensun
I'll put progress log here.
Add @google-oss-robot as kubeflow org admin: https://github.com/kubeflow/internal-acls/pull/418
UPDATE: there's a permission issue on gcp oss prow side.
We are currently blocked by resolving that first.