Cluster-api: MachinePool experiment API group should be under `cluster.x-k8s.io`

Created on 30 Jul 2020  路  13Comments  路  Source: kubernetes-sigs/cluster-api

MachinePool is today an experiment, and the API group we originally decided to pick was exp.cluster.x-k8s.io. Given that the intent is in the future to move MachinePool to the core API group, we should rewrite the experiment to usecluster.x-k8s.io or whatever the final group should be.

/kind cleanup
/milestone v0.4.0

areapi kinapi-change kincleanup

Most helpful comment

We could leverage clusterctl to provide a custom migration step to re-write the resources safely rather than a controller, especially since it should be short lived.

Longer term, we could also potentially leverage the management cluster operator (?? can't remember what naming was decided upon) to perform these types of actions.

I do think it would be good to make sure that we are publishing the old CRD info and have webhook validation to reject requests with a friendly message.

All 13 comments

/assign @mytunguyen
/lifecycle active

@rudoi: GitHub didn't allow me to assign the following users: mytunguyen.

Note that only kubernetes-sigs members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @mytunguyen
/lifecycle active

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

yeah yeah CI bot, we know :(

@mytunguyen You should open an issue in k8s-sigs to the into the org, I'm happy to sponsor as well.

One thing to note, this would be for v1alpha4 and we haven't opened the main branch yet, if you want to get a head start please do :), just a warning that it might have to be rebased later

/area api

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

/kind api-change

/assign

@vincepri any thoughts on how to do this without completely breaking existing machine pool users? If we change the group now, existing machine pools won't be able to upgrade to new versions. Is that acceptable given that the feature was in exp until now?

Yeah better to do this now, rather than later. We need to document the transition. There isn't any good way to do this without breaking user in one way or another; we could have a controller that looks for the old resources and recreates them, although that might cause the infrastructure to be deleted and recreated, unless maybe we have some way to make it seamless, which we might be able to do with MachinePool, given that machines are "virtual".

One way we could approach this problem:

  • Have two reconcilers, one for the old group and one for the new one.
  • Have a webhook that disallows create (and maybe update as well?) operations on the old API group. We should return a nice message that the group has changed.
  • When the old group reconciler sees a resource, it creates a new one in the new group (if it doesn't exist), with the paused annotation.
  • The old resource should be forcibly deleted by removing any finalizer.
  • Remove the paused annotation from the new resource.

We could leverage clusterctl to provide a custom migration step to re-write the resources safely rather than a controller, especially since it should be short lived.

Longer term, we could also potentially leverage the management cluster operator (?? can't remember what naming was decided upon) to perform these types of actions.

I do think it would be good to make sure that we are publishing the old CRD info and have webhook validation to reject requests with a friendly message.

IMO, a reconciler to migrate the CRDs is a bit overkill for an exp/ feature in an alpha API. We would be introducing more complexity and more code to maintain. I'm leaning towards ripping off the band-aid now and avoiding making the same mistake in the future with other exp APIs.

We could leverage clusterctl to provide a custom migration step to re-write the resources safely rather than a controller, especially since it should be short lived.

That seems like a reasonable compromise to me, especially if any users speak up and tell us they would suffer from a breaking change on MachinePools.

I do think it would be good to make sure that we are publishing the old CRD info and have webhook validation to reject requests with a friendly message.

Definitely.

/cc @devigned

Was this page helpful?
0 / 5 - 0 ratings