What steps did you take and what happened:
What did you expect to happen:
No machine upgrades are initiated by minor version CAPI upgrades.
The same behaviour is observed in an HA control plane by @rudoi
/kind bug
@sedefsavas: The label(s) kind/ cannot be applied, because the repository doesn't have them
In response to this:
What steps did you take and what happened:
- With cluster api v0.3.2, create a cluster with single control plane machine.
- Upgrade cluster api to v0.3.5
- Rolling upgrade started for the control plane machine.
What did you expect to happen:
No machine upgrades are initiated by minor version CAPI upgrades.The same behaviour is observed in an HA control plane by @rudoi
/kind bug
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@sedefsavas are you working on this?
/milestone v0.3.x
/assign
0.3.3 triggers the change in https://github.com/kubernetes-sigs/cluster-api/commit/dcd3d51881552722234c3443d9d597f5e5c3db98#diff-f30ad9aecb671548155d73c5840fe19bR88
The mdutil.Hash func uses Spew to spit out a string serialisation of the object and then hashes it. The addition of UseExperimentalRetryJoin to the API caused the hash to change.
Number of options:
Re "no additions", we maybe could get around this by only introducing optional pointer fields, but that feels like a hacky workaround, especially if we don't really want a new field to be optional.
I'm not sure about the spec serialization - if you're using kubectl apply, you end up with the intent serialized in an annotation, as well as the server-side apply managedFields stuff. Would it make sense to add a 3rd potential serialization? How do you envision the semantic comparisons working?
Seems today we're using spew to output the object, which would include any omitempty and optional fields as well. Is there any reason we can't use json encoding instead and respect optional fields?
It'd be super useful to have a test that hashes a basic configuration, making sure it's stable.
Would it make sense to add a 3rd potential serialization? How do you envision the semantic comparisons working?
Not sure, was just putting it out there.
Was surprised to see mdutil using spew. @rudoi and I were also wondering switching over to JSON. We'd need to handle a hash migration of some sort?
Semantic comparisons are how k/k detects whether a ReplicaSet matches a Deployment (presumably to avoid just this issue). Hashes are a nice convenience, but by design they change in unpredictable ways whenever the input changes even slightly.
For the adoption work, I introduced some semantic comparison logic in order to lay the groundwork for the "semantic" approach (see: https://github.com/kubernetes-sigs/cluster-api/blob/master/bootstrap/kubeadm/api/equality/semantic.go#L28). I intended to follow that work up by changing the hash to be advisory instead of authoritative, maybe something to the effect of:
func SemanticEquals(a, b bootstrapv1.KubeadmConfigSpec, cluster *clusterv1.Cluster) {
aprime, bprime := SemanticMerge(b, a, cluster), SemanticMerge(a, b, cluster)
return reflect.DeepEquals(aprime, bprime)
?
There are some missing unit tests here, especially since the way the KubeadmConfig stuff works today it's likely that changes to the bootstrap controller would need to be mirrored in the SemanticMerge function (or we find another more explicit way to separate out user intent from the instanced configs).
/priority critical-urgent
/milestone v0.3.7
/unassign
I think other folks are already working on this.
@sedefsavas I don't think anyone else is assigned to this issue. We should explore what @sethp-nr proposed with the semantic equal in a POC pull request.
/assign
/lifecycle active
I applied the latest kubeadm-bootstrap-controller and kubeadm-control-plane-controller images to the management cluster during velero backup and restore, but still triggered KCP upgrade with logs as:
I0617 13:12:58.613527 1 controller.go:261] controllers/KubeadmControlPlane "msg"="Upgrading Control Plane" "cluster"="cool" "kubeadmControlPlane"="cool-control-plane" "namespace"="default"
The new created control plane node has a different hash code(kubeadm.controlplane.cluster.x-k8s.io/hash: "2478258161") from the original one(kubeadm.controlplane.cluster.x-k8s.io/hash: "2502614424").
The upgrade should avoid during velero backup/restore, otherwise, it will create a new control plane node. Currently, it blocks the velero backup/restore management cluster test.
@wjun Thanks for the report, @sedefsavas is actively working on this issue, we'll have a fix in v0.3.7 (due in a couple of weeks)