Cluster-api: A different name for the MachineHealthCheck MaxUnhealthy field may be more accurate and align better with other Kubernetes APIs

Created on 11 Mar 2020 · 8Comments · Source: kubernetes-sigs/cluster-api

If I understand corectly, the MaxUnhealthy value represents the percentage of Nodes that can be NotReady at any point in time.

For example, if MaxUnhealthy is set to 10%, and 1 of 3 Nodes is NotReady, then 33% of the Nodes are unhealthy. Therefore, the MachineHealthCheck controller will _not_ remediate (delete and re-create) the Machine corresponding to the NotReady Node.

[@JoelSpeed please LMK if this is wrong and I'll update]

I find the "unhealthy" part confusing. A Machine that's being remediated (deleted, re-created) counts toward this threshold, but is technically just unavailable, not unhealthy, after it's deleted. I'm still thinking of more accurate names.

Some other comments shared in the 03/11/2020 meeting:

I agree it’s a questionable name. StopRemediatingAbove? Compare Deployment where maxUnavailable means really the opposite of what maxUnhealthy does here.
-- @bboreham

AIUI the idea is that when we exceed this threshold we assume that our logic is faulty or that we're only going to make things worse. k-c-m has --unhealthy-zone-threshold which I think is a more similar concept
-- @justinsb

areapi kinapi-change lifecyclfrozen

Source

dlipovetsky

Most helpful comment

Maybe bring it up at the next upstream meeting + mailing list discussion?

This is a change we'll have to delay to v1alpha4, so we should have some time :)

vincepri on 16 Mar 2020

👍2

All 8 comments

@dlipovetsky You explanation is correct! The idea behind this is, as @justinsb mentioned, to prevent the health checking logic making anything worse.

Users should choose a valid level of disruption they are comfortable with and if there is more than that, the MHC will stop doing anything. I see it as analogous to the maxUnavailable field on a Pod Disruption Budget.

If there are 5 expected pods, maxUnavailable is 30%, 1 currently unavailable, and something tries to evict another, evicting that second pod would violate maxUnavailable and as such, the eviction is blocked

With MHC, if there are 5 expected Machines, maxUnhealthy is 30%, 1 currently unhealthy (may have been remediated or not), the MHC detects a second unhealthy, it will stop making further remediations and the value now exceeds the threshold

The difference between the two is that the PDB can prevent the threshold being exceeded, where MHC just stops working if outside influences change the number of unhealthy Machines

I hope some of this adds clarity and helps us choose a better name

@bboreham could you elaborate on your point mentioned above? I was under the impression this was actually similar to how deployments work rather than the opposite. If the maxUnavailable field is violated, does the deployment rollout not pause?

JoelSpeed on 12 Mar 2020

You're right, Deployment rolling update will pause if maxUnavailable is unexpectedly exceeded, but more commonly it is used to control the active behaviour: at the beginning of a rollout it will scale down the old ReplicaSet by N pods and scale up the new to create N new pods.

If a pod were to suddenly disappear mid-rollout then the underlying ReplicaSets would replace it, so I think this is quite different to a situation where all remediation is halted beyond a certain point.

bboreham on 12 Mar 2020

👍1

Does anyone have any ideas on how I should seek feedback to try and come up with a better name for this field?

JoelSpeed on 16 Mar 2020

Maybe bring it up at the next upstream meeting + mailing list discussion?

This is a change we'll have to delay to v1alpha4, so we should have some time :)

vincepri on 16 Mar 2020

👍2

My two cents:

I think maxUnavailable is accurate enough. However, because we're used to seeing the NotReady condition in kubectl get nodes, and because I think the condition name and semantics are unlikely to change in the future, I think maxNotReady is also a good choice.

dlipovetsky on 17 Mar 2020

👍1

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale