Cluster-api: Add a ~Waiting phase for Machines to describe a Node isn't ready

Created on 25 Oct 2019 · 12Comments · Source: kubernetes-sigs/cluster-api

User Story

As a operator I would like to know if a Node linked to a Machine is not ready after it has been provisioned

Detailed Description

Follow-up issue to https://github.com/kubernetes-sigs/cluster-api/issues/1622, the Machine controller should check during reconciliation if the Node specified in NodeRef isn't in a Ready state and flip the Status.Phase field to a Waiting phase.

/kind feature
/priority important-longterm
/milestone v0.3.0

kinfeature prioritimportant-longterm

Source

vincepri

All 12 comments

Bike-shedding on the name: I think something like NotReady, Unavailable, Offline is more indicative of the status than Waiting is.

ncdc on 25 Oct 2019

👍1

Updated the title to make it clear name is something we need to figure out, I'm ok with anything meaningful :)

vincepri on 25 Oct 2019

Maybe clarify that it's after the machine has successfully bootstrapped (instead of just provisioned)?

NodeNotReady?

ncdc on 25 Oct 2019

This is potentially a stopgap until we have a better proposal around status conditions. In the meantime, looking for feedback!

ncdc on 25 Oct 2019

❤1

Tagging a bunch of people for 👀. Would ❤️ comments, and feel free to tag more people (or ignore, that's ok too! 😄).

@CecileRobertMichon @juan-lee @justinsb @akutz @michaelgugino @pablochacin @detiber @liztio @chuckha @noamran @amy @rudoi @rsmitty @andrewsykim @moshloop

ncdc on 25 Oct 2019

Status.Phase: chillin'

but seriously, offline was the one I liked the most and could not think of a more appropriate, succinct name. I was thinking about something like UnavailableButRecoverable...but that's terrible 😬

chuckha on 25 Oct 2019

What would we consider Ready here? Would we consider the Node being fully Ready (whjich would require CNI to be deployed)? Would we want to exclude that since it is out of scope for our purposes? Would we want to be able to distinguish between the two and introduce 2 new phases?

detiber on 25 Oct 2019

I would prefer to have only one state that covers both, if possible. Once we settle on how to shape conditions, that would cover others as well

vincepri on 25 Oct 2019

I would prefer to have only one state that covers both, if possible. Once we settle on how to shape conditions, that would cover others as well

I make a distinction between the two because whatever is used to determine AlmostReady (defined as ready, just missing CNI to be fully Ready), would be similar to what we'd want to use for health checks related to the Node resource when it comes to initializing, scaling, or upgrading of a control plane machine. But this is less useful to end users who would be more interested in the full Ready state of a Node.

detiber on 25 Oct 2019

👍1

I'm generally against syncing status from node to machines. Machines are a mechanism for creating nodes, not much else.

Once a machine has a nodeRef, it's done. What happens from there is the node's problem. It doesn't make sense to re-invent node statuses or describe those conditions on a machine.

If for some reason you feel you absolutely have to indicate node status on a machine, then I would just do a blind copy of node conditions, and not invent any new ones.

michaelgugino on 31 Oct 2019

Closing this in favor of #1658
/close

ncdc on 6 Dec 2019

@ncdc: Closing this issue.

In response to this:

Closing this in favor of #1658
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.