Cluster-api: Status of a machine deployment

Created on 30 May 2019 · 23Comments · Source: kubernetes-sigs/cluster-api

/kind feature

Describe the solution you'd like
When I create a machine deployment, It is hard to see the state of it. If it is still resizing, or if it has reached the expected replica number, or it has running into some error reconciling. It would be nice to associated machine deployment with some sort of state such as "RESIZING", "READY", ERROR", etc. Also, currently it is not easy to get an error reason on why reconciling failed. can we add an errorReason filed to indicate why scaling failed?

Anything else you would like to add:
It will be similar to the request here to add status for cluster "https://github.com/kubernetes-sigs/cluster-api/issues/820"

help wanted kinfeature prioritbacklog

Source

Lokicity

All 23 comments

A MachineDeployment is ready if spec.replicas == status.updatedReplicas == status.availableReplicas - does that sound accurate?

It's resizing if spec.replicas != status.updatedReplicas or spec.replicas != status.availableReplicas?

Can you use the 2 above calculations to determine ready & resizing?

It's in an error state if something is going wrong, such as:

error updating a MachineSet
error creating a MachineSet
error deleting old MachineSet (maybe not as bad, if it's scaled to 0)
error creating a Machine (when scaling up)
error deleting a Machine (when scaling down)

I think distinguishing between resizing-all-is-well and resizing-but-encountering-persistent-errors is the hard part. I'm not sure the best way to 1) detect this, and 2) store & display this information. Does anyone have any suggestions?

/priority important-soon

ncdc on 11 Jun 2019

Note to self: check out how k/k Deployments handle resize failures & status

ncdc on 21 Jun 2019

/cc @rudoi – you've spent some time staring at MachineDeployments. Want to join us?

sethp-nr on 24 Jun 2019

👀1

We haven't done any work on this and if it's still needed, we can consider it for the next minor release.

/unassign
/milestone Next

ncdc on 25 Sep 2019

@ncdc would it make sense to do something similar to the cluster phase status field, where the machine_deployment controller can update that field based on the replica numbers as you explained above and then show that in the kubectl get command as additional columns?

nader-ziada on 2 Oct 2019

@nader-ziada we could consider doing something like that. Would you be willing to write up the details in a comment here?

ncdc on 2 Oct 2019

👍1

I think the approach could be the following:

add a phase field to MachineDeploymentStatus
add that field to the additional columns
a machinedeployment_phase reconcile method would watch the machine set in each cycle and update accordingly, maybe get the status calculated here https://github.com/kubernetes-sigs/cluster-api/blob/a6b5ba419bad1af4ec9f294720d5c61eada297be/controllers/machineset_status.go#L39
- i think the machinedeployment controller already watches the machineset

would need to investigate further to get more details, I could work on putting together a PR if the general approach makes sense. Thanks

nader-ziada on 2 Oct 2019

This seems reasonable. Before you do a PR, would you be able to flesh out the details around how you calculate the phase, more specifically?

ncdc on 2 Oct 2019

yeah sure, will update with a more detailed comment explaining how to calculate the phase. I'll do a quick spike to make sure what i'm saying makes sense, but will update soon

nader-ziada on 2 Oct 2019

/assign

nader-ziada on 2 Oct 2019

You can find an example in this proposal https://github.com/kubernetes-sigs/cluster-api/tree/master/docs/proposals on how we defined phases and each requirement

vincepri on 2 Oct 2019

👍1

Thanks @vincepri

nader-ziada on 2 Oct 2019

👍1

Add a new field in the MachineDeploymentStatus called Phase with the following values

Provisioning: if status.Replicas > status.ReadyReplicas
Running: if status.Replicas == status.ReadyReplicas
Failed: if the inspection of the machines list that is part of this deployment shows a machine with a phase of failed or unknown

Questions:

Resizing would be covered by Provisioning, do we need to differentiate between these two cases?
Do we need Pending when the machinedeployment is first created?

nader-ziada on 3 Oct 2019

Would ScalingUp and ScalingDown be better than Provisioning?

I don't feel strongly on having Pending vs not having it.

ncdc on 3 Oct 2019

I would prefer not to have Pending. Doesn’t really add much value.

We can do scaling up and down by also checking UpdatedReplicas. If Replicas < UpdatedReplicas then ScalingUp, otherwise ScalingDown

In addition to checking the ready replicas as well

nader-ziada on 3 Oct 2019

Some of this is already handled by kubectl and kubectl describe for resources that have a scale subresource.

I'm not sure we need dedicated "states" or "phases" to show scaling up or down. I think the bigger concern is around bubbling up that there are anomolies encountered when scaling up or down.

My fear is that we introduce a field that is intended to just be used for friendly display and not meant to be used by external tooling for accurate "state", but ends up being used that way anyway.

detiber on 4 Oct 2019

The kubectl describe machinedeployment output will show all the replica information

      ...
Status:
  Observed Generation:   2
  Replicas:              2
  Unavailable Replicas:  2
  Updated Replicas:      2

but having a field with a status to say when the deployment is done scaling might be easier to automate against. Is there more context to why this would not be accurate?

nader-ziada on 4 Oct 2019

@detiber we have status.phase for machines and clusters, and we have agreed these fields are for human consumption. People certainly could write automation against these fields (we can't stop that from happening), but we can augment our documentation on these fields to indicate they exist to provide a user-friendly visual status and nothing more. Couldn't we do the same thing here?

ncdc on 4 Oct 2019

👍1

should I go ahead with this? or is still under consideration?

nader-ziada on 8 Oct 2019

Let's give @detiber a chance to reply to my last comment, then we'll see 😄

ncdc on 8 Oct 2019

👍1

@ncdc I'm willing to give it a go.

detiber on 8 Oct 2019

Great, thanks Jason!

ncdc on 8 Oct 2019

Thanks all, I will work on that and hopefully have something to show soon

nader-ziada on 8 Oct 2019

🎉1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

`clusterctl delete --all --include-namespace` skips capi-webhook-system namespace

mboersma · 5Comments

Using clusterctl generate yaml for Windows NamedPipes causes backslashes to be stripped

jsturtevant · 5Comments

Running clusterctl init twice results in an error

wfernandes · 5Comments

Remove the example provider

fabriziopandini · 5Comments

[capz] kubeadm bootstrap config contains raw secrets

alexeldeib · 4Comments