/kind feature
Describe the solution you'd like
When I create a machine deployment, It is hard to see the state of it. If it is still resizing, or if it has reached the expected replica number, or it has running into some error reconciling. It would be nice to associated machine deployment with some sort of state such as "RESIZING", "READY", ERROR", etc. Also, currently it is not easy to get an error reason on why reconciling failed. can we add an errorReason filed to indicate why scaling failed?
Anything else you would like to add:
It will be similar to the request here to add status for cluster "https://github.com/kubernetes-sigs/cluster-api/issues/820"
A MachineDeployment is ready if spec.replicas == status.updatedReplicas == status.availableReplicas - does that sound accurate?
It's resizing if spec.replicas != status.updatedReplicas or spec.replicas != status.availableReplicas?
Can you use the 2 above calculations to determine ready & resizing?
It's in an error state if something is going wrong, such as:
I think distinguishing between resizing-all-is-well and resizing-but-encountering-persistent-errors is the hard part. I'm not sure the best way to 1) detect this, and 2) store & display this information. Does anyone have any suggestions?
/priority important-soon
Note to self: check out how k/k Deployments handle resize failures & status
/cc @rudoi – you've spent some time staring at MachineDeployments. Want to join us?
We haven't done any work on this and if it's still needed, we can consider it for the next minor release.
/unassign
/milestone Next
@ncdc would it make sense to do something similar to the cluster phase status field, where the machine_deployment controller can update that field based on the replica numbers as you explained above and then show that in the kubectl get command as additional columns?
@nader-ziada we could consider doing something like that. Would you be willing to write up the details in a comment here?
I think the approach could be the following:
phase field to MachineDeploymentStatusmachinedeployment_phase reconcile method would watch the machine set in each cycle and update accordingly, maybe get the status calculated here https://github.com/kubernetes-sigs/cluster-api/blob/a6b5ba419bad1af4ec9f294720d5c61eada297be/controllers/machineset_status.go#L39would need to investigate further to get more details, I could work on putting together a PR if the general approach makes sense. Thanks
This seems reasonable. Before you do a PR, would you be able to flesh out the details around how you calculate the phase, more specifically?
yeah sure, will update with a more detailed comment explaining how to calculate the phase. I'll do a quick spike to make sure what i'm saying makes sense, but will update soon
/assign
You can find an example in this proposal https://github.com/kubernetes-sigs/cluster-api/tree/master/docs/proposals on how we defined phases and each requirement
Thanks @vincepri
Add a new field in the MachineDeploymentStatus called Phase with the following values
status.Replicas > status.ReadyReplicasstatus.Replicas == status.ReadyReplicasfailed or unknownQuestions:
Resizing would be covered by Provisioning, do we need to differentiate between these two cases? Pending when the machinedeployment is first created? Would ScalingUp and ScalingDown be better than Provisioning?
I don't feel strongly on having Pending vs not having it.
I would prefer not to have Pending. Doesn’t really add much value.
We can do scaling up and down by also checking UpdatedReplicas. If Replicas < UpdatedReplicas then ScalingUp, otherwise ScalingDown
In addition to checking the ready replicas as well
Some of this is already handled by kubectl and kubectl describe for resources that have a scale subresource.
I'm not sure we need dedicated "states" or "phases" to show scaling up or down. I think the bigger concern is around bubbling up that there are anomolies encountered when scaling up or down.
My fear is that we introduce a field that is intended to just be used for friendly display and not meant to be used by external tooling for accurate "state", but ends up being used that way anyway.
The kubectl describe machinedeployment output will show all the replica information
...
Status:
Observed Generation: 2
Replicas: 2
Unavailable Replicas: 2
Updated Replicas: 2
but having a field with a status to say when the deployment is done scaling might be easier to automate against. Is there more context to why this would not be accurate?
@detiber we have status.phase for machines and clusters, and we have agreed these fields are for human consumption. People certainly could write automation against these fields (we can't stop that from happening), but we can augment our documentation on these fields to indicate they exist to provide a user-friendly visual status and nothing more. Couldn't we do the same thing here?
should I go ahead with this? or is still under consideration?
Let's give @detiber a chance to reply to my last comment, then we'll see 😄
@ncdc I'm willing to give it a go.
Great, thanks Jason!
Thanks all, I will work on that and hopefully have something to show soon