Kops: rolling-update very slow

Created on 23 Oct 2018  路  12Comments  路  Source: kubernetes/kops

As I understand it the current behaviour for rolling-update is:

for node in stale_nodes:
  drain(node)
  validate_stable()
  delete(node)

With the ASG taking care of spawning the new nodes.

This is very slow, particularly when pod disruption budgets are not being violated. (e.g. imagine we are configured so that the smallest scale-down of any workload has capacity for 2 disruptions without issues, but scales up with load, then kops could safely introduce 1 disruption at any time, but also potentially many more when the disruption budget hasn't been exceeded.

Now, you might say 'slow doesn't matter', but ops folk have to pay attention during this process, and when it exceeds attention windows - say an hour - that becomes a human factor problem.

I'm not sure what the right behaviour should be, but something like the following would be pretty much ideal for many of our use cases:

    asg.count = 2*len(stale_nodes)
    wait for *a* node to be up
    map(stale_nodes, cordon)
    while True:
        if drain_nonblocking(stale_nodes):
              break
    map(stale_nodes, delete)

where drain_nonblocking is something like this:

done = True
for node in stale_nodes:
  for pod in pods:
    if ((standalone(pod) or
        statefulset_highest_unmoved(pod) or
        daemonset/job(pod) or
        deployment_above_disruption_budget(pod)) and
        pod_can_be_rescheduled(pod)):
        delete(pod)
    else:
        done=False
return done

The idea being to induce as much disruption as the cluster is defined as tolerating, as rapidly as possible.

An obvious extension to this approach is a canary incrementally more aggressive rollout - first one node, then two, then four, 8, 16 etc until the entire cluster is being done.

kops 1.9.0 :).

Let me know if this would be considered for merging, we might see about putting something together.

lifecyclrotten

Most helpful comment

All 12 comments

For me, an ideal solution would be for kops to understand what are my critical services, and how many instances/pods I can tolerate to be taken out of service. As long as those critical services are available with a minimum amount of instances, kops can recycle as many nodes as possible. PDB can be used. I think this is deeper than just plain pod disruption budgets, though. Kops could check the fleet of nodes before hand, and understand which ones can be recycled immediately, and then efficiently recycle the remaining without violating a PDB. As we all know, kops can hang on recycling a node that has pods on it that will cause a violation of a PDB. This intelligent decision making before hand can avoid that.

In the meantime, we usually run more than one upgrade command at a cluster, picking different instance groups while running the rolling deployment. That helps somewhat.

Sounds like we're saying much the same thing; the key question for me is how much room for experimentation kops will permit - no point putting experimental code forward if its not broadly interesting to the maintainers :)

@rbtcollins I totally sign in the point that rolling-update is super-slow.

I am not sure, you have the same thing in mind, but one of other possibilities is to temporary double-size the amount of nodes at auto-scaling-group and then... put it back to normal.
Auto-scaling-group removes the oldest nodes by default, as far as I know.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Caskia picture Caskia  路  3Comments

georgebuckerfield picture georgebuckerfield  路  4Comments

joshbranham picture joshbranham  路  3Comments

yetanotherchris picture yetanotherchris  路  3Comments

DocValerian picture DocValerian  路  4Comments