Autoscaler: scale-down-delay-after-delete parameter doesn't work properly

Created on 30 Sep 2020 · 12Comments · Source: kubernetes/autoscaler

I believe that started around the time the switch to the new scale down processor happened.

The parameter goes into effect and puts scaledown in cooldown mode based on the lastScaleDownDeleteTime.

https://github.com/kubernetes/autoscaler/blob/774390efc80625466794131418b95092a5e63741/cluster-autoscaler/core/static_autoscaler.go#L499

However, this is only set when the scale down status result is ScaleDownNodeDeleted

https://github.com/kubernetes/autoscaler/blob/774390efc80625466794131418b95092a5e63741/cluster-autoscaler/core/static_autoscaler.go#L532-L535

And it seems with the switch to the async deletions, this is no longer set:

https://github.com/kubernetes/autoscaler/blob/30719059bf89199a983fde72c3e061561d280672/cluster-autoscaler/core/scale_down.go#L943-L964

/kind bug

kinbug

Source

marwanad

Most helpful comment

The bug should be fixed by #3570.

towca on 1 Oct 2020

🎉2

All 12 comments

@MaciekPytel @towca

marwanad on 30 Sep 2020

Should we be returning ScaleDownNodeDeleted instead of ScaleDownNodeDeleteStarted - it seems that the later isn't used around so it seems like the most straightforward fix

marwanad on 30 Sep 2020

Thanks for pointing this out Marwan! Although I'd stick to using ScaleDownNodeDeleteStarted since it better conveys what actually happens. I'll send out a fix shortly.

towca on 1 Oct 2020

👍1

The bug should be fixed by #3570.

towca on 1 Oct 2020

🎉2

thanks @towca @marwanad for finding/fixing this issue! any idea when/if this will make it into a patch release for versions 1.16.x/1.17.x?

ryaneorth on 8 Oct 2020

@MaciekPytel - maybe you know the answer to the above question?

ryaneorth on 9 Oct 2020

@ryaneorth we can prepare the cherry-pick PRs. I'm guessing we'll have one set of patch releases before K8s 1.20 and another around 1.20.

marwanad on 9 Oct 2020

Sounds great, thanks @marwanad . I'm happy to perform the cherry-picks if you'd like - let me know!

ryaneorth on 9 Oct 2020

@ryaneorth that would be great, thanks!

marwanad on 9 Oct 2020

Done!

3597

3598

3599

3600

ryaneorth on 9 Oct 2020

All of the above cherry-picks are complete. @MaciekPytel - do you have any information as to when the next patches will be released?

ryaneorth on 19 Oct 2020

@ryaneorth keep an 👀 on https://github.com/kubernetes/autoscaler/issues/3611

marwanad on 19 Oct 2020

👀1 👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Plans to support Terraform Enterprise as a provider for the cluster autoscaler

jadelafuente · 4Comments

scaling-down nodes that are running only system pods

adamrp · 7Comments

Implement cluster autoscaling on AWS by manipulating spot fleet target?

whereisaaron · 7Comments

[feature request] Optionally allow VPA to manage pod limits

davidquarles · 7Comments

AWS cloud provider tests take 120 seconds

losipiuk · 7Comments