Kops: Support ASG Scaling Policies or the cluster-autoscaler

Created on 2 Feb 2017 · 23Comments · Source: kubernetes/kops

Out of the box kops creates an ASG on AWS for the nodes instance group, with a min and max number of instances. But it doesn't add any Scaling Policies, so the ASG will not grow above the min, unless someone manually changes it.

Does kops have the ability to define Scaling Policies? I couldn't find any documentation on defining these, so perhaps we just need documentation. But if it's not currently supported, then this is a feature request.

lifecyclrotten

Source

pluttrell

👍12

Most helpful comment

Thanks for the info. I was unaware of the cluster-autoscaler add-on. After looking at it, it appears that you must manually set several cluster specific parameters, which are already maintained in the kops config on S3. It doesn't seem logical that we should be maintaining a min and max size of the ASG and the cluster-autoscaler add-on separately. As such, I think that kops should install and manage the configuration of this add-on by default.

pluttrell on 3 Feb 2017

👍15

All 23 comments

Scaling is managed by cluster-autoscaler addon instead ASG policies https://github.com/kubernetes/kops/tree/master/addons/cluster-autoscaler

ese on 2 Feb 2017

pluttrell on 3 Feb 2017

👍15

@ese is right - we don't install the AWS autoscaling policies because the autoscaler is the more "correct" solution. The reason is that the kubernetes scheduler will avoid scheduling too many pods, so you might never see excess CPU, even if you cluster had hundreds of unschedulable pods.

I agree that kops should integrate autoscaler. I think likely first we should make a few changes in the autoscaler to pick up our ASGs automatically as well. I'm hoping it can be a simple addon at that point, and then I don't particularly mind whether we install it automatically or not. My inclination up until now has been not to unless it is critical to bring-up, but I also want to make the distinction go away by making it easy to add addons to the kops installation (and most of the machinery is there to do that already, to support things like weave addons)

justinsb on 5 Feb 2017

👍3

(I changed the title to incorporate the cluster-autoscaler option - hope that is OK!)

justinsb on 5 Feb 2017

Sure changing the title makes total sense. And it sounds like the plan is to build an easy to include certain important addons, so that would certainly satisfy my request.

In terms of if kops should deploy the cluster-autoscaler option by default or not, I think this comes down to style. The main README.md states "kops lets you deploy production-grade, highly available, Kubernetes clusters from the command line." Personally, I would hope that it does this by default out of the box, without the need to flip levers to ensure that it happens. But I can also see why others might just want it to just see a bare minimum cluster created, so I think it comes down to style. Addons such as Heapster and the Dashboard aren't required, so I can see why they would be off by default.

If it's not included, we should make sure that the documentation is crystal clear that this needs to be added in all production deployments to enable autoscaling.

pluttrell on 6 Feb 2017

Is there any update on this issue ? on k8s 1.6.0

Tibingeo on 28 Jun 2017

What was the final word on this? Is the add-on getting installed by default with kops? It seems not. I would put my vote to have it installed by default and a create cluster parameter available to set max and min nodes.

hswope on 31 Oct 2017

👍11

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 29 Jan 2018

/remove-lifecycle stale

pluttrell on 2 Feb 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 3 May 2018

/remove-lifecycle stale

sernst on 3 May 2018

This particular feature would be nice so that we're able to scale down our cluster outside of business hours in our test environments. I don't think the cluster auto scaler supports this.

treacher on 29 Jun 2018

👍2

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 27 Sep 2018

/remove-lifecycle stale

Gottox on 9 Oct 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 7 Jan 2019

/remove-lifecycle stale

Gottox on 7 Jan 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 7 Apr 2019

/remove-lifecycle stale

ckawasaki-isp on 6 May 2019

Seems a bit misleading to provide the option of maxSize and minSize without actually doing any scaling.

strongpauly on 1 Jul 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 29 Sep 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 29 Oct 2019

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 28 Nov 2019

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.