We are currently trying to build a kops cluster with auto-scaling using the the cluster autoscaler https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md
The autoscaler documentation recommends to use a separate ASG per AZ for the scaling to work correctly (in-depth reasoning https://github.com/kubernetes/contrib/pull/1552#r75532949)
While this can be achieved in kops by manually creating an instance group for each AZ, it seems not to be fully support (e.g. rolling-update is rolling all IGs in parallel instead of in serial) and quite manual.
Is the ASG per AZ a pattern kops should use / support by default or is there a different / better way to achieve k8s driven cluster auto scaling in AWS?
You still need help on this?
Yes, I would like to know what the recommended approach is for cluster autoscaling. One instance group with all AZs or one ig per asg?
@johanneswuerbach actually a couple of things come into play here, not just autoscaling :)
I have read that the autoscaling works with multiple ASG. @andrewsykim and @justinsb would know better though. @andrewsykim do you know of documentation that would help?
re: multiple ASGs https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/aws#multiple-asg-setup
I have also seen there is https://github.com/hjacobs/kube-aws-autoscaler and it mentions a few pain points with the official autoscaler. Is this potentially a better fit than the official autoscaler? Just getting started with kubernetes/kops and autoscaling that respects AZs is something we would need as well.
Never tried it so can't comment. What do you mean by "respects AZs"? Cause the official autoscaler works with multiple AZs (one ASG per zone), it's just not too smart about it. You can try autoscaling a single ASG on multiple AZs as well, it's just that we're leaving the AZ for the instance up to AWS and kubenetes will not schedule (as far as I'm aware) pods to nodes based on their AZ
kubenetes will not schedule (as far as I'm aware) pods to nodes based on their AZ
As far as I know and what the docs say, k8s is AZ aware and tries to spread pods by default https://kubernetes.io/docs/admin/multiple-zones/#functionality
Cool, then I don't see why 1 multi-AZ ASG can't work with cluster autoscaler 馃, I think when cluster-autoscaler on AWS was first implemented the scheduler was not AZ aware (or we just didn't know it was).
https://github.com/kubernetes/contrib/pull/1552#r75532949 has the best summary why 1 ASG with multiple AZs isn't ideal.
Yeah, that's a great summary. 1 ASG with multiple AZ is specially not ideal if the scheduler will ever block pods from running if the desired AZ is not available. But from experience I don't think it ever will (unless you specify some taint or a custom scheduler) so it seems like it won't be a problem anymore. I think there's only 1 way to confirm this 馃槢, have you already tried 1 ASG with multiple AZs?
We are not at the stage yet to try this but will do soon. I have noticed that since 0.6 of the autoscaler they added a new flag --balance-similar-node-groups that is meant for multi-AZ setup. See: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#im-running-cluster-with-nodes-in-multiple-zones-for-ha-purposes-is-that-supported-by-cluster-autoscaler But sounds to me like it is talking about multiple node groups, i.e. ASGs if I understand that text correctly, so maybe no use for 1ASG.
@chrislovecnm I am just reading your comment
If you are using PVC / EBS storage I highly recommend one kops ig per zone. This is to ensure that you have failover availability since EBS volumes do not span zones.
how does one kops ig per zone allow for failover abilitiy when using ebs volumes? I have a statefulset that writes certain data to the file system and using a PVC for that. Failover would be great to have but not sure how to achieve that. Also the other thing I was wondering, if the EBS gets created in one AZ and the pod gets rescheduled in another AZ how does that then work? can the EBS volume then just be attached to that node in the other AZ? Or will the pod only be scheduled on that one node? Sorry for it being a bit off topic.
Edit: Ah, found part of the answer here: https://kubernetes.io/docs/admin/multiple-zones/#volume-affinity the pod will always be scheduled on the same node - but still not sure how having separate igs per zone allows for failover.
We are using 3 separate ASGs (IGs) for 3 AZs successfully with autoscaler for a while now. While I'm not 100% sure what @andrewsykim meant by his comment, having separate ASGs per AZ ensures that the autoscaler can scale the ASG in the correct AZ to ensure that the pending Pod(s) can actually mount the PV (as EBS volumes are only attachable within the same AZ they have been created in) they claimed.
With the balance similar node groups option instances are okay-ish distributed across all AZs and with https://github.com/kubernetes/autoscaler/pull/198 (not yet released) AZ failures should also be handled in a controlled fashion.
To ensure failover of Pods with PVCs each ASG should have a min node size of at least 2 as otherwise the claims are not mountable until a new instance in the required AZ has been started.
So overall separate ASGs for each AZ would be the setup I would recommend as this ensures the autoscaler can fully control the instance distribution across AZ instead of AWS which has no awareness of your cluster needs and might actually cause harm like https://github.com/kubernetes/kops/issues/3319
Closing since we have solution on here.
Being in October 2018 and in 1.12, what is the general feeling around 1 ASG vs 1 ASG per AZ?
I have the feeling that now, we don't need this complexity of 1 ASG per AZ.
But maybe there are still arguments for it?
Thanks for your insights. (And sorry to dig such an old post)
1 ASG per AZ is still the way to go, because otherwise there is no way for the autoscaler to spawn an instance in a specific AZ.
Generally you need a specific AZ, ones you start using EBS backed PersitentVolumes as they can't move AZs so the AZ of the instance is important.
Most helpful comment
1 ASG per AZis still the way to go, because otherwise there is no way for the autoscaler to spawn an instance in a specific AZ.Generally you need a specific AZ, ones you start using EBS backed PersitentVolumes as they can't move AZs so the AZ of the instance is important.