Managed Kubernetes worker nodes will allow you to provision, scale, and update groups of EC2 worker nodes through EKS.
This feature fulfills https://github.com/aws/containers-roadmap/issues/57
As a quick note, can we make sure that this will interact/play nicely with cluster-autoscaler? If we can get managed, autoscaling worker nodes, this would be amazing.
Along with draining nodes in an upgrade situation.
Shouldn't this one be solved as part of implementing Fargate for EKS? https://github.com/aws/containers-roadmap/issues/32
You could use Virtual Kubelet
Fargate for EKS is a different thing @dnobre, with Fargate there are no worker nodes to manage. So this issue about managing worker nodes is not relevant to Fargate.
@tabern being able to support cluster-autoscaler is important to people. At the moment it expects to manipulate ASGs through that API. But if you add an EKS API they please contribute a patch or fork to cluster-autoscaler to be able to use the new API.
If you provide your own autoscaling instead, it has to be aware on the cluster workload and all ASGs, you’d need to a k8s service or daemonset to provide custom metrics to ASG. And in some way, when there are many ASGs, to choose which one to scale up/down next, as cluster-autoscaler does.
cluster-autoscaler has to use some tricks to scale to/from zero nodes in an ASG, because it don’t know what a node would look like when there are none. The improvement the EKS API could provide would be to expose what a node would be (instance type, AZ, node labels, node taints, tags) when the node group is scaled to zero.
cluster-autoscaler also has trouble when scaling up multi-AZ ASGs, because it can’t specify which AZ the new node will be in (e.g. when the un-scheduled workload is AZ-specific). The ability to specify an AZ when scaling up and EKS node group would be great.
@whereisaaron interesting that's exactly what I would consider Fargate for EKS to be. Since it was never released 2 years ago it's implementation is pretty hypothetical but theoretically I would expect an endpoint for your kubeconfig and be able to deploy via kubectl apply or helm install.
I wouldn't expect things like "task" definitions because that's what Fargate for ECS already is.
How would "managed worker nodes" be any different?
Granted the fact that "Fargate for EKS" was never released means we are all just spit balling here.
With Fargate, whether ECS Fargate or EKS Fargate there are no worker nodes. That’s why you use a Fargate solution, so you do not have to manage worker nodes. So this issue has no overlap with a Fargate product.
@cdenneen not sure I understand, but what you describe sounds correct, just like EKS endpoint, except no (real) worker nodes, just a virtual-kubelet running as a sidecar to the Pod. The virtual-kubelet and Pod run who-knows-where, because the instances they run on are not our problem with a Fargate solution.
Any updates on this issue?
@groodt coming soon.... we'll be sure to update when there are updates to share!
Does this features will add capability to create worker nodes group from the AWS console (UI)?
@ejlp12 yes.
@tabern will there be an option to add a userdata script or otherwise modify the instances?
I am curious about logging aggregation as well for managed workers. Any details on how we can aggregate logs as part of this feature?
@lilley2412 not at launch, but we plan to add this in the future.
@pfremm yes. You'll be able to use EC2 Autoscaling for reporting group-level metrics. Since managed nodes are standard EC2 instances that run in your account, you will be able to implement any log forwarding/aggregation tooling that you are using today, such as FluentBit/S3 and Fluentd/CloudWatch.
@tabern will this support windows worker nodes?
Who manages security patches or addresses CVEs on these managed worker nodes. Will this still fall under "Security in the Cloud" customer responsibility?
Released GA 11/18 👍
Can we have a link to the docs?
Hi! The documentation is deploying now. It should be available shortly, and I'll update with a link here when it is.
We're excited to announce that Amazon EKS Managed Node Groups are now generally available!
With Amazon EKS managed node groups you don’t need to separately provision or connect the EC2 instances that provide compute capacity to run your Kubernetes applications. You can create, update, or terminate nodes for your cluster with a single command. Nodes run using the latest EKS-optimized AMIs in your AWS account while node updates and terminations gracefully drain nodes to ensure your applications stay available.
Today, EKS managed node groups are available for new Amazon EKS clusters running Kubernetes version 1.14 with platform version eks.3. You can also update clusters (1.13 or lower) to version 1.14 to take advantage of this feature. Support for existing version 1.14 clusters is coming soon.
Learn more
@tabern congrats on the release!
Is CF support in a future release or is doco just pending updates?
Ready to use this but can't use without CF support :/
@tabern and entire EKS team, thanks for working hard on this this is very good step in the right direction. I am assuming i can still run my user data(bootstrap) . Also as @robgott mentioned CF needs to support it. Also since some one mentioned about cluster-autoscaler. I am assuming that should continue to work. We have this problem of "How do i keep a node hot, to provision additional pods" and we were thinking of using https://github.com/helm/charts/tree/master/stable/cluster-overprovisioner .
I'm not seeing anything in the docs regarding user data. Is this available right now with the managed worker nodes?
Does something need to be done to enable this on existing clusters?
Latest EKS 1.14

Also, as @nxf5025 mentions, doesn't look like any ability to pass in userdata or kubelet flags?
Also, will there be support for spot instances?
Thanks all! We're pretty excited to introduce this new feature.
@robgott @pc-rshetty CloudFormation support for managed node groups is there today, its just that the documentation is taking a bit longer to publish than we had originally expected.
Specifically, EKS Managed node group introduces a new resource type ”AWS::EKS::Nodegroup“ and an update to existing resource type ”AWS::EKS::Cluster“ to add ClusterSecurityGroupId in Cloudformation. The documentation updates for these changes will be published by 11/21.
@pc-rshetty Cluster Autoscaler should continue to work just like it does today. The biggest change from our end is that we tag every node for auto discovery by cluster autoscaler. Overprovisioner should work. Seems like a helm chart that basically implements the method described here?
@nxf5025 @MarcusNoble today you cannot pass this to managed node groups. However! we're planning to add this in the future as part of support for EC2 Launch Templates https://github.com/aws/containers-roadmap/issues/585
Yes, we also will be working on spot support - tracking in https://github.com/aws/containers-roadmap/issues/583
The other feature we're currently tracking on the roadmap is Windows Support (https://github.com/aws/containers-roadmap/issues/584) but feel free to add more if there are important features you think we should be looking at.
Are managed Ubuntu node groups also being worked on or should that be added to the roadmap? That was mentioned in the blog post when comparing EKS API with eksctl, it's a feature we need.
In addition to spot instances, being able to utilise mixed instances policy, as per https://github.com/kubernetes/autoscaler/pull/1886, i.e. t3.large and t3a.large or m5.large and m5d.large etc. This is to increase the probability of a successful instance fulfilment. We are currently using this functionality to good effect and would need to have the same ability with managed worker nodes, along with the ability to specify userdata.
In the UI, this would be simply represented by being able to select multiple instance types and preferably being able to sort them in order of preference. This is how launch template mixed instances policy and overrides currently work:
Just one question can this feature utilise spot instances? Could not find it in documentation though
Sent from my mobile. Typos are possible!
On 19 Nov 2019, at 08:27, Andrew Hemming notifications@github.com wrote:

In addition to spot instances, being able to utilise mixed instances policy, as per kubernetes/autoscaler#1886, i.e. t3.large and t3a.large or m5.large and m5d.large etc. This is to increase the probability of a successful instance fulfilment. We are currently using this functionality to good effect and would need to have the same ability with managed worker nodes, along with the ability to specify userdata.In the UI, this would be simply represented by being able to select multiple instance types and preferably being able to sort them in order of preference. This is launch template mixed instances policy and overrides currently work:
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
@omerfsen we're tracking spot support in https://github.com/aws/containers-roadmap/issues/583
@drewhemm we're considering mixed instance groups part of spot support, agree that without them spot will be difficult to use properly.
@JanneEN that's a good call out, we'd love to make this happen. Thanks for adding as https://github.com/aws/containers-roadmap/issues/588
Cloudformation Documentation for EKS Managed Node Groups is now published - https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-eks-nodegroup.html
@tabern Doesn't look like docs have been updated still. That link redirects me to: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html
Interesting. The link worked when I clicked it 12 hours ago...
It's working for me now! :+1:
@tabern How are rolling updates supposed to work? Draining nodes actually works, but apparently leads to a downtime.
Let's say I have an existing node group and want to rotate the nodes. To do this (manually), I would replace the node group by creating a new one, waiting for it to become available and then delete the old one afterwards. When doing this, I can actually see that the nodes get drained before the instances get terminated. However, the running pods are more or less terminated simultaneously which leads to a downtime.
In terraform, the mechanism is basically the same, leading to the same result.
Am I doing something wrong?
Edit:
I can also see this behavior when just scaling an existing node group (e.g. by scaling from 3 nodes to 6 and back from 6 to 3).
2nd edit:
Downtime in this case means that I can see some failing requests.
@splieth look into pod disruption budgets, that's what you need to avoid all pods terminate at once.
When will we see support for existing 1.14 clusters? My clusters are currently stuck on platform version eks.2.
@groodt I don't think there's much hope there. You could just create a new cluster and move your workloads there.
Hopefully everyone realizes that you're not supposed to build pet clusters either.
I don't see why it couldn't support existing clusters. A little more involved maybe but a cluster can have multiple ASGs associated with it so the new managed nodes could be brought up alongside the existing self-managed and them remove the self-managed when the new nodes are stable.
This isn't a new Kubernetes version. Presumably it's some additional process running in the control plane that are aware of the ASGs and that's it.
I saw this in the original announcement:
Today, EKS managed node groups are available for new Amazon EKS clusters running Kubernetes version 1.14 with platform version eks.3. You can also update clusters (1.13 or lower) to version 1.14 to take advantage of this feature. Support for existing version 1.14 clusters is coming soon.
So presumably they do plan to upgrade existing clusters, I'm curious on the timelines. If it's too long, sure I can create a new cluster and migrate workloads easily enough, but it's still annoying to do without downtime.
My 1.14 clusters are still stuck in platform version eks.2. The newest platform version is eks.7 - seems that the rollout of new platform versions for existing clusters is really slow.
While it's fair to assume the creation of new clusters for new K8s versions, seems a bit excessive creating new clusters for new platform versions.
What expectations can we have on the timeline for updates of platform version for existing clusters?
FWIW, my clusters weren’t updating either, but when I updated all my workers to a newer AMI ahead of the control plane version, all my control planes updated within 48 hours. Coincidence?
Been trying out managed worker nodes and unless I am missing something do I have no ability to see kubelet related logs unless I provision with an SSH key?
@pfremm I didn't find another method apart from deploying the SSM agent as DaemonSet and accessing the logs via SSM rather than SSH. But imho that's the better option since the SSH key doesn't need to be shared
@pfremm I suggest you set up container insights to ship logs and metrics into CloudWatch Logs, that worked great for me.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/deploy-container-insights-EKS.html
All logs from pods and kubelet + kube proxy are then shipped and viewable in cloudwatch. You can then ship that further into elasticsearch as well, so that's also an option 8f you don't like cloudwatch queries.
@tabern will there be an option to add a userdata script or otherwise modify the instances?
Is there any update on this?
I couldn't find any reference in https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-eks-nodegroup.html
This is quite critical to setup nodes for different purposes
Hi @pdonorio, that feature request is being tracked in this issue #596
Out of curiosity, does the current managed node group setup specify any kind of flags for --kube-reserved and friends? If so, what are the values based on? I know we'll be able to control those values once #596 has been addressed, but I'm wondering if managed nodegroups would be usable for us today.
is there any support for setting taints on a nodegroup?
is there any support for customizing the bootstrap script that is being ran on the nodegroup instances?
Most helpful comment
As a quick note, can we make sure that this will interact/play nicely with cluster-autoscaler? If we can get managed, autoscaling worker nodes, this would be amazing.