Tell us about your request
Right now we can use on-demand instances in a managed node worker group. However I see no reference in the documentation to using spot instances or a spot fleet. Ideally, I would like to be able to use spot instances for my batch workloads.
Which service(s) is this request for?
EKS
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
I want to run batch workloads cost efficiently. We mostly use spot instances for this. Without this feature I can't take advantage of the nice managed draining and upgrading support of the managed worker node groups.
Are you currently working around this issue?
Creating our own autoscaling groups and manually doing a rolling upgrade using kubectl cordon and drain commands.
Additional context
No
Attachments
None
Update 12/1 – this feature is now available
Thanks for adding this! We're working on this feature and its been part of our plan for managed nodes from the start.
Question: Would you expect to provision a spot node group with a single instance type or multiple instance types?
@tabern Thanks for the quick answer!
As to your question I would expect to be able to specify multiple instance types. At the moment we use launch template to specify multiple instance types and let it choose automatically. Providing the same support would be great. The reason we choose multiple instance types is resiliency, if one of those instance types is not available we can automatically switch.
How about the new Fargate Spot option on EKS?
How about the new Fargate Spot option on EKS?
I assume you are also talking about this announcement, however this just means you're running pods on fargate orchestrated by EKS, which is still quite expensive compared to running actual nodes, and does not integrate with typical k8s tooling such as EFK, Prometheus+Grafana, nginx ingress, cert manager, etc.
I would expect to be able to specify multiple instance types. At the moment we use launch template to specify multiple instance types and let it choose automatically.
@stijndehaes would you expect to add any priority to these instance types or is random sufficient (ie: let cluster autoscaler scale up and we'll hit eventual capacity)? If we did not support multiple instance types per group would it be painful to need to create multiple node groups, some of which were scaled to 0 (and could be scaled up as needed) or would this create undue complexity?
How about the new Fargate Spot option on EKS?
@gjmveloso - that's on our roadmap, tracked as https://github.com/aws/containers-roadmap/issues/622
which is still quite expensive compared to running actual nodes
@gertjangaillet - The cost of Fargate tends to be dependent on cluster utilization. If you're getting very high utilization, Fargate is more expensive than nodes. However, if you typically run with low cluster utilization (50% or much less is very common), Fargate is more efficient. We're also bringing _Savings Plan_ to EKS/Fargate (https://github.com/aws/containers-roadmap/issues/616) which is another great way to lower costs.
@tabern to start with random would be sufficient. However I would be most interested in the option to launch the cheapest instance type. I mostly use a couple of different instance types that roughly have the same cpu/memory. For example: m5.xlarge, m5a.xlarge, m5d.xlarge. This makes it sure that all jobs land on instances with roughly the same power available to them. Also this used to be very important for the kubernetes cluster autoscaler because it uses one of the nodes as a template to see if a new pod would fit that node. I am not sure if this is still the case though (but I guess it is).
We would be interested in specifying a 'Capacity-Optimised' allocation strategy, as we have seen instability in Spot using 'Lowest-price' as we have suffered from losing instances within a given AZ then getting then back again, and losing them again where that instance type in that AZ is near exhaustion. We have therefore moved to diversified pools of instances matching the same capacity requirements, with a Capacity-Optimised strategy, i.e. we are willing to take a hit on getting the cheapest spot for stability.
When can we expect this?
Is there an ETA to release this feature? We're interested in migrating from Kops to EKS Managed, but not having Spot Instances is going to increase all of our pre-environments costs, which is a no-go.
This would be a really outstanding important feature, hope it comes soon :)
+1 to this, essential feature IMO.
o release this feature? We're interested in migrating from Kops to EKS Managed, but not having Spot Instances is going to increase all of our pre-environments costs, which is a no-go.
I'd suggest checking out this workshop and https://ec2spotworkshops.com/using_ec2_spot_instances_with_eks.html and this blog post https://itnext.io/the-definitive-guide-to-running-ec2-spot-instances-as-kubernetes-worker-nodes-68ef2095e767.
Agreed that this is an essential EKS offering especially to support development pipelines.
Make it so!
Any update on this item?
Any update please?
Hi, are there some new about this topic?
This would be pretty useful given widespread industry mandates to cut costs right now .....
any updates? It would help us keep out cost low on EKS cluster which we plan to use for our CI builds.
@tabern after moving to eks on reserved instances we are now thinking of leveraging spot instances.
We do use cluster autoscaler on production today.
If we have 2 node groups and if i were to set precedence for "spot" worker node groups (over reserved) to expand first and only if that is not successful i would like to go to reserved instances.
To make this happen in understand i would have to implement something like this https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/expander/priority/readme.md
This, in my opinion, can also work with SpotAllocationStrategy set to capacity-optimized
So i think implementing priority based expander is important.
@casey-robertson 👍. Any update regarding the support of spot instances on managed node group ?
Question: Would you expect to provision a spot node group with vCPU/Mem based inputs and let EKS select the list of instance types?
If that was an additional (optional) feature it might be interesting. But having control over the exact types of instances is more important.
@dchelupati I think having control of the type of instance, it's what's people expect from that feature. A scenario would be providing more power to your CI and optimise the cost or another scenario to increase the power of your cluster based on the amount of preview environment. You probably want to pick up the exact instance type you need.
@antiqe Thanks for the feedback. I understand why you want to pick the instance type based on the workload and desired performance of the cluster. However, in order to take the most benefit of EC2 Spot instances, we recommend best practices of instance and AZ flexibility. For example, if you need c5.large as preferred instance type for your CI environment, we can create a Spot node group with c5.large and c4.large so you are flexible across instance types. If you prefer, we can even add m4 and m5 to further increase flexibility. If the node group does it on your behalf with a preferred instance type input instead of vCPU and memory, would that work?
For example, if you need c5.large as preferred instance type for your CI environment, we can create a Spot node group with c5.large and c4.large
That works for me, especially since it's best practice. I personally don't need to care about instance type for CPU and memory, but for GPU instances the instance type is very important. There can be big price and performance differences between different types of GPUs.
@dchelupati Thanks a lot for the reply and the explaination regarding the good practice. Agree with @cep21 regarding GPU instance, but for the others scenarios that i have in mind that works for me too and sounds to be a pretty good first version.
Re: the AWS EKS Workshop mentioned above, a brand new section was added 2 weeks ago covering how Ocean from Spot.io helps simplify EKS infra management while fully leveraging the cost-savings of spot instances. Here is direct link to that section: https://eksworkshop.com/beginner/190_ocean/
@ZevOps You can used spot instances with the worker node approach (the approach that people used before AWS introduce node groups) meaning handling the initialisation + scaling by your own (user_data + scaling group...) let's called it the none fully auto-managed way but the purpose of that issue it's to be able to used spot instances with the managed node groups directly
https://github.com/terraform-aws-modules/terraform-aws-eks/issues/831#issuecomment-613046677
ETA on this feature ?
AWS team working on it if we refer to the public roadmap, hope this feature will be available soon 😄
Any updates?
If this gets implemented, it might also be nice to additionally have node lifecycle (e.g., on-demand versus spot) added as a node label
@tabern As of now we install Antivirus on the worker nodes with the help of a custom script included in User Data Section of the launch template for our self managed nodegroup. It will be nice to have an option to modify user data section for the underlying launch template while editing node group config in the case of AWS Managed Node groups as well.
Was also looking for this functionality when spinning up a EKS cluster to run a flink batch compute workload.
https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/kubernetes.html
Seems it's already there. :)
https://aws.amazon.com/blogs/compute/cost-optimization-and-resilience-eks-with-spot-instances/
Maybe the UI part is still in-progress? Anyway, we can have a try on the managed spot instances with EKS. Thanks team!
@igrowheart The API documentation doesn't tell anything about "spot" instances when creating a managed node group: https://docs.aws.amazon.com/eks/latest/APIReference/API_CreateNodegroup.html. I think that blog describes how to launch spot instances through a "regular" "manually" created launch template + autoscaling group.
@igrowheart The API documentation doesn't tell anything about "spot" instances when creating a managed node group: https://docs.aws.amazon.com/eks/latest/APIReference/API_CreateNodegroup.html. I think that blog describes how to launch spot instances through a "regular" "manually" created launch template + autoscaling group.
If you read through the blog I posted and try it out, you will see that it describes a way to use spot instance under managed worker node group, which is the specific request mentioned in this ticket. The control is passed via eksctl instead of other AWS APIs. Let me know if I missed anything here. :)
This doc also mentions eksctl’s support on spot instances. It’s updated recently in May for the first time.
https://eksctl.io/usage/spot-instances/
For those who voted down on my comments:
We have the same interest to wait for the most wanted feature on EKS and we want those features to benefit our work or apps in the future. I’m just guessing the way the product team is doing, if you think the comment is not in the right direction, please just leave your comments. Do not down voting like a kid. :)
@igrowheart Yes spot instances can be launched in Managed Node Groups by tweaking the underlying ASG/Launch Template manually. We did implement this in our clusters. But while doing an upgrade of K8s cluster and worker nodes to newer versions we did notice that the underlying ASGs/Launch Templates were reset to using On-Demand Nodes. Also the custom user data section we configured in underlying launch template has completely been ignored in managed worker nodes setup after the upgrade.
@igrowheart my bad. But let's try to not get emotive here and not flood this issue with unnecessary comments.
I downvoted your comment because just like @yourilefers pointed out, currently there's no support for spot instances through the managed node groups official API. What eksctl does is to use cloudformation templates to provision what AWS itself calls 'self managed nodes'. Also in the article you mentioned it's pretty clear, managed nodes for the on demand group and self managed for the spot pool.
And even though it is possible to workaround this by changing the auto scaling groups manually (like @ktumu0225 mentioned), this does not solves this issue as it stands for being capable of provisioning spot instances through managed nodes (officially), this would also enable other tools like terraform to use this feature.
@igrowheart Yes spot instances can be launched in Managed Node Groups by tweaking the underlying ASG/Launch Template manually. We did implement this in our clusters. But while doing an upgrade of K8s cluster and worker nodes to newer versions we did notice that the underlying ASGs/Launch Templates were reset to using On-Demand Nodes. Also the custom user data section we configured in underlying launch template has completely been ignored in managed worker nodes setup after the upgrade.
Didn't realize the upgrade will break this and the custom user data part. Thanks for the insights!
However, this will help a lot on the dev&test environments.
For production, we need to wait for the General Available of this feature.
@Dudssource never mind.
Seems I'm too thrilled after I found eksctl can support spot instances :)
Thanks for your time explaining the details. Let's wait for the General Available of this feature.
@igrowheart Yes spot instances can be launched in Managed Node Groups by tweaking the underlying ASG/Launch Template manually. We did implement this in our clusters. But while doing an upgrade of K8s cluster and worker nodes to newer versions we did notice that the underlying ASGs/Launch Templates were reset to using On-Demand Nodes. Also the custom user data section we configured in underlying launch template has completely been ignored in managed worker nodes setup after the upgrade.
We also did the same thing. We are using Cluster Autosacaler as well. Did you use their recommended settings for instance types or did you configure our own instance types with mixed spot and on-demand in the launch templates?
@tabern Do you have some news from AWS team regarding the used of Spot Instance with Managed Node Groups. Thanks in advance
Any terraform terraform-aws-modules/eks/aws user here, know when it can become a feature for node_groups?
Any terraform
terraform-aws-modules/eks/awsuser here, know when it can become a feature fornode_groups?
I wouldn't count on it becoming available until it's part of the official AWS EKS API. See https://github.com/aws/containers-roadmap/issues/583#issuecomment-657274758.
Alright, this has been open for almost a year now. Is there any progress?
If this feature is not available soon (within a month), I will be forced to give up on managed node groups and that would be a shame.
Is there a roadmap for the AWS EKS API? Is there an Amazon rep who can speak to this?
@treksler We are actively working on it and appreciate the patience.
Is there any ETA @treksler ?
Is there any ETA @treksler ?
you mean @rtripat
I saw the status changed to 'Coming Soon'. So I'm expecting this during the re:invent. :)
EKS managed node groups now provide native support for EC2 Spot Instances.
When you create a managed node group, simply set capacity type as SPOT and the select one or more EC2 instance types that meet your resource requirements. Managed node groups provision and manage Spot nodes based on the latest Spot best practices. In particular, they enhance your node group's availability by enabling the capacity-optimized allocation strategy and Capacity Rebalancing on all Amazon EC2 Auto Scaling groups they manage.
Learn more
Is this supported in the config as well?
Any terraform
terraform-aws-modules/eks/awsuser here, know when it can become a feature fornode_groups?I wouldn't count on it becoming available until it's part of the official AWS EKS API. See #583 (comment).
@amazingandyyy This is available in PR form here: https://github.com/terraform-aws-modules/terraform-aws-eks/pull/1129
Tried this today. The price is even higher than on-demand's one. What's the purpose?
I must admit that I mixed up "price" and "max price" options. I apologize
Most helpful comment
Thanks for adding this! We're working on this feature and its been part of our plan for managed nodes from the start.
Question: Would you expect to provision a spot node group with a single instance type or multiple instance types?