Tell us about your request
What do you want us to build?
Currently when you use DAEMON tasks you can get into situations where the task cannot be scheduled because there isn't enough CPU/Memory available on the instance. However, this is critical when you want to run something with 1 DAEMON task per host for things like log aggregation, datadog agent, etc.
I think we need something like DAEMON_MEMORY_RESERVATION_MB/DAEMON_CPU_RESERVATION that we can populate to reserve this space so that ECS can still schedule these tasks.
Which service(s) is this request for?
This could be Fargate, ECS, EKS, ECR
ECS
Are you currently working around this issue?
How are you currently solving this problem?
The only work around I'm aware of is to ensure your instances have plenty of memory/cpu which causes you to over provision your cluster and cost us more.
We are facing this issue as well. When scaling up a bunch of tasks get scheduled before the daemons, causing the instance to fill up, resulting in there not being enough CPU or memory for the daemon.
We use a daemon for log parsing / forwarding, so this is quite a big issue for us.
We are currently working on a daemon scheduler enhancements that will resolve the issue as defined above:
All customers will get the enhancements out of the box:
Please feel free to provide feedback on the Github Issue here. Hope this helps!
the above sounds great @pavneeta !
@pavneeta
Is there any way to add a feature like this :
What do you think about it ?
@kapralVV I suspect this starts to dive into the territory of https://github.com/aws/containers-roadmap/issues/105 ?
+1 to think about what happens when adding a new daemon task to an existing cluster - that seems to be the main case not handled in @pavneeta's update above. This sounds great though!
Edit: If moving tasks off of the instance to make room is difficult, I wonder if marking an instance as unhealthy if there aren't enough resources to run the daemon, drain it, launch a new instance w/ the new resource reservations, and reschedule there.
Most helpful comment
We are currently working on a daemon scheduler enhancements that will resolve the issue as defined above:
All customers will get the enhancements out of the box:
Please feel free to provide feedback on the Github Issue here. Hope this helps!