Containers-roadmap: [ECS] [request]: Daemon tasks need reserved memory/cpu space in order to schedule

Created on 29 Sep 2019 · 6Comments · Source: aws/containers-roadmap

Tell us about your request
What do you want us to build?
Currently when you use DAEMON tasks you can get into situations where the task cannot be scheduled because there isn't enough CPU/Memory available on the instance. However, this is critical when you want to run something with 1 DAEMON task per host for things like log aggregation, datadog agent, etc.

I think we need something like DAEMON_MEMORY_RESERVATION_MB/DAEMON_CPU_RESERVATION that we can populate to reserve this space so that ECS can still schedule these tasks.

Which service(s) is this request for?
This could be Fargate, ECS, EKS, ECR
ECS

Are you currently working around this issue?
How are you currently solving this problem?
The only work around I'm aware of is to ensure your instances have plenty of memory/cpu which causes you to over provision your cluster and cost us more.

ECS Proposed

Source

brentryan

👍10

Most helpful comment

We are currently working on a daemon scheduler enhancements that will resolve the issue as defined above:
All customers will get the enhancements out of the box:

ECS will ensure that Daemon tasks are the first tasks to be placed on new ECS container instances to ensure that monitoring and security agents are launched before the application containers are launched on the container instance.
ECS will also reserve the CPU, memory and ENI resources defined for the daemon task on the Instance. This will ensure that in case of daemon launch failure or during daemon service updates,another task launch does not ‘steal’ the resources for daemon task and prevent it from Running successfully.

Please feel free to provide feedback on the Github Issue here. Hope this helps!

pavneeta on 2 Nov 2020

🎉4

All 6 comments

We are facing this issue as well. When scaling up a bunch of tasks get scheduled before the daemons, causing the instance to fill up, resulting in there not being enough CPU or memory for the daemon.

We use a daemon for log parsing / forwarding, so this is quite a big issue for us.

zBart on 23 Jul 2020

👍1

We are currently working on a daemon scheduler enhancements that will resolve the issue as defined above:
All customers will get the enhancements out of the box:

ECS will ensure that Daemon tasks are the first tasks to be placed on new ECS container instances to ensure that monitoring and security agents are launched before the application containers are launched on the container instance.
ECS will also reserve the CPU, memory and ENI resources defined for the daemon task on the Instance. This will ensure that in case of daemon launch failure or during daemon service updates,another task launch does not ‘steal’ the resources for daemon task and prevent it from Running successfully.

Please feel free to provide feedback on the Github Issue here. Hope this helps!

pavneeta on 2 Nov 2020

🎉4

the above sounds great @pavneeta !

CpuID on 2 Nov 2020

@pavneeta
Is there any way to add a feature like this :

ECS will move Replica tasks from the instance to another one in case a new Daemon task is installed and there is not enough resources to run it. So this moving reduces the resources for the correct Daemon task work .

What do you think about it ?

kapralVV on 2 Nov 2020

👍2

@kapralVV I suspect this starts to dive into the territory of https://github.com/aws/containers-roadmap/issues/105 ?

CpuID on 2 Nov 2020

+1 to think about what happens when adding a new daemon task to an existing cluster - that seems to be the main case not handled in @pavneeta's update above. This sounds great though!

Edit: If moving tasks off of the instance to make room is difficult, I wonder if marking an instance as unhealthy if there aren't enough resources to run the daemon, drain it, launch a new instance w/ the new resource reservations, and reschedule there.