Tell us about your request
A typical use case of the DAEMON scheduling strategy is to run daemons that monitor containers, collect logs, etc. Imagine the following scenario:
Kubernetes solves for this issue by not killing containers scheduled by the DaemonSet controller when a node is put into DRAINING. Similar functionality is desired in ECS. Otherwise, in this scenario, logs can potentially get lost.
Which service(s) is this request for?
ECS
Are you currently working around this issue?
We run logstash/fluentd outside of ECS so that it remains running at all times.
+1 to this, but please behind some kind of flag at the task level.
Most of our Daemon tasks fall under this requirement: they're metrics/logging/etc containers so they have to be running all the time on every instance and would benefit from not being stopped after a drain.
However we also have clusters dedicated to running some heavyweight applications, where we want only one task per instance. Daemon tasks are great for this, we scale on the EC2 instances instead of ECS and rely on ECS to manage everything else. But we want those containers stopped after a drain, leaving only the metrics/logging/etc ones running.
Yes... +1 but with the flag at the task level..
Even just killing REPLICA tasks before DAEMON tasks would be nice.
Would also be great if this is applied to scheduled tasks (https://docs.aws.amazon.com/AmazonECS/latest/developerguide/scheduling_tasks.html). Thanks!
+1 with a flag to say if they remain in DRAINING or not
When messing about with the draining state the following issue might be worthwhile to take under consideration as well for implementation at the same time.
[ecs] - de-register from Cloud Map / R53 when instance is draining
https://github.com/aws/containers-roadmap/issues/473
This is also a problem for DAEMON tasks that are connected to a Load Balancer. I would expect the ECS scheduler to respect the draining timeout set for the tasks target group.
What happens is that ECS tries to;
All at the same time
2019-11-06 11:38:54 +0100
service tracker has reached a steady state.
658253d0-09ba-417c-9d09-278ab036e37b
2019-11-06 11:38:44 +0100
service tracker has begun draining connections on 1 tasks.
cfc8396e-c0d1-4aea-abd3-77f7495a63e5
2019-11-06 11:38:44 +0100
service tracker deregistered 1 targets in target-group X
55806ae5-4492-4b9b-bc8f-128b7a758e33
2019-11-06 11:38:44 +0100
service tracker has stopped 1 running tasks: task 0f9375327d974fadb36b14203669ed55.
ab25021d-1e9d-4dd4-b456-15bda44a248b
2019-11-06 11:38:44 +0100
(daemon service tracker) task 0f9375327d974fadb36b14203669ed55 no longer satisfies placement constraints.
85b57279-59d1-402b-a106-348ef82cf3f4
2019-11-06 11:38:44 +0100
(daemon service tracker) updated desired count to 2.
We worked around this issue for now by blocking SIGTERM that is sent to the DAEMON container. Then we able to drain connections.
+1 to this, but please behind some kind of flag at the task level.
Most of our Daemon tasks fall under this requirement: they're metrics/logging/etc containers so they have to be running all the time on every instance and would benefit from not being stopped after a drain.
However we also have clusters dedicated to running some heavyweight applications, where we want only one task per instance. Daemon tasks are great for this, we scale on the EC2 instances instead of ECS and rely on ECS to manage everything else. But we want those containers stopped after a drain, leaving only the metrics/logging/etc ones running.
Hi @xose We are currently working on the design phase of this feature to improve the Daemon Service Scheduler for ECS. Could you please help us understand the first use case a bit more? You want the Daemon tasks to never be killed on the instance, even when it has been drained of the application tasks? Are you constantly gathering logs/metric on empty EC2 instances as well? - if so, then could you share the reason?
Based on the feedback here, my understanding is that the Daemon service needs to satisfy the below conditions:
If the ECS Daemon service satisfies these conditions , does it solve for your use case?
Would love to get some feedback from everyone on this Issue created. Thanks.
- ECS should ensure that there is one daemon task per instance
- It should be the first task to be scheduled on any given Instance
- It should be the last task to be killed when an Instance is drained/stopped
For my use case where we have logging, x-ray and other metric collecting daemon services, this would be perfect. As long as these are the last containers to stop, then we won't be losing monitoring as the containers are drained.
I would be fine with 1 and 2 as stated.
3 is good or it would also be fine (maybe preferred) to leave the tasks even after draining. This would have the ability to pick up non container logs/metrics during shutdown
I worked around by dependsOn in container definition.
I would be fine with 1 and 2 as stated.
3 is good or it would also be fine (maybe preferred) to leave the tasks even after draining. This would have the ability to pick up non container logs/metrics during shutdown
Hi @rothgar , Could you please help me understand this more, If you are drained the host EC2 instance then it has no replica tasks running - why would you want to run Daemon task on it ? Wouldn't prefer that instance be scaled in ?
Sometimes instances are drained but left running for troubleshooting or testing. The daemon tasks are used for logging and metrics from the host no matter if other ECS tasks are running on it or not.
@pavneeta Could you please provide an update? Will there be an AWS native solution to this problem soon?
@pavneeta are you able to provide an update if this is coming shortly on the roadmap...?
@pavneeta Any update on this?
Just hit this again today :(
Hi @xose We are currently working on the design phase of this feature to improve the Daemon Service Scheduler for ECS. Could you please help us understand the first use case a bit more? You want the Daemon tasks to never be killed on the instance, even when it has been drained of the application tasks? Are you constantly gathering logs/metric on empty EC2 instances as well? - if so, then could you share the reason?
Based on the feedback here, my understanding is that the Daemon service needs to satisfy the below conditions:
- ECS should ensure that there is one daemon task per instance
- It should be the first task to be scheduled on any given Instance
- It should be the last task to be killed when an Instance is drained/stopped
If the ECS Daemon service satisfies these conditions , does it solve for your use case?
Would love to get some feedback from everyone on this Issue created. Thanks.
Hi @pavneeta
Some extra feedback for you re your questions, yes to all points 1/2/3.
One consideration will be if you have multiple DAEMON services, if an order of startup/shutdown needs to be considered. For example, you have something like Consul Agent (for service discovery) + Filebeat (for log aggregation) running on each node, and you want to ensure they both stay up until all non-DAEMON ECS tasks are terminated, then take down Consul Agent, then take down Filebeat (or vice versa).
I think just taking down all the DAEMON services together at the end is likely fine as a first pass of this feature though, its hard to decide/tradeoff what should stay running longer than others after non-DAEMON tasks are terminated.
Hi Everyone,
Thank you so much for all the feedback regarding the this Issue:
We are currently working on a feature (coming soon)for Daemon Scheduler that will be available out of the box for all customers:
ECS will now ensure that Daemon tasks are the last task to drain from an Instance - this will allow the monitoring daemons to pick up trailing logs or metrics. This should help resolve the issue as defined above. I understand that there was also an ask to not drain/stop the daemon at all, however we believe that certain customers use Daemon tasks for application containers as well, hence the decision to gracefully shut them down but only after the replica (application) tasks have been stopped.
Hope this helps!
ECS will now ensure that Daemon tasks are the last task to drain from an Instance - this will allow the monitoring daemons to pick up trailing logs or metrics.
amazing, can't wait @pavneeta thank you!
Hi @pavneeta ,
This is great news indeed, thanks for sharing. I do have one question though - I don't believe there is currently a mechanism to specify dependencies between Daemon services? Can you explain a bit about what the termination ordering would be _within the daemon services running_ on each ECS Container Instance?
Many thanks,
Edd
Hi
Hi Everyone,
Thank you so much for all the feedback regarding the this Issue:
We are currently working on a feature (coming soon)for Daemon Scheduler that will be available out of the box for all customers:
ECS will now ensure that Daemon tasks are the last task to drain from an Instance - this will allow the monitoring daemons to pick up trailing logs or metrics. This should help resolve the issue as defined above. I understand that there was also an ask to not drain/stop the daemon at all, however we believe that certain customers use Daemon tasks for application containers as well, hence the decision to gracefully shut them down but only after the replica (application) tasks have been stopped.Hope this helps!
Hi Everyone, This feature has been shipped, all customers will be see this behavior by default with no opt-in required.
Hi @pavneeta ,
This is great news indeed, thanks for sharing. I do have one question though - I don't believe there is currently a mechanism to specify dependencies between Daemon services? Can you explain a bit about what the termination ordering would be _within the daemon services running_ on each ECS Container Instance?
Many thanks,
Edd
@eddgrant - ECS will not follow a specific order of draining between multiple Daemon services. At this time there is no way for your define dependency between daemon services. Can you please share more information regarding your use case for that?
Hi @pavneeta
Regarding this statement...
I understand that there was also an ask to not drain/stop the daemon at all, however we believe that certain customers use Daemon tasks for application containers as well, hence the decision to gracefully shut them down but only after the replica (application) tasks have been stopped.
Is that the end of it or is there anything on the roadmap to fulfil the requirement.
We are trying to move as many of our "agent" type services out of the host and in to priviledged daemon containers. For example, running fluentd container with mounted volume of /var/log rather than fluentd on the host.
This is impossible if you're going to always kill the daemons though as we'll end up stop {DOING X} for whatever host is in draining mode. This could be logs, host metrics, security scanning, etc.
Hi @pavneeta
Regarding this statement...
I understand that there was also an ask to not drain/stop the daemon at all, however we believe that certain customers use Daemon tasks for application containers as well, hence the decision to gracefully shut them down but only after the replica (application) tasks have been stopped.
Is that the end of it or is there anything on the roadmap to fulfil the requirement.
We are trying to move as many of our "agent" type services out of the host and in to priviledged daemon containers. For example, running fluentd container with mounted volume of /var/log rather than fluentd on the host.
This is impossible if you're going to always kill the daemons though as we'll end up stop {DOING X} for whatever host is in draining mode. This could be logs, host metrics, security scanning, etc.
Hi Wimnat - can you please unpack that a little more for me please. If all replica tasks have been shut down and the instance is in draining mode - then why would you not want the daemon tasks to be stopped before the instance is terminated ? Is it to collect host level metrics/logs ?
The reason we did not want to build that in was because some customers use Daemon scheduler type to run applications - hence eliminating draining altogether is a risky for them.
Yes... I'm one such user of Daemon tasks that run applications. Draining means draining.. so all tasks should be removed when draining process is requested. I don't mind that the daemon tasks are the last ones to go down but they must be removed at some point in the lifecycle.
I would suggest that a feature like that is needed, a new RFE should be submitted to request a new mode besides draining. Call it userapps_off or something like that.
Most helpful comment
Hi Everyone,
Thank you so much for all the feedback regarding the this Issue:
We are currently working on a feature (coming soon)for Daemon Scheduler that will be available out of the box for all customers:
ECS will now ensure that Daemon tasks are the last task to drain from an Instance - this will allow the monitoring daemons to pick up trailing logs or metrics. This should help resolve the issue as defined above. I understand that there was also an ask to not drain/stop the daemon at all, however we believe that certain customers use Daemon tasks for application containers as well, hence the decision to gracefully shut them down but only after the replica (application) tasks have been stopped.
Hope this helps!