We're experiencing an issue where linked containers in a task aren't terminated in the "correct" order. We would have expected that containers are terminated in a way that consumer containers are terminated before their linked containers.
In our case, a local redis cache is terminated before the consuming web application is. This in turn leads to exceptions in the web application.
Can you tell me whether this is expected behaviour or whether we can request the termination order to be changed? I've tried to dig into the agent's code a bit and think that I've found out that a task's containers aren't terminated in any particular order right now.
@seiffert thanks for pointing this out. You are right that termination order of containers within a task is non-deterministic. Note that it is deterministic when starting containers based on links and volumes (https://github.com/aws/amazon-ecs-agent/blob/master/agent/engine/dependencygraph/graph.go).
We will look into this and update the issue.
-kiran
Any updates on this?
@kiranmeduri Also interested in updates on the topic as some of our containers fail because their linked containers are stopped first.
Still no updates? Our workflows are crashing as well due to caching containers being stopped before the applications gracefully exits.
Thanks for checking in on this. I wanted to let you know that we on the ECS team are aware of this issue, and that it is under active consideration. +1's and additional details on use cases are always appreciated and will help inform our work moving forward.
We are looking at addressing this along with startup order in https://github.com/aws/containers-roadmap/issues/123
I opened PR https://github.com/aws/amazon-ecs-agent/pull/1809.
If this PR is acceptable, I will complete all testing and request review as soon as possible.
We link our application container with a fluentd container. However the fluentd container always stops before the application container stops, so some data posted to fluentd lose.
I believe https://github.com/aws/amazon-ecs-agent/pull/1809 resolves the issue, so I appreciate if you take a look the pull request and post a comment, for example "This change looks good, so please complete all tests." or "We won't merge this pull request because it will conflicts with https://github.com/aws/containers-roadmap/issues/123".
Closing this issue because you can now control startup and termination order of containers in the task definition: https://aws.amazon.com/about-aws/whats-new/2019/03/amazon-ecs-introduces-enhanced-container-dependency-management/
Most helpful comment
@seiffert thanks for pointing this out. You are right that termination order of containers within a task is non-deterministic. Note that it is deterministic when starting containers based on links and volumes (https://github.com/aws/amazon-ecs-agent/blob/master/agent/engine/dependencygraph/graph.go).
We will look into this and update the issue.
-kiran