Containers-roadmap: ECS Service reaches steady state before container healthcheck pass

Created on 26 Jul 2019 · 7Comments · Source: aws/containers-roadmap

Summary

ECS Service reaches steady state before container healthcheck pass. This results in ECS update getting deployed via cloudformation successfully, However, the tasks go into restart loop after container health check fails rendering whole ECS service unstable.

Description

when we perform an update via cloudformation to ECS. Cloudformation waits for "steady state" event to mark the update successful. "Service xxxxxxxx has reached a steady state." I can see the ECS service reaches the steady state and then the health checks fail, Which results cloud-formation deployment successfully but rendering ECS service unstable.

Expected Behavior

ECS Service should reach steady state after container healthcheck pass

Observed Behavior

ECS Service reaches steady state before container healthcheck pass

Environment Details

Supporting Log Snippets

ECS

Source

mailjunze

👍9

Most helpful comment

The bugfix has been deployed in all regions. Feel free to reply or reopen this issue if the problem persists.

DireCorgi on 2 Oct 2020

🎉2

All 7 comments

@mailjunze

Could you please mail me the following information at shugy at amazon dot com to help debug this issue --
1) Which version of Agent and Docker are you running?
2) How often do you observe this error?
3) Date/time of when the issue occurred, and ECS agent's configuration and debug level log files following instructions here: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-logs-collector.html

shubham2892 on 8 Aug 2019

Haven't got back the requested information. Will close the issue for now. Feel free to reopen it when you have them ready.

yumex93 on 16 Aug 2019

Is there an easier way to check if the ECS service has reached the "steady" (not "stable") state other than constantly querying the last events and looking for the word "steady" in there?

robertoandrade on 26 Sep 2019

we are seeing this issue and it's affecting our services. Is there a way to escalate?

schmohlio on 31 Jan 2020

We will be releasing a fix for this soon. With this change, all container health checks must be in a HEALTHY status before a Service or TaskSet can reach STEADY_STATE. This fix will apply to services using ECS/Rolling deployments controllers (Steady State event), external deployment controllers (TaskSet stabilityStatus) and CodeDeploy Blue/Green (deployments will now wait for Container Health Checks to pass before flipping traffic). Services not using container health checks will see no change in behavior. We will update this ticket once the change has been released.

DireCorgi on 16 Sep 2020

👍2

The bugfix has been deployed in all regions. Feel free to reply or reopen this issue if the problem persists.

DireCorgi on 2 Oct 2020

🎉2

@DireCorgi does this fix only apply to stack updates? Because I am able to create an ECS service with a failing container health check via CloufFormation. Stack and Service are CREATE_COMPLETE, but the service just keeps restarting the container with the failing health check.