Containers-roadmap: ECS Agent is not restarted unhealthy containers for Dockerfile healthcheck

Created on 16 Jan 2019 · 7Comments · Source: aws/containers-roadmap

Summary

ECS Agent is not restarted unhealthy containers for Dockerfile healthcheck

Description

According to an article Amazon ECS Supports Container Health Checks and Task Health Management you have announced that Amazon ECS integrates with Docker container health checks to monitor the health of each container using HEALTHCHECK.
So we have ECS cluster with several services (i.e. logger). We added healthcheck for our logger as self hosted service and mapped to port (i.e. 5050), so dockerfile looks like this:

FROM microsoft/dotnet:2.1-runtime
WORKDIR ***
COPY . .
HEALTHCHECK CMD curl --fail http://localhost:5050/ || exit 1
ENTRYPOINT dotnet /***.dll

But next we read ECS documentation about Task Definition Parameters and Task and observed the following:

The Amazon ECS container agent does not monitor or report on Docker health checks that are embedded in a container image (such as those specified in a parent image or from the image's Dockerfile) and not specified in the container definition. Health check parameters that are specified in a container definition override any Docker health checks that exist in the container image.

Next I'm trying to play with docker containers and stopped one of unhealthy. So container restarted(but not removed) successfully and set unhealthy status again for new one (please note, ECS task was not restarted).

The questions are:

Will the healthcheck work if we move it to CloudFormation stack template?
Do you or will you in future support healthcheck specified specified in a parent image or from the image's Dockerfile?

Expected Behavior

Unhealthy docker container or ECS task should be restarted

Observed Behavior

Unhealthy docker container and ECS task are running

Environment Details

docker info:

Containers: 8
 Running: 7
 Paused: 0
 Stopped: 1
Images: 4
Server Version: 18.06.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-1049-aws
Operating System: Ubuntu 16.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.801GiB
Name: ***
ID: ***
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support

curl http://localhost:51678/v1/metadata
{"Cluster":"fnd-Cluster-***-cluster","ContainerInstanceArn":"arn:aws:ecs:eu-west-1:***:container-instance/0a263365-c9ad-40e0-b500-b169913bc3e1","Version":"Amazon ECS Agent - v1.24.0 (8b5e1863)"}

ECS Proposed

Source

dsharkou

👍1

Most helpful comment

I have issue similar to @dsharkov described.

this is something that we still need to evaluate. so i understand your use case - what does a baked in healthcheck definition offer that isn't covered the task def health check?

First of all, it reduce code duplication when moving existing docker containers to ecs . why should i add another healthcheck if i already have one in Dockerfile.
Also TaskDefinition is platform agnostic. I can create Task with linux or windows containers using the same task template, but Healthcheck instruction is platform dependent, for linux i usually use curl , for windows invoke-webrequest.

dimmy-timmy on 16 Jan 2019

👍11

All 7 comments

Will the healthcheck work if we move it to CloudFormation stack template?

hrm looks like it is supported in cloudformation for the container definition which includes healthcheck.

Do you or will you in future support healthcheck specified specified in a parent image or from the image's Dockerfile?

this is something that we still need to evaluate. so i understand your use case - what does a baked in healthcheck definition offer that isn't covered the task def health check?

adnxn on 16 Jan 2019

I have issue similar to @dsharkov described.

this is something that we still need to evaluate. so i understand your use case - what does a baked in healthcheck definition offer that isn't covered the task def health check?

dimmy-timmy on 16 Jan 2019

👍11

@dimmy-timmy +1, I also prefered default Docker healthcheck

dsharkou on 17 Jan 2019

👍1

Container healthchecks are the developers way of communicating how to determine if the container is alive/healthy; no sense duplicating that in the task definition when its already bound to the image.

jhmartin on 14 Aug 2019

Also TaskDefinition is platform agnostic. I can create Task with linux or windows containers using the same task template, but Healthcheck instruction is platform dependent, for linux i usually use curl , for windows invoke-webrequest.

I think this is a little confusing - the HEALTHCHECK as specified in the Dockerfile is executed within the namespace of the container (so if you use e.g. curl on Linux you need to have curl available within the container).

I can see the attractiveness of a container which can report its health, although really if the container is unhappy and you can tell from within the container, why not just exit? It's likely to be killed anyway.

It's worth noting that HEALTHCHECK is a docker proprietary feature and not part of the OCI image spec so is ignored by e.g. k8s.

rhowe on 9 Oct 2019

Someone have any tips for how to write a healthcheck for the TaskDefinition that checks the status of the docker status?

McDoit on 21 Mar 2020

@McDoit Ended up using default Docker healthcheck directive (Hoping ECS will honor it some day) followed by a cron to kill unhealthy instances.

* * * * * sh /usr/local/bin/ec2_docker_kill_unhealthy.sh

```#!/bin/bash

START_TIME=$SECONDS

/usr/bin/docker ps --filter "health=unhealthy" --format "table {{.ID}}" | tail -n +2 | xargs -n1 /usr/bin/docker kill

if [ $? -eq 0 ]
then
KILLED_STATUS="$(/usr/bin/docker ps -a --filter "status=exited" --format "table {{.ID}}\t{{.Names}}\t{{.Status}}\t{{.RunningFor}}")"
ELAPSED_TIME=$(($SECONDS - $START_TIME))
MSG="Docker unhealthy kill [hostname -f] successfully in $ELAPSED_TIME secs\n```$KILLED_STATUS```"
ALERT_CHANNEL_URL="https://hooks.slack.com/services/XXXX"

curl -X POST -H 'Content-type: application/json' --data '{"text":"'"$MSG"'"}' "$ALERT_CHANNEL_URL"
fi
```