Some containers that should be stopped (and are seen by ECS as stopped) when doing a service update stay up.
We noticed that certain containers are not stopped during regular ECS deployments (new task definitions containing image changes).
To narrow down what could fail, the problem service update using :
ecs update-service --cluster <cluster> --service <service> --force-new-deployment
This service shouldn't allow concurrent containers anyway :
Minimum healthy percent = 0%
Maximum healthy percent = 100%
Desired count = 1
Containers part of a stopped task should be always be stopped.
Certain containers are sometimes not stopped (in around 1 in 5 services updates) and survive future service updates.
They are seen as stopped by ECS :
"KnownExitCode": null,
"KnownStatus": "STOPPED",
"SentStatus": "STOPPED",
I've gone through ECS and docker logs and did not manage to pinpoint the source of this issue.
AMI : Amazon ECS-Optimized Amazon Linux AMI 2018.03.l
ECS agent version 1.17.3
Cluster of one EC2 instance, but also occuring on multiple-instances clusters.
The ECS instance has been rebooted, and docker system purge --all has been run minutes before the described occurence.
Docker info :
Containers: 131
Running: 25
Paused: 0
Stopped: 106
Images: 13
Server Version: 17.12.1-ce
Storage Driver: devicemapper
Pool Name: docker-docker--pool
Pool Blocksize: 524.3kB
Base Device Size: 10.74GB
Backing Filesystem: ext4
Udev Sync Supported: true
Data Space Used: 9.06GB
Data Space Total: 23.33GB
Data Space Available: 14.27GB
Metadata Space Used: 9.212MB
Metadata Space Total: 33.55MB
Metadata Space Available: 24.34MB
Thin Pool Minimum Free Space: 2.333GB
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.14.33-51.37.amzn1.x86_64
Operating System: Amazon Linux AMI 2018.03
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.792GiB
Name: ip-172-20-0-12
ID: UB62:W2YM:DYBG:BBT3:BDO7:VMEH:3XC5:REFC:FF7X:MV33:VLNH:MKIN
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
I've collected all logs using https://github.com/awslabs/ecs-logs-collector, but it proves time consuming to anonymize those logs. Please don't hesitate to ask for more details or logs.
There should be only df56811ff7a running (docker info) :
df56811ff7acd478f1f887a2b3e78b5e8c1f28bee9dc6d6347c04c169ac53907 <repo>.amazonaws.com/php:PR-1884-1-stage-cron "/usr/local/bin/entrypoint.sh /usr/bin/php /var/www/example/symfony mails:task --env=stage" 3 minutes ago Up 3 minutes 9000/tcp ecs-example-stage-cron-1-179-example-stage-mailstask-a6a1dcf1f3b681be8101
257e5551a32366def005dd45979d0f19cdb30f09f6302af19a1c7b29238e0dae <repo>.amazonaws.com/php:PR-1884-1-stage-cron "/usr/local/bin/entrypoint.sh /usr/bin/php /var/www/example/symfony mails:task --env=stage" 4 minutes ago Up 4 minutes 9000/tcp ecs-example-stage-cron-1-179-example-stage-mailstask-dea2dccf9294c0bc1c00
f54fa2bbe2a84e3af73a8c917772f458f54ae7fefe5c6c095977faf7ca00837c <repo>.amazonaws.com/php:PR-1884-1-stage-cron "/usr/local/bin/entrypoint.sh /usr/bin/php /var/www/example/symfony mails:task --env=stage" 5 minutes ago Up 5 minutes 9000/tcp ecs-example-stage-cron-1-179-example-stage-mailstask-9c9d93aee2a0b1dba201
ECS data for the task containing container 257e5551a323 :
{
"Arn": "arn:aws:ecs:eu-central-1:123456890123:task/da8ea8c0-a080-44cd-9a32-6f850b797da3",
"Containers": [
{
"ApplyingError": null,
"Command": [
"/usr/bin/php",
"/var/www/website/symfony",
"mails:task",
"--env=stage"
],
"Cpu": 0,
"EntryPoint": null,
"Essential": true,
"Image": "repo.amazonaws.com/php:PR-1884-1-stage-cron",
"ImageID": "sha256:0255089b72be4ea548854ea22a742b9228abd7d1f64b23b6307df3f8feadd2bb",
"IsInternal": "NORMAL",
"KnownExitCode": null,
"KnownPortBindings": null,
"KnownStatus": "STOPPED",
"Links": null,
"LogsAuthStrategy": "",
"Memory": 150,
"Name": "env-stage-mails_task",
"RunDependencies": null,
"SentStatus": "STOPPED",
"TransitionDependencySet": {
"ContainerDependencies": null
},
"desiredStatus": "STOPPED",
"dockerConfig": {
"config": "{}",
"hostConfig": "{\"DnsSearch\":[\"int.stage.example.com\"],\"LogConfig\":{\"Type\":\"awslogs\",\"Config\":{\"awslogs-group\":\"env-stage-workers\",\"awslogs-stream\":\"php/env-stage-mails_task/da8ea8c0-a080-44cd-9a32-6f850b797da3\",\"awslogs-region\":\"eu-central-1\"}},\"MemoryReservation\":157286400,\"CapAdd\":[],\"CapDrop\":[]}",
"version": "1.21"
},
"environment": {
"PHP_MEMORY_LIMIT": "128M",
"TZ": "Europe/Paris",
},
"metadataFileUpdated": false,
"mountPoints": [
{
"containerPath": "/var/www/website/web/uploads",
"readOnly": false,
"sourceVolume": "example-uploads"
}
],
"overrides": {
"command": null
},
"portMappings": [],
"registryAuthentication": {
"ecrAuthData": {
"endpointOverride": "",
"region": "eu-central-1",
"registryId": "123456890123",
"useExecutionRole": false
},
"type": "ecr"
},
"volumesFrom": []
},
{
"ApplyingError": null,
"Command": [
"/usr/bin/php",
"/var/www/website/symfony",
"utils:purgeNotification",
"--env=stage"
],
"Cpu": 0,
"EntryPoint": null,
"Essential": true,
"Image": "repo.amazonaws.com/php:PR-1884-1-stage-cron",
"ImageID": "sha256:0255089b72be4ea548854ea22a742b9228abd7d1f64b23b6307df3f8feadd2bb",
"IsInternal": "NORMAL",
"KnownExitCode": 137,
"KnownPortBindings": null,
"KnownStatus": "STOPPED",
"Links": null,
"LogsAuthStrategy": "",
"Memory": 150,
"Name": "env-stage-utils_purgeNotification",
"RunDependencies": null,
"SentStatus": "STOPPED",
"TransitionDependencySet": {
"ContainerDependencies": null
},
"desiredStatus": "STOPPED",
"dockerConfig": {
"config": "{}",
"hostConfig": "{\"DnsSearch\":[\"int.stage.example.com\"],\"LogConfig\":{\"Type\":\"awslogs\",\"Config\":{\"awslogs-group\":\"env-stage-workers\",\"awslogs-stream\":\"php/env-stage-utils_purgeNotification/da8ea8c0-a080-44cd-9a32-6f850b797da3\",\"awslogs-region\":\"eu-central-1\"}},\"MemoryReservation\":157286400,\"CapAdd\":[],\"CapDrop\":[]}",
"version": "1.21"
},
"environment": {
"PHP_MEMORY_LIMIT": "128M",
"TZ": "Europe/Paris",
},
"metadataFileUpdated": false,
"mountPoints": [
{
"containerPath": "/var/www/website/web/uploads",
"readOnly": false,
"sourceVolume": "example-uploads"
}
],
"overrides": {
"command": null
},
"portMappings": [],
"registryAuthentication": {
"ecrAuthData": {
"endpointOverride": "",
"region": "eu-central-1",
"registryId": "123456890123",
"useExecutionRole": false
},
"type": "ecr"
},
"volumesFrom": []
},
{
"ApplyingError": null,
"Command": [
"/usr/bin/php",
"/var/www/website/symfony",
"mails:dailyWorkshopRecap",
"--env=stage"
],
"Cpu": 0,
"EntryPoint": null,
"Essential": true,
"Image": "repo.amazonaws.com/php:PR-1884-1-stage-cron",
"ImageID": "sha256:0255089b72be4ea548854ea22a742b9228abd7d1f64b23b6307df3f8feadd2bb",
"IsInternal": "NORMAL",
"KnownExitCode": null,
"KnownPortBindings": null,
"KnownStatus": "STOPPED",
"Links": null,
"LogsAuthStrategy": "",
"Memory": 150,
"Name": "env-stage-mails_dailyWorkshopRecap",
"RunDependencies": null,
"SentStatus": "STOPPED",
"TransitionDependencySet": {
"ContainerDependencies": null
},
"desiredStatus": "STOPPED",
"dockerConfig": {
"config": "{}",
"hostConfig": "{\"DnsSearch\":[\"int.stage.example.com\"],\"LogConfig\":{\"Type\":\"awslogs\",\"Config\":{\"awslogs-group\":\"env-stage-workers\",\"awslogs-stream\":\"php/env-stage-mails_dailyWorkshopRecap/da8ea8c0-a080-44cd-9a32-6f850b797da3\",\"awslogs-region\":\"eu-central-1\"}},\"MemoryReservation\":157286400,\"CapAdd\":[],\"CapDrop\":[]}",
"version": "1.21"
},
"environment": {
"PHP_MEMORY_LIMIT": "128M",
"TZ": "Europe/Paris",
},
"metadataFileUpdated": false,
"mountPoints": [
{
"containerPath": "/var/www/website/web/uploads",
"readOnly": false,
"sourceVolume": "example-uploads"
}
],
"overrides": {
"command": null
},
"portMappings": [],
"registryAuthentication": {
"ecrAuthData": {
"endpointOverride": "",
"region": "eu-central-1",
"registryId": "123456890123",
"useExecutionRole": false
},
"type": "ecr"
},
"volumesFrom": []
},
{
"ApplyingError": null,
"Command": [
"/usr/bin/php",
"/var/www/website/symfony",
"mails:weeklyTicketingRecap",
"--env=stage"
],
"Cpu": 0,
"EntryPoint": null,
"Essential": true,
"Image": "repo.amazonaws.com/php:PR-1884-1-stage-cron",
"ImageID": "sha256:0255089b72be4ea548854ea22a742b9228abd7d1f64b23b6307df3f8feadd2bb",
"IsInternal": "NORMAL",
"KnownExitCode": 137,
"KnownPortBindings": null,
"KnownStatus": "STOPPED",
"Links": null,
"LogsAuthStrategy": "",
"Memory": 150,
"Name": "env-stage-mails_weeklyTicketingRecap",
"RunDependencies": null,
"SentStatus": "STOPPED",
"TransitionDependencySet": {
"ContainerDependencies": null
},
"desiredStatus": "STOPPED",
"dockerConfig": {
"config": "{}",
"hostConfig": "{\"DnsSearch\":[\"int.stage.example.com\"],\"LogConfig\":{\"Type\":\"awslogs\",\"Config\":{\"awslogs-group\":\"env-stage-workers\",\"awslogs-stream\":\"php/env-stage-mails_weeklyTicketingRecap/da8ea8c0-a080-44cd-9a32-6f850b797da3\",\"awslogs-region\":\"eu-central-1\"}},\"MemoryReservation\":157286400,\"CapAdd\":[],\"CapDrop\":[]}",
"version": "1.21"
},
"environment": {
"PHP_MEMORY_LIMIT": "128M",
"TZ": "Europe/Paris",
},
"metadataFileUpdated": false,
"mountPoints": [
{
"containerPath": "/var/www/website/web/uploads",
"readOnly": false,
"sourceVolume": "example-uploads"
}
],
"overrides": {
"command": null
},
"portMappings": [],
"registryAuthentication": {
"ecrAuthData": {
"endpointOverride": "",
"region": "eu-central-1",
"registryId": "123456890123",
"useExecutionRole": false
},
"type": "ecr"
},
"volumesFrom": []
},
{
"ApplyingError": null,
"Command": [
"/usr/bin/php",
"/var/www/website/symfony",
"utils:manageAssoAccount",
"--env=stage"
],
"Cpu": 0,
"EntryPoint": null,
"Essential": true,
"Image": "repo.amazonaws.com/php:PR-1884-1-stage-cron",
"ImageID": "sha256:0255089b72be4ea548854ea22a742b9228abd7d1f64b23b6307df3f8feadd2bb",
"IsInternal": "NORMAL",
"KnownExitCode": null,
"KnownPortBindings": null,
"KnownStatus": "STOPPED",
"Links": null,
"LogsAuthStrategy": "",
"Memory": 150,
"Name": "env-stage-utils_manageAssoAccount",
"RunDependencies": null,
"SentStatus": "STOPPED",
"TransitionDependencySet": {
"ContainerDependencies": null
},
"desiredStatus": "STOPPED",
"dockerConfig": {
"config": "{}",
"hostConfig": "{\"DnsSearch\":[\"int.stage.example.com\"],\"LogConfig\":{\"Type\":\"awslogs\",\"Config\":{\"awslogs-group\":\"env-stage-workers\",\"awslogs-stream\":\"php/env-stage-utils_manageAssoAccount/da8ea8c0-a080-44cd-9a32-6f850b797da3\",\"awslogs-region\":\"eu-central-1\"}},\"MemoryReservation\":157286400,\"CapAdd\":[],\"CapDrop\":[]}",
"version": "1.21"
},
"environment": {
"PHP_MEMORY_LIMIT": "128M",
"TZ": "Europe/Paris",
},
"metadataFileUpdated": false,
"mountPoints": [
{
"containerPath": "/var/www/website/web/uploads",
"readOnly": false,
"sourceVolume": "example-uploads"
}
],
"overrides": {
"command": null
},
"portMappings": [],
"registryAuthentication": {
"ecrAuthData": {
"endpointOverride": "",
"region": "eu-central-1",
"registryId": "123456890123",
"useExecutionRole": false
},
"type": "ecr"
},
"volumesFrom": []
},
{
"ApplyingError": null,
"Command": [
"/usr/bin/php",
"/var/www/website/symfony",
"mails:profileNotification",
"--env=stage"
],
"Cpu": 0,
"EntryPoint": null,
"Essential": true,
"Image": "repo.amazonaws.com/php:PR-1884-1-stage-cron",
"ImageID": "sha256:0255089b72be4ea548854ea22a742b9228abd7d1f64b23b6307df3f8feadd2bb",
"IsInternal": "NORMAL",
"KnownExitCode": 0,
"KnownPortBindings": null,
"KnownStatus": "STOPPED",
"Links": null,
"LogsAuthStrategy": "",
"Memory": 150,
"Name": "env-stage-mails_profileNotification",
"RunDependencies": null,
"SentStatus": "STOPPED",
"TransitionDependencySet": {
"ContainerDependencies": null
},
"desiredStatus": "STOPPED",
"dockerConfig": {
"config": "{}",
"hostConfig": "{\"DnsSearch\":[\"int.stage.example.com\"],\"LogConfig\":{\"Type\":\"awslogs\",\"Config\":{\"awslogs-group\":\"env-stage-workers\",\"awslogs-stream\":\"php/env-stage-mails_profileNotification/da8ea8c0-a080-44cd-9a32-6f850b797da3\",\"awslogs-region\":\"eu-central-1\"}},\"MemoryReservation\":157286400,\"CapAdd\":[],\"CapDrop\":[]}",
"version": "1.21"
},
"environment": {
"PHP_MEMORY_LIMIT": "128M",
"TZ": "Europe/Paris",
},
"metadataFileUpdated": false,
"mountPoints": [
{
"containerPath": "/var/www/website/web/uploads",
"readOnly": false,
"sourceVolume": "example-uploads"
}
],
"overrides": {
"command": null
},
"portMappings": [],
"registryAuthentication": {
"ecrAuthData": {
"endpointOverride": "",
"region": "eu-central-1",
"registryId": "123456890123",
"useExecutionRole": false
},
"type": "ecr"
},
"volumesFrom": []
}
],
"DesiredStatus": "STOPPED",
"ENI": null,
"ExecutionStoppedAt": "2018-05-21T13:26:15.969730897Z",
"Family": "env-stage-cron-1",
"KnownStatus": "STOPPED",
"KnownTime": "2018-05-21T13:26:46.556684068Z",
"MemoryCPULimitsEnabled": true,
"PullStartedAt": "2018-05-21T13:26:13.842619731Z",
"PullStoppedAt": "2018-05-21T13:26:14.100596311Z",
"SentStatus": "STOPPED",
"StartSequenceNumber": 1746,
"StopSequenceNumber": 0,
"Version": "179",
"executionCredentialsID": "",
"volumes": [
{
"host": {
"sourcePath": "/var/example-uploads"
},
"name": "example-uploads"
}
]
},
Container info for 257e5551a323:
[
{
"Id": "257e5551a32366def005dd45979d0f19cdb30f09f6302af19a1c7b29238e0dae",
"Created": "2018-05-21T13:26:14.087419271Z",
"Path": "/usr/local/bin/entrypoint.sh",
"Args": [
"/usr/bin/php",
"/var/www/example/symfony",
"mails:task",
"--env=stage"
],
"State": {
"Status": "running",
"Running": true,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 31682,
"ExitCode": 0,
"Error": "",
"StartedAt": "2018-05-21T13:26:16.284080656Z",
"FinishedAt": "0001-01-01T00:00:00Z"
},
"Image": "sha256:0255089b72be4ea548854ea22a742b9228abd7d1f64b23b6307df3f8feadd2bb",
"ResolvConfPath": "/var/lib/docker/containers/257e5551a32366def005dd45979d0f19cdb30f09f6302af19a1c7b29238e0dae/resolv.conf",
"HostnamePath": "/var/lib/docker/containers/257e5551a32366def005dd45979d0f19cdb30f09f6302af19a1c7b29238e0dae/hostname",
"HostsPath": "/var/lib/docker/containers/257e5551a32366def005dd45979d0f19cdb30f09f6302af19a1c7b29238e0dae/hosts",
"LogPath": "",
"Name": "/ecs-example-stage-cron-1-179-example-stage-mailstask-dea2dccf9294c0bc1c00",
"RestartCount": 0,
"Driver": "devicemapper",
"Platform": "linux",
"MountLabel": "",
"ProcessLabel": "",
"AppArmorProfile": "",
"ExecIDs": null,
"HostConfig": {
"Binds": [
"/var/example-uploads:/var/www/example/web/uploads"
],
"ContainerIDFile": "",
"LogConfig": {
"Type": "awslogs",
"Config": {
"awslogs-group": "example-stage-workers",
"awslogs-region": "eu-central-1",
"awslogs-stream": "php/example-stage-mails_task/da8ea8c0-a080-44cd-9a32-6f850b797da3"
}
},
"NetworkMode": "default",
"PortBindings": null,
"RestartPolicy": {
"Name": "",
"MaximumRetryCount": 0
},
"AutoRemove": false,
"VolumeDriver": "",
"VolumesFrom": null,
"CapAdd": null,
"CapDrop": null,
"Dns": null,
"DnsOptions": null,
"DnsSearch": [
"int.stage.example.com"
],
"ExtraHosts": null,
"GroupAdd": null,
"IpcMode": "shareable",
"Cgroup": "",
"Links": null,
"OomScoreAdj": 0,
"PidMode": "",
"Privileged": false,
"PublishAllPorts": false,
"ReadonlyRootfs": false,
"SecurityOpt": null,
"UTSMode": "",
"UsernsMode": "",
"ShmSize": 67108864,
"Runtime": "runc",
"ConsoleSize": [
0,
0
],
"Isolation": "",
"CpuShares": 2,
"Memory": 157286400,
"NanoCpus": 0,
"CgroupParent": "/ecs/da8ea8c0-a080-44cd-9a32-6f850b797da3",
"BlkioWeight": 0,
"BlkioWeightDevice": null,
"BlkioDeviceReadBps": null,
"BlkioDeviceWriteBps": null,
"BlkioDeviceReadIOps": null,
"BlkioDeviceWriteIOps": null,
"CpuPeriod": 0,
"CpuQuota": 0,
"CpuRealtimePeriod": 0,
"CpuRealtimeRuntime": 0,
"CpusetCpus": "",
"CpusetMems": "",
"Devices": null,
"DeviceCgroupRules": null,
"DiskQuota": 0,
"KernelMemory": 0,
"MemoryReservation": 157286400,
"MemorySwap": 314572800,
"MemorySwappiness": 0,
"OomKillDisable": false,
"PidsLimit": 0,
"Ulimits": [
{
"Name": "nofile",
"Hard": 4096,
"Soft": 1024
}
],
"CpuCount": 0,
"CpuPercent": 0,
"IOMaximumIOps": 0,
"IOMaximumBandwidth": 0
},
"GraphDriver": {
"Data": {
"DeviceId": "6798",
"DeviceName": "docker-202:1-263286-a55538d948074e8082593e29d08ea4d2d57bd81fe3efa500d2f2d4a54c080ad8",
"DeviceSize": "10737418240"
},
"Name": "devicemapper"
},
"Mounts": [
{
"Type": "bind",
"Source": "/var/example-uploads",
"Destination": "/var/www/example/web/uploads",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
},
{
"Type": "volume",
"Name": "ef69e07b38ac24fb9d371db44261cb259f8627b99a0bcbb1976a039c1c31deb1",
"Source": "/var/lib/docker/volumes/ef69e07b38ac24fb9d371db44261cb259f8627b99a0bcbb1976a039c1c31deb1/_data",
"Destination": "/var/composer-cache",
"Driver": "local",
"Mode": "",
"RW": true,
"Propagation": ""
}
],
"Config": {
"Hostname": "257e5551a323",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"ExposedPorts": {
"9000/tcp": {}
},
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"PHP_MEMORY_LIMIT=128M",
"TZ=Europe/Paris",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
],
"Cmd": [
"/usr/bin/php",
"/var/www/example/symfony",
"mails:task",
"--env=stage"
],
"Image": "124766415242.dkr.ecr.eu-central-1.amazonaws.com/php:PR-1884-1-stage-cron",
"Volumes": {
"/var/composer-cache": {},
"/var/www/example/web/uploads": {}
},
"WorkingDir": "/var/www/example",
"Entrypoint": [
"/usr/local/bin/entrypoint.sh"
],
"OnBuild": null,
"Labels": {
"com.amazonaws.ecs.cluster": "example-stage",
"com.amazonaws.ecs.container-name": "example-stage-mails_task",
"com.amazonaws.ecs.task-arn": "arn:aws:ecs:eu-central-1:124766415242:task/da8ea8c0-a080-44cd-9a32-6f850b797da3",
"com.amazonaws.ecs.task-definition-family": "example-stage-cron-1",
"com.amazonaws.ecs.task-definition-version": "179"
}
},
"NetworkSettings": {
"Bridge": "",
"SandboxID": "c9a25205efbd78c301295f7d950e30468b4b80259c6e99703ec947299fe23612",
"HairpinMode": false,
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"Ports": {
"9000/tcp": null
},
"SandboxKey": "/var/run/docker/netns/c9a25205efbd",
"SecondaryIPAddresses": null,
"SecondaryIPv6Addresses": null,
"EndpointID": "ee47c3b433f27926a17b9c9c4c222f00dfa14a3614c337b71c9f1f3b18c704d1",
"Gateway": "172.17.0.1",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"IPAddress": "172.17.0.7",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"MacAddress": "02:42:ac:11:00:07",
"Networks": {
"bridge": {
"IPAMConfig": null,
"Links": null,
"Aliases": null,
"NetworkID": "dc279972e1950c8f86f380f434f0eb73663c9ab5f40809db776d361833b95f20",
"EndpointID": "ee47c3b433f27926a17b9c9c4c222f00dfa14a3614c337b71c9f1f3b18c704d1",
"Gateway": "172.17.0.1",
"IPAddress": "172.17.0.7",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:42:ac:11:00:07",
"DriverOpts": null
}
}
}
}
]
@gileri I was trying to reproduce this issue with a service that has desired count 1, minimumHealthyPercent=50 and maximumHealthyPercent=100 but wasn't able to reproduce this issue. It would be helpful if you can share the service name, the maximumHealthyPercent, minimumHealthyPercent and the agent logs that experienced this issue? You can send it to me: penyin (at) amazon.com
Thanks,
Peng
Sorry, I wrongly formatted the code blocks ; they are now fixed and include :
Minimum healthy percent = 0%
Maximum healthy percent = 100%
Desired count = 1
I removed sensitive informations from the ecs-agent logs downloadable here.
The service name is example-stage-cron-1.
Hi @gileri,
Sorry for my late response.
I investigated on the logs and found out the root cause: the task is started and then there is a stop immediately. Agent sets the container to be stopped, and then docker sends a docker change event to Agent indicating the container is running, Agent is supposed to stop the container again in this case, and this is handle by this go routine. However, it only handles once due to some reasons.
I will mark it as a bug.
Thanks,
Haikuo
Thank you @haikuoliu for the analysis ! I'm not familiar with go or ECS code but I sure can provide additional debug logs or tests.
@gileri
I think the logs that you provided are enough, the bug seems clear there and we will let you know when we fix it.
I saw from logs that the containers in your task gets stopped too quick, this will cause the bug that container cannot be stopped. Try to avoid this situation will be a mitigation.
Thanks for bringing this to our attention!
closing issue, fix is included with latest release.