Amazon-ecs-agent: Automated images cleanup is not working

Created on 12 Oct 2016  路  9Comments  路  Source: aws/amazon-ecs-agent

We use latest version of ECS-agent (1.13.0) on latest version of ECS-optimized AMI, but our images don't cleaned up from server.
For example on time of writing 4d270ab01a37 image was created 6 hours ago:

[ec2-user@ip-10-1-0-111 ~]$ docker images | grep 4d270ab01a37
<none>        <none>              4d270ab01a37        6 hours ago         166 MB

Not very useful info in ecs log:

[ec2-user@ip-10-1-0-111 ~]$ cat /var/log/ecs/ecs-agent.log.2016* | grep 4d270ab01a37
2016-10-12T08:16:12Z [INFO] Adding image name- 685238703098.dkr.ecr.us-east-1.amazonaws.com/academy/code:latest to Image state- sha256:4d270ab01a37680b1b9a6b8a557acf97309248c0c7dedc70fe8bf510b847d00c
2016-10-12T08:16:12Z [INFO] Updating container reference code in Image State - sha256:4d270ab01a37680b1b9a6b8a557acf97309248c0c7dedc70fe8bf510b847d00c
2016-10-12T08:16:12Z [INFO] Updating container reference code in Image State - sha256:4d270ab01a37680b1b9a6b8a557acf97309248c0c7dedc70fe8bf510b847d00c
2016-10-12T08:16:18Z [INFO] Updating container reference code in Image State - sha256:4d270ab01a37680b1b9a6b8a557acf97309248c0c7dedc70fe8bf510b847d00c
2016-10-12T08:16:18Z [INFO] Updating container reference code in Image State - sha256:4d270ab01a37680b1b9a6b8a557acf97309248c0c7dedc70fe8bf510b847d00c
2016-10-12T08:16:19Z [INFO] Updating container reference code in Image State - sha256:4d270ab01a37680b1b9a6b8a557acf97309248c0c7dedc70fe8bf510b847d00c
2016-10-12T08:16:19Z [INFO] Updating container reference code in Image State - sha256:4d270ab01a37680b1b9a6b8a557acf97309248c0c7dedc70fe8bf510b847d00c
2016-10-12T08:16:23Z [INFO] Updating container reference code in Image State - sha256:4d270ab01a37680b1b9a6b8a557acf97309248c0c7dedc70fe8bf510b847d00c
2016-10-12T08:17:36Z [INFO] Updating container reference code in Image State - sha256:4d270ab01a37680b1b9a6b8a557acf97309248c0c7dedc70fe8bf510b847d00c
2016-10-12T08:20:27Z [INFO] Updating container reference code in Image State - sha256:4d270ab01a37680b1b9a6b8a557acf97309248c0c7dedc70fe8bf510b847d00c

and not log records in docker log, related to this image id.

Could you see any reason why this is happening?

kinbug

Most helpful comment

@GeyseR Thanks for providing the logs. After taking a look at the logs, we found there is a scenario (where tasks updated with the same image name but different image id) that won't work for image cleanup, we are working on the fix, and will let you know if there is an update.

All 9 comments

@GeyseR The image cleanup is based on the image pulled time, not the created time. As from the document, if only the image is not used by any container, and the image has been pulled before ECS_IMAGE_MINIMUM_CLEANUP_AGE(default is 1 hour), will the image be removed when agent is performing cleanup. And agent perform the cleanup periodically set by ECS_IMAGE_CLEANUP_INTERVAL(default is 30 minutes).

From the logs you provided above, the image is pulled at 2016-10-12T08:16:12Z, and seems the containers using this image isn't removed, so the image won't be removed. The containers will be removed by the agent will based on the environment variable ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION.

The document has detailed all the environment variable settings for image cleanup. Please reach to us if you have any other concerns.

Hi, @richardpen!

seems the containers using this image isn't removed, so the image won't be removed

we have a lot of dangling images

[ec2-user@ip-10-1-0-111 ~]$ docker images -f dangling=true | wc -l
44

and most of them are not using by any container running or stopped.
For example 4d270ab01a37 is still here, but not used:

[ec2-user@ip-10-1-0-111 ~]$ docker ps -a | grep 4d270ab01a37

So for me described logic in docs for removing obsolete images still seems broken for us.
May be i can help you in some way to solve this issue?

@GeyseR The image cleanup is depending on several parameters, in order to help me understand what's happening in the agent, could you provide me the following information?

  • ecs-agent logs in /var/log/ecs/
  • docker inspect ecs-agent

If you're not comfortable publish here, please send to me at penyin (at) amazon.com
Thanks

@GeyseR Thanks for providing the logs. After taking a look at the logs, we found there is a scenario (where tasks updated with the same image name but different image id) that won't work for image cleanup, we are working on the fix, and will let you know if there is an update.

@GeyseR We have release v1.13.1 today, this issue should be fixed in the latest version. I'm closing this issue now, feel free to reopen it if you still run into this problem.

Thanks!

Hello @richardpen

I seem have the same problem.
When I push the image with the same tag and the old image still keep in ECS and replace with 'null' imageTag.

So if I would like to clean those 'null' imageTag in ECS.
Do I have to install 'amazon-ecs-agent' on my client side?
Is there any some way implement in ECS server side?

Thanks very much.

Hong

@yanhongwang Please correct me if I'm wrong, I think you want to delete the "null" image in ECR, right? I'd suggest you take a look at the Amazon ECR Lifecycle Policies, where you can use Tag Status to manage the untagged image.

Thanks,
Peng

Hello @richardpen,

This appears to still be an issue for me. I believe what @yanhongwang was referring to here was not the "null" image in ECR, but rather the "null" image that appears in the EC2 instance that is supporting the ECS cluster. I have provided the output from an SSH session with an EC2 instance in my ECS cluster below. You can see 3 separate image deploys. Each of the images comes from the same repository and shares the same tag of stage. Once I do a new deploy, the tag from the previous deploy's image changes to <none> on the EC2 instance. All of my containers are running the most recent image, which is tagged as stage below.

[ec2-user@ip-000-00-00-000 ~]$ sudo docker images
REPOSITORY                                                   TAG                 IMAGE ID            CREATED             SIZE
000000000000.dkr.ecr.us-east-2.amazonaws.com/test-testtest   stage               5d6d3a8c6679        18 minutes ago      1.8GB
000000000000.dkr.ecr.us-east-2.amazonaws.com/test-testtest   <none>              fa7865768de3        2 hours ago         1.8GB
000000000000.dkr.ecr.us-east-2.amazonaws.com/test-testtest   <none>              530ef5a1efa3        2 days ago          1.8GB
amazon/amazon-ecs-agent                                      latest              e8693c7b1f0c        3 months ago        58.7MB

I have also included the ECS agent log input from the EC2 instance, where we can see it states that there are no eligible images for deletion, despite the fact I believe the Docker images with IMAGE ID <none> should be eligible for deletion since no containers are utilizing these images.

[ec2-user@ip-000-00-00-000 ~]$ cat /var/log/ecs/ecs-agent.log | grep eligible
level=info time=2021-03-22T19:25:50Z msg="Begin building map of eligible unused images for deletion" module=docker_image_manager.go
level=info time=2021-03-22T19:25:50Z msg="No eligible images for deletion for this cleanup cycle" module=docker_image_manager.go
level=info time=2021-03-22T19:25:50Z msg="End of eligible images for deletion: No more eligible images for deletion; Still have 2 image states being managed" module=docker_image_manager.go
level=info time=2021-03-22T19:55:50Z msg="Begin building map of eligible unused images for deletion" module=docker_image_manager.go
level=info time=2021-03-22T19:55:50Z msg="No eligible images for deletion for this cleanup cycle" module=docker_image_manager.go
level=info time=2021-03-22T19:55:50Z msg="End of eligible images for deletion: No more eligible images for deletion; Still have 3 image states being managed" module=docker_image_manager.go

My ECS dashboard states that I am running Agent version 1.50.2 and Docker version 19.03.13-ce, therefore I am led to believe that the fix discussed previously in this thread should be present in the agent I am running.

I will also include what my /etc/ecs/ecs.config file looks like below.

ECS_CLUSTER=XXXX-XXXXXXXX-XXXXX
ECS_RESERVED_MEMORY=256
ECS_DISABLE_IMAGE_CLEANUP=false
EOF

Any thoughts or solutions on how to further debug this would be greatly appreciated. I have been reading these docs as well as these docs while I have been debugging this.

_EDIT: 2021-06-08
This ended problem ended up being related to Docker containers. The instances that I was running this on had dangling Docker containers, which even though they were stopped, were keeping the automated ECS cleanup process from deleting the untagged Docker images. If you have no dangling Docker containers, make sure to delete them and the ECS automated cleanup process should take care of the images._

Was this page helpful?
0 / 5 - 0 ratings