Amazon-ecs-agent: Image Removal

Created on 18 Jun 2015  Â·  32Comments  Â·  Source: aws/amazon-ecs-agent

Howdy,

Currently the ECS-Agent does not remove old, un-used Docker Images. Over time, this could cause issues with space.

kinenhancement

Most helpful comment

Automated Image cleanup in ECS instances

We propose to address this issue by adding support in ECS Agent to perform periodic cleanup of images in Container Instances. This alleviates the pain of having to manually cleanup container images using the docker rmi command. Additionally, the ECS_IMAGE_CLEANUP_ENABLED flag can be used to disable the automatic image cleanup in the ECS Agent.

Frequency of Cleanup

The time interval for image cleanup will be based on the value of existing ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION environment variable (default - 3 hours) and removeContainerTimeout (5 minutes) . The default value for this is set at 3 hours + 5 minutes + additional wait of 5 minutes + jitter.

When ECS Agent removes containers as part of task cleanup, it also runs image cleanup in the background to remove unused images.

Choosing Images for Deletion

Docker Images that do not have any containers referencing them are considered as candidates for removal.

To avoid removing images that have been just pulled, a minimumAge time factor (default - 1 hour) is attached to an image, before which it cannot be removed from the instance.

Order of Deletion

The policy of LRU (Least Recently Used) is applied to determine the order of image deletion. i.e. The time when an image was last referenced by a container is used to determine the order of its removal.

All 32 comments

@JasonSwindle if it helps this is what we're using for now https://github.com/meltwater/docker-cleanup. The main problem is you can't schedule containers like this 1:1 with each host, so we've been adding a bunch with docker run but otherwise it works pretty well

EDIT: It does work pretty well but it has a bit of a gotcha, when you restart dockerd it may remove a container or two before they start which is not really ideal

+1 for this feature. Old unused containers should be cleaned up when new one are being used. Previous one should be preserved for rapid rollback.

+1 for image removal

+1

+1 exactly what I wondered about today!

+1, but minor (docker-cleanup and friends are a good workaround)

+1, issue is much more apparent when this is used on smaller instances like the RancherOS AMI with 8gb of space. We tried using tutumcloud/cleanup at first but it ended up breaking something with docker itself. meltwater/docker-cleanup works like a charm

+1, cleanup should be an option for ecs agent

+1

+1

This Task Definition should help in the clean-up process.

ecs should upgrade Docker to version 1.10 which doesn't rely on intermediary layers being present. We are seeing every intermediary layer being a dependency and can't delete old images. So a 1GB deployment eventually fills up all 20GB of storage, making deployments impossible
Agent version 1.8.1
Docker version 1.9.1

+1, this is really needed.

+1 please, please go for it!

+1 This just bit me in a production environment - ran out of logical volume space and dmeventd went nuts eating loads of CPU for days on end. Manually clearing out old unused images fixed the issue - I need to happen automatically in the agent!\

spoke with aws support, their suggested approach for now is to have ecs-agent version 1.8.1,
add ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=5m either in /etc/ecs/ecs.config or as a -e option when you run docker run ecs-agent

and then have a crontab job on the host, something like:
0 0 * * * docker images -q | xargs -l --no-run-if-empty docker rmi

which will remove any images not being used by running containers every day at 12am - can set it to every 5m to follow a hyperactive cleanup task if you want.

for reference,
ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION - this is a value that specifies how often to clean up "dead" tasks, the default is set to 3 hours and the minimum respected value is 1m, anything less is not possible.

So I didn't bother setting this cos 3 hours default would still mean my stale images wont build up too much from the 24hr cron job.

@ashramsey Here's the detailed blog post regarding the same.

The approach @ashramsey suggested is great, thanks for that. :smile:

I only have one complaint... running docker images -q | xargs --no-run-if-empty docker rmi removes all the REPOSITORY and TAG information from my images. Is there a way to prevent that from happening?

Before...

[ec2-user@ip-10-0-16-139 ~]$ docker images
REPOSITORY                     TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
wpsinc/novus-acp-laravel       v154                6a0b28834462        2 hours ago         521.9 MB
wpsinc/novus-api-laravel       v873                c3a30f0e4cfc        24 hours ago        558.9 MB
wpsinc/ad-map-laravel          v34                 2dc593b48671        45 hours ago        519.8 MB
wpsinc/alchemy-laravel         v724                49bf5abfe00e        46 hours ago        552.1 MB
wpsinc/hdtwin-laravel          v100                0e0f0752ed23        47 hours ago        533.6 MB
wpsinc/gmaxhelmet-legacy       v23                 e33f091fc1eb        8 days ago          404.9 MB
wpsinc/subsidiaries-drupal     v200                2cfe5ec45292        10 days ago         647.8 MB
wpsinc/calypso-laravel         v87                 08800bf41a49        10 days ago         526.4 MB
wpsinc/gmaxhelmet-apache-php   latest              5ed9e4e3d425        10 days ago         541.5 MB
wpsinc/hdtwin-php-fpm          latest              3f6db539e3b0        2 weeks ago         518.7 MB

Then I run...

[ec2-user@ip-10-0-16-139 ~]$ docker images -q | xargs --no-run-if-empty docker rmi

This works great and removes any unused images. The problem is the images that are used by running containers have their info stripped from them.

After...

[ec2-user@ip-10-0-16-139 ~]$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
<none>              <none>              6a0b28834462        2 hours ago         521.9 MB
<none>              <none>              c3a30f0e4cfc        24 hours ago        558.9 MB
<none>              <none>              2dc593b48671        45 hours ago        519.8 MB
<none>              <none>              49bf5abfe00e        46 hours ago        552.1 MB
<none>              <none>              0e0f0752ed23        47 hours ago        533.6 MB
<none>              <none>              e33f091fc1eb        8 days ago          404.9 MB
<none>              <none>              2cfe5ec45292        10 days ago         647.8 MB
<none>              <none>              08800bf41a49        10 days ago         526.4 MB
<none>              <none>              5ed9e4e3d425        10 days ago         541.5 MB
<none>              <none>              3f6db539e3b0        2 weeks ago         518.7 MB

Of course, the ideal solution would be if there were ECS settings we could configure to remove old unused Docker images at a specified frequency or certain rules we could set up. :+1:

+1

+1

+1

Automated Image cleanup in ECS instances

We propose to address this issue by adding support in ECS Agent to perform periodic cleanup of images in Container Instances. This alleviates the pain of having to manually cleanup container images using the docker rmi command. Additionally, the ECS_IMAGE_CLEANUP_ENABLED flag can be used to disable the automatic image cleanup in the ECS Agent.

Frequency of Cleanup

The time interval for image cleanup will be based on the value of existing ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION environment variable (default - 3 hours) and removeContainerTimeout (5 minutes) . The default value for this is set at 3 hours + 5 minutes + additional wait of 5 minutes + jitter.

When ECS Agent removes containers as part of task cleanup, it also runs image cleanup in the background to remove unused images.

Choosing Images for Deletion

Docker Images that do not have any containers referencing them are considered as candidates for removal.

To avoid removing images that have been just pulled, a minimumAge time factor (default - 1 hour) is attached to an image, before which it cannot be removed from the instance.

Order of Deletion

The policy of LRU (Least Recently Used) is applied to determine the order of image deletion. i.e. The time when an image was last referenced by a container is used to determine the order of its removal.

Will there be a way to omit images from removal. More ephemeral tasks like build images I believe would be good candidates for omitting from this cleanup.

@alexwen We plan to remove only the unused images that have been pulled by the ECS Agent in the Container Instance. If your build images are not pulled by the agent, they are not removed.

This is a bit of an annoyance since cleaning up images has to be done manually once in a while. The ECS agent should simply track the image IDs it has put in and delete them when no longer used (service update, task def change). Without having to resort to arbitrary timeouts.

The automated image clean up feature has been released in v1.13.0.

@richardpen this is great, is there any needed configuration to enable this?

@hopperd Sorry, I forgot to link the doc, the document here has detailed all the configuration for image cleanup in ECS Agent.

Short answer, the automated image cleanup is enabled by default.

@richardpen excellent, thank you very much!

Awesome! Does the new agent remove unwanted ‘dangling’ volumes as well by any chance?

I woke up this morning to some of our infrastructure down. One of the engineers did a code push which caused ecs to re-deploy and our servers ran out of space. The issue was images taking up the entire drive. I am running ECS Agent 1.17.1. Any ideas why this happened

Hi @Littlejd1997, can you please create a new issue with more details (as per the issue template, which you'll seen when you go to create a new issue) for this? We'd like to avoid conflating multiple issues under a single 'github issue'.

Thanks,
Anirudh

Was this page helpful?
0 / 5 - 0 ratings