Howdy,
Currently the ECS-Agent does not remove old, un-used Docker Images. Over time, this could cause issues with space.
@JasonSwindle if it helps this is what we're using for now https://github.com/meltwater/docker-cleanup. The main problem is you can't schedule containers like this 1:1 with each host, so we've been adding a bunch with docker run but otherwise it works pretty well
EDIT: It does work pretty well but it has a bit of a gotcha, when you restart dockerd it may remove a container or two before they start which is not really ideal
+1 for this feature. Old unused containers should be cleaned up when new one are being used. Previous one should be preserved for rapid rollback.
+1 for image removal
+1
+1 exactly what I wondered about today!
+1, but minor (docker-cleanup and friends are a good workaround)
+1, issue is much more apparent when this is used on smaller instances like the RancherOS AMI with 8gb of space. We tried using tutumcloud/cleanup at first but it ended up breaking something with docker itself. meltwater/docker-cleanup works like a charm
+1, cleanup should be an option for ecs agent
+1
+1
This Task Definition should help in the clean-up process.
ecs should upgrade Docker to version 1.10 which doesn't rely on intermediary layers being present. We are seeing every intermediary layer being a dependency and can't delete old images. So a 1GB deployment eventually fills up all 20GB of storage, making deployments impossible
Agent version 1.8.1
Docker version 1.9.1
+1, this is really needed.
+1 please, please go for it!
+1 This just bit me in a production environment - ran out of logical volume space and dmeventd went nuts eating loads of CPU for days on end. Manually clearing out old unused images fixed the issue - I need to happen automatically in the agent!\
spoke with aws support, their suggested approach for now is to have ecs-agent version 1.8.1,
add ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=5m either in /etc/ecs/ecs.config or as a -e option when you run docker run ecs-agent
and then have a crontab job on the host, something like:
0 0 * * * docker images -q | xargs -l --no-run-if-empty docker rmi
which will remove any images not being used by running containers every day at 12am - can set it to every 5m to follow a hyperactive cleanup task if you want.
for reference,
ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION - this is a value that specifies how often to clean up "dead" tasks, the default is set to 3 hours and the minimum respected value is 1m, anything less is not possible.
So I didn't bother setting this cos 3 hours default would still mean my stale images wont build up too much from the 24hr cron job.
@ashramsey Here's the detailed blog post regarding the same.
The approach @ashramsey suggested is great, thanks for that. :smile:
I only have one complaint... running docker images -q | xargs --no-run-if-empty docker rmi removes all the REPOSITORY and TAG information from my images. Is there a way to prevent that from happening?
[ec2-user@ip-10-0-16-139 ~]$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
wpsinc/novus-acp-laravel v154 6a0b28834462 2 hours ago 521.9 MB
wpsinc/novus-api-laravel v873 c3a30f0e4cfc 24 hours ago 558.9 MB
wpsinc/ad-map-laravel v34 2dc593b48671 45 hours ago 519.8 MB
wpsinc/alchemy-laravel v724 49bf5abfe00e 46 hours ago 552.1 MB
wpsinc/hdtwin-laravel v100 0e0f0752ed23 47 hours ago 533.6 MB
wpsinc/gmaxhelmet-legacy v23 e33f091fc1eb 8 days ago 404.9 MB
wpsinc/subsidiaries-drupal v200 2cfe5ec45292 10 days ago 647.8 MB
wpsinc/calypso-laravel v87 08800bf41a49 10 days ago 526.4 MB
wpsinc/gmaxhelmet-apache-php latest 5ed9e4e3d425 10 days ago 541.5 MB
wpsinc/hdtwin-php-fpm latest 3f6db539e3b0 2 weeks ago 518.7 MB
[ec2-user@ip-10-0-16-139 ~]$ docker images -q | xargs --no-run-if-empty docker rmi
This works great and removes any unused images. The problem is the images that are used by running containers have their info stripped from them.
[ec2-user@ip-10-0-16-139 ~]$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
<none> <none> 6a0b28834462 2 hours ago 521.9 MB
<none> <none> c3a30f0e4cfc 24 hours ago 558.9 MB
<none> <none> 2dc593b48671 45 hours ago 519.8 MB
<none> <none> 49bf5abfe00e 46 hours ago 552.1 MB
<none> <none> 0e0f0752ed23 47 hours ago 533.6 MB
<none> <none> e33f091fc1eb 8 days ago 404.9 MB
<none> <none> 2cfe5ec45292 10 days ago 647.8 MB
<none> <none> 08800bf41a49 10 days ago 526.4 MB
<none> <none> 5ed9e4e3d425 10 days ago 541.5 MB
<none> <none> 3f6db539e3b0 2 weeks ago 518.7 MB
Of course, the ideal solution would be if there were ECS settings we could configure to remove old unused Docker images at a specified frequency or certain rules we could set up. :+1:
+1
+1
+1
We propose to address this issue by adding support in ECS Agent to perform periodic cleanup of images in Container Instances. This alleviates the pain of having to manually cleanup container images using the docker rmi command. Additionally, the ECS_IMAGE_CLEANUP_ENABLED flag can be used to disable the automatic image cleanup in the ECS Agent.
The time interval for image cleanup will be based on the value of existing ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION environment variable (default - 3 hours) and removeContainerTimeout (5 minutes) . The default value for this is set at 3 hours + 5 minutes + additional wait of 5 minutes + jitter.
When ECS Agent removes containers as part of task cleanup, it also runs image cleanup in the background to remove unused images.
Docker Images that do not have any containers referencing them are considered as candidates for removal.
To avoid removing images that have been just pulled, a minimumAge time factor (default - 1 hour) is attached to an image, before which it cannot be removed from the instance.
The policy of LRU (Least Recently Used) is applied to determine the order of image deletion. i.e. The time when an image was last referenced by a container is used to determine the order of its removal.
Will there be a way to omit images from removal. More ephemeral tasks like build images I believe would be good candidates for omitting from this cleanup.
@alexwen We plan to remove only the unused images that have been pulled by the ECS Agent in the Container Instance. If your build images are not pulled by the agent, they are not removed.
This is a bit of an annoyance since cleaning up images has to be done manually once in a while. The ECS agent should simply track the image IDs it has put in and delete them when no longer used (service update, task def change). Without having to resort to arbitrary timeouts.
The automated image clean up feature has been released in v1.13.0.
@richardpen this is great, is there any needed configuration to enable this?
@hopperd Sorry, I forgot to link the doc, the document here has detailed all the configuration for image cleanup in ECS Agent.
Short answer, the automated image cleanup is enabled by default.
@richardpen excellent, thank you very much!
Awesome! Does the new agent remove unwanted ‘dangling’ volumes as well by any chance?
I woke up this morning to some of our infrastructure down. One of the engineers did a code push which caused ecs to re-deploy and our servers ran out of space. The issue was images taking up the entire drive. I am running ECS Agent 1.17.1. Any ideas why this happened
Hi @Littlejd1997, can you please create a new issue with more details (as per the issue template, which you'll seen when you go to create a new issue) for this? We'd like to avoid conflating multiple issues under a single 'github issue'.
Thanks,
Anirudh
Most helpful comment
Automated Image cleanup in ECS instances
We propose to address this issue by adding support in ECS Agent to perform periodic cleanup of images in Container Instances. This alleviates the pain of having to manually cleanup container images using the
docker rmicommand. Additionally, theECS_IMAGE_CLEANUP_ENABLEDflag can be used to disable the automatic image cleanup in the ECS Agent.Frequency of Cleanup
The time interval for image cleanup will be based on the value of existing
ECS_ENGINE_TASK_CLEANUP_WAIT_DURATIONenvironment variable (default - 3 hours) andremoveContainerTimeout(5 minutes) . The default value for this is set at 3 hours + 5 minutes + additional wait of 5 minutes + jitter.When ECS Agent removes containers as part of task cleanup, it also runs image cleanup in the background to remove unused images.
Choosing Images for Deletion
Docker Images that do not have any containers referencing them are considered as candidates for removal.
To avoid removing images that have been just pulled, a
minimumAgetime factor (default - 1 hour) is attached to an image, before which it cannot be removed from the instance.Order of Deletion
The policy of LRU (Least Recently Used) is applied to determine the order of image deletion. i.e. The time when an image was last referenced by a container is used to determine the order of its removal.