Automatic Docker images cleanup is not working when container not managed by ECS uses an image under ECS control.
In our setup, we often use ad-hoc application containers running manually on production server based on our application image (it contains source code, production settings for connecting to DB, Cache, etc.). Sometimes the connection to the server hangs up and such containers can stay running for a long time.
From my understanding, the agent looks for images which are not referenced by containers managed by ECS, founds those images and ends up with the next error when tries to remove them:
/var/log/ecs/ecs-agent.log.2019-04-08-23:402:2019-04-08T23:41:53Z [ERROR] Error removing Image sha256:<managed image ID> - Error response from daemon: conflict: unable to delete d8e7da55cb7c (cannot be forced) - image is being used by running container <ad-hoc container ID>
The managed image should be skipped during cleanup
The managed image is not skipped during cleanup and other stale images not cleared because of this.
Let me know if you need any additional details
Hi @GeyseR ,
Thanks for reporting the issue. This seems like a bug in the agent, and I think the correct behavior is to continue deleting other stale images when we failed to delete a certain image. I will mark this as a bug, and we will work on implementing a fix for this.
This has been fixed as part of an unrelated change (#2023)
Keep in mind that you may still see logging messages for containers that could not be deleted. There is no way for us to skip these images without first trying to delete them.
They will not, however, count against the number of images that are allowed to be deleted per cleanup cycle. The number of images that can get deleted per cycle is configurable using the ECS_NUM_IMAGES_DELETE_PER_CYCLE parameter (see https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-config.html) and defaults to 5.
Hi @sparrc, I've checked the latest ECS-agent version and it worked really well for the described case.
Thanks!
Most helpful comment
Hi @GeyseR ,
Thanks for reporting the issue. This seems like a bug in the agent, and I think the correct behavior is to continue deleting other stale images when we failed to delete a certain image. I will mark this as a bug, and we will work on implementing a fix for this.