Amazon-ecs-agent: /proc/self/cgroup naming convention changed from default "docker" to "ecs"

Created on 30 Nov 2017 · 12Comments · Source: aws/amazon-ecs-agent

Summary

The default prefix for a docker container is "docker" followed by the image name / sha. Several libraries determine the container id by parsing /proc/self/cgroup and expect "docker/container_id". This is one of the libraries that I use that after updating to the latest docker agent, my proxy no longer could get configured since it relies on determining the container id using the default values for docker. https://github.com/jwilder/docker-gen/blob/master/context.go#L172

I could not see any specific reason for why the new "ecs/task-arn" prefix was introduced in this commit f09f0f56f21696cc and would like to have it set back to the default.

Expected Behavior

9:perf_event:/docker/container-sha
8:memory:/docker/container-sha
7:hugetlb:/docker/container-sha
6:freezer:/docker/container-sha
5:devices:/docker/container-sha
4:cpuset:/docker/container-sha
3:cpuacct:/docker/container-sha
2:cpu:/docker/container-sha
1:blkio:/docker/container-sha

Observed Behavior

9:perf_event:/ecs/task-arn/container-sha
8:memory:/ecs/task-arn/container-sha
7:hugetlb:/ecs/task-arn/container-sha
6:freezer:/ecs/task-arn/container-sha
5:devices:/ecs/task-arn/container-sha
4:cpuset:/ecs/task-arn/container-sha
3:cpuacct:/ecs/task-arn/container-sha
2:cpu:/ecs/task-arn/container-sha
1:blkio:/ecs/task-arn/container-sha

Environment Details

ecs agent 1.16.0

Supporting Log Snippets

kinbug olinux scopECS Agent workaround available

Source

sidewaysgravity

👍6

Most helpful comment

Just throwing my two cents in here, but we have just encountered an issue as a result of the change referenced by @sidewaysgravity above.

We are running Jenkins inside a Docker container on an ECS cluster and this change is preventing our build jobs from running any Docker builds. Specifically, this line in an essential plugin is now rendered defunct:

https://github.com/jenkinsci/docker-workflow-plugin/blob/master/src/main/java/org/jenkinsci/plugins/docker/workflow/client/DockerClient.java#L311

Because of this change, Jenkins now believes that it's no longer running inside a container, and that in turn alters the behaviour of how the job is launched which ultimately causes it to fail.

I take your point @samuelkarp that whilst the output from /proc/self/cgroup is an implementation detail rather than part of a concrete interface, a search of techniques to detect whether an application is running in a Docker container and to fetch a running containers ID from within that container shows that this technique is by far the most commonly used.

I think it would be useful to re-open this issue and continue the discussion around this change, I worry that it's been implemented without full consideration given to the consequences of it on existing containers that depend on the (_fairly standard_) output of /proc/self/cgroup, especially since there currently doesn't appear to be any other equally clean / ubiquitous ways of doing the same thing.

danielgrant on 11 Dec 2017

👍6

All 12 comments

Hi @sidewaysgravity, thanks for opening this issue.

The cgroup prefix was changed to include the task ARN so that containers in the same task have a common cgroup prefix and cgroup limits can be set for a task together (cgroups are hierarchical in nature, and this enables limits to be set in a higher path and have it apply to all of the cgroups under that common root). This is used in the new task size parameters in the task definition. The prefix ahead of the task ARN was changed from docker to ecs to avoid colliding with containers that were started outside of ECS.

Several libraries determine the container id by parsing /proc/self/cgroup and expect "docker/container_id".

I would generally recommend against this approach. Having the container ID in the cgroup path (at all) and having docker as the cgroup path prefix is an implementation detail of Docker rather than something defined in a concrete interface. This is similar to how (by default) Docker will set the hostname of a container to be the container ID, but again that's just an implementation detail and not part of the formal interface.

Container introspection in Docker is still a bit of an open question. You can find a discussion about that in https://github.com/moby/moby/pull/26331.

We recently added a supported interface in ECS for discovering information about your task from within a container. This will be a stable interface and is not an implementation detail. You can find documentation of the container metadata feature here.

samuelkarp on 30 Nov 2017

@samuelkarp Thanks for the quick reply and links. Hopefully the docker introspection can get a resolution sooner than later. I have opened up an issue on docker-gen (https://github.com/jwilder/docker-gen/issues/263) to discuss a possible solution on their end.

sidewaysgravity on 30 Nov 2017

Just throwing my two cents in here, but we have just encountered an issue as a result of the change referenced by @sidewaysgravity above.

https://github.com/jenkinsci/docker-workflow-plugin/blob/master/src/main/java/org/jenkinsci/plugins/docker/workflow/client/DockerClient.java#L311

Because of this change, Jenkins now believes that it's no longer running inside a container, and that in turn alters the behaviour of how the job is launched which ultimately causes it to fail.

danielgrant on 11 Dec 2017

👍6

@danielgrant I closed the issue but I can open it back up to continue the discussion. I agree to your point that the /proc/self/cgroup does seem to be the most consistent way to determine if you are running inside a container.

sidewaysgravity on 13 Dec 2017

@sidewaysgravity @danielgrant Thanks for providing those examples!

If you aren't using the task size (task-level cpu and memory limits), you can disable the ecs hierarchy by setting ECS_ENABLE_TASK_CPU_MEM_LIMIT=false. That should cause docker to revert to using the docker hierarchy.

petderek on 13 Dec 2017

👍5

@petderek thanks for the tip, I've set ECS_ENABLE_TASK_CPU_MEM_LIMIT to false as per your suggestion and I can confirm that this does indeed revert the /proc/self/cgroup output to use the "normal" Docker output.

This workaround should suffice for now for the use case I highlighted above, but obviously this is going to continue to cause problems for people who might want the "normal" Docker /proc/self/cgroup output and the new task size parameters.

danielgrant on 14 Dec 2017

👍1

...this is going to continue to cause problems for people who might want the "normal" Docker /proc/self/cgroup output _and_ the new task size parameters.

@danielgrant Precisely -- and that won't have a clear cut solution because the task size feature requires using the custom cgroup hierarchy. We will need to look at the solutions that @samuelkarp pointed out in that scenario.

petderek on 14 Dec 2017

Thanks @petderek, that worked for me as well.

sidewaysgravity on 15 Dec 2017

@petderek : Could you kindly elaborate on setting ECS_ENABLE_TASK_CPU_MEM_LIMIT=false ?
Where exactly do I need to set that ?

I tried setting it in the AWS console as an environment variable for the jwilder/nginx-proxy container and while it does become effective, it has no effect ;-):

[ec2-user@ip-172-22-9-162 ~]$ docker system prune --force
Total reclaimed space: 0B

[ec2-user@ip-172-22-9-162 ~]$ proxy_cont_name="$(docker ps -a | grep 'jwilder/nginx-proxy' | sed 's/.* //g')"

[ec2-user@ip-172-22-9-162 ~]$ echo "${proxy_cont_name}"
ecs-Proxy-4-Proxy-80e9898abb89df849101

[ec2-user@ip-172-22-9-162 ~]$ docker exec "${proxy_cont_name}" env | sort
DOCKER_GEN_VERSION=0.7.3
DOCKER_HOST=unix:///tmp/docker.sock
ECS_ENABLE_TASK_CPU_MEM_LIMIT=false         # <-- env var setting is active
HOME=/root
HOSTNAME=7e9139716d9e
NGINX_VERSION=1.13.8-1~stretch
NJS_VERSION=1.13.8.0.1.15-1~stretch
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

[ec2-user@ip-172-22-9-162 ~]$ docker exec "${proxy_cont_name}" cat /proc/self/cgroup | sort
1:blkio:/ecs/eb9d3d0c-8936-42d7-80d8-f82b2f1a629e/7e9139716d9e5d762d22f9f877b87d1be8b1449ac912c025a984750c5dbff157
2:cpu:/ecs/eb9d3d0c-8936-42d7-80d8-f82b2f1a629e/7e9139716d9e5d762d22f9f877b87d1be8b1449ac912c025a984750c5dbff157
3:cpuacct:/ecs/eb9d3d0c-8936-42d7-80d8-f82b2f1a629e/7e9139716d9e5d762d22f9f877b87d1be8b1449ac912c025a984750c5dbff157
    ...

--> paths start with /ecs :-(

From https://github.com/aws/amazon-ecs-agent#environment-variables I get the impression I need to set the env var not in the nginx proxy container, but in the AWS agent container. This container doesn't seem to be exposed for user modification in the AWS console, so I can't set the env var in the same way as above. Also, none of the usual shell tools are available, so I can't inspect that container in the same way as above.

Do I need to run my own Amazon Linux AMI ?
Is it possible at all to set environment variables when using the 'standard' AWS ECS ?

Update: Found out myself, adding info for posterity and anyone in the same situation as myself:
Add the env var to /etc/ecs/ecs.config on the EC2 instance that runs as Docker container host; this instance is usually created by AWS as part of the ECS cluster setup:

[ec2-user@ip-172-22-9-162 ~]$ cat /etc/ecs/ecs.config 
ECS_CLUSTER=<cluster name set when creating the cluster in AWS console>
ECS_ENABLE_TASK_CPU_MEM_LIMIT=false

--> restart nginx proxy container (i.e. EC2 task) using AWS console

[ec2-user@ip-172-22-9-162 ~]$ proxy_cont_name="$(docker ps -a | grep 'jwilder/nginx-proxy' | sed 's/.* //g')"
[ec2-user@ip-172-22-9-162 ~]$ docker exec "${proxy_cont_name}" cat /proc/self/cgroup | sort
1:blkio:/docker/a4d00c9dd675d67f866c786181419e1b44832d4696780152e61afd44a3e02856
2:cpu:/docker/a4d00c9dd675d67f866c786181419e1b44832d4696780152e61afd44a3e02856
3:cpuacct:/docker/a4d00c9dd675d67f866c786181419e1b44832d4696780152e61afd44a3e02856
    ...

--> paths start with /docker :-)

20180214 Amendment: It seems required to reboot the EC2 host for the change to come into effect; it might suffice to restart amazon-ecs-agent container.

z00m1n on 14 Jan 2018

👍2

Making this change does not require a reboot. stop ecs; start ecs will reload your settings from ecs.config.

jniedrauer on 1 Jul 2018

👍1

closing this issue since we currently do not have plans to change the custom paths for cgroups when using task limits feature.

the workaround for this would be to explicitly disable the feature with ECS_ENABLE_TASK_CPU_MEM_LIMIT=false in ecs.config and restart the ecs agent.

adnxn on 15 Oct 2019

I probably ran into the issue of inconsistent/non-standardized cgroup naming conventions now.
Our scenario is as follows: We are using the DataDog Tracer Library for JavaScript [1] which relies on the container-info package [2][3] which is fairly popular nowadays.

The container-info package reads /proc/${pid}/cgroup and parses the lines using the following regexes: [4]

const uuidSource = '[0-9a-f]{8}[-_][0-9a-f]{4}[-_][0-9a-f]{4}[-_][0-9a-f]{4}[-_][0-9a-f]{12}'
const containerSource = '[0-9a-f]{64}'

I noticed that our container running on AWS Fargate reports the following path: /ecs/7f57e3e838b74301abd12f863c7c093a/7f57e3e838b74301abd12f863c7c093a-3653333083.

Apparently, Fargate uses a path which matches the following regex: [0-9a-f]{32}-[0-9]{10}

Q: Is this something Fargate-specific or was the cgroup entry changed for all containers running on ECS??

[1] https://github.com/DataDog/dd-trace-js
[2] https://www.npmjs.com/package/container-info
[3] https://github.com/Qard/container-info/
[4] https://github.com/Qard/container-info/blob/master/index.js#L5

@sidewaysgravity @samuelkarp

MartinLoeper on 17 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Container instance and agent not cleaned up on unclean shutdown

acmcelwee · 4Comments

HostPort not present in ECS Task Metadata Endpoint response with bridge network type

MartinMitro · 3Comments

Cleanup is not working when ECS mamanged image is running in non-managed container

GeyseR · 3Comments

Logentries docker driver

AbelGuti · 5Comments

ECS Agent 1.36.0 becomes unhealthy, resulting in tasks stuck in pending state

truppert-mdsol · 5Comments