I'm trying to investigate using the recently released task networking for our ECS clusters but I'm not running Amazon Linux on our ECS clusters (standardised with Ubuntu across everything instead).
This seems to not work by design although the docs weren't especially clear here, suggesting at first that you just needed ECS agent 1.15 for this to work.
On Ubuntu 16.04 host running Docker 17.09.0-ce and ECS agent 1.15.1 running an awsvpc networking task originally threw the following error in the ECS console when scheduling a task:
service test-cni was unable to place a task because no container instance met all of its requirements. The closest matching container-instance 9d04bbf1-b1e9-499b-9f5f-ebb7a1af0c92 is missing an attribute required by your task.
After reading the docs a little but more and seeing that there is also a requirement of the ecs-init package and talking to support to confirm this requirement I took a look at what the ecs-init package is doing and found it bind mounts a few extra volumes, adds some linux capabilites and sets ECS_ENABLE_TASK_ENI=true in the ECS config.
I changed my systemd unit file to:
[Unit]
Description=ECS Agent
Requires=docker.service
After=docker.service cloud-final.service
[Service]
Restart=always
ExecStart=/usr/bin/docker run --name ecs-agent \
--privileged \
--restart=on-failure:10 \
--volume=/var/run:/var/run \
--volume=/var/log/ecs/:/log \
--volume=/var/lib/ecs/data:/data \
--volume=/etc/ecs:/etc/ecs \
--volume=/proc:/host/proc:ro \
--volume=/var/lib/ecs/dhclient:/var/lib/ecs/dhclient \
--volume=/lib64:/lib64:ro \
--volume=/sbin:/sbin:ro \
--cap-add=NET_ADMIN \
--cap-add=SYS_ADMIN \
--net=host \
--env-file=/etc/ecs/ecs.config \
amazon/amazon-ecs-agent:latest
ExecStop=/usr/bin/docker rm -f ecs-agent
[Install]
WantedBy=default.target
and added the ECS_ENABLE_TASK_ENI=true to the ECS config but the ECS agent Docker image then panics with [CRITICAL] Unable to initialize Task ENI dependencies: agent is not started with an init system.
Looking at the source shows an explanation for why it throws: https://github.com/aws/amazon-ecs-agent/blob/c5c0f37ddabf848beb8ad25f0f5f5ffd5bb39740/agent/app/agent_unix.go#L50-L57
Is there a good way to get this to work without needing the ecs-init package? Or do I need to wait for the ecs-init repo to add systemd unit files and enable support for Suse/Ubuntu on task networking (it's currently not built for those distros)?
Task based networking would be a really nice addition but right now, with it restricted to just Amazon Linux (and having to manage these instances) it's not workable for us. Expanding it to cover Ubuntu (and also not just the ancient LTS using upstart) or not having to manage the instances at all (a la GKE/AKE) would be great.
Ubuntu 16.04 host running Docker 17.09.0-ce and ECS agent 1.15.1
service test-cni was unable to place a task because no container instance met all of its requirements. The closest matching container-instance 9d04bbf1-b1e9-499b-9f5f-ebb7a1af0c92 is missing an attribute required by your task.
[CRITICAL] Unable to initialize Task ENI dependencies: agent is not started with an init system
Hey @tomelliff,
We don't officially support Task Networking on Ubuntu, but there may be some things we can try to get it working for you.
Is there a good way to get this to work without needing the ecs-init package?
The Getpid() check is just ensuring that agent isn't running with pid 1. You should be able to accomplish the same using the 鈥攊nit flag on docker run.
Additionally, @nmeyerhans has been testing task networking with a systemd unit file on Debian. I'd recommend looking over that file as well, although we haven't tested it yet on Ubuntu.
...not having to manage the instances at all (a la GKE/AKE) would be great.
Take a look at Fargate, which launched during re:invent. Let us know if that fits your use case!
Fargate sounds great (would need to have a longer look at pricing though) but with the announcement of EKS at the same time I'm probably more inclined to move to Kubernetes at some point but might play with both before I make the switch.
That systemd file looks a bit better than the one I cobbled together although I'm a bit surprised at the setting of environment variables in the systemd unit file rather than allowing those to be set from the env-file (that is also used after being mounted into the ecs-agent container). From quick testing it looks like --env overrides variables provided in --env-file regardless of order:
$ echo FOO=BAZ > /tmp/foo.env
$ echo FOO2=BAZ >> /tmp/foo.env
$ docker run --rm -e FOO=BAR --env-file /tmp/foo.env alpine printenv
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=4e8491f42e72
FOO=BAR
FOO2=BAZ
HOME=/root
@nmeyerhans was there a reasoning behind these at all?
@tomelliff Here's the rationale behind the handling of environment variables in the unit file:
This is obviously a change in behavior from the golang ecs-init package, and part of why we're not recommending the use of the unit file more broadly yet.
@petderek I am a bit confused here. Linux AMIs are supported according to https://github.com/aws/amazon-ecs-agent but not ENIs? ENIs are fundamental for a properly-networked container solution, and having a hardened EC2 instance is fundamental for any production box. I'd expect that any Linux instance can easily have ENIs so that we customers can choose the hardening of our choice. @petderek Could you shed some light? Did I misunderstand something?
Hi @BrunoCarrier @tomelliff, attaching ENIs to instances is definitely supported on all instance types launched into a VPC, irrespective of the instance type. To configure an ENI for containers/tasks, ECS agent depends on tools such as dhclient and on some container capabilities such as the --init flag, SYSTEM_ADMIN and NET_ADMIN capabilities that are provided via Docker. ECS init is a convenient way for the ECS agent to be bootstrapped with all of these so that tasks that require ENIs do not fail during initialization because of missing dependencies/configurations.
When we released this feature last year, we added support in ECS init for doing this for Amazon Linux distribution. The work needed to do this so that we enable this support for other distro's is on our roadmap.
Having said that, I just started ECS agent using @nmeyerhans's unit file and was able to successfully start a task in 'awsvpc' mode. I'm pasting the command for reference as well:
$ cat /etc/ecs/ecs.config
ECS_CLUSTER=ubuntu-task-eni
$ docker run --name ecs-agent \
--init \
--restart=on-failure:10 \
--volume=/var/run:/var/run \
--volume=/var/log/ecs/:/log \
--volume=/var/lib/ecs/data:/data \
--volume=/etc/ecs:/etc/ecs \
--volume=/sbin:/sbin \
--volume=/lib:/lib \
--volume=/lib64:/lib64 \
--volume=/usr/lib:/usr/lib \
--volume=/proc:/host/proc \
--volume=/sys/fs/cgroup:/sys/fs/cgroup \
--volume=/var/lib/ecs/dhclient:/var/lib/dhclient \
--net=host \
--env ECS_LOGFILE=/log/ecs-agent.log \
--env ECS_DATADIR=/data \
--env ECS_UPDATES_ENABLED=false \
--env ECS_AVAILABLE_LOGGING_DRIVERS='["json-file","syslog","awslogs"]' \
--env ECS_ENABLE_TASK_IAM_ROLE=true \
--env ECS_ENABLE_TASK_IAM_ROLE_NETWORK_HOST=true \
--env ECS_UPDATES_ENABLED=true \
--env ECS_ENABLE_TASK_ENI=true \
--env-file=/etc/ecs/ecs.config \
--cap-add=sys_admin \
--cap-add=net_admin \
-d \
amazon/amazon-ecs-agent:latest
I have created an issue in the ECS init repo for the same as well: https://github.com/aws/amazon-ecs-init/issues/150
Just to follow up on what @aaithal posted, a key part is making sure all dependency paths for dhclient are also mounted. If you're not sure for your system use ldd /sbin/dhclient to get the list. You may see the following line if the dependencies aren't satisfied (fork/exec /sbin/dhclient: no such file or directory)
2018-02-21T21:17:13Z [ERROR] Set up pause container namespace failed, err: setupContainerNamespace engine: unable to setup device 'eth1' in namespace '/host/proc/13393/ns/net': engine dhclient: unable to start dhclient for ipv4 address; command: dhclient [-q -lf /var/lib/dhclient/ns-eth1-dhclient4.leases -pf /var/run/ns-eth1-dhclient4.pid eth1]; output: : fork/exec /sbin/dhclient: no such file or directory, task: nlb-test:11
@stlava yes! dhclient's libs definitely need to be mounted. I'm curious to know if you found additional lib paths apart from these that you had to mount:
--volume=/lib:/lib \
--volume=/lib64:/lib64 \
Thanks,
Anirudh
@aaithal I'm not sure if this is important for ecs-agent/go but I found that I also needed /bin because when I was testing I was able to run docker exec -it <ecs task> /sbin/dhclient -h successfully from an amazon AMI but was not able to do that from ubuntu 16 AMI. I had to add /bin for it to work. I haven't tested if this matters for when dhclient is called from within the go code.
I'll just lave a note here in case some finds it useful.
I was able to get awsvpc networking working on CoreOS Linux, however, it does involve some hacks. ecs-agent requires dhclient binary, which is not available in the ecs-agent container image, so the host dhclient gets mounted using docker volume. This works fine on linux distributions that ship with dhclient, but CoreOS is not one of them (at least I could not find it). The solution to this is to build a custom ecs-agent container based on top of alpine image and installing the dhclient using apk. Further, dockers multi stage build can be used to COPY binaries from official ecs-agent image into customized (Dockerfile example below). Afterwards, custom ecs-agent image can be used in systemd unit.
Perhaps I'm missing something important here, if so please let me know.
# Dockerfile
FROM amazon/amazon-ecs-agent:latest as aws-ecs-agent
FROM alpine:latest
RUN apk add --update --no-cache dhclient
COPY --from=aws-ecs-agent /agent /agent
COPY --from=aws-ecs-agent /images/amazon-ecs-pause.tar /images/amazon-ecs-pause.tar
COPY --from=aws-ecs-agent /amazon-ecs-cni-plugins /amazon-ecs-cni-plugins
COPY --from=aws-ecs-agent /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
EXPOSE 51678 51679
ENTRYPOINT ["/agent"]
actually I'm getting this issue in Amazon Linux (edit: [1]) as well, and it isn't telling me what I'm missing: I have a cluster of 4 instances, and only one of them seems to accept awsvpc tasks, and they're supposed to be identical (launched from the same cloudformation template, etc.)
We only found it when we couldn't update a service because AWS was refusing to place the task (that one machine where it works didn't have enough memory, but others did, and I sat scratching my head for a while; also, for some reason starting a service from the same task definition worked (I'm pretty sure it got placed on another instance).
It would be really useful to have informative errors instead of "something is missing".
[1] the ECS-optimized AMI, or more precisely one derived from it because we need it encrypted - just copied the snapshot, encrypted, made private image from it and added yum -y upgrade to the user data to get it up to date on start
@aaithal Thanks for the instructions! Everything looks straightforward except the /var/lib/ecs/dhclient part. I do not know what should be in there. On my Ubuntu14 machine, nothing is there before/after running the ECS agent. I'd guess that what needs to be there is NOT contained in this repository https://github.com/aws/amazon-ecs-cni-plugins/tree/master/plugins/eni
I've had a look at http://manpages.ubuntu.com/manpages/trusty/man8/dhclient.8.html, but I can't connect the dots between that and /var/lib/ecs/dhclient
Could you give me some pointers? That'd be really appreciated!
Could you give me some pointers? That'd be really appreciated!
@BrunoCarrier: Agent v1.21.0 includes an updated version of the https://github.com/aws/amazon-ecs-cni-plugins that drops the dhclient dependency https://github.com/aws/amazon-ecs-cni-plugins/pull/83
I was able to run awsvpc tasks by following instructions described in the comment https://github.com/aws/amazon-ecs-agent/issues/1083#issuecomment-367737490. The docker run options related to the dhclient dependency can also be removed.
@tomelliff: Have you tried running on ubuntu with a newer version of the agent? I suggest trying agent v1.21.0
closing issue for now. please see comment above, otherwise reopen issue if required.
additionally, https://github.com/aws/amazon-ecs-agent/pull/1501/files needs to be updated and https://github.com/aws/amazon-ecs-init/issues/209 needs to be addressed.
@adnxn I'm trying to spawn awsvpc tasks on a CoreOS system running v1.22.0. For testing, I'm using the docker run syntax noted above. The agent starts, etc, but ECS doesn't even try to schedule any tasks on the instance. Nothing in the ECS agent logs (DEBUG enabled) and nothing in the ECS events. Any ideas here?
I assume you added the cluster name to the ecs agent config?
Does it register and show on the ECS console?
Yes - it's joined to the cluster. I was able to create a new service and have it scheduled there, but services that already existed on the cluster don't seem to want to run there (still no logs of any failures, etc).
Ah yeah.. ECS won't rebalance running tasks.
There's already a ticket about it
Have a link to the ticket? Won't rebalance existing tasks even if no other container instances are available to schedule the task? This seems odd. The only remaining container instance doesn't have any available ENI's for the task. Is this ticket/issue that you are talking about specific to awsvpc?
Thanks! The problem was actually that the tasks were configured for awslogs but the ECS agent was not configured to support awslogs. I would have expected the agent to log something that told me this, but it did not.
The new problem is that the networking stack in CoreOS mangles the interface names upon boot. I think this is likely out of the scope of this specific GH issue, but odd anyways. Upon rebooting, the interfaces naming scheme changes, which causes the IP's to be assigned to the incorrect interfaces making the host not reachable using it's main interface - eth0.
core@ip-10-0-91-197 ~ $ route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default ip-10-0-64-1.ec 0.0.0.0 UG 1024 0 0 eth2
default ip-10-0-64-1.ec 0.0.0.0 UG 1024 0 0 eth0
default ip-10-0-64-1.ec 0.0.0.0 UG 1024 0 0 eth1
ip-10-0-64-0.ec 0.0.0.0 255.255.224.0 U 0 0 0 eth2
ip-10-0-64-0.ec 0.0.0.0 255.255.224.0 U 0 0 0 eth0
ip-10-0-64-0.ec 0.0.0.0 255.255.224.0 U 0 0 0 eth1
ip-10-0-64-1.ec 0.0.0.0 255.255.255.255 UH 1024 0 0 eth2
ip-10-0-64-1.ec 0.0.0.0 255.255.255.255 UH 1024 0 0 eth0
ip-10-0-64-1.ec 0.0.0.0 255.255.255.255 UH 1024 0 0 eth1
core@ip-10-0-91-197 ~ $ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet 10.0.91.197 netmask 255.255.224.0 broadcast 10.0.95.255
inet6 fe80::10c7:49ff:fe49:5c6 prefixlen 64 scopeid 0x20<link>
ether 12:c7:49:49:05:c6 txqueuelen 1000 (Ethernet)
RX packets 41 bytes 3388 (3.3 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 50 bytes 12347 (12.0 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet 10.0.74.43 netmask 255.255.224.0 broadcast 10.0.95.255
inet6 fe80::102d:e9ff:febd:387a prefixlen 64 scopeid 0x20<link>
ether 12:2d:e9:bd:38:7a txqueuelen 1000 (Ethernet)
RX packets 25 bytes 2116 (2.0 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 24 bytes 2700 (2.6 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet 10.0.86.225 netmask 255.255.224.0 broadcast 10.0.95.255
inet6 fe80::10b8:aaff:fe5d:4806 prefixlen 64 scopeid 0x20<link>
ether 12:b8:aa:5d:48:06 txqueuelen 1000 (Ethernet)
RX packets 168 bytes 24051 (23.4 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 163 bytes 20325 (19.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 16 bytes 1400 (1.3 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 16 bytes 1400 (1.3 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
@martinssipenko did you see this same problem?
Most helpful comment
Hi @BrunoCarrier @tomelliff, attaching ENIs to instances is definitely supported on all instance types launched into a VPC, irrespective of the instance type. To configure an ENI for containers/tasks, ECS agent depends on tools such as
dhclientand on some container capabilities such as the--initflag,SYSTEM_ADMINandNET_ADMINcapabilities that are provided via Docker. ECS init is a convenient way for the ECS agent to be bootstrapped with all of these so that tasks that require ENIs do not fail during initialization because of missing dependencies/configurations.When we released this feature last year, we added support in ECS init for doing this for Amazon Linux distribution. The work needed to do this so that we enable this support for other distro's is on our roadmap.
Having said that, I just started ECS agent using @nmeyerhans's unit file and was able to successfully start a task in 'awsvpc' mode. I'm pasting the command for reference as well:
I have created an issue in the ECS init repo for the same as well: https://github.com/aws/amazon-ecs-init/issues/150