Amazon-ecs-agent: ECS agent can't register instance to ecs.eu-west-1.amazonaws.com

Created on 27 Jan 2016 · 14Comments · Source: aws/amazon-ecs-agent

With the latest ECS-optimized AMI (ami-13f84d60) in eu-west-1, the ECS agent cannot register the instance. It waits for 20 seconds, times out and exits. It is then relaunched by ecs-init and the same thing happens again and again.

$ tail -f /var/log/ecs/ecs-agent.log.2016-01-27-10
2016-01-27T10:13:25Z [INFO] Starting Agent: Amazon ECS Agent - v1.7.1 (007985c)
2016-01-27T10:13:25Z [INFO] Loading configuration
2016-01-27T10:13:25Z [INFO] Checkpointing is enabled. Attempting to load state
2016-01-27T10:13:25Z [INFO] Loading state! module="statemanager"
2016-01-27T10:13:25Z [INFO] Detected Docker versions [1.17 1.18 1.19 1.20]
2016-01-27T10:13:25Z [INFO] Registering Instance with ECS
2016-01-27T10:13:45Z [ERROR] Could not register module="api client" err="RequestError: send request failed
caused by: Post https://ecs.eu-west-1.amazonaws.com/: net/http: request canceled while waiting for connection"
2016-01-27T10:13:45Z [ERROR] Error registering: RequestError: send request failed
caused by: Post https://ecs.eu-west-1.amazonaws.com/: net/http: request canceled while waiting for connection
2016-01-27T10:13:46Z [INFO] Starting Agent: Amazon ECS Agent - v1.7.1 (007985c)
2016-01-27T10:13:46Z [INFO] Loading configuration
2016-01-27T10:13:46Z [INFO] Checkpointing is enabled. Attempting to load state
2016-01-27T10:13:46Z [INFO] Loading state! module="statemanager"
2016-01-27T10:13:46Z [INFO] Detected Docker versions [1.17 1.18 1.19 1.20]
2016-01-27T10:13:46Z [INFO] Registering Instance with ECS

$ tail -f /var/log/ecs/ecs-init.log.2016-01-27-10
2016-01-27T10:13:45Z [INFO] Removing existing agent container ID: 73152e83cbe27745fa2e1f8ca1a9a60f62458bf7ddc67c47fcee803f82a9a93c
2016-01-27T10:13:45Z [INFO] Starting Amazon EC2 Container Service Agent
2016-01-27T10:14:07Z [INFO] Agent exited with code 1
2016-01-27T10:14:07Z [INFO] Container name: /ecs-agent
2016-01-27T10:14:07Z [INFO] Removing existing agent container ID: 6984bd3711bfc56989369ad63460cc19fb4d6bb60da117200752fd02ec98e276
2016-01-27T10:14:07Z [INFO] Starting Amazon EC2 Container Service Agent
2016-01-27T10:14:29Z [INFO] Agent exited with code 1
2016-01-27T10:14:29Z [INFO] Container name: /ecs-agent
2016-01-27T10:14:29Z [INFO] Removing existing agent container ID: bade70df4d946b2be47b613562595111dfc92720a3e0d54710cfcf21f12b177b
2016-01-27T10:14:29Z [INFO] Starting Amazon EC2 Container Service Agent

$ docker version
Client:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5/1.9.1
Built:
OS/Arch: linux/amd64

Server:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5/1.9.1
Built:
OS/Arch: linux/amd64

The cluster exists (visible in the ECS console) and /etc/ecs/ecs.config is correctly set:
$ aws ecs describe-clusters --clusters microservices
{
"clusters": [
{
"status": "ACTIVE",
"clusterName": "microservices",
"registeredContainerInstancesCount": 0,
"pendingTasksCount": 0,
"runningTasksCount": 0,
"activeServicesCount": 0,
"clusterArn": "arn:aws:ecs:eu-west-1:613904931467:cluster/microservices"
}
],
"failures": []
}

[ec2-user@ip-172-31-7-9 ~]$ cat /etc/ecs/ecs.config
ECS_CLUSTER=microservices

Instances are running the the default VPC, security groups look fine (HTTP/HTTPS/SSH allowed inbound from everywhere, everything allowed outbound), network ACL are standard (everything allowed). The endpoint seems accessible:

$ curl https://ecs.eu-west-1.amazonaws.com/

Missing Authentication Token

Here's the IAM policy attached to the role instance:

{
"Statement": [
{
"Resource": "_",
"Action": [
"ecs:CreateCluster",
"ecs:DeregisterContainerInstance",
"ecs:DiscoverPollEndpoint",
"ecs:Poll",
"ecs:RegisterContainerInstance",
"ecs:StartTelemetrySession",
"ecs:Submit_",
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage"
],
"Effect": "Allow"
},
{
"Resource": "_",
"Action": "ec2:DescribeInstances",
"Effect": "Allow"
},
{
"Resource": [
"arn:aws:logs:_:_:_"
],
"Action": [
"logs:*"
],
"Effect": "Allow"
}
],
"Version": "2012-10-17"
}

I tried rebooting, stopping/starting, etc. No luck :-/

Let me know if you need more information.

more info needed

Source

juliensimon

Most helpful comment

Hello, i had the same problem and figured out, that one of my subnets wasn't correct configured/added in the routing table of amazon.

Marcel

marcelalburg on 1 Jan 2019

👍3

All 14 comments

@juliensimon It looks like something might be affecting the networking inside the container. Are you launching the instance with user-data? Areas I'd check:

Docker options in /etc/sysconfig/docker
Output of sudo docker info
Docker daemon logs in /var/log/docker
Output of sudo ip addr show docker0
Contents of the container's /etc/resolv.conf, which you can see by doing sudo docker cp ecs-agent:/etc/resolv.conf /tmp/agent && cat /tmp/agent/resolv.conf

samuelkarp on 1 Feb 2016

Same issue for me, it was a wrong --dns option in /etc/sysconfig/docker ;)

fabien0102 on 2 Feb 2016

Hi @fabien0102,
Could you give us any more details please ? What configuration do you write in /etc/sysconfig/docker ?
I have exactly the same issue.
Thanks,

algorines on 2 Feb 2016

I've fixed it. I've deleted the IP's from the /etc/sysconfig/docker, so that the docker daemon will use the same DNS servers as the host.

Thanks.

algorines on 3 Feb 2016

;)

fabien0102 on 3 Feb 2016

@juliensimon Were you able to investigate the areas I recommended?

samuelkarp on 19 Feb 2016

In case it helps anyone else, I had the exact same issue and it was resolved by modifying /etc/sysconfig/docker to correct a DNS entry. The root cause was an edited version of a cloud formation template that had not been modifed to use a new VPC CIDR. It seems to rely on an (undocumented?) feature of VPCs that the address of the VPC server is at the VPC base CIDR + 2 (e.g. 10.0.0.2 for a 10.0.0.0/16 VPC).

jhovell on 3 Mar 2016

👍2

@juliensimon Are you still encountering problems here?

samuelkarp on 17 Mar 2016

I'm going to close this issue for now. @juliensimon, if you continue to have problems please let us know.

samuelkarp on 22 Mar 2016

still an issue, /etc/sysconfig/docker does not have any IP mentioned, Fixed , it was due to outbound ssl port config not present on security group

naheedmk on 3 Mar 2017

I'm getting a very similar issue. Running ECS + ALB (ELB v2), with the ALB in public subnet, ECS in private subnet, in custom VPC and routes handled with a NAT gateway. Using the latest ECS-optimised AMI (ami-42e9f921). This is all terraformed (v0.9.5).

Agent log:

$ tail -f /var/log/ecs/ecs-agent.log.2017-06-29-01
2017-06-29T01:58:45Z [INFO] Starting Agent: Amazon ECS Agent - v1.14.3 (15de319)
2017-06-29T01:58:45Z [INFO] Loading configuration
2017-06-29T01:58:45Z [INFO] Checkpointing is enabled. Attempting to load state
2017-06-29T01:58:45Z [INFO] Loading state! module="statemanager"
2017-06-29T01:58:45Z [INFO] Event stream ContainerChange start listening...
2017-06-29T01:58:45Z [INFO] Detected Docker versions [1.17 1.18 1.19 1.20 1.21 1.22 1.23]
2017-06-29T01:58:45Z [INFO] Registering Instance with ECS
2017-06-29T01:59:05Z [ERROR] Could not register: RequestError: send request failed
caused by: Post https://ecs.ap-southeast-2.amazonaws.com/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2017-06-29T01:59:05Z [ERROR] Error registering: RequestError: send request failed
caused by: Post https://ecs.ap-southeast-2.amazonaws.com/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Instance docker config:

$ cat /etc/sysconfig/docker
DAEMON_MAXFILES=1048576

OPTIONS="--default-ulimit nofile=1024:4096"

DAEMON_PIDFILE_TIMEOUT=10
export HTTP_PROXY=http://10.0.0.131:3128/
export NO_PROXY=169.254.169.254

Instance ECS config:

$ cat /etc/ecs/ecs.config
ECS_CLUSTER=core_ecs
HTTP_PROXY=10.0.0.131:3128
NO_PROXY=169.254.169.254,169.254.170.2,/var/run/docker.sock

ECS connection check:

$ curl https://ecs.ap-southeast-2.amazonaws.com/

Missing Authentication Token

My security group allows egress from 443 too, for example:

$ curl https://google.com

302 Moved

302 Moved

The document has moved
here.

I have checked the IAM role for the ECS instance:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Action": [
"ecs:UpdateContainerInstancesState",
"ecs:Submit",
"ecs:StartTelemetrySession",
"ecs:RegisterContainerInstance",
"ecs:Poll",
"ecs:DiscoverPollEndpoint",
"ecs:DeregisterContainerInstance",
"ecs:CreateCluster",
"ecr:GetDownloadUrlForLayer",
"ecr:GetAuthorizationToken",
"ecr:BatchGetImage",
"ecr:BatchCheckLayerAvailability"
],
"Resource": ""
}
]
}

Don't know where to go from here.

EDIT: I solved this issue by removing the HTTP_PROXY and NO_PROXY variables.

L226 on 29 Jun 2017

Hello, i had the same problem and figured out, that one of my subnets wasn't correct configured/added in the routing table of amazon.

Marcel

marcelalburg on 1 Jan 2019

👍3

@marcelalburg what have you configured and added in the routing table ?

Fettah on 6 May 2020

My AWS account was quite old and I didn't had the eu-west-1c subnet and I created it manually by my own - this was the error. So I removed this subnet and wrote to the AWS service if they can create the new subnet into my account. On the Website, I couldn't create the subnet with the correct settings. I hope this helps ... this topic is quite old and I hope I remember correctly :)

marcelalburg on 6 May 2020

Was this page helpful?

0 / 5 - 0 ratings