With the latest ECS-optimized AMI (ami-13f84d60) in eu-west-1, the ECS agent cannot register the instance. It waits for 20 seconds, times out and exits. It is then relaunched by ecs-init and the same thing happens again and again.
$ tail -f /var/log/ecs/ecs-agent.log.2016-01-27-10
2016-01-27T10:13:25Z [INFO] Starting Agent: Amazon ECS Agent - v1.7.1 (007985c)
2016-01-27T10:13:25Z [INFO] Loading configuration
2016-01-27T10:13:25Z [INFO] Checkpointing is enabled. Attempting to load state
2016-01-27T10:13:25Z [INFO] Loading state! module="statemanager"
2016-01-27T10:13:25Z [INFO] Detected Docker versions [1.17 1.18 1.19 1.20]
2016-01-27T10:13:25Z [INFO] Registering Instance with ECS
2016-01-27T10:13:45Z [ERROR] Could not register module="api client" err="RequestError: send request failed
caused by: Post https://ecs.eu-west-1.amazonaws.com/: net/http: request canceled while waiting for connection"
2016-01-27T10:13:45Z [ERROR] Error registering: RequestError: send request failed
caused by: Post https://ecs.eu-west-1.amazonaws.com/: net/http: request canceled while waiting for connection
2016-01-27T10:13:46Z [INFO] Starting Agent: Amazon ECS Agent - v1.7.1 (007985c)
2016-01-27T10:13:46Z [INFO] Loading configuration
2016-01-27T10:13:46Z [INFO] Checkpointing is enabled. Attempting to load state
2016-01-27T10:13:46Z [INFO] Loading state! module="statemanager"
2016-01-27T10:13:46Z [INFO] Detected Docker versions [1.17 1.18 1.19 1.20]
2016-01-27T10:13:46Z [INFO] Registering Instance with ECS
$ tail -f /var/log/ecs/ecs-init.log.2016-01-27-10
2016-01-27T10:13:45Z [INFO] Removing existing agent container ID: 73152e83cbe27745fa2e1f8ca1a9a60f62458bf7ddc67c47fcee803f82a9a93c
2016-01-27T10:13:45Z [INFO] Starting Amazon EC2 Container Service Agent
2016-01-27T10:14:07Z [INFO] Agent exited with code 1
2016-01-27T10:14:07Z [INFO] Container name: /ecs-agent
2016-01-27T10:14:07Z [INFO] Removing existing agent container ID: 6984bd3711bfc56989369ad63460cc19fb4d6bb60da117200752fd02ec98e276
2016-01-27T10:14:07Z [INFO] Starting Amazon EC2 Container Service Agent
2016-01-27T10:14:29Z [INFO] Agent exited with code 1
2016-01-27T10:14:29Z [INFO] Container name: /ecs-agent
2016-01-27T10:14:29Z [INFO] Removing existing agent container ID: bade70df4d946b2be47b613562595111dfc92720a3e0d54710cfcf21f12b177b
2016-01-27T10:14:29Z [INFO] Starting Amazon EC2 Container Service Agent
$ docker version
Client:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5/1.9.1
Built:
OS/Arch: linux/amd64
Server:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5/1.9.1
Built:
OS/Arch: linux/amd64
The cluster exists (visible in the ECS console) and /etc/ecs/ecs.config is correctly set:
$ aws ecs describe-clusters --clusters microservices
{
"clusters": [
{
"status": "ACTIVE",
"clusterName": "microservices",
"registeredContainerInstancesCount": 0,
"pendingTasksCount": 0,
"runningTasksCount": 0,
"activeServicesCount": 0,
"clusterArn": "arn:aws:ecs:eu-west-1:613904931467:cluster/microservices"
}
],
"failures": []
}
[ec2-user@ip-172-31-7-9 ~]$ cat /etc/ecs/ecs.config
ECS_CLUSTER=microservices
Instances are running the the default VPC, security groups look fine (HTTP/HTTPS/SSH allowed inbound from everywhere, everything allowed outbound), network ACL are standard (everything allowed). The endpoint seems accessible:
$ curl https://ecs.eu-west-1.amazonaws.com/
Here's the IAM policy attached to the role instance:
{
"Statement": [
{
"Resource": "_",
"Action": [
"ecs:CreateCluster",
"ecs:DeregisterContainerInstance",
"ecs:DiscoverPollEndpoint",
"ecs:Poll",
"ecs:RegisterContainerInstance",
"ecs:StartTelemetrySession",
"ecs:Submit_",
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage"
],
"Effect": "Allow"
},
{
"Resource": "_",
"Action": "ec2:DescribeInstances",
"Effect": "Allow"
},
{
"Resource": [
"arn:aws:logs:_:_:_"
],
"Action": [
"logs:*"
],
"Effect": "Allow"
}
],
"Version": "2012-10-17"
}
I tried rebooting, stopping/starting, etc. No luck :-/
Let me know if you need more information.
@juliensimon It looks like something might be affecting the networking inside the container. Are you launching the instance with user-data? Areas I'd check:
/etc/sysconfig/dockersudo docker info/var/log/dockersudo ip addr show docker0/etc/resolv.conf, which you can see by doing sudo docker cp ecs-agent:/etc/resolv.conf /tmp/agent && cat /tmp/agent/resolv.confSame issue for me, it was a wrong --dns option in /etc/sysconfig/docker ;)
Hi @fabien0102,
Could you give us any more details please ? What configuration do you write in /etc/sysconfig/docker ?
I have exactly the same issue.
Thanks,
I've fixed it. I've deleted the IP's from the /etc/sysconfig/docker, so that the docker daemon will use the same DNS servers as the host.
Thanks.
;)
@juliensimon Were you able to investigate the areas I recommended?
In case it helps anyone else, I had the exact same issue and it was resolved by modifying /etc/sysconfig/docker to correct a DNS entry. The root cause was an edited version of a cloud formation template that had not been modifed to use a new VPC CIDR. It seems to rely on an (undocumented?) feature of VPCs that the address of the VPC server is at the VPC base CIDR + 2 (e.g. 10.0.0.2 for a 10.0.0.0/16 VPC).
@juliensimon Are you still encountering problems here?
I'm going to close this issue for now. @juliensimon, if you continue to have problems please let us know.
still an issue, /etc/sysconfig/docker does not have any IP mentioned, Fixed , it was due to outbound ssl port config not present on security group
I'm getting a very similar issue. Running ECS + ALB (ELB v2), with the ALB in public subnet, ECS in private subnet, in custom VPC and routes handled with a NAT gateway. Using the latest ECS-optimised AMI (ami-42e9f921). This is all terraformed (v0.9.5).
Agent log:
$ tail -f /var/log/ecs/ecs-agent.log.2017-06-29-01
2017-06-29T01:58:45Z [INFO] Starting Agent: Amazon ECS Agent - v1.14.3 (15de319)
2017-06-29T01:58:45Z [INFO] Loading configuration
2017-06-29T01:58:45Z [INFO] Checkpointing is enabled. Attempting to load state
2017-06-29T01:58:45Z [INFO] Loading state! module="statemanager"
2017-06-29T01:58:45Z [INFO] Event stream ContainerChange start listening...
2017-06-29T01:58:45Z [INFO] Detected Docker versions [1.17 1.18 1.19 1.20 1.21 1.22 1.23]
2017-06-29T01:58:45Z [INFO] Registering Instance with ECS
2017-06-29T01:59:05Z [ERROR] Could not register: RequestError: send request failed
caused by: Post https://ecs.ap-southeast-2.amazonaws.com/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2017-06-29T01:59:05Z [ERROR] Error registering: RequestError: send request failed
caused by: Post https://ecs.ap-southeast-2.amazonaws.com/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Instance docker config:
$ cat /etc/sysconfig/docker
DAEMON_MAXFILES=1048576OPTIONS="--default-ulimit nofile=1024:4096"
DAEMON_PIDFILE_TIMEOUT=10
export HTTP_PROXY=http://10.0.0.131:3128/
export NO_PROXY=169.254.169.254
Instance ECS config:
$ cat /etc/ecs/ecs.config
ECS_CLUSTER=core_ecs
HTTP_PROXY=10.0.0.131:3128
NO_PROXY=169.254.169.254,169.254.170.2,/var/run/docker.sock
ECS connection check:
$ curl https://ecs.ap-southeast-2.amazonaws.com/
Missing Authentication Token
My security group allows egress from 443 too, for example:
$ curl https://google.com
302 Moved
302 Moved
The document has moved
here.
I have checked the IAM role for the ECS instance:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Action": [
"ecs:UpdateContainerInstancesState",
"ecs:Submit",
"ecs:StartTelemetrySession",
"ecs:RegisterContainerInstance",
"ecs:Poll",
"ecs:DiscoverPollEndpoint",
"ecs:DeregisterContainerInstance",
"ecs:CreateCluster",
"ecr:GetDownloadUrlForLayer",
"ecr:GetAuthorizationToken",
"ecr:BatchGetImage",
"ecr:BatchCheckLayerAvailability"
],
"Resource": ""
}
]
}
Don't know where to go from here.
EDIT: I solved this issue by removing the HTTP_PROXY and NO_PROXY variables.
Hello, i had the same problem and figured out, that one of my subnets wasn't correct configured/added in the routing table of amazon.
Marcel
@marcelalburg what have you configured and added in the routing table ?
My AWS account was quite old and I didn't had the eu-west-1c subnet and I created it manually by my own - this was the error. So I removed this subnet and wrote to the AWS service if they can create the new subnet into my account. On the Website, I couldn't create the subnet with the correct settings. I hope this helps ... this topic is quite old and I hope I remember correctly :)
Most helpful comment
Hello, i had the same problem and figured out, that one of my subnets wasn't correct configured/added in the routing table of amazon.
Marcel