Amazon-ecs-agent: Agent fails to start ("unable to get vpc id from instance metadata")

Created on 15 Feb 2019 · 15Comments · Source: aws/amazon-ecs-agent

Summary

I am attempting to add container instances to an existing cluster. The instances never join the cluster. The ECS agent logs indicate a 404 when trying to fetch the VPC ID from the metadata service. However, these instances were not launched in a VPC and reside in EC2-Classic.

Description

I've tried the following AMIs:

amzn-ami-2018.03.m-amazon-ecs-optimized (ami-0796380bc6e51157f)
amzn2-ami-ecs-hvm-2.0.20190204-x86_64-ebs (ami-032564940f9afd5c0)

My /etc/ecs/ecs.config contains this:

ECS_CLUSTER=thunder
ECS_LOGLEVEL=debug

Expected Behavior

Agent starts, and instances join cluster.

Observed Behavior

Agent never starts; instances do not join cluster. Agent logs contain the following:

2019-02-14T23:29:46Z [INFO] Amazon ECS agent Version: 1.25.2, Commit: 0821fbc7
2019-02-14T23:29:46Z [DEBUG] Loaded config: Cluster: thunder,  Region: us-east-1,  DataDir: /data, Checkpoint: true, AuthType: , UpdatesEnabled: true, DisableMetrics: false, PollMetrics: false, PollingMetricsWaitDuration: 15s, ReservedMem: 0, TaskCleanupWaitDuration: 3h0m0s, DockerStopTimeout: 30s, ContainerStartTimeout: 3m0s, TaskCPUMemLimit: 3, , PauseContainerImageName: amazon/amazon-ecs-pause, PauseContainerTag: 0.1.0
2019-02-14T23:29:46Z [INFO] Creating root ecs cgroup: /ecs
2019-02-14T23:29:46Z [INFO] Creating cgroup /ecs
2019-02-14T23:29:46Z [INFO] Loading state! module="statemanager"
2019-02-14T23:29:46Z [INFO] Event stream ContainerChange start listening...
2019-02-14T23:29:46Z [CRITICAL] Unable to initialize Task ENI dependencies: unable to get vpc id from instance metadata: EC2MetadataError: failed to make EC2Metadata request
caused by: <?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 <head>
  <title>404 - Not Found</title>
 </head>
 <body>
  <h1>404 - Not Found</h1>
 </body>
</html>

Environment Details

Instance is in EC2-Classic, with public IP. Hence unclear why fetching VPC ID.
IAM role is ecsInstanceRole with a single AWS-managed policy: AmazonEC2ContainerServiceforEC2Role

Supporting Log Snippets

See below (zip format to make GitHub happy)

kinbug pending release scopECS Agent workaround available

Source

kian

Most helpful comment

Hey @kian

I did a quick dive into the problem and it looks like there's a bug in how agent detects classic ec2 instances. This really should degrade gracefully instead of failing like this -- and thats something we will need to fix on our end.

That said, you may be able to avoid this code path by adding the following to your ecs.config file:

ECS_ENABLE_TASK_ENI=false

petderek on 15 Feb 2019

👍3

All 15 comments

ecs-logs-collector bundle: collect.zip

kian on 15 Feb 2019

can you check if you have the trust relationship setup from this doc also?
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/instance_IAM_role.html

suneyz on 15 Feb 2019

@suneyz Verified:

{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

kian on 15 Feb 2019

Not sure if it's relevant here, but the 4 existing instances that are currently part of the cluster are running agent 1.14.1 and docker 1.12.6.

kian on 15 Feb 2019

took a look into the iptable log you provided, looks like you might miss the setup for iptables, specifically you are missing the route to port 51679. Can you try to setup again the iptable rule from this instruction?
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-install.html

Note you you might want to follow the steps to setup manually
To install the Amazon ECS container agent on a non-Amazon Linux EC2 instance -> step 5-7

suneyz on 15 Feb 2019

I'm confused - these are the Amazon Linux ECS-optimized AMIs, so why do I need custom configuration? Have I selected the wrong AMI?

kian on 15 Feb 2019

@kian You should not need any special setup.

@suneyz This sounds like a bug in the ECS agent. @kian is running in EC2-Classic, which means the instance is not running inside a VPC. The agent should tolerate the lack of a VPC ID and disable features that depend on it (like awsvpc network mode).

samuelkarp on 15 Feb 2019

👍3

(note: nothing here depends on trust relationships or iptables rules)

samuelkarp on 15 Feb 2019

Thanks guys - we're in the process of addressing the recent runc security hole (hence the need to roll out updates), so I'm happy to test any patches or provide more information.

kian on 15 Feb 2019

@kian If this is blocking you, please note that you can upgrade Docker (and runc) from the Amazon Linux yum repositories without updating the ECS agent. You can do so by running sudo yum update docker. See ALAS-2019-1156 and the AWS security bulletin for more information.

samuelkarp on 15 Feb 2019

perfect, I'll give it a shot. thanks for the tip.

kian on 15 Feb 2019

Looks like docker has a dependency on ecs-init, so yum update docker upgraded both. same error is now happening on the old 2016.09.g instance I upgraded in-place.

2019-02-15T01:25:23Z [INFO] pre-start
2019-02-15T01:25:23Z [INFO] start
2019-02-15T01:25:23Z [INFO] Container name: /ecs-agent
2019-02-15T01:25:23Z [INFO] Removing existing agent container ID: b088183bd5bf538fbd545119416871da120a80d26c0d1d39c1f7dd180bc5e6c4
2019-02-15T01:25:23Z [INFO] Starting Amazon Elastic Container Service Agent
2019-02-15T01:25:24Z [INFO] Agent exited with code 5
2019-02-15T01:25:24Z [ERROR] agent exited with terminal exit code
2019-02-15T01:25:24Z [INFO] post-stop
2019-02-15T01:25:24Z [INFO] Cleaning up the credentials endpoint setup for Amazon Elastic Container Service Agent

sounds like I will need to hold off until the agent is fixed?

kian on 15 Feb 2019

Hey @kian

That said, you may be able to avoid this code path by adding the following to your ecs.config file: