Agent 1.20.0 not starting with saved state from 1.19.1
restarting docker doesn't help
removing all running containers and issuing a start ecs doesnt help
the only thing that works is to purge /var/lib/ecs/data/ecs_agent_data.json
and then
start ecs
however this then registers as a new ecs container instance and loses previous container instance arn
after upgrade expected ecs-agent to start
after ecs-agent update from 1.19.1 to 1.20.0
docker ecs-agent fails to start
1.19.1 shutdown
2018-08-07T20:36:07Z [INFO] Saving state! module="statemanager"
2018-08-07T20:37:42Z [INFO] Saving state! module="statemanager"
2018-08-07T20:37:43Z [INFO] Loading configuration
2018-08-07T20:37:43Z [INFO] Amazon ECS agent Version: 1.20.0, Commit: cd331230
2018-08-07T20:37:43Z [INFO] Creating root ecs cgroup: /ecs
2018-08-07T20:37:43Z [INFO] Creating cgroup /ecs
2018-08-07T20:37:43Z [INFO] Loading state! module="statemanager"
2018-08-07T20:37:43Z [INFO] Event stream ContainerChange start listening...
2018-08-07T20:37:43Z [CRITICAL] Error loading previously saved state: invalid Volume: must include a type
1.20.0 startup
docker logs ecs-agent
2018-08-08T12:33:52Z [INFO] Loading configuration
2018-08-08T12:33:52Z [INFO] Amazon ECS agent Version: 1.20.0, Commit: cd331230
2018-08-08T12:33:52Z [INFO] Creating root ecs cgroup: /ecs
2018-08-08T12:33:52Z [INFO] Creating cgroup /ecs
2018-08-08T12:33:52Z [INFO] Loading state! module="statemanager"
2018-08-08T12:33:52Z [INFO] Event stream ContainerChange start listening...
2018-08-08T12:33:52Z [CRITICAL] Error loading previously saved state: invalid Volume: must include a type
after purging /var/lib/ecs/data/ecs_agent_data.json
docker logs ecs-agent
2018-08-08T13:30:49Z [INFO] Loading configuration
2018-08-08T13:30:49Z [INFO] Amazon ECS agent Version: 1.20.0, Commit: cd331230
2018-08-08T13:30:49Z [INFO] Creating root ecs cgroup: /ecs
2018-08-08T13:30:49Z [INFO] Creating cgroup /ecs
2018-08-08T13:30:49Z [INFO] Loading state! module="statemanager"
2018-08-08T13:30:49Z [INFO] Event stream ContainerChange start listening...
2018-08-08T13:30:49Z [INFO] Registering Instance with ECS
2018-08-08T13:30:49Z [INFO] Registered container instance with cluster!
2018-08-08T13:30:49Z [INFO] Registration completed successfully. I am running as '
2018-08-08T13:30:49Z [INFO] Saving state! module="statemanager"
2018-08-08T13:30:49Z [INFO] Beginning Polling for updates
2018-08-08T13:30:49Z [INFO] Event stream DeregisterContainerInstance start listening...
2018-08-08T13:30:49Z [INFO] Initializing stats engine
2018-08-08T13:30:49Z [INFO] NO_PROXY set:169.254.169.254,169.254.170.2,/var/run/docker.sock
2018-08-08T13:30:49Z [INFO] Connected to TCS endpoint
2018-08-08T13:30:49Z [INFO] Connected to ACS endpoint
We are facing the same issue here on prod. We managed to start the agent on version 1.20.0 but it doesn't send the proper info. On ECS instances page, it shows 0 tasks running, but when we run docker container list, it shows 18 containers running.
tailf /var/log/ecs/ecs-agent.log.2018-08-08-14:
2018-08-08T14:48:24Z [DEBUG] No container health metrics to report
2018-08-08T14:48:24Z [DEBUG] Instance is idle. No task metrics to report
2018-08-08T14:48:24Z [DEBUG] TCS client sending payload: {"type":"PublishMetricsRequest","message":{"metadata":{"cluster":"production-ecs","containerInstance":"arn:aws:ecs:eu-west-1:[removed]:container-instance/[removed]","fin":true,"idle":true,"messageId":"[removed]"},"timestamp":1533739704}}
2018-08-08T14:48:24Z [DEBUG] Received message of type: AckPublishMetric
2018-08-08T14:48:24Z [DEBUG] Received AckPublishMetric from tcs
Same issue with the upgrade. After deleting /var/lib/ecs/data it works, we're not experiencing other issues like @manoelhc does.
@marksullivancrowd, @vad: We're looking into the issue related to the data file.
@manoelhc: This appears to be unrelated. Do you mind cutting a new issue for this?
@adnxn nope, will create a ticket for my issue.
This issue surfaces when upgrading an agent to 1.20.0 on an instance that is managing tasks that use volumes.
Right now we have work arounds while we wait for the fix to be released.
Roll back to the older version of the agent.
Terminate the instance and launch a new one.
Modify the state file to include additional fields expected by the agent.
On an instance where agent ran into this issue.
There should be multiple "volumes" in the state file `/var/lib/ecs/data/ecs_agent_data.json`.
eg: ` "volumes":[{"host":{"sourcePath":"/home/ec2-user"},"name":"volume1"},{"host":{"sourcePath":"/home/ec2-user"},"name":"volume2"}]`
Adding the `, "type": "host"` in each of the volume blob like this:
`"volumes":[{"host":{"sourcePath":"/home/ec2-user"},"name":"volume1","type":"host"},{"host":{"sourcePath":"/home/ec2-user"},"name":"volume2","type":"host"}]`
And then run sudo start ecs should bring the agent back to 1.20.0
We've released agent v1.20.1 that includes a fix for this issue.