Hi @samuelkarp
one of ECS instance started hanging when running docker ps command.
we got same issue when we
moved to amzn-ami-2015.09.d-amazon-ecs-optimized AMI and finally rollback to amzn-ami-2015.09.c-amazon-ecs-optimized.
but seems amzn-ami-2015.09.c-amazon-ecs-optimized also giving same issue now.
IMAGE : amzn-ami-2015.09.c-amazon-ecs-optimized
issue : running docker ps command hang
top - 06:25:29 up 9 days, 15 min, 3 users, load average: 0.13, 0.17, 0.22
Tasks: 252 total, 1 running, 251 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.2%us, 0.1%sy, 0.0%ni, 98.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 30871392k total, 18639196k used, 12232196k free, 768708k buffers
Swap: 0k total, 0k used, 0k free, 13476696k cached
free -m
total used free shared buffers cached
Mem: 30147 18203 11943 0 750 13161
-/+ buffers/cache: 4291 25856
Swap: 0 0 0
vmstat -s
30871392 total memory
18635860 used memory
13366228 active memory
4282160 inactive memory
12235532 free memory
768888 buffer memory
13471276 swap cache
0 total swap
0 used swap
0 free swap
32476646 non-nice user cpu ticks
1538 nice user cpu ticks
37910674 system cpu ticks
1169278443 idle cpu ticks
2209692 IO-wait cpu ticks
1767 IRQ cpu ticks
199125 softirq cpu ticks
248840 stolen cpu ticks
6706077546 pages paged in
483007840 pages paged out
0 pages swapped in
0 pages swapped out
2182560729 interrupts
69339519 CPU context switches
1454998199 boot time
6270932 forks
i am attaching all ecs agent and docker log . Please advise us as this is causing our production deployment and need to do lots of manual work to remove faulty instance each time.
i am attaching all log including strace docker ps command, and you can look which state it is hanging.
Let me know if you need any other info.
We are also seeing this issue.
Running AMI amzn-ami-2015.09.e-amazon-ecs-optimized
I have also been seeing this, running amzn-ami-2015.09.e-amazon-ecs-optimized (ami-cb2305a1).
same here and it seems to trigger ECS timeout i.e. #278 & #309
Interestingly I've had an outdated ECS optimized ami, and moving to latest: 2015.09.g solved the problem.
docker ps still occasionally takes longer but it's not triggering task shut down as described in #309
@hridyeshpant Could you please verify if you are seeing this issue with the latest 2015.09.g ECS Optimized AMI?
@jbergknoff @hridyeshpant Are you still seeing this problem with the 2015.09.g ECS-optimized AMI?
@samuelkarp I haven't seen the issue since upgrading.
@samuelkarp no after moving in .g , we are not seeing docker ps hang issue.
Thanks for fixing .
Thanks @jbergknoff @hridyeshpant !
Closing this issue.
I'm seeing this again on the 2015.09.g AMI. Maybe related: a task on the same instance is failing to start with CannotPullContainerError: dial unix /var/run/docker.sock: too many open files
@samuelkarp We're seeing the same thing on amzn-ami-2016.03.b-amazon-ecs-optimized (ami-a1fa1acc)
Would you suggest changing to amzn-ami-2015.09.g-amazon-ecs-optimized - ami-33b48a59 ?
@mjaverto Can you open a new issue? When you do so, please include:
sudo pvs; sudo vgs; sudo pvsdocker infoI'd also recommend that you take a look at the CloudWatch metrics for the EBS volumes attached to your instance(s); we've definitely seen Docker performance suffer if disk I/O starts to take a long time. Note that by default the ECS-optimized AMI comes with gp2 volumes, which have credit-based performance characteristics. A good discussion on I/O characteristics is available in the EBS documentation.
@samuelkarp will do, thank for the info, I'll be back with more.
@samuelkarp just to update this issue, it was 100% I/O, no need to send stats.
For anyone else reading this, check out the EC2 > Volumes > Monitoring > Avg Queue length metric. For us it was like 30-40 when < 1 is ideal.
For us since we were using cloudformation, we just upped the provisioned gp2 disk size from 22gb to 150gb to get the IOPS we needed short term until we fixed the IOPS issue with our services.
@mjaverto Thanks for confirming!
We are seeing this problem with "amzn-ami-2016.09.f-amazon-ecs-optimized [ami-ec2be583]". Is anyone else experiencing this?
Yup... i can't do anything
I am seeing this issue again please re-open
If you are currently experiencing problems, please open a new issue with information on what you're experiencing.
I am locking this issue.
Most helpful comment
We are seeing this problem with "amzn-ami-2016.09.f-amazon-ecs-optimized [ami-ec2be583]". Is anyone else experiencing this?