amazon-ecs-agent 🚀 - CannotInspectContainerError: Could not transition to inspecting

+1. ECS is getting jankier by the day.

bilalaslamseattle on 11 Mar 2016

😄1

@hridyeshpant @bilalaslamseattle I've typically seen this happen when the Docker daemon gets very slow as the disk gets full. Do you see anything in the Docker daemon log in /var/log/docker? Can you share the output of sudo pvs, sudo vgs, sudo lvs, and dmesg on the affected instance?

samuelkarp on 11 Mar 2016

Hi @samuelkarp I am responding here for @bilalaslamseattle .

Context: We were upgrading all our machines to latest AMI (amzn-ami-2015.09.g-amazon-ecs-optimized-4ce33fd9-63ff-4f35-8d3a-939b641f1931-ami-33b48a59.3) and we are running ecs-agent 1.8.1.

We have plenty of space in our docker-pool volume

[root@ip-10-0-1-214 ~]# pvs
  PV         VG     Fmt  Attr PSize   PFree
  /dev/sdcz1 docker lvm2 a--  100.00g    0
[root@ip-10-0-1-214 ~]# vgs
  VG     #PV #LV #SN Attr   VSize   VFree
  docker   1   1   0 wz--n- 100.00g    0
[root@ip-10-0-1-214 ~]# lvs
  LV          VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  docker-pool docker twi-aot--- 99.79g             34.43  60.02

here is our latest dmesg output (are we looking for disk device related issues? - I didn't find anything like that):

[148598.000407] XFS (dm-4): Mounting V4 Filesystem
[148598.223849] XFS (dm-4): Ending clean mount
[148598.248095] XFS (dm-4): Unmounting Filesystem
[148598.382691] XFS (dm-4): Mounting V4 Filesystem
[148598.693982] XFS (dm-4): Ending clean mount
[148598.698291] device veth9179b2d entered promiscuous mode
[148598.700779] IPv6: ADDRCONF(NETDEV_UP): veth9179b2d: link is not ready
[148598.703262] docker0: port 3(veth9179b2d) entered forwarding state
[148598.705716] docker0: port 3(veth9179b2d) entered forwarding state
[148598.748298] docker0: port 3(veth9179b2d) entered disabled state
[148598.750980] eth0: renamed from vethab1b5b3
[148598.768437] IPv6: ADDRCONF(NETDEV_CHANGE): veth9179b2d: link becomes ready
[148598.771048] docker0: port 3(veth9179b2d) entered forwarding state
[148598.773322] docker0: port 3(veth9179b2d) entered forwarding state
[148600.590332] vethb28e6b2: renamed from eth0
[148600.612217] docker0: port 5(vethfd6e9d5) entered disabled state
[148600.649720] docker0: port 5(vethfd6e9d5) entered disabled state
[148600.655303] device vethfd6e9d5 left promiscuous mode
[148600.657479] docker0: port 5(vethfd6e9d5) entered disabled state
[148600.770529] XFS (dm-5): Unmounting Filesystem
[148603.118288] vetha84dc5c: renamed from eth0
[148603.132139] docker0: port 6(veth139e811) entered disabled state
[148603.156986] docker0: port 6(veth139e811) entered disabled state
[148603.162356] device veth139e811 left promiscuous mode
[148603.164594] docker0: port 6(veth139e811) entered disabled state
[148603.272042] XFS (dm-8): Unmounting Filesystem
[148604.252063] docker0: port 7(vethe35c9ba) entered forwarding state
[148606.742376] veth191b7a6: renamed from eth0
[148606.756246] docker0: port 10(vethb53e8e1) entered disabled state
[148606.789104] docker0: port 10(vethb53e8e1) entered disabled state
[148606.793692] device vethb53e8e1 left promiscuous mode
[148606.795567] docker0: port 10(vethb53e8e1) entered disabled state
[148606.895606] XFS (dm-12): Unmounting Filesystem
[148607.644069] docker0: port 8(veth81f571b) entered forwarding state
[148609.366559] veth79e46a7: renamed from eth0
[148609.388130] docker0: port 8(veth81f571b) entered disabled state
[148609.412588] docker0: port 8(veth81f571b) entered disabled state
[148609.417083] device veth81f571b left promiscuous mode
[148609.419377] docker0: port 8(veth81f571b) entered disabled state
[148609.519901] XFS (dm-10): Unmounting Filesystem
[148613.792075] docker0: port 3(veth9179b2d) entered forwarding state
[148615.322276] vethab1b5b3: renamed from eth0
[148615.344134] docker0: port 3(veth9179b2d) entered disabled state
[148615.372810] docker0: port 3(veth9179b2d) entered disabled state
[148615.377364] device veth9179b2d left promiscuous mode
[148615.379603] docker0: port 3(veth9179b2d) entered disabled state
[148615.484147] XFS (dm-4): Unmounting Filesystem
[148656.938239] vethfd23535: renamed from eth0
[148656.952144] docker0: port 14(veth20f8a9b) entered disabled state
[148656.992160] docker0: port 14(veth20f8a9b) entered disabled state
[148656.996866] device veth20f8a9b left promiscuous mode
[148656.998910] docker0: port 14(veth20f8a9b) entered disabled state
[148657.094089] XFS (dm-18): Unmounting Filesystem
[148698.794368] veth0c0039f: renamed from eth0
[148698.808248] docker0: port 7(vethe35c9ba) entered disabled state
[148698.832905] docker0: port 7(vethe35c9ba) entered disabled state
[148698.837463] device vethe35c9ba left promiscuous mode
[148698.839596] docker0: port 7(vethe35c9ba) entered disabled state
[148698.934218] XFS (dm-9): Unmounting Filesystem
[148760.490350] veth55c5eb5: renamed from eth0
[148760.500137] docker0: port 23(vethdb0ee4f) entered disabled state
[148760.532975] docker0: port 23(vethdb0ee4f) entered disabled state
[148760.537540] device vethdb0ee4f left promiscuous mode
[148760.539777] docker0: port 23(vethdb0ee4f) entered disabled state
[148760.620828] XFS (dm-25): Unmounting Filesystem
[148765.343538] XFS (dm-4): Mounting V4 Filesystem
[148765.432009] XFS (dm-4): Ending clean mount
[148765.459511] XFS (dm-4): Unmounting Filesystem
[148765.580933] XFS (dm-4): Mounting V4 Filesystem
[148765.707909] XFS (dm-4): Ending clean mount
[148765.724402] XFS (dm-4): Unmounting Filesystem
[148765.822841] XFS (dm-4): Mounting V4 Filesystem
[148765.907867] XFS (dm-4): Ending clean mount
[148765.912327] device veth1d1aea1 entered promiscuous mode
[148765.914662] IPv6: ADDRCONF(NETDEV_UP): veth1d1aea1: link is not ready
[148765.984366] eth0: renamed from veth228537c
[148766.004445] IPv6: ADDRCONF(NETDEV_CHANGE): veth1d1aea1: link becomes ready
[148766.006923] docker0: port 3(veth1d1aea1) entered forwarding state
[148766.009063] docker0: port 3(veth1d1aea1) entered forwarding state
[148768.250410] veth228537c: renamed from eth0
[148768.260140] docker0: port 3(veth1d1aea1) entered disabled state
[148768.280564] docker0: port 3(veth1d1aea1) entered disabled state
[148768.284735] device veth1d1aea1 left promiscuous mode
[148768.286477] docker0: port 3(veth1d1aea1) entered disabled state
[148768.383826] XFS (dm-4): Unmounting Filesystem
[148778.600429] XFS (dm-4): Mounting V4 Filesystem
[148778.690413] XFS (dm-4): Ending clean mount
[148778.711659] XFS (dm-4): Unmounting Filesystem
[148778.855437] XFS (dm-4): Mounting V4 Filesystem
[148778.966238] XFS (dm-4): Ending clean mount
[148778.985373] XFS (dm-4): Unmounting Filesystem
[148779.071083] XFS (dm-4): Mounting V4 Filesystem
[148779.167678] XFS (dm-4): Ending clean mount
[148779.171848] device vethea945a4 entered promiscuous mode
[148779.174717] IPv6: ADDRCONF(NETDEV_UP): vethea945a4: link is not ready
[148779.212435] eth0: renamed from vethe07e898
[148779.232526] IPv6: ADDRCONF(NETDEV_CHANGE): vethea945a4: link becomes ready
[148779.235638] docker0: port 3(vethea945a4) entered forwarding state
[148779.237822] docker0: port 3(vethea945a4) entered forwarding state
[148788.610572] vethe07e898: renamed from eth0
[148788.636258] docker0: port 3(vethea945a4) entered disabled state
[148788.668787] docker0: port 3(vethea945a4) entered disabled state
[148788.673903] device vethea945a4 left promiscuous mode
[148788.675876] docker0: port 3(vethea945a4) entered disabled state
[148788.779910] XFS (dm-4): Unmounting Filesystem
[148794.279414] XFS (dm-4): Mounting V4 Filesystem
[148794.368444] XFS (dm-4): Ending clean mount
[148794.399327] XFS (dm-4): Unmounting Filesystem
[148794.510381] XFS (dm-4): Mounting V4 Filesystem
[148794.645299] XFS (dm-4): Ending clean mount
[148794.671919] XFS (dm-4): Unmounting Filesystem
[148794.762757] XFS (dm-4): Mounting V4 Filesystem
[148794.844144] XFS (dm-4): Ending clean mount
[148794.848370] device vetha6e28bc entered promiscuous mode
[148794.850718] IPv6: ADDRCONF(NETDEV_UP): vetha6e28bc: link is not ready
[148794.892448] eth0: renamed from vethed5fdd0
[148794.916531] IPv6: ADDRCONF(NETDEV_CHANGE): vetha6e28bc: link becomes ready
[148794.919526] docker0: port 3(vetha6e28bc) entered forwarding state
[148794.921777] docker0: port 3(vetha6e28bc) entered forwarding state
[148808.210349] vethed5fdd0: renamed from eth0
[148808.228137] docker0: port 3(vetha6e28bc) entered disabled state
[148808.256748] docker0: port 3(vetha6e28bc) entered disabled state
[148808.261328] device vetha6e28bc left promiscuous mode
[148808.263068] docker0: port 3(vetha6e28bc) entered disabled state
[148808.356413] XFS (dm-4): Unmounting Filesystem
[148831.836999] XFS (dm-4): Mounting V4 Filesystem
[148831.928650] XFS (dm-4): Ending clean mount
[148831.951532] XFS (dm-4): Unmounting Filesystem
[148832.075715] XFS (dm-4): Mounting V4 Filesystem
[148832.201509] XFS (dm-4): Ending clean mount
[148832.216837] XFS (dm-4): Unmounting Filesystem
[148832.305603] XFS (dm-4): Mounting V4 Filesystem
[148832.401634] XFS (dm-4): Ending clean mount
[148832.405974] device veth0dcc7c2 entered promiscuous mode
[148832.408614] IPv6: ADDRCONF(NETDEV_UP): veth0dcc7c2: link is not ready
[148832.516329] eth0: renamed from vethfeef268
[148832.536519] IPv6: ADDRCONF(NETDEV_CHANGE): veth0dcc7c2: link becomes ready
[148832.539154] docker0: port 3(veth0dcc7c2) entered forwarding state
[148832.541706] docker0: port 3(veth0dcc7c2) entered forwarding state
[148835.564258] XFS (dm-5): Mounting V4 Filesystem
[148841.924312] XFS (dm-5): Ending clean mount
[148842.398973] XFS (dm-5): Unmounting Filesystem
[148844.393552] XFS (dm-5): Mounting V4 Filesystem
[148847.580085] docker0: port 3(veth0dcc7c2) entered forwarding state
[148850.548260] XFS (dm-5): Ending clean mount
[148850.624212] XFS (dm-5): Unmounting Filesystem
[148852.812985] XFS (dm-5): Mounting V4 Filesystem
[148858.530565] XFS (dm-5): Ending clean mount
[148858.534994] device veth3a99e11 entered promiscuous mode
[148858.537181] IPv6: ADDRCONF(NETDEV_UP): veth3a99e11: link is not ready
[148858.572456] eth0: renamed from veth63750f4
[148858.604592] IPv6: ADDRCONF(NETDEV_CHANGE): veth3a99e11: link becomes ready
[148858.607058] docker0: port 5(veth3a99e11) entered forwarding state
[148858.609223] docker0: port 5(veth3a99e11) entered forwarding state
[148873.628081] docker0: port 5(veth3a99e11) entered forwarding state
[148935.574289] veth63750f4: renamed from eth0
[148935.600134] docker0: port 5(veth3a99e11) entered disabled state
[148935.639834] docker0: port 5(veth3a99e11) entered disabled state
[148935.644606] device veth3a99e11 left promiscuous mode
[148935.646573] docker0: port 5(veth3a99e11) entered disabled state
[148935.771992] XFS (dm-5): Unmounting Filesystem
[148937.062075] XFS (dm-5): Mounting V4 Filesystem
[148938.804554] XFS (dm-5): Ending clean mount
[148939.234080] XFS (dm-8): Mounting V4 Filesystem
[148940.986489] XFS (dm-8): Ending clean mount
[148941.013282] XFS (dm-5): Unmounting Filesystem
[148942.027481] XFS (dm-5): Mounting V4 Filesystem
[148943.854810] XFS (dm-5): Ending clean mount
[148943.911279] XFS (dm-5): Unmounting Filesystem
[148944.113933] XFS (dm-8): Unmounting Filesystem
[148945.020340] XFS (dm-5): Mounting V4 Filesystem
[148946.837711] XFS (dm-5): Ending clean mount
[148946.890850] XFS (dm-5): Unmounting Filesystem
[148947.380962] XFS (dm-5): Mounting V4 Filesystem
[148949.165454] XFS (dm-5): Ending clean mount
[148949.584833] XFS (dm-8): Mounting V4 Filesystem
[148951.156715] XFS (dm-8): Ending clean mount
[148951.160826] device vethee3e81d entered promiscuous mode
[148951.163062] IPv6: ADDRCONF(NETDEV_UP): vethee3e81d: link is not ready
[148951.224396] eth0: renamed from veth008f71d
[148951.248540] IPv6: ADDRCONF(NETDEV_CHANGE): vethee3e81d: link becomes ready
[148951.251062] docker0: port 5(vethee3e81d) entered forwarding state
[148951.253346] docker0: port 5(vethee3e81d) entered forwarding state
[148951.768479] XFS (dm-9): Mounting V4 Filesystem
[148954.935612] XFS (dm-9): Ending clean mount
[148954.940273] device veth9cea596 entered promiscuous mode
[148954.943171] IPv6: ADDRCONF(NETDEV_UP): veth9cea596: link is not ready
[148954.996484] eth0: renamed from veth8254d1f
[148955.020507] IPv6: ADDRCONF(NETDEV_CHANGE): veth9cea596: link becomes ready
[148955.023199] docker0: port 6(veth9cea596) entered forwarding state
[148955.025674] docker0: port 6(veth9cea596) entered forwarding state
[148956.069941] XFS (dm-10): Mounting V4 Filesystem
[148960.903733] XFS (dm-10): Ending clean mount
[148961.027654] XFS (dm-5): Unmounting Filesystem
[148963.626409] XFS (dm-5): Mounting V4 Filesystem
[148966.300061] docker0: port 5(vethee3e81d) entered forwarding state
[148970.076065] docker0: port 6(veth9cea596) entered forwarding state
[148970.308383] XFS (dm-5): Ending clean mount
[148972.406098] XFS (dm-12): Mounting V4 Filesystem
[148979.204529] XFS (dm-12): Ending clean mount
[148979.310072] XFS (dm-12): Unmounting Filesystem
[148979.633569] XFS (dm-10): Unmounting Filesystem
[148983.123119] XFS (dm-10): Mounting V4 Filesystem
[148991.624617] XFS (dm-10): Ending clean mount
[148991.719102] XFS (dm-10): Unmounting Filesystem
[148992.273999] XFS (dm-5): Unmounting Filesystem
[148996.400852] XFS (dm-5): Mounting V4 Filesystem
[149006.362503] XFS (dm-5): Ending clean mount
[149006.366398] device veth4c17192 entered promiscuous mode
[149006.368944] IPv6: ADDRCONF(NETDEV_UP): veth4c17192: link is not ready
[149006.371481] docker0: port 7(veth4c17192) entered forwarding state
[149006.374193] docker0: port 7(veth4c17192) entered forwarding state
[149006.376485] docker0: port 7(veth4c17192) entered disabled state
[149006.488697] eth0: renamed from vethf1f16f6
[149006.512744] IPv6: ADDRCONF(NETDEV_CHANGE): veth4c17192: link becomes ready
[149006.515220] docker0: port 7(veth4c17192) entered forwarding state
[149006.517499] docker0: port 7(veth4c17192) entered forwarding state
[149009.640492] XFS (dm-10): Mounting V4 Filesystem
[149021.532083] docker0: port 7(veth4c17192) entered forwarding state
[149025.218276] veth008f71d: renamed from eth0
[149025.240136] docker0: port 5(vethee3e81d) entered disabled state
[149025.264561] docker0: port 5(vethee3e81d) entered disabled state
[149025.269322] device vethee3e81d left promiscuous mode
[149025.271411] docker0: port 5(vethee3e81d) entered disabled state
[149027.535403] XFS (dm-10): Ending clean mount
[149027.635720] XFS (dm-10): Unmounting Filesystem
[149030.095504] XFS (dm-10): Mounting V4 Filesystem
[149037.251238] XFS (dm-10): Ending clean mount
[149037.255680] device veth33fd42f entered promiscuous mode
[149037.258285] IPv6: ADDRCONF(NETDEV_UP): veth33fd42f: link is not ready
[149037.308295] eth0: renamed from veth5313aad
[149037.324240] IPv6: ADDRCONF(NETDEV_CHANGE): veth33fd42f: link becomes ready
[149037.326997] docker0: port 5(veth33fd42f) entered forwarding state
[149037.329532] docker0: port 5(veth33fd42f) entered forwarding state
[149039.817639] XFS (dm-12): Mounting V4 Filesystem
[149046.010233] veth8254d1f: renamed from eth0
[149046.040515] docker0: port 6(veth9cea596) entered disabled state
[149046.047601] docker0: port 6(veth9cea596) entered disabled state
[149046.052412] device veth9cea596 left promiscuous mode
[149046.054473] docker0: port 6(veth9cea596) entered disabled state
[149049.174165] XFS (dm-12): Ending clean mount
[149049.193840] XFS (dm-8): Unmounting Filesystem
[149049.362115] XFS (dm-9): Unmounting Filesystem
[149051.283358] XFS (dm-8): Mounting V4 Filesystem
[149052.380095] docker0: port 5(veth33fd42f) entered forwarding state
[149057.369324] XFS (dm-8): Ending clean mount
[149057.373851] device vethed38967 entered promiscuous mode
[149057.376259] IPv6: ADDRCONF(NETDEV_UP): vethed38967: link is not ready
[149057.416396] eth0: renamed from veth75b5f7e
[149057.420649] XFS (dm-12): Unmounting Filesystem
[149057.432477] IPv6: ADDRCONF(NETDEV_CHANGE): vethed38967: link becomes ready
[149057.434989] docker0: port 6(vethed38967) entered forwarding state
[149057.437407] docker0: port 6(vethed38967) entered forwarding state
[149060.549355] XFS (dm-9): Mounting V4 Filesystem
[149070.293835] XFS (dm-9): Ending clean mount
[149070.391627] XFS (dm-9): Unmounting Filesystem
[149072.292865] XFS (dm-9): Mounting V4 Filesystem
[149072.476064] docker0: port 6(vethed38967) entered forwarding state
[149078.183708] XFS (dm-9): Ending clean mount
[149078.188850] device vethb4d1e4c entered promiscuous mode
[149078.191233] IPv6: ADDRCONF(NETDEV_UP): vethb4d1e4c: link is not ready
[149078.244379] eth0: renamed from veth3e75d38
[149078.268413] IPv6: ADDRCONF(NETDEV_CHANGE): vethb4d1e4c: link becomes ready
[149078.271102] docker0: port 8(vethb4d1e4c) entered forwarding state
[149078.273546] docker0: port 8(vethb4d1e4c) entered forwarding state
[149085.046306] vethfeef268: renamed from eth0
[149085.068155] docker0: port 3(veth0dcc7c2) entered disabled state
[149085.100852] docker0: port 3(veth0dcc7c2) entered disabled state
[149085.105209] device veth0dcc7c2 left promiscuous mode
[149085.107396] docker0: port 3(veth0dcc7c2) entered disabled state
[149085.213981] XFS (dm-4): Unmounting Filesystem
[149087.414439] vethf1f16f6: renamed from eth0
[149087.432156] docker0: port 7(veth4c17192) entered disabled state
[149087.457176] docker0: port 7(veth4c17192) entered disabled state
[149087.462018] device veth4c17192 left promiscuous mode
[149087.464300] docker0: port 7(veth4c17192) entered disabled state
[149087.569520] XFS (dm-5): Unmounting Filesystem
[149093.276064] docker0: port 8(vethb4d1e4c) entered forwarding state
[149117.174419] veth75b5f7e: renamed from eth0
[149117.188177] docker0: port 6(vethed38967) entered disabled state
[149117.216693] docker0: port 6(vethed38967) entered disabled state
[149117.221916] device vethed38967 left promiscuous mode
[149117.224040] docker0: port 6(vethed38967) entered disabled state
[149117.359820] XFS (dm-8): Unmounting Filesystem
[149132.998406] veth3e75d38: renamed from eth0
[149133.012226] docker0: port 8(vethb4d1e4c) entered disabled state
[149133.036571] docker0: port 8(vethb4d1e4c) entered disabled state
[149133.041418] device vethb4d1e4c left promiscuous mode
[149133.043390] docker0: port 8(vethb4d1e4c) entered disabled state
[149133.192824] XFS (dm-9): Unmounting Filesystem
[149139.006335] veth44f57f7: renamed from eth0
[149139.024246] docker0: port 4(vethe231830) entered disabled state
[149139.053278] docker0: port 4(vethe231830) entered disabled state
[149139.057511] device vethe231830 left promiscuous mode
[149139.059856] docker0: port 4(vethe231830) entered disabled state
[149139.153957] XFS (dm-6): Unmounting Filesystem
[149143.314316] veth5313aad: renamed from eth0
[149143.328130] docker0: port 5(veth33fd42f) entered disabled state
[149143.356825] docker0: port 5(veth33fd42f) entered disabled state
[149143.361370] device veth33fd42f left promiscuous mode
[149143.363408] docker0: port 5(veth33fd42f) entered disabled state
[149143.465966] XFS (dm-10): Unmounting Filesystem
[149175.831770] XFS (dm-4): Mounting V4 Filesystem
[149175.930204] XFS (dm-4): Ending clean mount
[149175.959535] XFS (dm-4): Unmounting Filesystem
[149176.067605] XFS (dm-4): Mounting V4 Filesystem
[149176.196382] XFS (dm-4): Ending clean mount
[149176.213340] XFS (dm-4): Unmounting Filesystem
[149176.310475] XFS (dm-4): Mounting V4 Filesystem
[149176.397907] XFS (dm-4): Ending clean mount
[149176.402287] device veth9338a6a entered promiscuous mode
[149176.404542] IPv6: ADDRCONF(NETDEV_UP): veth9338a6a: link is not ready
[149176.452380] eth0: renamed from vethea1b473
[149176.472466] IPv6: ADDRCONF(NETDEV_CHANGE): veth9338a6a: link becomes ready
[149176.474950] docker0: port 3(veth9338a6a) entered forwarding state
[149176.477048] docker0: port 3(veth9338a6a) entered forwarding state
[149191.516079] docker0: port 3(veth9338a6a) entered forwarding state
[149254.617881] XFS (dm-5): Mounting V4 Filesystem
[149263.403141] XFS (dm-5): Ending clean mount
[149263.914423] XFS (dm-5): Unmounting Filesystem
[149267.864165] XFS (dm-5): Mounting V4 Filesystem
[149275.100959] XFS (dm-5): Ending clean mount
[149275.182856] XFS (dm-5): Unmounting Filesystem
[149277.596934] XFS (dm-5): Mounting V4 Filesystem
[149284.623996] XFS (dm-5): Ending clean mount
[149284.631655] device vethb3758cf entered promiscuous mode
[149284.634042] IPv6: ADDRCONF(NETDEV_UP): vethb3758cf: link is not ready
[149284.696315] eth0: renamed from vethd39df60
[149284.724486] IPv6: ADDRCONF(NETDEV_CHANGE): vethb3758cf: link becomes ready
[149284.727024] docker0: port 4(vethb3758cf) entered forwarding state
[149284.729253] docker0: port 4(vethb3758cf) entered forwarding state
[149287.533498] XFS (dm-6): Mounting V4 Filesystem
[149295.371425] XFS (dm-6): Ending clean mount
[149298.228503] XFS (dm-8): Mounting V4 Filesystem
[149299.740079] docker0: port 4(vethb3758cf) entered forwarding state
[149303.598461] vethf805a12: renamed from eth0
[149303.608136] docker0: port 9(veth12fa990) entered disabled state
[149303.628754] docker0: port 9(veth12fa990) entered disabled state
[149303.633505] device veth12fa990 left promiscuous mode
[149303.635528] docker0: port 9(veth12fa990) entered disabled state
[149306.406484] XFS (dm-8): Ending clean mount
[149308.456428] XFS (dm-9): Mounting V4 Filesystem
[149315.351190] XFS (dm-9): Ending clean mount
[149317.382990] XFS (dm-10): Mounting V4 Filesystem
[149324.241323] XFS (dm-10): Ending clean mount
[149326.322200] XFS (dm-12): Mounting V4 Filesystem
[149333.257038] XFS (dm-12): Ending clean mount
[149335.614014] XFS (dm-13): Mounting V4 Filesystem
[149342.630850] XFS (dm-13): Ending clean mount
[149342.646079] XFS (dm-11): Unmounting Filesystem
[149345.614473] XFS (dm-11): Mounting V4 Filesystem
[149352.419559] XFS (dm-11): Ending clean mount
[149354.606527] XFS (dm-14): Mounting V4 Filesystem
[149361.989438] XFS (dm-14): Ending clean mount
[149364.248298] XFS (dm-15): Mounting V4 Filesystem
[149371.521260] XFS (dm-15): Ending clean mount
[149373.659157] XFS (dm-16): Mounting V4 Filesystem
[149380.988747] XFS (dm-16): Ending clean mount
[149381.228193] XFS (dm-12): Unmounting Filesystem

here are ecs-agent logs around incident:

  2016-03-13T01:23:20Z [INFO] Pulling container module="TaskEngine" task="Churn_Prediction_320:7 arn:aws:ecs:us-west-2:156473786033:task/da12924f-5f6f-4483-bcdc-644fa0169027, Status: (NONE->RUNNING) Containers: [churn_drivers (NONE->RUNNING),]" container="churn_drivers(quay.io/appuri/churn-drivers:aa9829e) (NONE->RUNNING)"
  2016-03-13T01:34:46Z [INFO] Creating container module="TaskEngine" task="Churn_Prediction_320:7 arn:aws:ecs:us-west-2:156473786033:task/da12924f-5f6f-4483-bcdc-644fa0169027, Status: (NONE->RUNNING) Containers: [churn_drivers (PULLED->RUNNING),]" container="churn_drivers(quay.io/appuri/churn-drivers:aa9829e) (PULLED->RUNNING)"
  2016-03-13T01:34:46Z [INFO] Created container name mapping for task Churn_Prediction_320:7 arn:aws:ecs:us-west-2:156473786033:task/da12924f-5f6f-4483-bcdc-644fa0169027, Status: (NONE->RUNNING) Containers: [churn_drivers (PULLED->RUNNING),] - churn_drivers(quay.io/appuri/churn-drivers:aa9829e) (PULLED->RUNNING) -> ecs-Churn_Prediction_320-7-churndrivers-b8a792d9f0dc87fcca01
  2016-03-13T01:35:12Z [INFO] Created docker container for task Churn_Prediction_320:7 arn:aws:ecs:us-west-2:156473786033:task/da12924f-5f6f-4483-bcdc-644fa0169027, Status: (NONE->RUNNING) Containers: [churn_drivers (PULLED->RUNNING),]: churn_drivers(quay.io/appuri/churn-drivers:aa9829e) (PULLED->RUNNING) -> a42e60f08bf661333f736178a27ad81a028972b84697fc2f397fc2c1f64e0b02
  2016-03-13T01:35:12Z [INFO] Starting container module="TaskEngine" task="Churn_Prediction_320:7 arn:aws:ecs:us-west-2:156473786033:task/da12924f-5f6f-4483-bcdc-644fa0169027, Status: (CREATED->RUNNING) Containers: [churn_drivers (CREATED->RUNNING),]" container="churn_drivers(quay.io/appuri/churn-drivers:aa9829e) (CREATED->RUNNING)"
  2016-03-13T01:35:55Z [INFO] Error transitioning container module="TaskEngine" task="Churn_Prediction_320:7 arn:aws:ecs:us-west-2:156473786033:task/da12924f-5f6f-4483-bcdc-644fa0169027, Status: (CREATED->RUNNING) Containers: [churn_drivers (CREATED->RUNNING),]" container="churn_drivers(quay.io/appuri/churn-drivers:aa9829e) (CREATED->RUNNING)" state="RUNNING"
  2016-03-13T01:35:55Z [WARN] Error with docker; stopping container module="TaskEngine" task="Churn_Prediction_320:7 arn:aws:ecs:us-west-2:156473786033:task/da12924f-5f6f-4483-bcdc-644fa0169027, Status: (CREATED->RUNNING) Containers: [churn_drivers (RUNNING->RUNNING),]" container="churn_drivers(quay.io/appuri/churn-drivers:aa9829e) (RUNNING->RUNNING)" err="Could not transition to inspecting; timed out after waiting 30s"
  2016-03-13T01:35:55Z [INFO] Stopping container module="TaskEngine" task="Churn_Prediction_320:7 arn:aws:ecs:us-west-2:156473786033:task/da12924f-5f6f-4483-bcdc-644fa0169027, Status: (RUNNING->STOPPED) Containers: [churn_drivers (RUNNING->STOPPED),]" container="churn_drivers(quay.io/appuri/churn-drivers:aa9829e) (RUNNING->STOPPED)"
  2016-03-13T01:36:20Z [INFO] Redundant container state change for task Churn_Prediction_320:7 arn:aws:ecs:us-west-2:156473786033:task/da12924f-5f6f-4483-bcdc-644fa0169027, Status: (RUNNING->STOPPED) Containers: [churn_drivers (RUNNING->STOPPED),]: churn_drivers(quay.io/appuri/churn-drivers:aa9829e) (RUNNING->STOPPED) to RUNNING, but already RUNNING
  2016-03-13T01:36:55Z [INFO] Error transitioning container module="TaskEngine" task="Churn_Prediction_320:7 arn:aws:ecs:us-west-2:156473786033:task/da12924f-5f6f-4483-bcdc-644fa0169027, Status: (RUNNING->STOPPED) Containers: [churn_drivers (RUNNING->STOPPED),]" container="churn_drivers(quay.io/appuri/churn-drivers:aa9829e) (RUNNING->STOPPED)" state="STOPPED"
  2016-03-13T01:36:55Z [INFO] Error for 'docker stop' of container; assuming it's stopped anyways module="TaskEngine" task="Churn_Prediction_320:7 arn:aws:ecs:us-west-2:156473786033:task/da12924f-5f6f-4483-bcdc-644fa0169027, Status: (RUNNING->STOPPED) Containers: [churn_drivers (STOPPED->STOPPED),]"
  2016-03-13T01:36:55Z [INFO] Task change event module="TaskEngine" event="{TaskArn:arn:aws:ecs:us-west-2:156473786033:task/da12924f-5f6f-4483-bcdc-644fa0169027 Status:STOPPED Reason: SentStatus:NONE}"
  2016-03-13T01:36:55Z [INFO] Adding event module="eventhandler" change="ContainerChange: arn:aws:ecs:us-west-2:156473786033:task/da12924f-5f6f-4483-bcdc-644fa0169027 churn_drivers -> STOPPED, Reason CannotInspectContainerError: Could not transition to inspecting; timed out after waiting 30s, Known Sent: NONE"
  2016-03-13T01:36:55Z [INFO] Adding event module="eventhandler" change="TaskChange: arn:aws:ecs:us-west-2:156473786033:task/da12924f-5f6f-4483-bcdc-644fa0169027 -> STOPPED, Known Sent: NONE"
  2016-03-13T01:36:55Z [INFO] Sending container change module="eventhandler" event="ContainerChange: arn:aws:ecs:us-west-2:156473786033:task/da12924f-5f6f-4483-bcdc-644fa0169027 churn_drivers -> STOPPED, Reason CannotInspectContainerError: Could not transition to inspecting; timed out after waiting 30s, Known Sent: NONE" change="ContainerChange: arn:aws:ecs:us-west-2:156473786033:task/da12924f-5f6f-4483-bcdc-644fa0169027 churn_drivers -> STOPPED, Reason CannotInspectContainerError: Could not transition to inspecting; timed out after waiting 30s, Known Sent: NONE"
  2016-03-13T01:36:55Z [INFO] Sending task change module="eventhandler" event="TaskChange: arn:aws:ecs:us-west-2:156473786033:task/da12924f-5f6f-4483-bcdc-644fa0169027 -> STOPPED, Known Sent: NONE" change="TaskChange: arn:aws:ecs:us-west-2:156473786033:task/da12924f-5f6f-4483-bcdc-644fa0169027 -> STOPPED, Known Sent: NONE"
  2016-03-13T01:48:49Z [INFO] Redundant container state change for task Churn_Prediction_320:7 arn:aws:ecs:us-west-2:156473786033:task/da12924f-5f6f-4483-bcdc-644fa0169027, Status: (STOPPED->STOPPED) Containers: [churn_drivers (STOPPED->STOPPED),]: churn_drivers(quay.io/appuri/churn-drivers:aa9829e) (STOPPED->STOPPED) to STOPPED, but already STOPPED
  2016-03-13T01:48:49Z [INFO] Redundant container state change for task Churn_Prediction_320:7 arn:aws:ecs:us-west-2:156473786033:task/da12924f-5f6f-4483-bcdc-644fa0169027, Status: (STOPPED->STOPPED) Containers: [churn_drivers (STOPPED->STOPPED),]: churn_drivers(quay.io/appuri/churn-drivers:aa9829e) (STOPPED->STOPPED) to STOPPED, but already STOPPED

There seems to be nothing important in docker logs:

  time="2016-03-13T01:36:01.995075070Z" level=info msg="GET /v1.17/containers/87d1b17d023a8439128c406428d1aa72902b9c134303b84645129f904e5aa64a/json"
  time="2016-03-13T01:36:10.964134870Z" level=info msg="POST /v1.17/images/create?fromImage=quay.io%2Fappuri%2Ftable-flipper%3A414c315"
  time="2016-03-13T01:36:12.412347684Z" level=info msg="GET /v1.17/containers/a42e60f08bf661333f736178a27ad81a028972b84697fc2f397fc2c1f64e0b02/json"
  time="2016-03-13T01:36:12.562930603Z" level=info msg="GET /v1.17/containers/b3c8c148af6d0f255ed082c1b9e6ac09b75579642c1e5f3bd90d1b3484fd4524/json"
  time="2016-03-13T01:36:12.709987581Z" level=info msg="POST /v1.17/containers/create?name=ecs-Refresh_Users_and_Segments_335-3-tableflippercontainer-b69198ec999ed7d0fd01"
  time="2016-03-13T01:36:15.328778441Z" level=info msg="POST /v1.17/images/create?fromImage=quay.io%2Fappuri%2Fsql-task-runner%3A0866c7f"
  time="2016-03-13T01:36:17.285630820Z" level=info msg="POST /v1.17/images/create?fromImage=quay.io%2Fappuri%2Ftable-flipper%3A414c315"
  time="2016-03-13T01:36:17.343400360Z" level=info msg="POST /v1.17/containers/create?name=ecs-Generate_User_Event_Index_216-2-sqltaskrunner-90feaca3cf8b9fbd5500"
  time="2016-03-13T01:36:18.199433459Z" level=info msg="POST /v1.17/containers/4183cf25197d310d9db930ecad95125eed1f1b271251d53643fd9cdabea7c105/start"
  time="2016-03-13T01:36:18.797324550Z" level=info msg="POST /v1.17/containers/create?name=ecs-Refresh_Users_and_Segments_164-3-tableflippercontainer-aef8a582ec9e9c9b0400"
  time="2016-03-13T01:36:20.477754710Z" level=info msg="GET /v1.17/containers/07ece9de53c7a63024b15358688ca2026f2f806e2414e22b19a5b209bf7d767b/json"
  time="2016-03-13T01:36:20.836527734Z" level=info msg="POST /v1.17/images/create?fromImage=quay.io%2Fappuri%2Ftable-flipper%3A414c315"
  time="2016-03-13T01:36:22.572960232Z" level=info msg="POST /v1.17/images/create?fromImage=quay.io%2Fappuri%2Ftable-flipper%3A414c315"
  time="2016-03-13T01:36:22.629403330Z" level=info msg="POST /v1.17/containers/create?name=ecs-Refresh_Users_and_Segments_177-3-tableflippercontainer-ecbafdedf5b1839b9101"
  time="2016-03-13T01:36:24.145977361Z" level=info msg="POST /v1.17/images/create?fromImage=quay.io%2Fappuri%2Ftable-flipper%3A414c315"
  time="2016-03-13T01:36:24.188707563Z" level=info msg="POST /v1.17/containers/create?name=ecs-Refresh_Users_and_Segments_143-3-tableflippercontainer-a2f7abffa3c2aebefc01"
  time="2016-03-13T01:36:25.816666122Z" level=info msg="POST /v1.17/containers/create?name=ecs-Refresh_Users_and_Segments_190-3-tableflippercontainer-dc90f0b7f4f8bfd9e701"
  time="2016-03-13T01:36:26.404627303Z" level=info msg="POST /v1.17/images/create?fromImage=quay.io%2Fappuri%2Ftable-flipper%3A414c315"
  time="2016-03-13T01:36:28.024001357Z" level=info msg="POST /v1.17/containers/create?name=ecs-Refresh_Users_and_Segments_199-3-tableflippercontainer-f4cf919cc8a087af5f00"
  time="2016-03-13T01:36:29.954773671Z" level=info msg="GET /v1.17/containers/ec6b67268206ea5c1fcdc278e591221a5afca19391e49731e0e31971f07de83f/json"
  time="2016-03-13T01:36:33.822792337Z" level=info msg="GET /v1.17/containers/c73932d8393c2d75c0edc72074363f113f38f014c85fed5eb5869b05dff3ac5a/json"
  time="2016-03-13T01:36:39.267658752Z" level=info msg="POST /v1.17/containers/b3c8c148af6d0f255ed082c1b9e6ac09b75579642c1e5f3bd90d1b3484fd4524/start"
  time="2016-03-13T01:36:39.268587000Z" level=info msg="GET /v1.17/containers/088421aba4f4b810dabde3f1520ad720319bd8aaff3f0e257a7ac2e430ac8544/json"
  time="2016-03-13T01:36:42.412616709Z" level=info msg="GET /v1.17/containers/07ece9de53c7a63024b15358688ca2026f2f806e2414e22b19a5b209bf7d767b/json"
  time="2016-03-13T01:36:46.939015988Z" level=info msg="GET /v1.17/containers/9a941f8592625c090faa7bbb29d5d051460695bac358ff77da8968a9401eda5e/json"
  time="2016-03-13T01:36:50.478022196Z" level=info msg="GET /v1.17/containers/9be050a0c9e8844fa3fc41140ab3347a5ff9786b8b214e09d9f9d9a142b4fe5f/json"
  time="2016-03-13T01:36:50.906360414Z" level=info msg="POST /v1.17/images/create?fromImage=quay.io%2Fappuri%2Ftable-flipper%3A414c315"
  time="2016-03-13T01:36:52.611985364Z" level=info msg="POST /v1.17/containers/create?name=ecs-Refresh_Users_and_Segments_264-4-tableflippercontainer-aa9fa48bf1d98f999801"
  time="2016-03-13T01:36:53.298984787Z" level=info msg="GET /v1.17/containers/c7ae4c0ac987b161f36e2afbaa02d50344f2a0ce7efcac1a1ff03723b22de34f/json"

I will set ecs-agent log level to debug to see if we can get more interesting info.

Jakub

jwerak on 13 Mar 2016

@veverjak Thanks for providing that information. Yes, you assumed correctly in that I was looking for disk device related issues in dmesg. The Agent logs do show timeouts, which is leading me to believe that the Docker daemon is responding slowly. Can you see how long it takes to perform a docker inspect of a container? Also, can you share the output of docker info and how long it took for that to return?

samuelkarp on 13 Mar 2016

it does take a lot of time indeed...

time docker info

Containers: 3802
Images: 154
Server Version: 1.9.1
Storage Driver: devicemapper
 Pool Name: docker-docker--pool
 Pool Blocksize: 524.3 kB
 Base Device Size: 107.4 GB
 Backing Filesystem: xfs
 Data file:
 Metadata file:
 Data Space Used: 37.38 GB
 Data Space Total: 107.2 GB
 Data Space Available: 69.78 GB
 Metadata Space Used: 65.97 MB
 Metadata Space Total: 109.1 MB
 Metadata Space Available: 43.09 MB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.93-RHEL7 (2015-01-28)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.1.17-22.30.amzn1.x86_64
Operating System: Amazon Linux AMI 2015.09
CPUs: 4
Total Memory: 7.308 GiB
Name: ip-10-0-1-214
ID: PBSV:AZVO:LRPJ:G55J:U4VW:L6YV:YMKU:WLMH:53UF:ZYD3:G7CI:L26L

real    1m35.782s
user    0m0.024s
sys     0m0.008s

Seems like its related to this issue and is solved in docker 1.10

Do you know of this issue? Is there any workaround?
I believe docker 1.10 is not in amzn ECS AMI yet, right?

Jakub

jwerak on 13 Mar 2016

It looks like you're already using the mitigation we attempted for the ECS-optimized AMI after discussion in https://github.com/docker/docker/issues/18314, but the Docker daemon is still being slow. Docker 1.10.x is not yet available in the Amazon Linux AMI repo yet (and unfortunately there is no RPM for the Amazon Linux AMI from get.docker.com).

A potential option here is raising the timeouts, which are defined here, but my worry is that making them higher will make detecting failure slower and that the daemon could still continue to exceed raised timeouts.

If you have reproduction steps for the behavior you're seeing, I'd really appreciate them so I can try and find another mitigation.

samuelkarp on 13 Mar 2016

I don't have repro steps unfortunately. I can't see the issue consistently - it seems to be working now, so its delight to debug I suppose.

I will try to test few scenarios tomorrow and let you know if I will have repro steps.

We have our custom scheduler and we burst a lot of tasks at the time, so high load could be the source of issue, but right now I am just guessing.

Jakub

jwerak on 13 Mar 2016

@samuelkarp to add some color to what @veverjak said - we spin up containers as part of a background job scheduling system. We can easily spin up 50-100 containers at the same time. These are small containers (typical CPU/mem is 32/32) and they should fit fine on the ECS instances we use. Unfortunately, we see the issues in this thread.

bilalaslamseattle on 14 Mar 2016

I have one question @samuelkarp you said we are using mitigation you tried in https://github.com/docker/docker/issues/18314, but this means using dm.basesize=10G, right?
We currently have default Base Device Size: 107.4 GB.

Can I change this settings during cloud-init run? Seems like the lvm is already created when cloud-init is run.

Can you advice what would you suggest to do it in automated way during instance initiation?

jwerak on 14 Mar 2016

@samuelkarp ping on this. Does the ECS team have a plan for moving to Docker 1.10? We're down to periodically restarting Docker and the ECS agent - which kills all running containers.

bilalaslamseattle on 15 Mar 2016

this issue really making us crazy , each morning we need to get terminate ecs instances which have this error in agent log, so that ASG can spin new instances.
@samuelkarp could you please help here , sorry we dont have any reproduction steps .

docker info
Containers: 33
Images: 382
Server Version: 1.9.1
Storage Driver: devicemapper
Pool Name: docker-docker--pool
Pool Blocksize: 524.3 kB
Base Device Size: 107.4 GB
Backing Filesystem: xfs
Data file:
Metadata file:
Data Space Used: 10.16 GB
Data Space Total: 78.16 GB
Data Space Available: 68 GB
Metadata Space Used: 6.648 MB
Metadata Space Total: 25.17 MB
Metadata Space Available: 18.52 MB
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.93-RHEL7 (2015-01-28)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.1.17-22.30.amzn1.x86_64
Operating System: Amazon Linux AMI 2015.09
CPUs: 16
Total Memory: 29.44 GiB
Name: ip-**
ID: FP62:I76P:IAUQ:LMFR:A2LP:HAKN:6U4W:VXGE:VMS5:OQHE:2FWB:22S5

hridyeshpant on 16 Mar 2016

@hridyeshpant we restart Docker and the agent ... I think this gets around the need to terminate instances wholesale. @veverjak if I'm right, can you share the instructions?

bilalaslamseattle on 16 Mar 2016

but we can't restart docker and agent for production environment ,it will make service unavailable.
we are doing deregister that faulty container id, so that al running task can migrate to another ecs instances and then terminating that instance . @bilalaslamseattle

hridyeshpant on 16 Mar 2016

@hridyeshpant that sucks .. sorry .. we're running into this for batch (background) jobs where a restart is ok because the job will get kicked off again.

bilalaslamseattle on 16 Mar 2016

@samuelkarp @aws-dpt - can you please update this issue with next steps? As @hridyeshpant already noted, this is causing major disruptions in services that rely on ECS. I also believe the More Info Needed tag on the issue is incorrect, you have all the info you need.

bilalaslamseattle on 16 Mar 2016

👍1

@veverjak The mitigation we use in is the direct-lvm setup (two disks, with the second disk being a dedicated thin pool for Docker). LVM is set up as a bootcmd, so it runs fairly early in the boot process (see /etc/cloud/cloud.cfg.d/90_ecs.cfg on your instance) ahead of most user-data. You can change dm.basesize if you want (and it's a reasonable thing to try) with the following user-data:

#cloud-boothook
cloud-init-per instance dm_basesize sh -c "echo 'OPTIONS=\"\${OPTIONS} --storage-opt dm.basesize=10G\"' >> /etc/sysconfig/docker"

Note that the user-data I provided here is not guaranteed to work in the future if we change the AMI configuration.

@hridyeshpant @bilalaslamseattle Any sort of information you can provide on reproduction steps would be a great help here. If you're not sure how to reproduce it, answering these questions might help:

What's your peak and average task start rate? (Calls to RunTask/StartTask or tasks initiated by a Service)
What are general parameters for features you enable (labels, privileged, ports, etc)?
Have you set any configuration for the Agent? If so, what?
Have you set any configuration for Docker? If so, what?
What are the characteristics of your images? How many layers, what are the sizes, are there many different images or are they rerunning the same image many times, etc?
Does anything show up in Agent or Docker daemon logs? What about dmesg?
What does the disk activity on your EBS volumes look like (IOPS rate, provisioned/burst IOPS, etc)?

samuelkarp on 17 Mar 2016

Hi @samuelkarp,

your last hint was correct in our case, we have one job that is bursting a lot of reads from disk and this caused device saturation.
Overloading devicemapper device is obviously causing docker slowness.
I don't think we are hitting any ecs or docker bug here.

Thank you for all your help though,
Jakub

jwerak on 18 Mar 2016

@veverjak I'm glad to hear that we've come to what sounds like a root cause. Let me know if you continue to have problems.

@hridyeshpant If you're still running into problems, can you check the things I suggested: https://github.com/aws/amazon-ecs-agent/issues/336#issuecomment-198026978?

samuelkarp on 18 Mar 2016

@samuelkarp can you tell me how to test disk activity on your EBS volumes look like (IOPS rate, provisioned/burst IOPS, etc).
Can we have any command to get such data?

hridyeshpant on 19 Mar 2016

@hridyeshpant These links should be helpful:

samuelkarp on 19 Mar 2016

@samuelkarp i am attaching some EBS metric attached to the faulty machines where we just have 56 CannotInspectContainerError .

our instance type is c4.4xlarge

"c4.4xlarge",

hridyeshpant on 19 Mar 2016

cat /etc/ecs/ecs.config
ECS_CLUSTER=test
ECS_ENGINE_AUTH_TYPE=dockercfg
ECS_LOGLEVEL=debug
ECS_AVAILABLE_LOGGING_DRIVERS=["json-file","syslog","fluentd"]
ECS_ENGINE_AUTH_DATA=**
cat /etc/sysconfig/docker
DAEMON_MAXFILES=1048576
# Additional startup options for the Docker daemon, for example:
# OPTIONS="--ip-forward=true --iptables=true"
# By default we limit the number of open files per container
OPTIONS="--default-ulimit nofile=1024:4096"

3.Avg. no of task per container are 12.

some part of dmesg out put
260462.820386] docker0: port 7(veth86fe7c7) entered disabled state
[260462.869498] docker0: port 7(veth86fe7c7) entered disabled state
[260462.874361] device veth86fe7c7 left promiscuous mode
[260462.876368] docker0: port 7(veth86fe7c7) entered disabled state
[260463.125963] XFS (dm-9): Unmounting Filesystem
[260492.142324] veth6c9096f: renamed from eth0
[260492.172412] docker0: port 9(vethdc874aa) entered disabled state
[260492.199239] docker0: port 9(vethdc874aa) entered disabled state
[260492.205366] device vethdc874aa left promiscuous mode
[260492.208201] docker0: port 9(vethdc874aa) entered disabled state
[260492.351408] XFS (dm-11): Unmounting Filesystem
[260536.428108] XFS (dm-9): Mounting V4 Filesystem
[260536.522019] XFS (dm-9): Ending clean mount
[260536.555449] XFS (dm-9): Unmounting Filesystem
[260536.736970] XFS (dm-9): Mounting V4 Filesystem
[260536.837464] XFS (dm-9): Ending clean mount
[260536.867861] XFS (dm-9): Unmounting Filesystem
[260536.969889] XFS (dm-9): Mounting V4 Filesystem
[260537.043118] XFS (dm-9): Ending clean mount
[260537.048536] device vethe1c78ad entered promiscuous mode
[260537.052022] IPv6: ADDRCONF(NETDEV_UP): vethe1c78ad: link is not ready
[260537.136577] eth0: renamed from veth9b38b0e
[260537.160471] IPv6: ADDRCONF(NETDEV_CHANGE): vethe1c78ad: link becomes ready
[260537.164446] docker0: port 7(vethe1c78ad) entered forwarding state
[260537.167952] docker0: port 7(vethe1c78ad) entered forwarding state
[260541.715365] XFS (dm-11): Mounting V4 Filesystem
[260552.220087] docker0: port 7(vethe1c78ad) entered forwarding state
[260554.714441] XFS (dm-11): Ending clean mount
[260556.483769] XFS (dm-11): Unmounting Filesystem
[260564.652740] XFS (dm-11): Mounting V4 Filesystem
[260585.893416] XFS (dm-11): Ending clean mount
[260586.093343] XFS (dm-11): Unmounting Filesystem
[260593.330227] XFS (dm-11): Mounting V4 Filesystem
[260610.060457] XFS (dm-11): Ending clean mount
[260610.065625] device vethb70f455 entered promiscuous mode
[260610.068293] IPv6: ADDRCONF(NETDEV_UP): vethb70f455: link is not ready
[260610.071219] docker0: port 9(vethb70f455) entered forwarding state
[260610.074052] docker0: port 9(vethb70f455) entered forwarding state
[260610.076849] docker0: port 9(vethb70f455) entered disabled state
[260610.132762] eth0: renamed from vethc919f19
[260610.156696] IPv6: ADDRCONF(NETDEV_CHANGE): vethb70f455: link becomes ready
[260610.159872] docker0: port 9(vethb70f455) entered forwarding state
[260610.162634] docker0: port 9(vethb70f455) entered forwarding state
[260625.180080] docker0: port 9(vethb70f455) entered forwarding state
[260668.074440] veth9b38b0e: renamed from eth0
[260668.092350] docker0: port 7(vethe1c78ad) entered disabled state
[260668.132824] docker0: port 7(vethe1c78ad) entered disabled state
[260668.138011] device vethe1c78ad left promiscuous mode
[260668.140150] docker0: port 7(vethe1c78ad) entered disabled state
[260668.249059] XFS (dm-9): Unmounting Filesystem
[260673.386527] vethc919f19: renamed from eth0
[260673.412393] docker0: port 9(vethb70f455) entered disabled state
[260673.444496] docker0: port 9(vethb70f455) entered disabled state
[260673.450579] device vethb70f455 left promiscuous mode
[260673.453434] docker0: port 9(vethb70f455) entered disabled state
[260680.374898] XFS (dm-11): Unmounting Filesystem
[260693.794132] XFS (dm-9): Mounting V4 Filesystem
[260693.900630] XFS (dm-9): Ending clean mount
[260693.931487] XFS (dm-9): Unmounting Filesystem
[260694.101033] XFS (dm-9): Mounting V4 Filesystem
[260698.644485] XFS (dm-9): Ending clean mount
[260698.795421] XFS (dm-9): Unmounting Filesystem
[260700.612764] XFS (dm-9): Mounting V4 Filesystem
[260704.997406] XFS (dm-9): Ending clean mount
[260705.003288] device veth6350b76 entered promiscuous mode
[260705.006703] IPv6: ADDRCONF(NETDEV_UP): veth6350b76: link is not ready
[260705.092591] eth0: renamed from vethf1febb6
[260705.116692] IPv6: ADDRCONF(NETDEV_CHANGE): veth6350b76: link becomes ready
[260705.120869] docker0: port 7(veth6350b76) entered forwarding state
[260705.124481] docker0: port 7(veth6350b76) entered forwarding state
[260707.881401] XFS (dm-11): Mounting V4 Filesystem
[260720.156063] docker0: port 7(veth6350b76) entered forwarding state
[260737.694229] XFS (dm-11): Ending clean mount
[260738.679503] XFS (dm-11): Unmounting Filesystem
[260744.542503] XFS (dm-11): Mounting V4 Filesystem
we are running similar base image each time.
let me know if you need more info.

hridyeshpant on 19 Mar 2016

I had this issue after i update the ami version to amzn-ami-2015.09.g on m3.medium instances with a default storage of 22 gb ssd. I was able to resolve this changing the size of storage up to 120 gb but i want to know if there is any other workaround in order to avoid expand the storage.

moranmathias on 21 Mar 2016

@hridyeshpant Based on the graphs you posted, it looks like you have fairly high sustained and burst write IOPS on that volume. The EBS documentation has a good section on how IOPS credits are accumulated and consumed.

samuelkarp on 22 Mar 2016

@samuelkarp we are seeing a similar thing. Our use case is to run scheduled (lots of them) on a timer - for example, we routinely launch 50+ containers on an instance at once. The thing is that most of these containers are instances of the same 4 or 5 images. I would assume that Docker would be smart about caching these images to reduce IO i.e. if I launch 10 instances of the same container I don't read the same image 10 times - do you know if that's the case?

bilalaslamseattle on 22 Mar 2016

@bilalaslamseattle Docker will create metadata and a new write layer for each container that starts. I'm not sure offhand how much IO Docker will perform during concurrent container creation; your best bet is probably measuring what you consume in this particular use-case.

samuelkarp on 22 Mar 2016

@samuelkarp big thanks for your help here. We found that this problem went away when we ran more smaller instances e.g. c4.large instead of a few bigger ones.

bilalaslamseattle on 26 Mar 2016

@bilalaslamseattle @hridyeshpant Please let us know if you need more help here or continue to face issues. I am closing this issue for now as it seems like we have root caused the issue and a remediation as well.

aaithal on 29 Mar 2016

Confirming that this problem reported by OP was indeed related to exhausted EBS IOPS credits when using the default 22Gb /dev/xvdcz volume (using c4.4xlarge instances with up to 20 tasks running per instance). The problem is observable in Cloudwatch by comparing VolumeWriteOps with VolumeQueueLength - we could see VolumeQueueLength jump to over 10 once IOPS credits were exhausted (about 8hrs after launch).

The problem was particularly reproducible in the scenario where there are a number of services in the cluster that are continually deploying tasks (i.e. due to failing containers).

As per ECS team's advice, we solved it by configuring /dev/xvdcz as a 1,000GiB gp2 volume during launch to ensure we don't exhaust IOPS credits.

mattcallanan on 5 Jul 2016

👍6

Pointers on how to change size/type/iops of /dev/xvdcz volume?

chriskinsman on 28 Dec 2017

👍1

Amazon-ecs-agent: CannotInspectContainerError: Could not transition to inspecting

Most helpful comment

All 30 comments

Related issues