Amazon-ecs-agent: Task containers dependencies resolution stuck forever

Created on 18 Aug 2020  ·  6Comments  ·  Source: aws/amazon-ecs-agent

Summary

Task stuck in Pending state because of some problems during containers dependencies resolution.
Setting Start timeout doesn't have any effect as well.
This is reopening of my previous issue #2350, with the additional details and a Task Definition to reproduce the error.

Description

I have 3 containers:

| Container name | Essential | Depends On Container | Depends On State | Start Timeout |
|----------------|-----------|----------------------|------------------|---------------|
| exit1 | false | | | 10 |
| mysql | true | exit1 | SUCCESS | 30 |
| nginx | true | mysql | START | 60 |

(nginx depends on mysql which depends on exit1)

When container exit1 fails with the exit code 1, containers mysql and nginx remains in the PENDING state forever.
When container exit1 succeeds with exit code 0 everything works fine.

I also tried to set container nginx dependencies like:

      [
        {
          "containerName": "mysql",
          "condition": "START"
        },
        {
          "containerName": "exit1",
          "condition": "SUCCESS"
        }
      ]

the result is the same.

However, behavior seems to be correct if the order of dependencies for nginx is changed:

      [
        {
          "containerName": "exit1",
          "condition": "SUCCESS"
        },
        {
          "containerName": "mysql",
          "condition": "START"
        }
      ]

Expected Behavior

The task is failed to start.

Observed Behavior

Task stuck in the PENDING state forever

Screen Shot 2020-08-18 at 5 02 17 PM

Environment Details

  • ECS Agent version: 1.43.0

Task Definitions to reproduce:


With transitive dependencies - stuck as PENDING

{
    "containerDefinitions": [
        {
            "command": [
                "cat",
                "123"
            ],
            "image": "alpine",
            "startTimeout": 10,
            "name": "exit1",
            "essential": false
        },
        {
            "image": "nginx",
            "startTimeout": 60,
            "dependsOn": [
                {
                    "containerName": "mysql",
                    "condition": "START"
                }
            ],
            "name": "nginx"
        },
        {
            "image": "mysql:5.7",
            "startTimeout": 30,
            "dependsOn": [
                {
                    "containerName": "exit1",
                    "condition": "SUCCESS"
                }
            ],
            "name": "mysql"
        }
    ],
    "memory": "100",
    "family": "reproduce-dependency-problem",
    "requiresCompatibilities": [
        "EC2"
    ],
    "cpu": "128"
}


With unordered multiple dependencies - stuck as PENDING

{
    "containerDefinitions": [
        {
            "command": [
                "cat",
                "123"
            ],
            "image": "alpine",
            "startTimeout": 10,
            "name": "exit1",
            "essential": false
        },
        {
            "image": "nginx",
            "startTimeout": 60,
            "dependsOn": [
                {
                    "containerName": "mysql",
                    "condition": "START"
                },
                {
                    "containerName": "exit1",
                    "condition": "SUCCESS"
                }
            ],
            "name": "nginx"
        },
        {
            "image": "mysql:5.7",
            "startTimeout": 30,
            "dependsOn": [
                {
                    "containerName": "exit1",
                    "condition": "SUCCESS"
                }
            ],
            "name": "mysql"
        }
    ],
    "memory": "100",
    "family": "reproduce-dependency-problem",
    "requiresCompatibilities": [
        "EC2"
    ],
    "cpu": "128"
}


With logically ordered multiple dependencies - task STOPPED as expected ✅

{
    "containerDefinitions": [
        {
            "command": [
                "cat",
                "123"
            ],
            "image": "alpine",
            "startTimeout": 10,
            "name": "exit1",
            "essential": false
        },
        {
            "image": "nginx",
            "startTimeout": 60,
            "dependsOn": [
                {
                    "containerName": "exit1",
                    "condition": "SUCCESS"
                },
                {
                    "containerName": "mysql",
                    "condition": "START"
                }
            ],
            "name": "nginx"
        },
        {
            "image": "mysql:5.7",
            "startTimeout": 30,
            "dependsOn": [
                {
                    "containerName": "exit1",
                    "condition": "SUCCESS"
                }
            ],
            "name": "mysql"
        }
    ],
    "memory": "100",
    "family": "reproduce-dependency-problem",
    "requiresCompatibilities": [
        "EC2"
    ],
    "cpu": "128"
}

kinbug pending release

All 6 comments

@Trane9991 Sorry that you are facing this issue, I will try to reproduce this on my end.

Meanwhile, Is it possible for you to send the task definition(with which you are seeing the issue) and Agent logs to [email protected].

Hey @shubham2892
Thanks for the quick reply, I published the Task Definitions to reproduce the issues under the collapsable spoilers in the Task Definitions to reproduce section of the issue description :)

Let me know if you are able to reproduce that because it reproduces in 100% cases for me with Task Definitions shared in the issue description.

I was able to reproduce the pending state behavior with With transitive dependencies task def and With unordered multiple dependencies, will mark this as a bug and work on getting this fixed.

Hi,

The PR for working on the fix for the ordered container dependency problem is https://github.com/aws/amazon-ecs-agent/pull/2615.

Regards,
Utsa

This fix has been released as part of ECS Agent 1.44.4: https://github.com/aws/amazon-ecs-agent/releases/tag/1.44.4

Please perform an update of the Agent: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-update.html or you can find the latest ECS Optimized AMIs containing ECS Agent 1.44.4 here: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html

Was this page helpful?
0 / 5 - 0 ratings