Copilot-cli: Failed to create a new environment if it's the first time to create an ECS cluster in an AWS account

Created on 25 Aug 2020 · 8Comments · Source: aws/copilot-cli

Reproduce:

Create a new AWS account
Use that account for copilot init
'Yes' for deploying test env (or copilot env init after 'No' for it)

Then the CFn stack for new env is going to fail due to an error something like:

Invalid request provided: CreateCluster Invalid Request: Unable to assume the service linked role. Please verify that the ECS service linked role exists. (Service: Ecs, Status Code: 400, Request ID: a973c2ca-204c-4715-9e2e-6097da7eb8c1, Extended Request ID: null)

Here is the CLI output:

~ snip ~
All right, you're all set for local development.
Deploy: Yes

✘ Failed to create the infrastructure for the test environment.
- Virtual private cloud on 2 availability zones to hold your services                       [Failed]
- Virtual private cloud on 2 availability zones to hold your services                       [Failed]
  Resource creation cancelled
  - Internet gateway to connect the network to the internet                                 [Failed]
  Resource creation cancelled                                                                        ess]
  - Public subnets for internet facing services                                             [In Progress]
  - Private subnets for services that can't be reached from the internet                    [In Progress]
  - Routing tables for services to talk with each other                                     [In Progress]
- ECS Cluster to hold your services                                                         [Failed]
  Invalid request provided: CreateCluster Invalid Request: Unable to assume the service linked role  ess]
- Application load balancer to distribute traffic                                           [In Progress]
✘ wait until stack prod-ready-copilot-test create is complete: ResourceNotReady: failed waiting for successful resource state

$ copilot --version
copilot version: v0.3.0

I think we need some documentation and/or nice CLI output to let users solve - removing the existing failed stack and copilot env init again - this problem.

areenv typbug typrequest

Source

toricls

👍11

All 8 comments

Yea - this is a weird one. You can just run env init again - and it'll work (copilot will clean up failed stacks) - but it's an odd race condition between the cluster and SLR being created. I'll also bring this up with the service team - since I don't think this is the behavior we're expecting.

kohidave on 25 Aug 2020

@kohidave Thanks! I tried the repro three times with different new AWS accounts and had (successfully?) same results, Just FYI 😉

toricls on 25 Aug 2020

Were the accounts brand new too? I wonder if accounts that weren't new, but not using ECS would have the same issue. But either way, we should figure out a better way to handle this!

Ah update - I think I understand now that this might be a bit of a bug in the CF resource itself. It creates a role, but doesn't wait for the eventual consistency delay of the SLR.

kohidave on 25 Aug 2020

Ah, guess we need a dependency on it

toricls on 25 Aug 2020

We may be able to create a custom resource that sets it up (it's not a normal role) and then have the cluster have a dependency on that custom resource. I'll work with the service team in the meantime to see if there are other ways to work around this.

kohidave on 26 Aug 2020

👍1

Ok, one better work around is to create the SLR manually in our env stack: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-iam-servicelinkedrole.html

That should alleviate this issue and is pretty simple!

kohidave on 27 Aug 2020

❤1

I ran into this as well, had to manually create ECS role. Running init again didn't help.