Copilot-cli: Can't redeploy or delete a service

Created on 22 Jul 2020  Â·  5Comments  Â·  Source: aws/copilot-cli

First off, kudos for the copilot CLI. The amount of infrastructure code and automation that I've been able to delete b/c of it is great! On to the issue...

It appears my app is stuck in a crash/restart loop, which seems to be causing the CF status of the service to appear as UPDATE_IN_PROGRESS, which prevents me from doing either of the following successfully:

copilot svc delete --env dev --name backend --yes
✘ Failed to delete service backend from environment dev: delete stack XXX-dev-backend: ValidationError: Stack [arn:aws:cloudformation:us-east-2:xxxxxxxx:stack/XXX-dev-backend/45063280-cbb3-11ea-979b-06e9f06d5ec4] cannot be deleted while in status UPDATE_IN_PROGRESS
    status code: 400, request id: c743537f-8594-4a63-a953-265563a15486.
✘ delete stack XXX-dev-backend: ValidationError: Stack [arn:aws:cloudformation:us-east-2:xxxxxxxx:stack/XXX-dev-backend/45063280-cbb3-11ea-979b-06e9f06d5ec4] cannot be deleted while in status UPDATE_IN_PROGRESS
    status code: 400, request id: c743537f-8594-4a63-a953-265563a15486
copilot svc deploy --name backend --env dev
...
...
19503f7a9eec: Layer already exists
88cc1a200eb9: Layer already exists
05df73b: digest: sha256:3b1fe01a380e4bec79cc4572c23f5d1cddabae08df1a568d370b28452f5e8a0a size: 2622
✘ Failed to deploy service.

✘ deploy service: stack XXX-dev-backend is currently being updated and cannot be deployed to

In the list of ECS service names, I see two: one is RUNNING (I believe this is the version I deployed prior to introducing my crash/restart bug). The other cycles between PROVISIONING and ACTIVATING.

The CF resources for my service show one Resource as perpetually in progress:

Service | arn:aws:ecs:us-east-2:xxxxxxxx:service/XXX-dev-Cluster-n3KxvSFKqWOR/XXX-dev-backend-Service-1TGEYMJZS14Z9 | AWS::ECS::Service | UPDATE_IN_PROGRESS

The CF event history also shows a lot that is stuck at in progress:

2020-07-22 06:57:09 UTC-0400 | Service | UPDATE_IN_PROGRESS | -
-- | -- | -- | --
2020-07-22 06:57:07 UTC-0400 | TaskDefinition | UPDATE_COMPLETE | -
2020-07-22 06:57:07 UTC-0400 | TaskDefinition | UPDATE_IN_PROGRESS | Resource creation Initiated
2020-07-22 06:57:07 UTC-0400 | TaskDefinition | UPDATE_IN_PROGRESS | Requested update requires the creation of a new physical resource; hence creating one.
2020-07-22 06:57:03 UTC-0400 | AddonsStack | UPDATE_COMPLETE | -
2020-07-22 06:57:03 UTC-0400 | AddonsStack | UPDATE_IN_PROGRESS | -
2020-07-22 06:56:57 UTC-0400 | XXX-dev-backend | UPDATE_IN_PROGRESS | User Initiated
2020-07-22 05:22:56 UTC-0400 | XXX-dev-backend | UPDATE_COMPLETE | -
2020-07-22 05:22:56 UTC-0400 | AddonsStack | UPDATE_COMPLETE | -
2020-07-22 05:22:55 UTC-0400 | XXX-dev-backend | UPDATE_COMPLETE_CLEANUP_IN_PROGRESS | -
2020-07-22 05:22:51 UTC-0400 | AddonsStack | UPDATE_COMPLETE | -
2020-07-22 05:22:51 UTC-0400 | AddonsStack | UPDATE_IN_PROGRESS | -
2020-07-22 05:22:46 UTC-0400 | XXX-dev-backend | UPDATE_IN_PROGRESS | User Initiated
2020-07-22 05:20:36 UTC-0400 | XXX-dev-backend | UPDATE_COMPLETE | -
2020-07-22 05:20:35 UTC-0400 | AddonsStack | UPDATE_COMPLETE | -
2020-07-22 05:20:35 UTC-0400 | TaskDefinition | DELETE_COMPLETE | -
2020-07-22 05:20:34 UTC-0400 | TaskDefinition | DELETE_IN_PROGRESS | -
2020-07-22 05:20:33 UTC-0400 | XXX-dev-backend | UPDATE_COMPLETE_CLEANUP_IN_PROGRESS | -
2020-07-22 05:20:32 UTC-0400 | Service | UPDATE_COMPLETE | -
2020-07-22 05:17:30 UTC-0400 | Service | UPDATE_IN_PROGRESS | -
2020-07-22 05:17:28 UTC-0400 | TaskDefinition | UPDATE_COMPLETE | -
2020-07-22 05:17:28 UTC-0400 | TaskDefinition | UPDATE_IN_PROGRESS | Resource creation Initiated
2020-07-22 05:17:28 UTC-0400 | TaskDefinition | UPDATE_IN_PROGRESS | Requested update requires the creation of a new physical resource; hence creating one.
2020-07-22 05:17:25 UTC-0400 | AddonsStack | UPDATE_COMPLETE | -
2020-07-22 05:17:24 UTC-0400 | AddonsStack | UPDATE_IN_PROGRESS | -
2020-07-22 05:17:20 UTC-0400 | XXX-dev-backend | UPDATE_IN_PROGRESS | User Initiated
2020-07-21 20:39:49 UTC-0400 | XXX-dev-backend | CREATE_COMPLETE | -
2020-07-21 20:39:48 UTC-0400 | Service | CREATE_COMPLETE | -
2020-07-21 20:37:47 UTC-0400 | Service | CREATE_IN_PROGRESS | Resource creation Initiated
2020-07-21 20:37:46 UTC-0400 | Service | CREATE_IN_PROGRESS | -
2020-07-21 20:37:45 UTC-0400 | TaskDefinition | CREATE_COMPLETE | -
2020-07-21 20:37:44 UTC-0400 | TaskDefinition | CREATE_IN_PROGRESS | Resource creation Initiated
2020-07-21 20:37:44 UTC-0400 | TaskDefinition | CREATE_IN_PROGRESS | -
2020-07-21 20:37:43 UTC-0400 | AddonsStack | CREATE_COMPLETE | -
2020-07-21 20:37:00 UTC-0400 | TaskRole | CREATE_COMPLETE | -
2020-07-21 20:37:00 UTC-0400 | ExecutionRole | CREATE_COMPLETE | -
2020-07-21 20:35:49 UTC-0400 | DiscoveryService | CREATE_COMPLETE | -
2020-07-21 20:35:49 UTC-0400 | DiscoveryService | CREATE_IN_PROGRESS | Resource creation Initiated
2020-07-21 20:35:49 UTC-0400 | LogGroup | CREATE_COMPLETE | -
2020-07-21 20:35:49 UTC-0400 | AddonsStack | CREATE_IN_PROGRESS | Resource creation Initiated
2020-07-21 20:35:49 UTC-0400 | TaskRole | CREATE_IN_PROGRESS | Resource creation Initiated
2020-07-21 20:35:48 UTC-0400 | LogGroup | CREATE_IN_PROGRESS | Resource creation Initiated
2020-07-21 20:35:48 UTC-0400 | ExecutionRole | CREATE_IN_PROGRESS | Resource creation Initiated
2020-07-21 20:35:48 UTC-0400 | TaskRole | CREATE_IN_PROGRESS | -
2020-07-21 20:35:48 UTC-0400 | ExecutionRole | CREATE_IN_PROGRESS | -
2020-07-21 20:35:48 UTC-0400 | AddonsStack | CREATE_IN_PROGRESS | -
2020-07-21 20:35:48 UTC-0400 | LogGroup | CREATE_IN_PROGRESS | -
2020-07-21 20:35:48 UTC-0400 | DiscoveryService | CREATE_IN_PROGRESS | -
2020-07-21 20:35:44 UTC-0400 | XXX-dev-backend | CREATE_IN_PROGRESS | User Initiated
2020-07-21 20:35:40 UTC-0400 | XXX-dev-backend | REVIEW_IN_PROGRESS | User Initiated

I'm currently using:

copilot --version
copilot version: v0.2.0

However, if it's relevant, I believe I deployed the app/env/svc with 0.1.0, then deleted the service with 0.1.0, then deployed it again (without deleting/recreating the app/env created by 0.1.0) with 0.2.0. It doesn't seem like that's an issue here, but figured I should mention it just in case.

If this is a bug, any advice on how to work around it for the short term would be greatly appreciated.

aredeployment guidance

Most helpful comment

Could a --force option be added that will setup the desired task to 0? - this would help users

All 5 comments

I'm so sorry about this - thank you so much for your detailed response, though.

Yea, this is unfortunate behavior in our ECS CloudFormation resource. If a service fails to stabilize (all containers come up and stay up) - the CloudFormation resource has to time out before (which is ~3hours).

I think there is some work we can do to mitigate this issue - but as a mitigation for you, for right now, you can set the desired count of your service to 0 - which forces it to stabilize (you can do this through the console or the AWS CLI).

I'll keep this issue open so we can brainstorm ways to help fix this. One idea is to, when you delete a service, set the desired count to 0 first, then delete. We'll keep brainstorming, and thank you for your patience!

Hey there - the workaround (setting desired count to 0) works for now. Thanks! My fault for not having a top level try/catch in my code to avoid the restart in the first place.

Not sure how to approach solving the deeper issue here. Maybe something like an optional --timeout parameter when doing the svc deploy? I can see how the rollback handling for that might get complicated very quickly though. :)

Could a --force option be added that will setup the desired task to 0? - this would help users

Heya! we create services with deployment circuit breakers enabled by default now (https://github.com/aws/copilot-cli/issues/1780) so that the service doesn't get stuck like this. I'll close the issue but please feel free to re-open if you'd like a different behavior!

hello @efekarakus, how can I configure this? I cannot find any documentation about it, and my deployment keep failing in loop.
I have also tried to put count: 0 in my manifest and I get ScalableTarget was not found. To make it work I have to manually set desired task count to 0 and update the service.

I am on v1.2.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mikelhamer picture mikelhamer  Â·  3Comments

tachyonics picture tachyonics  Â·  3Comments

BenediktMiller picture BenediktMiller  Â·  3Comments

jaybauson picture jaybauson  Â·  3Comments

efe-selcuk picture efe-selcuk  Â·  3Comments