Aws-cdk: CLI: reduce deployment timeout

Created on 8 Jan 2019 · 13Comments · Source: aws/aws-cdk

From what I observed, if the ecs fargate service(tasks) fail to start it will re-try and I never saw it actually timeout.

Is there anything we can do from CDK to timeout a deployment?

efformedium feature-request needs-discussion p2 packagtools

Source

mouyigang

👍9

All 13 comments

It will timeout eventually, but it will take a while (an hour or so).

That's a good question if we can set a timeout. I actually don't know the answer to that.

cc @SoManyHs ?

rix0rrr on 8 Jan 2019

I ran into this today as well. My fix was to kill CDK which was hung and go into the aws console to "Cancel update stack". From there the stack was rolled back to it's original state.

Seems like in these cases you'd want to fail fast(er) with a timeout override and do the above steps automatically to get back to a known state.

digitalsanctum on 9 Jan 2019

👍1

That is a good point. You can use the console to interrupt the deployment. You don't even have to kill CDK to do it either, CDK will show the rollback starting and exit with an error appropriately at the end if you do.

rix0rrr on 9 Jan 2019

Another issue I found related to this was: after cancel the deploy(and delete the stack in cloudformation) it didn't remove the log groups(created by the stack with specified static names) which will fail in the next cdk deploy.

I had to manually remove those log groups.

mouyigang on 9 Jan 2019

👍1

That is actually on purpose, to always retain logs, they might be important.

There is a parameter to control that, but I suppose we can improve the default on whether a static name was specified or not

rix0rrr on 9 Jan 2019

Another issue I found related to this was: after cancel the deploy(and delete the stack in cloudformation) it didn't remove the log groups(created by the stack with specified static names) which will fail in the next cdk deploy.

I had to manually remove those log groups.

Yes, I ran into this as well. IMO, the defaults should be such that destroy -> deploy should be idempotent.

digitalsanctum on 9 Jan 2019

That is only true if you haven't used your stack in the mean time, I'd think.

If you've accepted money from someone you're now required by law to keep logs on that. Also because of security reasons, we don't want to make it too easy to destroy those logs. Or customer data, or whatever state you've accumulated in the mean time.

In any case, if it's bothering you, you can always pass retainLogGroup: false

rix0rrr on 9 Jan 2019

I don't think there's any way to control the per-resource timeout in CloudFormation. We can control the stack-wide timeout instead though. Should be a toolkit feature.

rix0rrr on 4 Jun 2019

@rix0rrr Note that stack-wide timeouts seem to be limited to creation rather than update :(

mipearson on 25 Aug 2019

👍1

Looks like there is already creation timeout of nested stack. Is there any way to specify timeout of updating nested stack?

zxkane on 18 Mar 2020

We can control the stack-wide timeout instead though. Should be a toolkit feature.

It will be an awesome feature ☝️ ❤️ , and hopefully it will help avoid having to set up a complex manual check for ECS deployment crush loop like this https://aws.amazon.com/blogs/compute/automating-rollback-of-failed-amazon-ecs-deployments/

On the other hand, I can confirm that currently, ECS EC2 service deployment takes 3 hours to decide that it failed, when the error comes from application layer within docker container.

 3/6 | 6:33:37 AM | UPDATE_IN_PROGRESS   | AWS::ECS::Service        | xxx
 ...
 5/6 | 9:35:07 AM | UPDATE_FAILED        | AWS::ECS::Service        | xxx  Service arn:aws:ecs:xxx:xxx:service/xxx did not stabilize.

ivawzh on 4 May 2020

The current workaround is updating the service's desired count to 0.

acro5piano on 26 Aug 2020

What if the CDK CLI trapped a WINCH process signal to trigger the sending of a cloudformation:CancelUpdateStack? If we did this, then the user could run CDK like this: timeout -sWINCH 15m cdk deploy ...