Aws-cdk: Throttling: Rate exceeded

Created on 3 Jan 2020  ·  19Comments  ·  Source: aws/aws-cdk

When deploying multiple CDK stacks simultaneously, a throttling error occurs when trying to check the status of the stack.
The CloudFormation runs just fine, but CDK returns an error because the rate limit was exceeded.

We're using typescript.

The issue #1647 says that this error was resolved, but looking at the fix (#2053), it only increased the default number of retries, just making it less likely to happen.

Is there at least a way to override the base retryOptions in a CDK project? If there is, I can just override it in my side so the error does not occurs.

Even if there is, I think that this should be solved in the base project.
I don't think CDK should ever fail because of rate limiting while trying to check the stack status in CloudFormation, as it does not affect the end-result (the deployment of the stack).

Use Case

One of our applications have one CDK stack per customer (27 in total). When there's an important fix that needs to be sent to every customer, we run the cdk deploy command for each stack, simultaneously, via a Jenkins pipeline.

Error Log

00:03:13   ❌  MyStackName failed: Throttling: Rate exceeded
00:03:13  Rate exceeded
bug efformedium in-progress p1 packagtools

Most helpful comment

We hit this issue regularly and it is getting really annoying 🤨

Last build 2 of 10 stacks failed with the "Throttling: Rate exceeded" error
...a retrigger of the CICD pipeline will most likely succeed!

All 19 comments

Hi @danielfariati, thanks for reporting this. We will update this issue when there is movement.

We hit this issue regularly and it is getting really annoying 🤨

Last build 2 of 10 stacks failed with the "Throttling: Rate exceeded" error
...a retrigger of the CICD pipeline will most likely succeed!

This is becoming a bigger and bigger issue for my team as well-- we are now forced to stagger deployments that could otherwise be in parallel. Would be a big quality of life improvement to have this fixed.

@Silverwolf90 that does not sound ideal at all and we should be providing a better experience natively.

bumping this up to a p1 as it's affecting a lot of our users.

Just to add another voice to this This is affecting my team as well.

In particular we have several CDK apps which creates over 100 stacks each
If more then one of these apps are deploying at once time they fail with the rate exceeded message and just exit failing our CI build with no apparent retries.

Found this error because we are also experiencing this. BTW there is no mention in CloudFormation or CloudWatchLogs. This looks like an API that is not integrated with the rest of AWS.

[2020-05-11T23:50:36.610Z]  ❌  dev-XXXX: Throttling: Rate exceeded
[2020-05-11T23:50:36.610Z] Rate exceeded

picking this task up

Hey @shivlaks one qq, is your task going to be to expose the retry delay parameter or to allow async deploys (I.e. call execute-change-set then immediately return)?

@richardhboyd

I'm still exploring the options, but some of the things we are considering include:

  • make retries more configurable
  • allow opting out of polling altogether
  • handle rate throttled more gracefully (after exhausting retries and a better backoff, we might need to bail and just provide a link to the stack ARN and perhaps CloudFormation console link)

The downside of bailing on the stack monitoring is subsequent deploys will not be initiated by the CDK. i.e. if stack B had a dependency that required stack A to be deployed. We can't start that deployment until A has completed. That would not be possible if we stopped monitoring.

This would affect wildcard deployments and any scenario where we can't reason about the status of the stack without polling.

Handling rate limiting more gracefully is a precursor to attempting parallel deployments.
Let me know if you had any additional thoughts, and I'll work that in as I'm trying to prototype a proof of concept and test out the tradeoffs.

We know the Directed Acyclic Graph of stack dependencies, we could support bailing on terminal nodes in that graph because we don’t care about their status (in the context of it blocking future actions), though we wouldn’t be able to offer displaying stack outputs for bailed deployments.

@richardhboyd - good point. it's another option to add to the list of things to consider.

I wonder if it would be useful feature to allow retrieving stack outputs as a command. i.e. poll all the specified stacks and write their outputs to a specified location

What about avoiding polling altogether while scaling to a large number of stacks in parallel - have a CDK service endpoint which CDK clients would subscribe to. Once a stack is finished deploying, the client will get (event-driven) notification and continue to the next stack.

We're still seeing this; any news? Here is the common stack trace from cdk 1.44 in case it helps; since it is a retryable error, why doesn't the API simply ... retry:
Error occurred while monitoring stack: Throttling: Rate exceeded at Request.extractError (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/protocol/query.js:50:29) at Request.callListeners (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/sequential_executor.js:106:20) at Request.emit (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/sequential_executor.js:78:10) at Request.emit (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/request.js:683:14) at Request.transition (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/request.js:22:10) at AcceptorStateMachine.runTo (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/state_machine.js:14:12) at /usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/state_machine.js:26:10 at Request.<anonymous> (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/request.js:38:9) at Request.<anonymous> (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/request.js:685:12) at Request.callListeners (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/sequential_executor.js:116:18) { message: 'Rate exceeded', code: 'Throttling', time: 2020-06-11T06:52:29.217Z, requestId: 'a74453e2-3df4-4a14-b09a-80c40e9ab1e5', statusCode: 400, retryable: true }

I don't know if this is related, but I've started seeing similar throttling errors in a single stack when trying to create an IAM role within a stack:

```
10/25 | 12:01:36 | CREATE_FAILED | AWS::IAM::Role | SingletonLambda3f2d0f3dc42f4a18ab66a6ebeb8fa36f/ServiceRole (SingletonLambda3f2d0f3dc42f4a18ab66a6ebeb8fa36fServiceRoleDAA100A1) Rate exceeded (Service: AmazonIdentityManagement; Status Code: 400; Error Code: Throttling; Request ID: f4dd183c-5fd3-4a26-a6ce-4e1f34924fa7)
new Role (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-kernel-P2u1xG\node_modules\@aws-cdk\aws-iam\lib\role.js:41:22)
_ new Function (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-kernel-P2u1xG\node_modules\@aws-cdk\aws-lambda\lib\function.js:61:35)
_ SingletonFunction.ensureLambda (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-kernel-P2u1xG\node_modules\@aws-cdk\aws-lambda\lib\singleton-lambda.js:58:16)
_ new SingletonFunction (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-kernel-P2u1xG\node_modules\@aws-cdk\aws-lambda\lib\singleton-lambda.js:19:36)
_ C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-java-runtime14499259584469591882\jsii-runtime.js:7877:49
_ Kernel._wrapSandboxCode (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-java-runtime14499259584469591882\jsii-runtime.js:8350:20)
_ Kernel._create (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-java-runtime14499259584469591882\jsii-runtime.js:7877:26)
_ Kernel.create (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-java-runtime14499259584469591882\jsii-runtime.js:7621:21)
_ KernelHost.processRequest (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-java-runtime14499259584469591882\jsii-runtime.js:7411:28)
_ KernelHost.run (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-java-runtime14499259584469591882\jsii-runtime.js:7349:14)
_ Immediate._onImmediate (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-java-runtime14499259584469591882\jsii-runtime.js:7352:37)
_ processImmediate (internal/timers.js:456:21)

```

We experienced the same:

  73/101 | 9:33:55 AM | CREATE_FAILED        | AWS::IAM::Role                              | distributor-api-v1/ServiceRole (distributorapiv1ServiceRole61262089) Rate exceeded (Service: AmazonIdentityManagement; Status Code: 400; Error Code: Throttling; Request ID: e673fb40-db19-4103-b8e6-8ab9dbfa9c64)
    new Role (/builds/c2w/api/backend/node_modules/@aws-cdk/aws-iam/lib/role.ts:319:18)
        \_ new Function (/builds/c2w
       ...

I have been seeing similar today, i think there may be an AWS issue as i am not creating many roles and haven't seen this issue on the same stack + account previously.

There was an IAM issue overnight but it appears to be resolved or is in the process of resolving now

We love the CDK, but throttling from CloudFormation when using the CDK is still an issue for our team. We often have CDK builds/deploys running for different stages/stacks in our AWS Account at once and we run into this quite often.

We're currently on CDK version 1.54.0

image

We also have the same issue in our team, the #8711 did not fix the issue for us. I would like to see a simple option to control the poll interval for the cdk cli command in order to avoid exceeding the rate.

Was this page helpful?
0 / 5 - 0 ratings