Aws-cdk: Throttling: Rate exceeded

Created on 3 Jan 2020 · 19Comments · Source: aws/aws-cdk

When deploying multiple CDK stacks simultaneously, a throttling error occurs when trying to check the status of the stack.
The CloudFormation runs just fine, but CDK returns an error because the rate limit was exceeded.

We're using typescript.

The issue #1647 says that this error was resolved, but looking at the fix (#2053), it only increased the default number of retries, just making it less likely to happen.

Is there at least a way to override the base retryOptions in a CDK project? If there is, I can just override it in my side so the error does not occurs.

Even if there is, I think that this should be solved in the base project.
I don't think CDK should ever fail because of rate limiting while trying to check the stack status in CloudFormation, as it does not affect the end-result (the deployment of the stack).

Use Case

One of our applications have one CDK stack per customer (27 in total). When there's an important fix that needs to be sent to every customer, we run the cdk deploy command for each stack, simultaneously, via a Jenkins pipeline.

Error Log

00:03:13   ❌  MyStackName failed: Throttling: Rate exceeded
00:03:13  Rate exceeded

bug efformedium in-progress p1 packagtools

Source

danielfariati

👍25

Most helpful comment

We hit this issue regularly and it is getting really annoying 🤨

Last build 2 of 10 stacks failed with the "Throttling: Rate exceeded" error
...a retrigger of the CICD pipeline will most likely succeed!

gwriss on 13 Feb 2020

👍11

All 19 comments

Hi @danielfariati, thanks for reporting this. We will update this issue when there is movement.

SomayaB on 15 Jan 2020

👍3

We hit this issue regularly and it is getting really annoying 🤨

Last build 2 of 10 stacks failed with the "Throttling: Rate exceeded" error
...a retrigger of the CICD pipeline will most likely succeed!

gwriss on 13 Feb 2020

👍11

This is becoming a bigger and bigger issue for my team as well-- we are now forced to stagger deployments that could otherwise be in parallel. Would be a big quality of life improvement to have this fixed.

Silverwolf90 on 15 Apr 2020

@Silverwolf90 that does not sound ideal at all and we should be providing a better experience natively.

bumping this up to a p1 as it's affecting a lot of our users.

shivlaks on 16 Apr 2020

👍5

Just to add another voice to this This is affecting my team as well.

In particular we have several CDK apps which creates over 100 stacks each
If more then one of these apps are deploying at once time they fail with the rate exceeded message and just exit failing our CI build with no apparent retries.

phcyso on 17 Apr 2020

👍6

Found this error because we are also experiencing this. BTW there is no mention in CloudFormation or CloudWatchLogs. This looks like an API that is not integrated with the rest of AWS.

[2020-05-11T23:50:36.610Z]  ❌  dev-XXXX: Throttling: Rate exceeded
[2020-05-11T23:50:36.610Z] Rate exceeded

michft-v on 12 May 2020

picking this task up

shivlaks on 21 May 2020

🎉4

Hey @shivlaks one qq, is your task going to be to expose the retry delay parameter or to allow async deploys (I.e. call execute-change-set then immediately return)?

richardhboyd on 21 May 2020

@richardhboyd

I'm still exploring the options, but some of the things we are considering include:

make retries more configurable
allow opting out of polling altogether
handle rate throttled more gracefully (after exhausting retries and a better backoff, we might need to bail and just provide a link to the stack ARN and perhaps CloudFormation console link)

The downside of bailing on the stack monitoring is subsequent deploys will not be initiated by the CDK. i.e. if stack B had a dependency that required stack A to be deployed. We can't start that deployment until A has completed. That would not be possible if we stopped monitoring.

This would affect wildcard deployments and any scenario where we can't reason about the status of the stack without polling.

Handling rate limiting more gracefully is a precursor to attempting parallel deployments.
Let me know if you had any additional thoughts, and I'll work that in as I'm trying to prototype a proof of concept and test out the tradeoffs.

shivlaks on 22 May 2020

👍2

We know the Directed Acyclic Graph of stack dependencies, we could support bailing on terminal nodes in that graph because we don’t care about their status (in the context of it blocking future actions), though we wouldn’t be able to offer displaying stack outputs for bailed deployments.

richardhboyd on 22 May 2020

@richardhboyd - good point. it's another option to add to the list of things to consider.

I wonder if it would be useful feature to allow retrieving stack outputs as a command. i.e. poll all the specified stacks and write their outputs to a specified location

shivlaks on 22 May 2020

What about avoiding polling altogether while scaling to a large number of stacks in parallel - have a CDK service endpoint which CDK clients would subscribe to. Once a stack is finished deploying, the client will get (event-driven) notification and continue to the next stack.

alexpulver on 22 May 2020

👍2

We're still seeing this; any news? Here is the common stack trace from cdk 1.44 in case it helps; since it is a retryable error, why doesn't the API simply ... retry:
Error occurred while monitoring stack: Throttling: Rate exceeded at Request.extractError (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/protocol/query.js:50:29) at Request.callListeners (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/sequential_executor.js:106:20) at Request.emit (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/sequential_executor.js:78:10) at Request.emit (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/request.js:683:14) at Request.transition (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/request.js:22:10) at AcceptorStateMachine.runTo (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/state_machine.js:14:12) at /usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/state_machine.js:26:10 at Request.<anonymous> (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/request.js:38:9) at Request.<anonymous> (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/request.js:685:12) at Request.callListeners (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/sequential_executor.js:116:18) { message: 'Rate exceeded', code: 'Throttling', time: 2020-06-11T06:52:29.217Z, requestId: 'a74453e2-3df4-4a14-b09a-80c40e9ab1e5', statusCode: 400, retryable: true }

pontusvision on 11 Jun 2020

I don't know if this is related, but I've started seeing similar throttling errors in a single stack when trying to create an IAM role within a stack:

```
10/25 | 12:01:36 | CREATE_FAILED | AWS::IAM::Role | SingletonLambda3f2d0f3dc42f4a18ab66a6ebeb8fa36f/ServiceRole (SingletonLambda3f2d0f3dc42f4a18ab66a6ebeb8fa36fServiceRoleDAA100A1) Rate exceeded (Service: AmazonIdentityManagement; Status Code: 400; Error Code: Throttling; Request ID: f4dd183c-5fd3-4a26-a6ce-4e1f34924fa7)
new Role (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-kernel-P2u1xG\node_modules\@aws-cdk\aws-iam\lib\role.js:41:22)
_ new Function (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-kernel-P2u1xG\node_modules\@aws-cdk\aws-lambda\lib\function.js:61:35)
_ SingletonFunction.ensureLambda (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-kernel-P2u1xG\node_modules\@aws-cdk\aws-lambda\lib\singleton-lambda.js:58:16)
_ new SingletonFunction (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-kernel-P2u1xG\node_modules\@aws-cdk\aws-lambda\lib\singleton-lambda.js:19:36)
_ C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-java-runtime14499259584469591882\jsii-runtime.js:7877:49
_ Kernel._wrapSandboxCode (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-java-runtime14499259584469591882\jsii-runtime.js:8350:20)
_ Kernel._create (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-java-runtime14499259584469591882\jsii-runtime.js:7877:26)
_ Kernel.create (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-java-runtime14499259584469591882\jsii-runtime.js:7621:21)
_ KernelHost.processRequest (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-java-runtime14499259584469591882\jsii-runtime.js:7411:28)
_ KernelHost.run (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-java-runtime14499259584469591882\jsii-runtime.js:7349:14)
_ Immediate._onImmediate (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-java-runtime14499259584469591882\jsii-runtime.js:7352:37)
_ processImmediate (internal/timers.js:456:21)

```

john-tipper on 12 Jun 2020

👍3

We experienced the same:

  73/101 | 9:33:55 AM | CREATE_FAILED        | AWS::IAM::Role                              | distributor-api-v1/ServiceRole (distributorapiv1ServiceRole61262089) Rate exceeded (Service: AmazonIdentityManagement; Status Code: 400; Error Code: Throttling; Request ID: e673fb40-db19-4103-b8e6-8ab9dbfa9c64)
    new Role (/builds/c2w/api/backend/node_modules/@aws-cdk/aws-iam/lib/role.ts:319:18)
        \_ new Function (/builds/c2w
       ...

followben on 12 Jun 2020

I have been seeing similar today, i think there may be an AWS issue as i am not creating many roles and haven't seen this issue on the same stack + account previously.

adamnoakes on 12 Jun 2020

There was an IAM issue overnight but it appears to be resolved or is in the process of resolving now

richardhboyd on 12 Jun 2020

👍4

We love the CDK, but throttling from CloudFormation when using the CDK is still an issue for our team. We often have CDK builds/deploys running for different stages/stacks in our AWS Account at once and we run into this quite often.

We're currently on CDK version 1.54.0

salimhamed on 10 Sep 2020

👍3

We also have the same issue in our team, the #8711 did not fix the issue for us. I would like to see a simple option to control the poll interval for the cdk cli command in order to avoid exceeding the rate.