Deployment of one of our CDK projects randomly fails with rate exceeded errors. These errors occur when CDK creates LogRetention resources related to the Lambda functions we have.
The issue occurs when deploying multiple CDK stacks that contain quite some Lamba's with log retention resources.
I created a small test project to reproduce the issue: https://github.com/jaapvanblaaderen/log-retention-rate-limit With this simple setup, I wasn't able to reproduce the issue when deploying a few stacks sequentially (which is what we use in our actual project). The issue can however be observed when deploying the stacks in parallel.
128/101 | 9:04:29 AM | CREATE_IN_PROGRESS | Custom::LogRetention | hello_5/LogRetention (hello5LogRetention5D258C6A) Resource creation Initiated
129/101 | 9:04:29 AM | CREATE_FAILED | Custom::LogRetention | hello_5/LogRetention (hello5LogRetention5D258C6A) Failed to create resource. Rate exceeded
new LogRetention (/repos/logretention-rate-limit/node_modules/@aws-cdk/aws-lambda/lib/log-retention.ts:67:22)
\_ new Function (/repos/logretention-rate-limit/node_modules/@aws-cdk/aws-lambda/lib/function.ts:537:28)
\_ new LogRetentionRateLimitStack (/repos/logretention-rate-limit/lib/log-retention-rate-limit-stack.ts:17:18)
\_ Object.<anonymous> (/repos/logretention-rate-limit/bin/log-retention-rate-limit.ts:8:3)
\_ Module._compile (internal/modules/cjs/loader.js:1151:30)
\_ Module.m._compile (/repos/logretention-rate-limit/node_modules/ts-node/src/index.ts:858:23)
\_ Module._extensions..js (internal/modules/cjs/loader.js:1171:10)
\_ Object.require.extensions.<computed> [as .ts] (/repos/logretention-rate-limit/node_modules/ts-node/src/index.ts:861:12)
\_ Module.load (internal/modules/cjs/loader.js:1000:32)
\_ Function.Module._load (internal/modules/cjs/loader.js:899:14)
\_ Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:71:12)
\_ main (/repos/logretention-rate-limit/node_modules/ts-node/src/bin.ts:227:14)
\_ Object.<anonymous> (/repos/logretention-rate-limit/node_modules/ts-node/src/bin.ts:513:3)
\_ Module._compile (internal/modules/cjs/loader.js:1151:30)
\_ Object.Module._extensions..js (internal/modules/cjs/loader.js:1171:10)
\_ Module.load (internal/modules/cjs/loader.js:1000:32)
\_ Function.Module._load (internal/modules/cjs/loader.js:899:14)
\_ Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:71:12)
\_ /usr/local/lib/node_modules/npm/node_modules/libnpx/index.js:268:14
It fails when creating CloudWatch log groups. The issue could be fixed by relaxing the retry options for the CloudWatch SDK instance, I tested this locally by changing it to:
const cloudwatchlogs = new AWS.CloudWatchLogs({ apiVersion: '2014-03-28', maxRetries: 6, retryDelayOptions: { base: 300 }});
Another solution might be increasing a service limit. Unfortunately, I have no clue which rate limit is being hit here. It's not clear from the documentation:
This is :bug: Bug Report
Added PR with a change that fixes the issue. Code was inspired by a similar fix in: https://github.com/aws/aws-cdk/pull/2053/files
Not sure if this is the right approach though. Can imagine this can be better managed in one central location and/or be configurable.
Re-classified this as a feature request.
This is my config for logRetentionRetryOptions.

I'm still testing but it seems to fix the 'Rate exceeded' error.

@jaapvanblaaderen thank you for developing this feature!
This is my config for logRetentionRetryOptions.
I'm still testing but it seems to fix the 'Rate exceeded' error.
@jaapvanblaaderen thank you for developing this feature!
Mine seems fixed too. Thanks @georstoy !
Most helpful comment
This is my config for logRetentionRetryOptions.

I'm still testing but it seems to fix the 'Rate exceeded' error.

@jaapvanblaaderen thank you for developing this feature!