Aws-cdk: S3 error: Access Denied on cdk deploy

Created on 25 Feb 2020  ·  11Comments  ·  Source: aws/aws-cdk

Around 10% of cdk deploy commands fail with S3 error: Access Denied.

Reproduction Steps

cdk "deploy" "-e" "true" "--require-approval" "never" "--no-ci" "--output" "1432235223.cdk.out" "--no-staging" "MyStack"

Error Log

978 MyStack: creating CloudFormation changeset...
979  ❌  MyStack failed: ValidationError: S3 error: Access Denied
980 For more information check http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html
981 S3 error: Access Denied
982 For more information check http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html

Environment

  • CLI Version :1.25.0
  • Framework Version:1.25.0
  • OS :Linux Ubuntu 18.04
  • Language :Typescript

Other

The stack itself is irrelevant. This can even happen with an almost empty stack.


This is :bug: Bug Report

@aws-cdaws-s3 bug p1

Most helpful comment

I intend on releasing a version of CDK with the fix in it today, so y'all can collecting evidence on whether that was it or not... 🤞🏻

All 11 comments

@AlexCheema

You said the stack itself is irrelevant, nonetheless, can you please provide some minimal repro stack this happens to you?

Closing for now since there hasn't been a response in a while. Feel free to reopen.

We've had this twice now. Seems like just a random blip. Must be a race condition of some kind..... Stack works the second time

I get this about 1 time in 20. It's really embarrassing when running a demo in front of customers. Please take it seriously.

We nevertheless are going to require a minimal reproduction before we can work on it (we have never faced this particular problem ourselves).

What is the smallest possible CDK app you can come up with that triggers the issue?
Does your environment have anything particular about it? In particular - how are credentials provided?

What I can tell you:

  • I only use the python flavor of the CDK.
  • This has been happening for the past year (as long as I've been using the CDK), across many versions/updates and across different deployments. As I write this I'm on 1.34.1 (build 7b21aa0) but there's been no improvement over the past year in any release. I'm using Python 3.7.4.
  • I've seen it in tiny projects and large ones; doesn't seem to matter.
  • It's not related to the AWS account. I've seen it on accounts that are years old and ones created in the last few days. I've seen the same deployment to two accounts fail for one and succeed for the other.
  • It's very random...almost certainly an infrequent race condition. Appears unrelated to the size/complexity of the actual deployment.
  • I only write serverless apps, so my cdk stacks typically contain some combo of {API Gateway, Lambda, AppSync, DynamoDB, Step Functions, SQS}.

I can send you a simple registration service app if that helps, though it's nothing more than AppSync with a Lambda resolver that calls DDB. You would have to create a synthetic harness that makes innocuous changes to the Lambda and attempts to deploy them and then run that in a loop until the bug hits - just doing an "empty" deploy won't tickle the bug.

Credentials: In my case, creds come from ~/.aws/credentials. I've seen the bug on both the default credentials and with explicit --profile args provided; doesn't seem to matter. I've seen it happen on both the root account creds and IAM role creds. Different profiles in my credentials file relate to different accounts at the moment, but I've seen this bug in simple, single account cases, in multi-role-within-one-account cases, and in cross-account cases.

HTH.

@serverlessunicorn thanks for these details...

To confirm: no fancy things with respects to credentials, like MFA or anything of the sort?

Additionally - is there any particular region you're experiencing this with?

It might also be useful to get a --verbose trace of the failing cdk --deploy (I understand it might be tough to actually get this though, but if possible, it might prove invaluable)

No fancy MFA :). us-east-1 most of the time.

On Wed, Apr 22, 2020 at 10:04 AM Romain Marcadier notifications@github.com
wrote:

@serverlessunicorn https://github.com/serverlessunicorn thanks for
these details...

To confirm: no fancy things with respects to credentials, like MFA or
anything of the sort?

Additionally - is there any particular region you're experiencing this
with?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/aws/aws-cdk/issues/6430#issuecomment-617907274, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/ANCTK355K2EIHN54LHBUWX3RN4PSPANCNFSM4K22RMDQ
.

@NetaNir might actually have identified an S3 negative cache entry situation which could cause the symptoms you're facing.

We're checking if an asset exists in S3 before uploading (to avoid re-uploading possibly large assets). This causes HEAD/GET-after-PUT to become eventually consistent because it creates a negative cache entry...

We're looking to switch to using a LIST-based check, which would not create the negative cache entry, and hopefully stop this issue altogether.

I intend on releasing a version of CDK with the fix in it today, so y'all can collecting evidence on whether that was it or not... 🤞🏻

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mirazmamun picture mirazmamun  ·  3Comments

eladb picture eladb  ·  3Comments

abelmokadem picture abelmokadem  ·  3Comments

Kent1 picture Kent1  ·  3Comments

peterdeme picture peterdeme  ·  3Comments