Serverless-application-model: Policy create issues with Dead Letter Queue and XRay Tracing

Created on 30 Nov 2017  路  7Comments  路  Source: aws/serverless-application-model

I tried adding my function Dead Letter Queue(DLQ) SQS queue and enabling XRay tracing. The stack executer role also had the permissions enabled for the following resources sqs:* and xray:*.

The function has following inline policies for XRay:

- Effect: "Allow" # xray permissions (required)
  Action:
    - "xray:PutTraceSegments"
    - "xray:PutTelemetryRecords"
  Resource:
    - "*"

For SQS:

- Effect: Allow
  Action:
    - sqs:SendMessage
  Resource: "*"

When the stack's changeset is applied it would fail because:

12:47:11 UTC+0200 | UPDATE_ROLLBACK_IN_PROGRESS | AWS::CloudFormation::Stack | notifications-lambda | The following resource(s) failed to update: [GeneratorFunction].
-- | -- | -- | -- | --
聽 | 12:47:08 UTC+0200 | UPDATE_FAILED | AWS::Lambda::Function | GeneratorFunction | The provided execution role does not have permissions to call PutTraceSegments on XRAY

The current workaround is to deploy the function first time without the following options:

      DeadLetterQueue:
        Type: SQS
        TargetArn: !GetAtt NotificationDeadLetterQueue.Arn
      Tracing: Active

After that you can enable the options again, deploy and the stack apply would succeed. Seems like the serverless guys have same issue https://github.com/serverless/serverless/pull/3742.

Most helpful comment

Ok, so at least they are aware. I should be able to find the issue for it and add some more details to it. I am not sure exactly what I will be able to share out of that (if anything), but I will see what I can do. :)

I agree they don't work well for these types of tools. We are trying to stream line some of this configuration for customers but needing to split up into stacks or to deploy twice is not ideal at all. That leads to the tools being harder to use. We totally hear ya and I am completely with you on that.

All 7 comments

@riston This issue is out of the control of SAM unfortunately. If you read through the issue you linked you will see:
"From AWS support:
"...Policy modifications typically do take a little time to replicate. Unfortunately, this is known to occasionally cause issues with CloudFormation updates such as this one, when a subsequent change is immediately dependent on the IAM resource."

"I Searched our issues tracker and i can see that our CloudFormation Team are working on this issue by adding some delay to the creation and updates of IAM::Policy resources, so that updates to Roles and Users can propagate some before other resources attempt to use those permissions.

So I +1 your case to the issue and we may see it as a new feature of the upcoming releases of CloudFormation. "" - @caevyn

They commented later on that CloudFormation said they have fixed it but looks like they are still facing this issue. The only work around is what you said until this issue is resolved completely by CloudFormation.

Closing as no issue.

It would be great if you guys could add your voice to the CloudFormation issue, it might help with getting it prioritised. Essentially roles take a while to be available, and CloudFormation continues before they are ready to use. I did get some feedback that they had fixed it for policies, but I still encounter the same problem. I've run into this outside of serverless as well, e.g. custom config rules or anything involving lambda needing a recently provisioned role.

@caevyn I was planning on trying to dig up what I can internally and see if I can't bring it to their attention more, especially if it was communicated it was fixed (from what I read on the other issue). The reality is that me commenting on an issue internally isn't enough. The more the community brings this up and pushes the pain point, the better shot we have to get this done (and done sooner).

I have let them know it didn't solve our issue, so they are aware of that. My communication with them is via a small not-for-profit's AWS support, so if it is coming from these high profile OSS projects it might get more visibility. There are work arounds, like splitting up stacks etc, but they don't work well with tools like serverless and SAM.

Ok, so at least they are aware. I should be able to find the issue for it and add some more details to it. I am not sure exactly what I will be able to share out of that (if anything), but I will see what I can do. :)

I agree they don't work well for these types of tools. We are trying to stream line some of this configuration for customers but needing to split up into stacks or to deploy twice is not ideal at all. That leads to the tools being harder to use. We totally hear ya and I am completely with you on that.

I was thinking about this more. If there was a way we could for a arbitrary wait (effectively sleep) between resources, we may be able to stop the bleeding until CloudFormation has the root caused fixed. I looked into WaitConditions, but that only waits for a certain number of success signals and don't see a way we could leverage that to help solve this. I looked at Custom Resources as well. I still don't understand them completely but at first glance that doesn't look like it helps much either. :(

Open to other ideas that you guys have. Going to leave this as closed for now, since I don't think we have any options to get around it at the moment but happy to reopen if we can figure out a path forward (other than bug CloudFormation).

Hi, I got some additional info from AWS support regarding this issue. See https://github.com/serverless/serverless/pull/3742#issuecomment-362946277
Initial testing with the serverless framework seems promising, although not perfect.

Was this page helpful?
0 / 5 - 0 ratings