Serverless-application-model: AutoPublishAlias removes the previous stage alias

Created on 15 Feb 2018 · 14Comments · Source: aws/serverless-application-model

I have more stages in my pipeline, let's say dev and prod. I have one SAM template which I'm using for both of them with different parameters.

The pipeline flow is something like:
source -> build -> stage-dev -> integrations-tests -> stage-prod

Everything works fine, but after stage-prod is executed, there is only one (PROD) alias in my lambda.

Is there a way how to keep the alias from the previous stage?

My SAM template:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Lambda functions logs uploaded files from S3

Parameters:
    BucketName:
        Description: "S3 bucket name"
        Type: "String"
    DeployAlias:
        Description: "Lambda alias to deploy"
        Type: "String"

Resources:
  Bucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: !Ref BucketName

  UploadEventLogger:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: UploadEventLogger
      Handler: cz.net21.ttulka.aws.lambda.UploadEventLogger
      CodeUri: ./target/UploadEventLogger-1.0.0-SNAPSHOT.jar
      Runtime: java8
      AutoPublishAlias: !Ref DeployAlias
      MemorySize: 256
      Timeout: 30
      Policies:
      - Version: 2012-10-17
        Statement:
          - Resource: !Sub "arn:aws:s3:::${BucketName}/*"
            Action: s3:GetObject*
            Effect: Allow
          - Resource: arn:aws:logs:*
            Action:
            - logs:CreateLogGroup
            - logs:CreateLogStream
            - logs:PutLogEvents
            Effect: Allow
      Events:
        FileUpload:
          Type: S3
          Properties:
            Bucket: !Ref Bucket
            Events: s3:ObjectCreated:*
      DeploymentPreference:
             Type: AllAtOnce

And my staging configs:

{ "Parameters": {
    "BucketName": "ttulka-upload-bucket-dev",
    "DeployAlias": "DEV" } }

{ "Parameters": {
    "BucketName": "ttulka-upload-bucket-prod",
    "DeployAlias": "PROD" } }

arebestpractices typquestion

Source

ttulka

👍4

Most helpful comment

Sorry this is going to be a long response. I fairly passionate about things like this :)

Why is DEV and PROD any different from DEV and BETA? If they are the same, why are the separated? How come DEV and BETA are special but not PROD?

For a minute, just take a step back from the names of the stages. You have a pipeline with 3 stages (A, B, and C).
Scenario one:
A (Stack 1) -> B (Stack 2) -> C (Stack 3)
Each stage in this example is one CloudFormation stack. Between any stage, you can run integ test, performance, etc. Everything is consistent and isolated. You do not have to worry about clashing, overriding properties, affect any traffic in the next stage, if you follow least privilege the stage you are in is always that since you only change policies when a deployment happens to that stage (goes back to clashing and overriding), etc. When you want to add a new stage, it is almost trivial. Meaning, add Stage D with Stack 4 anywhere in the process and nothing has to change (maybe deployment configuration and what you will pass into the stack but it is minimal).

Scenario two:
A (Stack 1) -> B (Stack 1) -> C (Stack 2)
Between any stage, you can still run integ test, performance, etc but everything is now not consistent. You are forcing to stages into one stack, so how do you handle putting a new stage in? Does it share with Stack 2 or 1? You are adding extra complexity into your deployment process. When different stages share a stack you now have to handle this in your deployment system or template. Both are unnecessarily complex. In adding new resources to your template, you need to consider how they will be in Stack 1 and Stack 2 rather than the resource is just there and it doesn't matter because each stage behaves the same.

I am sure there are things I have not considered in the first Scenario that could lean someone to use Scenario two but my (to be honest strong) opinion is scenario one. I have used this model many times now and enjoy the benefits of it. I am very heavy into scoping resources, changes, etc and scenario one gives me that platform to do it safely.

For a moment, lets go back to your original setup. That is you have one Lambda Function with two aliases (one for dev and one for prod). One thing you may not have considered in this setup is how concurrency works. Since you are using the same Lambda Function, calling the dev alias will consume concurrency from the prod alias. So running integ tests, performance, or general testing can now lead to an availability issue with your customers or on the flip side, your tests get throttled instead of your customers and now an engineer needs to spend time and find out what happened. This is a poor experience for your customer as well as wasted dev effort that could have been avoided.

I am not suggesting to not use Aliases or Versions to do things by any means. You are still free to add these to the generated Lambda Function by SAM as you wish. You may want to do Weighted Aliases in some other process outside of SAM. Or use an Alias to point to the most sable version of a function and then have one for an unstable, where each Alias points to a different Version of the API. Or maybe your Lambda Function is used as a backend for an API and you want to use Versions to version the Lambda Function with an API Version. Others in the community maybe able to give better examples of what they have used Aliases and Versions for as well.

The point I was driving is the way you are using AutoPublishAlias to have multiple Aliases on a Function is not supported. This is meant to give you one Alias that always points at the more up to date Version being deployed. It is not $LATEST because $LATEST is mutable while Versions are not.

jfuss on 16 Feb 2018

👍8 🎉4

All 14 comments

@ttulka I have a couple clarifying questions.

Do you have one stack pre env (dev and prod)?
Is you Prod and Dev Lambda the same Lambda Function?

jfuss on 15 Feb 2018

@jfuss both answers are yes.
Everything is done in a single account. The ideal result would be to have an alias for each staging. In a happy flow would be aliases DEV and PROD referencing the same lambda version, let's say 2. If something goes wrong by the integration tests, the DEV alias references a new version, 2, but the PROD alias stays at 1.
Another benefit is, that external consumers, like end-to-end tests, can stick with an alias (this is not possible when the alias is removed by the next staging).
Thank you!

ttulka on 15 Feb 2018

@ttulka I do not recommend this at all and AutoPublish alias is not meant to be used for this case.

Reason why I strongly encourage teams not to do this:
You are changing your prod Lambda for a dev environment. Which means you are actually changing the same resource that is taking Production traffic. In the event something goes wrong, you now have an outage. There are other reasons you would want to avoid this but this is the biggest one to me.

To reduce this concern and limit blast radius for a given deployment/scoping your functions to each stage, you can use multiple stacks. In this model, you still maintain your one CloudFormation template but instead of dev and prod updating the same CloudFormation stack, dev would update the dev stack (with the correct parameters) and prod would update the prod stack (with the correct prod parameters). Since you are in one account, you will have to update the function name to be scoped to dev or prod as well. You can do this through, appending 'dev' or 'prod' to the function name, passing this value through another Parameter, or just letting CloudFormation do it's thing and have it control the names of your resources.

Does this model make sense to use for your case?

Happy to discuss this further or go deeper into what we recommend as best practices.

jfuss on 15 Feb 2018

@jfuss this is definitely a practice I have already considered. I will go even further and use a separate account for each stage. My question was more about "finding limits" of the SAM, considering options and pros/cons.

DEV and PROD is probably not the best example, but we can talk about DEV and BETA, both meant for testing. In this case can make sense for me to have really only one stack and to test different versions of the lambda with different test. Those tests can be even independent on the pipeline (like performance-tests running regularly in clock-based intervals). For such tests I would need a stable set of predefined aliases. Of course even this could (and probably better) be solved by implementing different stacks as you supposed...

But then one thing comes to my mind: in such a strategy has aliasing actually no meaning, because there is always only one correct version (the $LATEST) - so, the lambda is referenced from the consumers always like arn:aws:lambda:region:account-id:function:function-name-PROD:$LATEST (or even just arn:aws:lambda:region:account-id:function:function-name-PROD), makes no much sense to use arn:aws:lambda:region:account-id:function:function-name-PROD:PROD.

ttulka on 15 Feb 2018

Sorry this is going to be a long response. I fairly passionate about things like this :)

Why is DEV and PROD any different from DEV and BETA? If they are the same, why are the separated? How come DEV and BETA are special but not PROD?

jfuss on 16 Feb 2018

👍8 🎉4

@jfuss Thank you very much, I think all my questions are answered now. This is not an easy issue and I guess your answer will be very helpful for other users as well!

ttulka on 16 Feb 2018

👍2

@jfuss I got caught up on this thread. Is there any purpose of AutoPublishAlias beyond Weighted Aliases or other CodeDeploy features?

My team will transition from having a One Stack (To Rule Them All) approach to a stack/environment.
Which is actually 3 stacks for each environment because we're waiting on Regional Endpoints and BasePath mapping #248 . We currently deploy The Lambda/APIG stack, use the SDK to set regional endpoint, deploy the regional ApiG CustomDomain Mapping Stack (one per region), then deploy a global Route53 Mapping Stack (latency routing for multi-regional APIs).

Shockolate on 27 Feb 2018

@Shockolate AutoPublishAlias is really good for cases that you want a single Alias to always point to the latest version being deployed. It is not restricted to Weighted Aliases or CodeDeploy, but compliments those features.

Multiple stacks for a service is not a bad practice to follow, I actually encourage it. In your case, some of these things will go away shortly. :)

jfuss on 27 Feb 2018

I see this issue and that there is a solution, but I'd just like to weigh in on this. I came across this bug looking up the exact same issue. I had a set of Lamdas that I wanted to create a 'test' snapshot to expose to my customer. Reading the docs I thought the AutoPublishAlias would work for me. I had the 'test' alias that I wanted to be locked to a version while development continued.

The way I thought it would work is that I created a template file with a 'test' AutoPublishAlias and deployed it, then changed the AutoPublishAlias to 'dev' for continuing work. Deploying with the same stack to dev would then remove the 'test' alias and replace it with dev. My expectation was that it would create a new alias to the new version and leave the old alias pointing to the old version.

I understand the workaround to create a separate stack for test vs dev, but based on the current documentation it was not clear that this would be needed. I only came to realize this when I found this git issue and the suggested way forward.

ldm314 on 15 Aug 2018

👍5

Just thought I'd add another use case where multiple aliases would be useful other than prod/dev. I have a complex serverless application with many microservices each in their own stack and each containing a handful of lambda functions. All stacks and functions in a particular account are of the same development stage (production).

The goal of this architecture is very low development/deployment cycle time. This is achieved through loose coupling between microservices so any microservice can be deployed at any time without impacting any other part of the system. Each microservice owns one specific unit of domain data or logic. This means that microservices need to coordinate with each other for complex tasks and lambda contains the logic that other microservices expect to behave a certain way. Essentially these lambda functions are behaving as an interface regardless of the inter-microservice communication mechanism.

Although in an ideal world that interface would never change, in messy reality, of course there would occasionally need to be changes to a lambda function which may be incompatible with other parts of the system. When this happens it leaves two deployment options: 1) push - a big bang where the breaking update is coordinated with all other functions that depend on it or 2) pull - a deployment strategy where a new version of one function can be deployed while all existing consumers continue to use the previous version until they can be adapted to the new version and deployed independently.

The complexity, overhead, and cycle time of option 1 makes it unattractive and more prone to failure. For option 2 we have two possible strategies: 1) a new stack, or 2) lambda versions and aliases. A new stack is possible but less attractive because of the complexity it adds to operations: monitoring, logging, deployment, historical performance tracking, etc. This is non-trivial when you’re talking n-stacks per microservice in a system with hundreds of microservices.

To make lambda aliases/versioning work, when one microservice (A) initiates some task that depends on another microservice (B), it specifies the alias it is targeting. When releasing a new version of B, if it is a change that is compatible with the rest of the system, the alias is updated to target the newer version of the lambda function. AutoPublishAlias works perfectly for this scenario. However, in the case of a breaking change, a new alias needs to be added for the new version and the old alias needs to be left in place so the rest of the system can continue to operate and be updated independently. Without that capability you would be forced into either big-bang deployments of dependencies or stack-per-breaking change.

Leaving the alias would essentially give us the same capabilities as semantic versioning for lambda.

jimcatts on 4 Jan 2019

👍3

Two tricks can be used to overcome these issues and make aliases behave how you think they should. This may not work for all cases - but if you are using Python in a git repo it works fine and you can use sam package/deploy.
Firstly, use "git archive HEAD myFunction.py -o myFunction.py.zip" to create a zip of your lambda function (are you using Layers for your libraries?) - it stores all timestamps as "0" and so generates the same md5sum reliably. You must append a .zip to the CodeUri of course. This works for your Layer too.
Then in addition to AWS::Serverless::Function in your SAM template, explicitly add a AWS::Lambda::Version and a AWS::Lambda::Alias (do not use the autopublish parameter) with "DeletionPolicy: Retain"- you must add a Condition to these based on an input parameter (eg, Condition: addAlias).
Run sam package as normal. Hash based filenames will be generated from your git archive zip so any file alterations will update the template and no changes means no update for any given lambda.
Then you need to run sam deploy TWICE - first with a parameter that causes addAlias to be true, then run it again so that addAlias is false - the alias will be removed from the stack, but not deleted.
On your next deployment, when you add the alias, it will be added to the stack as a new alias, and will not update or delete the old alias, which is not part of the stack.
This is necessary due to how DeletionPolicy is overriden for an UPDATE to the stack resource, but is honoured if there is a DELETE for the stack resource.
Caveat: you will need to delete the aliases outside of SAM/cloudformation before updating them because they are no longer part of the stack - I achieve this using aws cli in a deployment wrapper shell script (which also calls sam package/deploy).

Josh-Preston on 6 Jun 2019

@jfuss it sounds like you are saying best practices are not to use API Gateway stages and stage variables with Lambda function versions and aliases as described here:
https://aws.amazon.com/blogs/compute/using-api-gateway-stage-variables-to-manage-lambda-functions/

Instead, rather each environment should each be created by its own CloudFormation stack and therefore have its own API Gateway and Lambda functions. This sounds simple and easy to me and is what I have done in the past but I was just starting to look into using stages and Lambda versions and aliases.
If this is the case, I would also assume each environment's API Gateway would really only use / need a single stage and therefore the API Gateway stage name is unimportant.

Can you please confirm if I am interpreting your above responses correctly?

Thanks!

paulsson on 17 Oct 2019

@jfuss do you still recommend having different stacks (one lambda function for dev, stage, prod, etc)? If so, how do you handle situations like blue/green deployments that would normally be handled via traffic shifting?

I find it odd that aws would design lambda versions/aliases in this manner but not allow for it via automated sam-cli deployments.

RichDavis1 on 13 Oct 2020

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-updatereplacepolicy.html
UpdateReplacePolicy: Retain attribute on a cloudformation resource will help you retain previous alias, but for that you need to use AWS::Lambda::Alias resource explicitly instead of using AutoPublishAlias attribute of AWS::Serverless::Function