Aws-cli: cloudformation package is always generating a new zip

Created on 6 Feb 2018  ·  32Comments  ·  Source: aws/aws-cli

I have a Golang lambda with the following template:

AWSTemplateFormatVersion : '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Billing Api Create Application

Resources:
  BillingCreate:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: billing-create
      Handler: main
      CodeUri: ./build
      Runtime: go1.x
      Policies: AWSLambdaDynamoDBExecutionRole

Even when the code didn't change (Go build generates the same compiled code), aws cloudformation package command generates a new zip file.

cloudformation packagdeploy customization feature-request

Most helpful comment

I have found another workaround (may be easier for those who have many lambda functions in one pipeline) to this issue.

Key to this workaround was to find out what contributes to different md5 of a zip even if contents of files within zip have not changed. I found 'Modified Timestamp' of files to be culprit. So idea is; if we can have consistent 'Modified Timestamp' on all files just before 'aws cloudformation package' or 'sam package' command is run, produced zip files will have consistent md5 across build executions.

find . -exec touch -m --date="2020-01-30" {} \; # date does not matter as long as it  is never changed.
aws cloudformation package --template-file template.yml --s3-bucket <bucket> --output-template-file package-template.yml

Above trick has worked for me so far.

All 32 comments

Could you elaborate a little more on why it is an issue that a new zip is formed for every package command? It may be difficult to avoid making a new zip every time because the package command does an md5 of the zip it creates to see if it needs to reupload the code to s3 by comparing the two md5's.

Imagine a scenario where you have a cloudformation file with more than one Lambda declared.
For now, let's call it FN1 and FN2, one is at fn1.go file and the second at fn2.go.

I build both of them, which generates two binaries fn1 and fn2.

I run cloudformation package, and it generates 2 zip files and send them to S3.

One week later, I change the fn1 function, but not the fn2. My CI builds both of them, but only the first has a different MD5 (the second has the same MD5 as before).

The problem here is the package command will generate a new zip for the second one too, even if the file did not changed, which causes all my cloudformation declared functions to be deployed.

I'm having the same issue with Python code. Every time I run aws cloudformation package it creates/uploads a new zip file and changes the CloudFormation template

@rizidoro Can you download the zip files from S3, unzip them locally and diff them? Turns out I had one file which was actually different, because it included a "generated at" date which was being updated everytime I built the CloudFormation script

You also need to check for timestamp differences amongst the files

the package command does an md5 of the zip it creates to see if it needs to reupload the code to s3 by comparing the two

That is exactly the issue. If timestamps on files are different in the zip file, even if they are the same contents, the md5 is different. In the case of scripting languages, this is probably not an issue. However with Go, each time you run go build, a new binary is created and thus a new timestamp.

This is especially troublesome if you are trying to use CodePipeline and CodeBuild (see https://docs.aws.amazon.com/lambda/latest/dg/automating-deployment.html) because no matter what, package is always going to create a zip with a different md5.

Perhaps package should md5 each file in the zip instead of the zip as a whole. As it is now, it's not an accurate comparison.

@jmassara exactly the problem I'm facing right now. The final binary go build generates change the timestamp.

@rizidoro Yes. This is a bug with package. It should probably create a temporary file that has a list of the md5 hashes of all files going into the zip. Then md5 this temporary file and use that value as the name of the S3 object.

I have the same issue. Have a CodeCommit repo with a sam.yml containing multiple lambdas.

When from my VM i use aws cli on the promt 2 times after each other, the frist will upload a .ip for every lambda. The seconds one does nothing because nothing changed = correct.

But... doing exactly the same from CodePipeline , CodeBuild (aws cloudform package bla bla) it does not work. You can trigger the pipeline with "Release Change" without needing a commit which will trigger the pipeline. It start a aws cli docker for CodeBuild, gets the input sources from S3 and unzips them. Calls cloudformation package which DOES reupload unchanged code for every lambda causing redeployment in next steps.

  1. How does not anyone using CodePipeline and Lambdas run into this?
  2. It seems that fetching unchanged sources from S3, unzipping them en doing package leads to other MD5 which is NOT OK.

Does anyone know a workaround and when this bug will be fixed?

I am having the same issue. I'm finding reviewing CloudFormation change sets painful because they are polluted with changes to Lambda resources that didn't materially change.

I'm seeing the same problem as @jmassara reported above with node. This one is painful for us as we are trying to use a CodePipeline to deploy Lambda@Edge functions with the CDN in the stack - even if we don't touch the functions, the CLI during packaging thinks the files changed resulting in a CDN update (wait 15 min) even if we didn't change anything in the function code. It is far more than just an unnecessary version publish in the change set - slows the entire CD process down unnecessarily because of how slow CloudFront updates are.

Hi, Is there any progress with this feature-request?
Comparing the md5sum of each file within a zip instead of md5sum of zip file sounds like a good possible solution for this problem.
Appreciate your thoughts and a possible fix for this. We have a CI/CD pipeline with many lambda functions and this problem is causing a new version of aws lambda being deployed everytime unnecessarily.

We are also facing this exact issue.

@rmmeans I have exactly same issue. This not only slows down the deployment, but also the rollbacks.

Guys, my question is not 100% related to this particular bug (I bypassed it by having different and separated lambdas) but there is smth I really cant bypass and I am giving up on it.. I would really appreciate any help/suggestions - please take a look at this error

image

that package command fails when i have too many deps added to my package.json, and unfortunately, do to the nature of the lambda, there is no way to decrease files amount..

so, is there any way, to , actually, run it with zip64 support ? please help.. I have already given up on this...

The solution may depend on the programming language (and therefore, potentially not possible for some). We solved it in the λ# CLI as follows:

.NET Core has a deterministic build system, which means that if the source files and nuget packages have not changed, then the resulting compiled binaries remain identical as well. During the _build_ phase of the package, the CLI creates a checksum of the file contents and filenames instead of the ZIP file itself. The latter contains date & timestamps that would cause the checksum to change with every build. The result is a package filename that only changes when the underlying code changes, which in-turn, only updates Lambda functions--or Lambda layers--when required.

Any updates on this issue?

I'm facing the exact same problem

I've also been suffering this issue. I am using the sam-cli and have been trying to optimise the time to run sam package and sam deploy. So far I've got to a nice place using a node script to pre-package each of the 29 lambdas into their own directory with the required node_modules. This is important so that I can make code changes in one file, then run deployment, and it'll very quickly deploy the lambdas for which that file change was necessary. Best case it'll affect 1 lambda and my deployment will take a few seconds.

As per the rest of the conversation in this issue, the md5 of the zip is different each time. Here is a demonstration:

~/C/t/test ❯❯❯ mkdir out
~/C/t/test ❯❯❯ touch out/test
~/C/t/test ❯❯❯ echo "Hello world" > out/test
~/C/t/test ❯❯❯
~/C/t/test ❯❯❯ md5 out/test
MD5 (out/test) = f0ef7081e1539ac00ef5b761b4fb01b3

~/C/t/test ❯❯❯ zip -rqX out.zip out
~/C/t/test ❯❯❯ md5 out.zip
MD5 (out.zip) = 5f28021c0b6fc266abbfb1b36870fa1d
~/C/t/test ❯❯❯
~/C/t/test ❯❯❯ zip -rqX out2.zip out
~/C/t/test ❯❯❯ md5 out2.zip
MD5 (out2.zip) = 5f28021c0b6fc266abbfb1b36870fa1d
~/C/t/test ❯❯❯ # Same md5!

~/C/t/test ❯❯❯ echo "Hello world" > out/test
~/C/t/test ❯❯❯ md5 out/test
MD5 (out/test) = f0ef7081e1539ac00ef5b761b4fb01b3
~/C/t/test ❯❯❯ # Same md5 for file!

~/C/t/test ❯❯❯ zip -rqX out3.zip out
~/C/t/test ❯❯❯ md5 out3.zip
MD5 (out3.zip) = 1a8ec423697ce9c657b6f1c12c51476f
~/C/t/test ❯❯❯ # Different zip file md5!

Digging into the source code for the zipping + uploading functionality you can see that the code walks the file tree and adds each file to the zipfile: https://github.com/aws/aws-cli/blob/384ae0aec97a706d1ff9ca9ce206dc93c9667038/awscli/customizations/cloudformation/artifact_exporter.py#L183-L196

My proposal would be that in this step it also md5s all the files adding to the zip, and then finally md5s the total. Not sure what the perf impact would be doing this, but it should make the final deployment significantly faster if doing this kind of thing.


I've tested locally on a lambda with a small 😛 sized node_modules, total directory size ~20mb:

~/C/g/a/.s/Api ❯❯❯ time find . -type f -exec md5 \{\} >> ../out.md5 \;
       10.51 real         3.18 user         6.76 sys
~/C/g/a/.s/Api ❯❯❯ md5 ../out.md5
MD5 (../out.mdf) = 6e6584c968e3974b60ba7b4e244a84b5

This was for 3098 files.

Yes, that's close to how it's done in λ# for the .NET zip packages. Make sure to sort the files by their full path first, then MD5 the file contents and the file path. If you omit the latter, the MD5 doesn't change when you change capitalization of a file!

See details at https://github.com/LambdaSharp/LambdaSharpTool/blob/9767b96fda1c459f21ebf68c1dd18670970c012d/src/LambdaSharp.Tool/Internal/StringEx.cs#L164

@stealthycoin would there be any appetite for a PR implementing this?

@stealthycoin any update on this? I'd be happy to take a crack at a PR to implement the behaviour discussed.

hello guys, any updates please :) ? im facing the same issue, i have a multiple lambdas in monorepo
once i update a lambda, the sam package generate multiples s3 zip files for the others even if i ddidnt any changes..
its a bug or feature request ?

Hi all, I've created a pull request which seems to solve the issue we were facing, where basically we compute the checksum on the entire function content (after installing all requirements) rather than computing it on the resulting ZIP file (the current behavior).
The main difference is that when computing checksum on the ZIP it changes every time a file is created (it keeps into account file mtime and ctime) even if there is no actual change in the file content.

It would be great if this pull gets accepted and merged.
Thanks.
G

@gpiccinni I implemented a similar solution to yours in September here https://github.com/aws/aws-cli/pull/4526, but unfortunately nothing ever came of it.

@wmonk many thanks for pointing this out, by looking at your pull request I realized that in my case checksum is not changing when filenames change (which in my opinion should), whereas in your code you already addressed this !

I'll look into other libraries such as dirhash where the filename and path is included in the checksum and eventually change my pull request.

Thanks
G

@gpiccinni , awesome !!! and thanks ! i hope that your PR can be merged quickly ! this can fix a lot of pipelines..,

That is exactly the issue. If timestamps on files are different in the zip file, even if they are the same
contents, the md5 is different. In the case of scripting languages, this is probably not an issue. However with Go, each time you run go build, a new binary is created and thus a new timestamp.

@jmassara This problem exists for scripting languages also. I am facing same problem with Node.js lambdas. Looks like it is due to zip headers. Have a look at this stackoverflow discussion.

Well the CDK team does not have this problem? Find out what they are doing and do the same

After being frustrated at this issue for a while, i've fixed this in my own deploy scripts. Hopefully this can help some other, and maybe get some optimisations! I'm not sure if this is the "right" way to do it, but it's been working fine for us. One big benefit i've found is that I can make config changes without having to redeploy every function that relies on code (that hadn't changed).

find src -type f -exec md5sum {} \; > tmp-md5
find node_modules -type f -exec md5sum {} \; >> tmp-md5
CODE_MD5=$(md5sum tmp-md5 | cut -c 1-32)

if [ ! -f "$CODE_MD5" ]; then
    zip -q -r $CODE_MD5 src node_modules # more files here
fi

aws s3 ls s3://bucket-name/$CODE_MD5 || aws s3 cp $CODE_MD5.zip s3://bucket-name/$CODE_MD5

sam deploy --parameter-overrides CodeUriKey=$CODE_MD5
Parameters:
  CodeUriKey:
    Type: String
    NoEcho: true

Lambda:
  Type: AWS::Serverless::Function
  Properties:
    CodeUri:
      Bucket: bucket-name
      Key: !Ref CodeUriKey

I have found another workaround (may be easier for those who have many lambda functions in one pipeline) to this issue.

Key to this workaround was to find out what contributes to different md5 of a zip even if contents of files within zip have not changed. I found 'Modified Timestamp' of files to be culprit. So idea is; if we can have consistent 'Modified Timestamp' on all files just before 'aws cloudformation package' or 'sam package' command is run, produced zip files will have consistent md5 across build executions.

find . -exec touch -m --date="2020-01-30" {} \; # date does not matter as long as it  is never changed.
aws cloudformation package --template-file template.yml --s3-bucket <bucket> --output-template-file package-template.yml

Above trick has worked for me so far.

does not work for me

Was this page helpful?
0 / 5 - 0 ratings