Aws-cli: 'aws cloudformation package' should use a hash value for uniquely identifying packages

Created on 5 May 2017  路  18Comments  路  Source: aws/aws-cli

I have a trivial my.sam.yaml SAM CFN Template with a CodeUri: ./code/

$ aws cloudformation package --template-file my.sam.yaml --s3-bucket my-3 --output-template-file output1.yaml
$ aws cloudformation package --template-file my.sam.yaml --s3-bucket my-3 --output-template-file output2.yaml
$ touch ./code/lambda_function.py
$ aws cloudformation package --template-file my.sam.yaml --s3-bucket my-3 --output-template-file output3.yaml

output1.yaml and output2.yaml are identical. output3.yaml has a CodeUri: with a different object key in S3.

From this behaviour, I assume that the CodeUri object-key is generated based on the timestamp(s) of the code repository. This is fine for local packaging, but when running on a build server, it creates redundant artifacts and no-op change-sets.

Could the object-key be changed to a content-based hash that won't change unless the code data (not meta-data) changes?

cloudformation packagdeploy customization feature-request

Most helpful comment

Based on community feedback, we have decided to return feature requests to GitHub issues.

All 18 comments

Marking as feature request. @sanathkr do you have any thoughts on this?

It does zips all files and generates a md5. Sure we can do a content based hash but that would be more complex and hard to get right

Logic along the lines of:
find ./code/ -type f | xargs cat | md5
?

Is anyone already working on this?

AFAIK no

So it looks like the MD5 hash is created in the uploader. That seems like a break in it's responsibilities of just uploading artifacts.

I suggest moving the hashing algorithm into a artifact_exporter.py and use it to hash the zip directory for a consistent filename before sending it to the uploader to upload.

If we want to keep the method uploader.upload_with_dedup I suggest moving these functions into a new utils.py file.

I'd also add that currently package command doesn't seem to recognize when the directory specified through the CodeUri property is shared between multiple functions. So it ends up uploading the same package over and over again (once for each function that uses the same CodeUri). It would be better to keep the hash of the CodeUri property and only upload corresponding package once for each hash.

(I was referred to paste this extra bit here from https://github.com/aws/aws-cli/issues/2712#issuecomment-315418115)

the md5 hash is supposed to do the job, but for some reason it doesn't work right in some cases. Need to look deeper

Yes, there are actually a few tricks to get it working right. For one, if we do a build job in CI/CD, some tools (such as npm) actually generate new package files (e.g. when running npm install) that change their content on every rerun. In addition, we must ignore file modification dates. So to make a global hash calculation more consistent (as in, to detect actual changes to our code), we need to filter the files that are being generated (e.g. only run it on *.js and *.json) and ignore some common offenders (e.g. package.json). I'm not sure how to make that process framework-agnostic though.

@dinvlad - those rules seems very specific to certain package managers. I personally would feel uncomfortable embedding such logic in the AWS CLI.

For your npm issues, does yarn give more predictable output?

Not sure, I haven't tried it yet. Actually I converted all of my functions to regular ones to address the nested stack issue. Now that hash calculation is predictable for my use case.

Any updates on this?

Good Morning!

We're closing this issue here on GitHub, as part of our migration to UserVoice for feature requests involving the AWS CLI.

This will let us get the most important features to you, by making it easier to search for and show support for the features you care the most about, without diluting the conversation with bug reports.

As a quick UserVoice primer (if not already familiar): after an idea is posted, people can vote on the ideas, and the product team will be responding directly to the most popular suggestions.

We鈥檝e imported existing feature requests from GitHub - Search for this issue there!

And don't worry, this issue will still exist on GitHub for posterity's sake. As it鈥檚 a text-only import of the original post into UserVoice, we鈥檒l still be keeping in mind the comments and discussion that already exist here on the GitHub issue.

GitHub will remain the channel for reporting bugs.

Once again, this issue can now be found by searching for the title on: https://aws.uservoice.com/forums/598381-aws-command-line-interface

-The AWS SDKs & Tools Team

This entry can specifically be found on UserVoice at: https://aws.uservoice.com/forums/598381-aws-command-line-interface/suggestions/33168394--aws-cloudformation-package-should-use-a-hash-val

Based on community feedback, we have decided to return feature requests to GitHub issues.

Is there any movement on this issue? The UserVoice pages 404 so not sure if anything happened.
I am creating simple python Lamba functions and it is creating new zip files every time.
I have diffed the contents of the zip and they are identical :-(

aws --version
aws-cli/1.16.10 Python/3.7.0 Darwin/17.7.0 botocore/1.12.0

+1 to getting this sorted. We have 50mb of source / dependencies reused across 4 lambda functions. the package command repeatedly uploads the code to the same S3 key.

Any updates on this? I tried pointing to separate source files for each function, but then finding out that they all just get uploaded anyway kind of defeats the purpose.

Any updates now? I do not completely understand how the md5sum is created, can I find the source code somewhere to maybe look for a workaround?

Was this page helpful?
0 / 5 - 0 ratings