Terraform-provider-aws: Download files from S3 bucket

Created on 3 Aug 2017 · 13Comments · Source: hashicorp/terraform-provider-aws

_This issue was originally opened by @danielpintilie as hashicorp/terraform#15714. It was migrated here as a result of the provider split. The original body of the issue is below._

Hi,

Is there a way to download the files that you have stored on an S3 bucket using terraform?

Thank you!

enhancement servics3

Source

hashibot

👍33

Most helpful comment

The constraint on MIME types is a result of the fact that this data source is, in a sense, serving two purposes: to get _metadata_ about an S3 object, and to get the _content_ of an S3 object. Since S3 can be used both to store small text objects and large binary objects, we wanted to make sure you could always retrieve the metadata of any object without causing Terraform to try to load a huge binary object into memory.

However, I do agree that the current approach of guessing based on Content-Type doesn't scale well, since the set of reasonable formats to retrieve is always growing.

Therefore my suggestion would be to add some new arguments to this data source:

get_body - a boolean argument that, if set to true, will override the Content-Type-based heuristic and just always populate the body. However, since Terraform strings are required to be UTF-8 for correct operation, an error should be produced if the retrieved object isn't valid UTF-8. (if we don't generate an error here then Terraform's internals will silently corrupt the binary data.)
get_body_base64 - similar to get_body but instead populates a new attribute body_base64 that _can_ accept arbitrary binary data. It would still be inadvisable to retrieve _large_ objects using this, since the provider would need to load them fully into memory and then transmit them over the RPC channel to Terraform Core, but it would allow retrieving e.g. small gzipped objects for use with arguments like user_data_base64 on aws_instance. (base64 is the conventional way to copy around small raw binary payloads in a Terraform configuration.)

Due to current limitations of the provider SDK it would not be possible to distinguish get_body = false from it not being set at all, but there is work in progress to address that and so get_body = false could also eventually be used to override the heuristic the _other_ way, and prevent the provider from retrieving the body of an object that _is_ detected as text by its Content-Type. As far as I can tell, it'd be safe to implement the above _without_ the ability to set it to false for now and then retrofit that capability in a later release once the SDK limitations have been addressed.

I notice that unfortunately the aws_s3_bucket_object data source uses body while the corresponding resource type uses content for this argument. It might be nice to address this at the same time, by adding content as an alias for body and making the new attribute described above be content_base64 instead, with the flags above then being get_content and get_content_base64 instead.

apparentlymart on 21 Jun 2018

👍11

All 13 comments

The same need is here. I want to download pre-existing files on s3 to install binaries/apps on newly launched EC2 instances using terraform.

The files are large in size and cannot upload every time using remote-exec because we have frequent provisioning of new system and it takes a lot of time. Also, the app server where these packages/files are required is in private subnet so scp'ing will be a two-way process.

apjneeraj on 17 Oct 2017

I'll add my need for this here as well. What I would like to be able to do is have my Chef Automate servers upload their validation keys to an s3 bucket when they are created (already done) and then fetch them with terraform so that I can use them with the chef provisioner.

bhechinger on 6 Nov 2017

Yes, declarative management of S3 resources is handy especially in loosely-coupled scenarios. I was hoping to leverage aws_s3_bucket_object to reference some private PKI material managed by another process but the uploads have the auto-detected mime-type of application/x-x509-ca-cert, so, no joy.

dweomer on 29 Nov 2017

👍1

This would be super appreciated!

rv-cdiaz on 25 May 2018

Also needed.

docktermj on 14 Jun 2018

Hi all!
There was a previous discussion that covered using the aws_s3_bucket_object data source to access and pass s3 objects to provisioners.

I'm going to copy some example code from that issue (credit to the brilliant @apparentlymart!)

data "aws_s3_bucket_object" "secret_key" {
  bucket = "awesomecorp-secret-keys"
  key    = "awesomeapp-secret-key"
}

resource "aws_instance" "example" {
  ## ...

  provisioner "file" {
    content = "${data.aws_s3_bucket_object.secret_key.body}"
  }
}

I hope this helps!

mildwonkey on 21 Jun 2018

@mildwonkey that is only a partial solution, since the data source only supports a very limited set of Content-Types. You can't actually download arbitrary files/content.

lorengordon on 21 Jun 2018

Ah, I'm sorry @lorengordon, that's fair. I don't have an answer for that, so I'm going to re-open this ticket and flag it as a feature request.

mildwonkey on 21 Jun 2018

However, I do agree that the current approach of guessing based on Content-Type doesn't scale well, since the set of reasonable formats to retrieve is always growing.

Therefore my suggestion would be to add some new arguments to this data source:

get_body - a boolean argument that, if set to true, will override the Content-Type-based heuristic and just always populate the body. However, since Terraform strings are required to be UTF-8 for correct operation, an error should be produced if the retrieved object isn't valid UTF-8. (if we don't generate an error here then Terraform's internals will silently corrupt the binary data.)
get_body_base64 - similar to get_body but instead populates a new attribute body_base64 that _can_ accept arbitrary binary data. It would still be inadvisable to retrieve _large_ objects using this, since the provider would need to load them fully into memory and then transmit them over the RPC channel to Terraform Core, but it would allow retrieving e.g. small gzipped objects for use with arguments like user_data_base64 on aws_instance. (base64 is the conventional way to copy around small raw binary payloads in a Terraform configuration.)

apparentlymart on 21 Jun 2018

👍11

What if Terraform implemented a lazy reader for the body attribute? That way the body would not even be retrieved unless something accessed the attribute, and it wouldn't be retrieved until the calling resource needed to be resolved. The reader should also implement a memory efficient chunking/streaming algorithm so the content can be written efficiently.

The UTF-8 requirement is super-limiting. Would be nice if there were an option to use a bytes array for such operations.

lorengordon on 23 Jun 2018

👍2

Guys, any progress on this last PR?! ^ Content-Type limitation is forcing me to workaround in profane ways :(