_This issue was originally opened by @danielpintilie as hashicorp/terraform#15714. It was migrated here as a result of the provider split. The original body of the issue is below._
Hi,
Is there a way to download the files that you have stored on an S3 bucket using terraform?
Thank you!
The same need is here. I want to download pre-existing files on s3 to install binaries/apps on newly launched EC2 instances using terraform.
The files are large in size and cannot upload every time using remote-exec because we have frequent provisioning of new system and it takes a lot of time. Also, the app server where these packages/files are required is in private subnet so scp'ing will be a two-way process.
I'll add my need for this here as well. What I would like to be able to do is have my Chef Automate servers upload their validation keys to an s3 bucket when they are created (already done) and then fetch them with terraform so that I can use them with the chef provisioner.
Yes, declarative management of S3 resources is handy especially in loosely-coupled scenarios. I was hoping to leverage aws_s3_bucket_object
to reference some private PKI material managed by another process but the uploads have the auto-detected mime-type of application/x-x509-ca-cert
, so, no joy.
This would be super appreciated!
Also needed.
Hi all!
There was a previous discussion that covered using the aws_s3_bucket_object
data source to access and pass s3 objects to provisioners.
I'm going to copy some example code from that issue (credit to the brilliant @apparentlymart!)
data "aws_s3_bucket_object" "secret_key" {
bucket = "awesomecorp-secret-keys"
key = "awesomeapp-secret-key"
}
resource "aws_instance" "example" {
## ...
provisioner "file" {
content = "${data.aws_s3_bucket_object.secret_key.body}"
}
}
I hope this helps!
@mildwonkey that is only a partial solution, since the data source only supports a very limited set of Content-Types. You can't actually download arbitrary files/content.
Ah, I'm sorry @lorengordon, that's fair. I don't have an answer for that, so I'm going to re-open this ticket and flag it as a feature request.
The constraint on MIME types is a result of the fact that this data source is, in a sense, serving two purposes: to get _metadata_ about an S3 object, and to get the _content_ of an S3 object. Since S3 can be used both to store small text objects and large binary objects, we wanted to make sure you could always retrieve the metadata of any object without causing Terraform to try to load a huge binary object into memory.
However, I do agree that the current approach of guessing based on Content-Type
doesn't scale well, since the set of reasonable formats to retrieve is always growing.
Therefore my suggestion would be to add some new arguments to this data source:
get_body
- a boolean argument that, if set to true
, will override the Content-Type-based heuristic and just always populate the body. However, since Terraform strings are required to be UTF-8 for correct operation, an error should be produced if the retrieved object isn't valid UTF-8. (if we don't generate an error here then Terraform's internals will silently corrupt the binary data.)get_body_base64
- similar to get_body
but instead populates a new attribute body_base64
that _can_ accept arbitrary binary data. It would still be inadvisable to retrieve _large_ objects using this, since the provider would need to load them fully into memory and then transmit them over the RPC channel to Terraform Core, but it would allow retrieving e.g. small gzipped objects for use with arguments like user_data_base64
on aws_instance
. (base64 is the conventional way to copy around small raw binary payloads in a Terraform configuration.)Due to current limitations of the provider SDK it would not be possible to distinguish get_body = false
from it not being set at all, but there is work in progress to address that and so get_body = false
could also eventually be used to override the heuristic the _other_ way, and prevent the provider from retrieving the body of an object that _is_ detected as text by its Content-Type
. As far as I can tell, it'd be safe to implement the above _without_ the ability to set it to false for now and then retrofit that capability in a later release once the SDK limitations have been addressed.
I notice that unfortunately the aws_s3_bucket_object
data source uses body
while the corresponding resource type uses content
for this argument. It might be nice to address this at the same time, by adding content
as an alias for body
and making the new attribute described above be content_base64
instead, with the flags above then being get_content
and get_content_base64
instead.
What if Terraform implemented a lazy reader for the body
attribute? That way the body
would not even be retrieved unless something accessed the attribute, and it wouldn't be retrieved until the calling resource needed to be resolved. The reader should also implement a memory efficient chunking/streaming algorithm so the content can be written efficiently.
The UTF-8 requirement is super-limiting. Would be nice if there were an option to use a bytes array for such operations.
Guys, any progress on this last PR?! ^ Content-Type limitation is forcing me to workaround in profane ways :(
This is affecting us as well. What's the status on the two PRs that could solve this?
+1
Most helpful comment
The constraint on MIME types is a result of the fact that this data source is, in a sense, serving two purposes: to get _metadata_ about an S3 object, and to get the _content_ of an S3 object. Since S3 can be used both to store small text objects and large binary objects, we wanted to make sure you could always retrieve the metadata of any object without causing Terraform to try to load a huge binary object into memory.
However, I do agree that the current approach of guessing based on
Content-Type
doesn't scale well, since the set of reasonable formats to retrieve is always growing.Therefore my suggestion would be to add some new arguments to this data source:
get_body
- a boolean argument that, if set totrue
, will override the Content-Type-based heuristic and just always populate the body. However, since Terraform strings are required to be UTF-8 for correct operation, an error should be produced if the retrieved object isn't valid UTF-8. (if we don't generate an error here then Terraform's internals will silently corrupt the binary data.)get_body_base64
- similar toget_body
but instead populates a new attributebody_base64
that _can_ accept arbitrary binary data. It would still be inadvisable to retrieve _large_ objects using this, since the provider would need to load them fully into memory and then transmit them over the RPC channel to Terraform Core, but it would allow retrieving e.g. small gzipped objects for use with arguments likeuser_data_base64
onaws_instance
. (base64 is the conventional way to copy around small raw binary payloads in a Terraform configuration.)Due to current limitations of the provider SDK it would not be possible to distinguish
get_body = false
from it not being set at all, but there is work in progress to address that and soget_body = false
could also eventually be used to override the heuristic the _other_ way, and prevent the provider from retrieving the body of an object that _is_ detected as text by itsContent-Type
. As far as I can tell, it'd be safe to implement the above _without_ the ability to set it to false for now and then retrofit that capability in a later release once the SDK limitations have been addressed.I notice that unfortunately the
aws_s3_bucket_object
data source usesbody
while the corresponding resource type usescontent
for this argument. It might be nice to address this at the same time, by addingcontent
as an alias forbody
and making the new attribute described above becontent_base64
instead, with the flags above then beingget_content
andget_content_base64
instead.