S3 Support was released in 1.6, however, there are a bunch of outstanding requests for improvements in the original ticket: https://github.com/fluent/fluent-bit/issues/1004
Please comment with new S3 feature requests here.
@elrob 's requests: https://github.com/fluent/fluent-bit/issues/1004#issuecomment-711627228
My response: https://github.com/fluent/fluent-bit/issues/1004#issuecomment-711672125
make the automatically added object key suffix position configurable so it is possible to have the key end with (e.g. -objectqZ7jv9Qt.jsonl )
For this one, I'm considering adding another special format string in the s3 key, $UUID (or may be $RANDOM), which will give you some number of random characters. If you enable use_put_object then having $UUID in the S3 key would be required.
That's not a perfect solution though...
The PutObject API is called under two circumstances:
use_put_object.In both cases I want to force some sort of UUID interpolation to ensure the key is unique. I suppose one thing I could do is split the S3 Key on . and then add the UUID before the last piece (if there were dots in the key). That way if you have an S3 key in the form of something.extension the UUID will come before the extension.
Another option would just be to include the $UUID special format string and require that it is always used.
Thoughts?
@shailegu requests pre-signed URLs: https://github.com/fluent/fluent-bit/issues/1004#issuecomment-671485657
I am very doubtful on the use case though; I think the pre-signed URLs are one time use only- it does not really fit a project like Fluent Bit that is meant to be continually uploading data.
Supporting parquet as an output format was requested as well: https://github.com/fluent/fluent-bit/issues/1004#issuecomment-606199771
@PettitWesley Thank you. I think adding the UUID part before the last . is a reasonable solution if it is documented so it doesn't surprise people. Or ideally can be toggled. Alternately, having an e.g. $UUID part within the key format and making it mandatory would work fine for me. Probably this is the most flexible solution with the least surprises.
Hi @PettitWesley I came across an issue when configuring S3 Output to use an Object Lock enabled S3 bucket. Would it be possible to include the Content-MD5 header with requests?
From the AWS Object Lock doc...
If you configure a default retention period on a bucket, requests to upload objects in such a bucket must include the Content-MD5 header
The logging provided by Fluent Bit supports the documentation:
[2020/10/27 18:46:22] [debug] [output:s3:s3.0] Running upload timer callback..
[2020/10/27 18:46:22] [debug] [aws_credentials] Requesting credentials from the env provider..
[2020/10/27 18:46:23] [debug] [http_client] server s3.us-west-2.amazonaws.com:443 will close connection #37
[2020/10/27 18:46:23] [debug] [aws_client] s3.us-west-2.amazonaws.com: http_do=0, HTTP Status: 400
[2020/10/27 18:46:23] [debug] [aws_client] Unable to parse API response- response is notnot valid JSON.
[2020/10/27 18:46:23] [debug] [output:s3:s3.0] PutObject http status=400
[2020/10/27 18:46:23] [error] [output:s3:s3.0] PutObject API responded with error='InvalidRequest', message='Content-MD5 HTTP header is required for Put Object requests with Object Lock parameters'
[2020/10/27 18:46:23] [error] [output:s3:s3.0] Raw PutObject response: HTTP/1.1 400 Bad Request
x-amz-request-id: xxx
x-amz-id-2: xxx
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Tue, 27 Oct 2020 18:46:22 GMT
Connection: close
Server: AmazonS3
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidRequest</Code><Message>Content-MD5 HTTP header is required for Put Object requests with Object Lock parameters</Message><RequestId>xxx</RequestId><HostId>xxx</HostId></Error>
[2020/10/27 18:46:23] [error] [output:s3:s3.0] PutObject request failed
[2020/10/27 18:46:23] [error] [output:s3:s3.0] Could not send chunk with tag syslog.0
Thanks!
- gzip compression support (I know this is mentioned above but this is currently a blocker for migrating to using fluent-bit with S3 for me so I want to express my desire for this)
@PettitWesley Can we expect the gzip compression support for the S3 output plugin to be added anytime soon? It's the only impediment for our team migrating to using fluent-bit with S3
@PettitWesley Per our discussion on Slack - it's pretty important that the S3 plugin be able to set the ACL on the file its uploading to S3. Without that, you cannot do cross-account writing safely even with the https://docs.aws.amazon.com/AmazonS3/latest/user-guide/add-object-ownership.html feature. At a minimum, there should be a canned ACL usage of the default "bucket-owner-full-control" policy. Better would be for us to be able to configure the ACL applied to the files. I think this should be a pretty simple change overall.
General note- I can not make any definite promises on timeline- but we are watching this issue and my team and I will be making our way through these requests over the next few weeks and months.
@diranged Hi, I am from Wesley's team and am working on the issue you mentioned above, to support ACL in S3. Do you think the canned ACL is good enough: https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html? Like you are able to request the policy you need and we apply it to your bucket. Or is canned ACL insufficient and you want to grant permissions to specific users or aws accounts?
Hi @PettitWesley , thanks so much for your work on the s3 plugin. Just wondering if the compression is still being worked on? 馃槃
@hawkesnc it has been merged but not released IIRC.
CC @zhonghui12
Some discussion on s3_key_format in #2905
Most helpful comment
@PettitWesley Can we expect the gzip compression support for the S3 output plugin to be added anytime soon? It's the only impediment for our team migrating to using fluent-bit with S3