Vector: aws_s3 sink is not setting a Content-Type on uploaded files

Created on 9 Jun 2020  路  6Comments  路  Source: timberio/vector

Trying out replacing our current fluentd log shipping system with vector and I noticed your aws_s3 sync is not setting any Content-Type on uploaded log files. AWS then fills in an default application/octet-stream which while sufficient for downloading S3 files via a CLI/API, isn't so nice an experience in the Browser (e.g. with the AWS S3 Console).

Can I suggest hard coding text/plain; charset=utf8 as the best option, even for encoding.codec = "ndjson" as suggested in https://stackoverflow.com/questions/51690624/json-lines-mime-type

I'm guessing the code to change is in https://github.com/timberio/vector/blob/ee953b7f674f9cd4d844928fc06d2360361fe7a2/src/sinks/aws_s3.rs#L291-L308 but don't know how you'd feel about a hard-coded value...

must aws_s3 bug

Most helpful comment

FYI AWS support have replied saying the only thing they do is override the Content-Disposition response header to be attachment for the "Download" button and inline for the "Open" button. All other behaviour, especially the decompression-during-transfer, is the response of the web browser (Chrome/Firefox) to the standard Content-Encoding: gzip header.

They have registered as a Feature Request the ability to override the Download button to set Content-Disposition: attachment; filename="dummy.log", but I wouldn't hold your breath...

So thanks for your efforts & looking forward to the next 0.10 beta release to try the config items out...

All 6 comments

Thanks for reporting @tyrken. We'll prioritize and get this fixed.

@tyrken thank you for issue and pull request. I experimented a little with content-type / content-encoding too:


Commands:

aws s3api put-object --bucket s3-issue-2769 --body data.log --key data1.log --content-type text/x-log > /dev/null
aws s3api put-object --bucket s3-issue-2769 --body data.log.gz --key data2.log.gz --content-type application/x-gzip > /dev/null

aws s3api put-object --bucket s3-issue-2769 --body data.log --key data3.log --content-type binary/octet-stream > /dev/null
aws s3api put-object --bucket s3-issue-2769 --body data.log.gz --key data4.log.gz --content-type binary/octet-stream > /dev/null
aws s3api put-object --bucket s3-issue-2769 --body data.log.gz --key data5.log.gz --content-type binary/octet-stream --content-encoding gzip > /dev/null

aws s3api put-object --bucket s3-issue-2769 --body data.log --key data6.log --content-type text/plain > /dev/null
aws s3api put-object --bucket s3-issue-2769 --body data.log.gz --key data7.log.gz --content-type text/plain > /dev/null
aws s3api put-object --bucket s3-issue-2769 --body data.log.gz --key data8.log.gz --content-type text/plain --content-encoding gzip > /dev/null

I tried all files in firefox (77.0.1) / chrome (84.0.4147.45) on fedora32.

Uncompressed files opened in firefox in case of content-type text/plain (in chrome also content-type text/x-log).
With gzip compression only content-type text/plain + content-encoding gzip opened in both browsers.
I think we should left current content-encoding option and add content-type text/plain. What you think?

Uncompressed - I don't like text/plain as when Downloading it renames the file from a ".log" extension to ".txt" with Chrome/Firefox on both Linux & Windows. OTOH text\x-log doesn't rename but still opens nicely so I'd +1 to that.

Compressed: I can see why you suggest text/plain and content-encoding: gzip as both Open & Download appear to work - as the browser/AWS decompresses the file as it downloads. Hence from a starting data.log.gz Download gives you a data.txt that works as a text file (albeit actually being ndjson, not plain text). However I regard that as bad for the Download case as I expect a file called data.log.gz.

text/x-log and content-encoding: gzip together are worse, as while Open still works there is no renaming of the file extensions so Download gives you a data.log.gz which is plain text, not actually a gzip. When you click to open it in a file explorer that launches some compressed-file-viewer which says it's corrupt.

Personally I'd prefer "Download" mean "Download" (no renaming or magic decompression) & value that over the convenience of "Open" given most professionals will have a OS setup to open compressed files easily. So I'd vote for NOT putting the content-encoding metadata on for compression.

@tyrken I made these values configurable so it's possible set any value

Thanks - that's the best we can do for now.

I'll update here/open another issue if AWS Support come back with any better suggestions. So far they've mentioned using Content-Disposition but it's not helped yet, awaiting 2nd line...

FYI AWS support have replied saying the only thing they do is override the Content-Disposition response header to be attachment for the "Download" button and inline for the "Open" button. All other behaviour, especially the decompression-during-transfer, is the response of the web browser (Chrome/Firefox) to the standard Content-Encoding: gzip header.

They have registered as a Feature Request the ability to override the Download button to set Content-Disposition: attachment; filename="dummy.log", but I wouldn't hold your breath...

So thanks for your efforts & looking forward to the next 0.10 beta release to try the config items out...

Was this page helpful?
0 / 5 - 0 ratings

Related issues

a-rodin picture a-rodin  路  3Comments

binarylogic picture binarylogic  路  3Comments

LucioFranco picture LucioFranco  路  3Comments

binarylogic picture binarylogic  路  4Comments

MOZGIII picture MOZGIII  路  3Comments