Vector: Update sink encoding to be non optional

Created on 27 Aug 2019  路  8Comments  路  Source: timberio/vector

Most sinks allow the user to specify an encoding type. If this option is not provided the sinks will attempt to check each event and encode it based on if it was explicitly structured by the user. Due to precidence the current implementation of the match statement will attempt to encode the event as a structured event even if the user explicitly included the encoding = "text" option. This should be fixed by forcing the left tuple match of the boolean value to be None.

Update:

Based on the comments below, the sinks will be modified to contain explicit options to encode their events. This means that encoding is no longer optional and a user must select their desired encoding or the sink can not be built.

Implementation wise this should be a simple fix to remove the second tuple item in this match statement https://github.com/timberio/vector/blob/master/src/sinks/aws_cloudwatch_logs/mod.rs#L268 and to change the encoding config item to not be wrapped in an option. In a separate issue, we can consolidate encoding options.

Sinks that need this fix:

  • [x] aws_cloudwatch_logs
  • [x] aws_s3
  • [x] aws_kinesis_streams
  • [x] kafka
  • [x] splunk
  • [x] tcp
  • [x] All sinks should have tests for this behavior
good first issue aws_cloudwatch_logs aws_kinesis_streams aws_s3 kafka socket splunk_hec bug

All 8 comments

So in the process of upgrading the aws_s3 sink I realized that this behavior is very confusing. Let's take for example I write a custom transform in lua that adds some explicit field. I then set my output sink to encode as text, as a user, I would assume that I would get the explicit field I added in the text output. This is not the case since text encoding will just extract the MESSAGE field instead of the whole event structured map.

I brought this up with @lukesteensen offline and we both agreed that it may be a better user experience if we made encoding options _non-optional_. This would make it very explicit to the user what type of encoding happens. This would also remove the feature of having dynamic encoding which I think in reality is also very confusing. I believe by forcing the user to choose their output encoding type we can remove weird issues that may happen downstream and are at the cause of vector sending the wrong assumed encoding type.

We could also allow a third encoding option that is Encoding::Dynamic that a user can _explicitly_ choose to use, thus opting into the more implicit behavior.

Proposed changes:

  • Make encoding option _non-optional_
  • Add Encoding::Dynamic to allow the user to explicitly choose to dynamically encode events instead of doing it by default.

cc @binarylogic I'd like to hear your thoughts?

I'm 馃憤 on making this explicit. I'm questioning why we would keep the dynamic option at all? Why would a user explicitly set that?

Yeah, definitely leave out Dynamic. That's exactly what we want to get rid of :smile:

I then set my output sink to encode as text, as a user, I would assume that I would get the explicit field I added in the text output.

Can you provide an example of what this would look like? JSON is text, and it would encode the field, so I'm curious how you'd include that field when text is provided. Basically, I'm wondering if it makes more sense to rename text to message_only or something more clear?

Basically, I'm wondering if it makes more sense to rename text to message_only or something more clear?

This is a good point, the issue is that encoding as message_only is not really an "encoding". So maybe there is something we are missing here, maybe we really need to define what text encoding really is?

I think we've defined it, text encoding is just the message field value. I just wanted to clarify your comment because it alludes to including other non-message fields. So I'm confused.

@binarylogic Right, I was just stating that when I was writing the test that is kinda how I would have naturally assumed the output of text would be if I didn't know in advance that that wasn't the behavior. For this issue, I am not suggesting we change the behavior.

Got it. Yeah, I'm 馃憤 on making this explicit. It's clearly confusing. We can document that text is the message field only.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

LucioFranco picture LucioFranco  路  3Comments

raghu999 picture raghu999  路  3Comments

MOZGIII picture MOZGIII  路  3Comments

binarylogic picture binarylogic  路  4Comments

LucioFranco picture LucioFranco  路  3Comments