Beats: Filebeat: multiline: introduce merge by using max-lines as condition instead of pattern

Created on 28 Apr 2020 · 13Comments · Source: elastic/beats

Describe the enhancement:
Once in a while people like to merge messages into a single line not based on a pattern but based on the number of lines that have to be merged. This may be caused by not having a clear usable pattern or by just wanting to reduce the number of lines in a message by combining several. There are situations that it may also be handy to combine the lines into a JSON-array that can be used by other applications.

I propose to introduce an extra multiline parameter kind that distinguishes this behavior. Of course all the other parameters are still valid so in theory you can combine the pattern and the max_lines parameters. Although in practice I do not expect that.

The values of the kind parameter would be <<empty>> (default and current implementation), merge, and merge-json, where merge-json will combine the messages in a JSON-array.

Describe a specific use case for the enhancement or feature:
It is when you know the number of lines of an event but there is no clear pattern.

Per example someone has dumped a database table one field per line. In that case you know the number of lines for a row (= number of columns) but creating a pattern for that may be hard. In this situation the configuration can be as follows:

multiline.kind: "merge"
multiline.pattern: ".*"
multiline.match: "before"
multiline.negate: false
multiline.max_lines: 13

where 13 is the number of columns in a row. This will create a single event for a single row. In case you would choose merge-json they would be combined in one JSON-array.

Another use-case is that someone just want to group a set of events that are similar. Per example the application is creating a lot of events and you want to put them in buckets of 300 each so that you can handle such group as a single event. In that case the configuration can be as follows:

multiline.kind: "merge"
multiline.pattern: ".*"
multiline.match: "before"
multiline.negate: false
multiline.max_lines: 300

A side-effect of the merge and merge-json options are that there are no lines discarded.

Services enhancement

Source

williamd67

Most helpful comment

Thanks all. I just integrated filebeat 7.9.x version (which contains this change) in our system and it works like a charm. Thanks again.

williamd67 on 30 Sep 2020

🚀1 ❤1 🎉1

All 13 comments

Pinging @elastic/integrations-services (Team:Services)

elasticmachine on 28 Apr 2020

I am not sure I completely understand your request. Is merge-json is a kind of multiline? If you would like to parse JSON why not use decode_json_fields processor in this case?

If you configure merge, do you still need the pattern-based multiline aggregation as well? Or you just want to read every N lines into a single event from a file?

kvch on 5 May 2020

I am not sure I completely understand your request. Is merge-json is a kind of multiline? If you would like to parse JSON why not use decode_json_fields processor in this case?

Thanks for investigating this topic. The kind merge-json is to create json as output so it will combine the found number of lines in a json-array event in stead of single concatenated event. This could be handy in case the lines represent single fields like a database-table dump. So it does not refer to the input lines.

williamd67 on 6 May 2020

If you configure merge, do you still need the pattern-based multiline aggregation as well? Or you just want to read every N lines into a single event from a file?

In theory you could use the pattern as well but in practice i would expect that it just reads every N lines into a single event from a file, so we could remove/hard-code the other parameters as that would make the usage clearer and simpler.

williamd67 on 6 May 2020

I have opened this PR to add a new mode to multiline reader to aggregate N lines: https://github.com/elastic/beats/pull/18352

With the following configuration you can aggregate 5 lines and parse the JSON:

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/*.log
  multiline.type: count
  multiline.lines_count: 5

processors:
  - decode_json_fields:
      fields: ["message"]
      target: ""
      overwrite_keys: true

Does this solve your problem?

kvch on 7 May 2020

Thanks for your PR. I like your improvement count over merge. I would keep max_lines as it is familiar for people although count_lines describes the purpose better so I would be happy with both. The json is not for the reader but for the writer to concatenate the different lines into a single json-array. I expect that your configuration will not work as the concatenated lines will not be correct json. I will test it as well.
I think the change can be much smaller and I will create a PR (based on your PR) as soon as I have some time.

williamd67 on 11 May 2020

I would keep max_lines as it is familiar for people although count_lines describes the purpose better so I would be happy with both.

I introduced a new option because max_lines does not describe the feature exactly. It implies that the number of lines might be smaller than the value configured in that option. count_lines express that number of lines must be the same always.

The json is not for the reader but for the writer to concatenate the different lines into a single json-array. I expect that your configuration will not work as the concatenated lines will not be correct json.

I am not sure why it does not fit your use case. Could you please share a few example logs so I can understand it?

kvch on 11 May 2020

I am not sure why it does not fit your use case. Could you please share a few example logs so I can understand it?

I will test it first and in case it fails I will give you some examples.

I created in my own space a changelist that contains an implementation with less changes. My changelist is based on this changelist so comparing should be straight-forward. It does reuse the current implementation of multiline so in case that is not preferred the implementation of this PR can be used. I also fixed the go-test and python-test.

williamd67 on 11 May 2020

Your approach leads to a smaller changeset. However, I do not want to add more complexity to the already pretty complicated pattern-based matcher of the multiline reader. So I would rather go with my own solution. I hope that is fine with you. :)

I am looking forward to seeing the results of your tests.

kvch on 12 May 2020

I hope that is fine with you. :)

Absolutely. I will test and let you know the results when I have some time.

williamd67 on 12 May 2020

Hi, I tested your change and the concatenating of the lines works fine. Thanks.

As expected does the json-decoder fail. This is caused by the fact that the lines are concatenated with a new-line character, and even if you would replace that by a , I expect it to fail as the brackets {} or [] are missing. Find below some sample input-files:

foo.1.log
foo.2.log
foo.3.log

and the filebeat.yml:

filebeat.yml.txt

As the extension json will make the understanding of the feature more complex and we can do without it by post-processing the events using the new-line as separator I am fine with only this improvement.

williamd67 on 13 May 2020

I added a new option skip_newline. If you set it to true, newline character is not added to the concatenated lines.

kvch on 21 May 2020

Thanks all. I just integrated filebeat 7.9.x version (which contains this change) in our system and it works like a charm. Thanks again.

williamd67 on 30 Sep 2020

🚀1 ❤1 🎉1

Was this page helpful?

0 / 5 - 0 ratings