Beats: filebeats truncates log messages after 16k

Created on 20 Mar 2018 · 22Comments · Source: elastic/beats

# filebeat 6.2.2
filebeat.prospectors:
-
  paths: [ "/var/lib/docker/containers/*/*-json.log" ]
  harvester_buffer_size: 65536
  json.message_key: log
  json.keys_under_root: true
  json.add_error_key: true
  fields_under_root: true
  processors:
    - add_docker_metadata: ~

harvester_buffer_size: 65536

output.file:
  path: "/tmp/filebeat"
  filename: filebeat

The docker container outputs a single line in valid json (about 18k)
The value of the "log" message key is also a single line in valid json
The message seems to be cut off at about 16k or a bit above (depends if you count the backslashes for escaping)
A second message gets created with the remaining part of the message including full decoration (docker meta data, additional fields etc)
Looks like filebeat splits the message into 2 separate ones
harvester_buffer_size has no effect
removing the json.* options has no effect

How to reproduce:

use my config, start a container and run the following:

#!/bin/bash
key="somerandomkey_"
value="somerandomvalue_"
echo -n '{'
for i in $(seq 420); do
  echo -n "\"${key}${i}\":\"${value}${i}\","
done
echo '"lastkey":"end"}'

This produces a valid json output. For me it cuts off at "somerand" and a new message continues with "omkey_396".

We also tried the following and it does not fix the issue: (as well as with and without the harvester_buffer_size option.)

filebeat.prospectors:
- type: docker
  containers.ids:
    - "*"
  processors:
    - add_docker_metadata: ~

@paltryeffort https://discuss.elastic.co/t/filebeat-splits-message-after-16k/123718/4

Filebeat enhancement

Source

bitsofinfo

👍7

Most helpful comment

@exekias I've tested your build and it works like magic. I hit the limit in ES, where an indexed field cannot contain more than 32k characters, but when I split the message into multiple fields, I could reach 300k with no problems.

Thanks a lot for the quick action.

towolf on 27 Apr 2018

👍3

All 22 comments

bmudda on 21 Mar 2018

we encounter this with some messages as well, can it be fixed?

nfedyk on 21 Mar 2018

I initially tried to reproduce this without docker to see if it is something in Filebeat itself or more likely related to Docker. As I didn't manage to reproduce it with local files I started to investigate a bit deeper into Docker. It seems this breaking change was introduced with 1.13 and is kind of a known issue: https://github.com/moby/moby/issues/34855 With https://github.com/moby/moby/pull/34888 it was made possible for log drivers to make the buffer size configurable, but if no buffer size is set it will fall back to the 16k we see above.

As Filebeat reads the files from disk I assume the lines are already split up in the log file which docker created. I need to further investigate if the above is correct and if there is a config option in the json-file driver to change the buffer size.

ruflin on 25 Mar 2018

I did a test with the json-file logger and indeed in the file itself the log message it split up across two lines which means Filebeat cannot do much else then read the file line by line. I looked that BufSize interface and the only logger which makes use of it is the cloudwatchlogs logger: https://github.com/moby/moby/search?utf8=%E2%9C%93&q=BufSize&type= Based on this I assume this option does not exist for the json-file logger. There seem to be some other logging options like max-size but they only seem to influence the overall rollover size of the file. https://docs.docker.com/config/containers/logging/json-file/

ruflin on 26 Mar 2018

I've noticed docker format doesn't add a \r\n to messages when it splits them, so we could potentially detect that and do the join. Example:

{"log":"\"1\":\"1\",\"2\":\"2\",\"3\":\"3\", ... ,\"1328\":\"1328\",\"1329\"","stream":"stdout","time":"2018-03-26T10:33:19.796343111Z"}
{"log":":\"1329\",\"1330\":\"1330\" ... ,\"1419\":\"1419\",\"1420\":\"1420\",/ # \r\n","stream":"stdout","time":"2018-03-26T10:33:53.840245637Z"}

exekias on 26 Mar 2018

@exekias
if i understand this correct, docker adds a PARTIAL flag to the messages if they are truncated.
Here ist how the docker journald plungin handel this:
https://github.com/moby/moby/blob/80edccda708c6771824c245b4d634c2a9ce29795/daemon/logger/journald/journald.go#L110

f0 on 26 Mar 2018

@f0 That is interesting. But as this is internal it is up to the specific logger to make use of it. Do you know if the json-file logger has something similar?

ruflin on 27 Mar 2018

@ruflin i chzecked with 17.09.1-ce , seems that the json-logger does not has this flag.

I can confirm that the log ends with \r\n

f0 on 27 Mar 2018

So does a ticket have to be opened w/ moby for this?

bitsofinfo on 28 Mar 2018

@ruflin @f0 what seems more interesting is this work to add partial message metadata. Might be better to base filebeat work off this. https://github.com/moby/moby/pull/35831

wallies on 29 Mar 2018

@wallies Thanks for the link. I only skimmed through the PR but it seems in case of partial data the json logger would skip the newline and put the second object on the same line. It would mean we have 2 (or more) json objects on the same line. Not sure yet how we would deal with it on our end but I'm sure we would find a way.

From a Filebeat perspective I would prefer if the json logger would have a configurable buffer size so we would not have to glue things together. I assume a PR could be done against the json file logger to have support for it but I don't know if that would be an option from the moby project perspective. Anyone knows more here?

ruflin on 30 Mar 2018

@bitsofinfo It is probably worth to open an issue related to the Buffer support in json file logger to get a feeling if a PR in this direction would have a change in the moby project.

ruflin on 30 Mar 2018

@ruflin Would it have more weight if someone from elastic opened such an issue? I'm not sure which moby project or area of code to reference and point them to. Also concerned that any moby side solution will take months if not more than a year to actually get implemented :/ Nothing can be done in beats to handle this?

bitsofinfo on 30 Mar 2018

As long as there is no flag or indication that the same event is spread to multiple lines there is not much beats can do.

I think the quickest to get a change in is to open a PR and get it merged as I doubt someone would open the PR based on the Github issue if there is no real urgency for it. If the PR is open by a Beats team member of an external contributor I would assume does not make a difference. It would probably be best if someone would open it that has done a few changes in the past to moby / docker.

Right now filebeat fully depends on the json file logger. There are other outputs like journald which in the future are potentially supported by filebeat, at least we are investigating it. This is also a long shot but perhaps some of these outputs already have these features in.

@bitsofinfo The biggest pushback from the moby contributors I expect is why someone would have 16kb in a log line (based on the comments in the thread). The good news is that some PR's to make this possible were already accepted just so this might be an obsolete topic. But the changes never made it to the default logger so far.

ruflin on 3 Apr 2018

added an issue at https://github.com/moby/moby/issues/36777 please chime in there.

bitsofinfo on 3 Apr 2018

Is there any way to fix this in Filebeat without waiting for movement in moby/moby?

Even if they add mechanisms that Filebeat can hook into, to puzzle partial messages back together, it's not a given that a new Docker release can be rolled out everywhere.

Kubernetes usually states a supported Docker engine that is a few releases behind, and not everybody can freely choose their Docker version, and so on and so forth.

towolf on 27 Apr 2018

I think we can fix this by checking the string ending, I'll do some tests

exekias on 27 Apr 2018

👍3

+1 the changes will likely never happen in moby, or at least not within a reasonable timeframe

bitsofinfo on 27 Apr 2018

I've open https://github.com/elastic/beats/pull/6967 to handle partial messages when using the docker prospector, it seems to work for me, additional confirmation would be nice.

exekias on 27 Apr 2018

@exekias I'd love to help validating this. Can you throw a snapshot into a registry somewhere?

towolf on 27 Apr 2018

I've pushed a build of my branch to exekias/filebeat:6967. Of course: it is not intended for production, just to test this, take into account it's based on master.

Settings should look like this (take partial.enabled into account):

prospectors:
   - type: docker
     containers:
       partial: true
       ids:
         - '*'

exekias on 27 Apr 2018

Thanks a lot for the quick action.

towolf on 27 Apr 2018

👍3

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[Auditbeat] Add xxhash as a hash option for faster file integrity checks on large files

geekpete · 3Comments

Monitoring: allow specifying /proc or hostfs path.

adriansr · 3Comments

Filebeat exclude_lines prior to multiline

ptrlv · 3Comments

Puppet / Chef / Ansible / Salt / etc. modules

tsg · 4Comments

Doc: Clarify <stack_product> vs. <stack_product>-xpack modules

ppf2 · 3Comments