# filebeat 6.2.2
filebeat.prospectors:
-
paths: [ "/var/lib/docker/containers/*/*-json.log" ]
harvester_buffer_size: 65536
json.message_key: log
json.keys_under_root: true
json.add_error_key: true
fields_under_root: true
processors:
- add_docker_metadata: ~
harvester_buffer_size: 65536
output.file:
path: "/tmp/filebeat"
filename: filebeat
How to reproduce:
use my config, start a container and run the following:
#!/bin/bash
key="somerandomkey_"
value="somerandomvalue_"
echo -n '{'
for i in $(seq 420); do
echo -n "\"${key}${i}\":\"${value}${i}\","
done
echo '"lastkey":"end"}'
This produces a valid json output. For me it cuts off at "somerand" and a new message continues with "omkey_396".
We also tried the following and it does not fix the issue: (as well as with and without the harvester_buffer_size option.)
filebeat.prospectors:
- type: docker
containers.ids:
- "*"
processors:
- add_docker_metadata: ~
@paltryeffort https://discuss.elastic.co/t/filebeat-splits-message-after-16k/123718/4
+1
we encounter this with some messages as well, can it be fixed?
I initially tried to reproduce this without docker to see if it is something in Filebeat itself or more likely related to Docker. As I didn't manage to reproduce it with local files I started to investigate a bit deeper into Docker. It seems this breaking change was introduced with 1.13 and is kind of a known issue: https://github.com/moby/moby/issues/34855 With https://github.com/moby/moby/pull/34888 it was made possible for log drivers to make the buffer size configurable, but if no buffer size is set it will fall back to the 16k we see above.
As Filebeat reads the files from disk I assume the lines are already split up in the log file which docker created. I need to further investigate if the above is correct and if there is a config option in the json-file driver to change the buffer size.
I did a test with the json-file logger and indeed in the file itself the log message it split up across two lines which means Filebeat cannot do much else then read the file line by line. I looked that BufSize interface and the only logger which makes use of it is the cloudwatchlogs logger: https://github.com/moby/moby/search?utf8=%E2%9C%93&q=BufSize&type= Based on this I assume this option does not exist for the json-file logger. There seem to be some other logging options like max-size but they only seem to influence the overall rollover size of the file. https://docs.docker.com/config/containers/logging/json-file/
I've noticed docker format doesn't add a \r\n to messages when it splits them, so we could potentially detect that and do the join. Example:
{"log":"\"1\":\"1\",\"2\":\"2\",\"3\":\"3\", ... ,\"1328\":\"1328\",\"1329\"","stream":"stdout","time":"2018-03-26T10:33:19.796343111Z"}
{"log":":\"1329\",\"1330\":\"1330\" ... ,\"1419\":\"1419\",\"1420\":\"1420\",/ # \r\n","stream":"stdout","time":"2018-03-26T10:33:53.840245637Z"}
@exekias
if i understand this correct, docker adds a PARTIAL flag to the messages if they are truncated.
Here ist how the docker journald plungin handel this:
https://github.com/moby/moby/blob/80edccda708c6771824c245b4d634c2a9ce29795/daemon/logger/journald/journald.go#L110
@f0 That is interesting. But as this is internal it is up to the specific logger to make use of it. Do you know if the json-file logger has something similar?
@ruflin i chzecked with 17.09.1-ce , seems that the json-logger does not has this flag.
I can confirm that the log ends with \r\n
So does a ticket have to be opened w/ moby for this?
@ruflin @f0 what seems more interesting is this work to add partial message metadata. Might be better to base filebeat work off this. https://github.com/moby/moby/pull/35831
@wallies Thanks for the link. I only skimmed through the PR but it seems in case of partial data the json logger would skip the newline and put the second object on the same line. It would mean we have 2 (or more) json objects on the same line. Not sure yet how we would deal with it on our end but I'm sure we would find a way.
From a Filebeat perspective I would prefer if the json logger would have a configurable buffer size so we would not have to glue things together. I assume a PR could be done against the json file logger to have support for it but I don't know if that would be an option from the moby project perspective. Anyone knows more here?
@bitsofinfo It is probably worth to open an issue related to the Buffer support in json file logger to get a feeling if a PR in this direction would have a change in the moby project.
@ruflin Would it have more weight if someone from elastic opened such an issue? I'm not sure which moby project or area of code to reference and point them to. Also concerned that any moby side solution will take months if not more than a year to actually get implemented :/ Nothing can be done in beats to handle this?
As long as there is no flag or indication that the same event is spread to multiple lines there is not much beats can do.
I think the quickest to get a change in is to open a PR and get it merged as I doubt someone would open the PR based on the Github issue if there is no real urgency for it. If the PR is open by a Beats team member of an external contributor I would assume does not make a difference. It would probably be best if someone would open it that has done a few changes in the past to moby / docker.
Right now filebeat fully depends on the json file logger. There are other outputs like journald which in the future are potentially supported by filebeat, at least we are investigating it. This is also a long shot but perhaps some of these outputs already have these features in.
@bitsofinfo The biggest pushback from the moby contributors I expect is why someone would have 16kb in a log line (based on the comments in the thread). The good news is that some PR's to make this possible were already accepted just so this might be an obsolete topic. But the changes never made it to the default logger so far.
added an issue at https://github.com/moby/moby/issues/36777 please chime in there.
Is there any way to fix this in Filebeat without waiting for movement in moby/moby?
Even if they add mechanisms that Filebeat can hook into, to puzzle partial messages back together, it's not a given that a new Docker release can be rolled out everywhere.
Kubernetes usually states a supported Docker engine that is a few releases behind, and not everybody can freely choose their Docker version, and so on and so forth.
I think we can fix this by checking the string ending, I'll do some tests
+1 the changes will likely never happen in moby, or at least not within a reasonable timeframe
I've open https://github.com/elastic/beats/pull/6967 to handle partial messages when using the docker prospector, it seems to work for me, additional confirmation would be nice.
@exekias I'd love to help validating this. Can you throw a snapshot into a registry somewhere?
I've pushed a build of my branch to exekias/filebeat:6967. Of course: it is not intended for production, just to test this, take into account it's based on master.
Settings should look like this (take partial.enabled into account):
prospectors:
- type: docker
containers:
partial: true
ids:
- '*'
@exekias I've tested your build and it works like magic. I hit the limit in ES, where an indexed field cannot contain more than 32k characters, but when I split the message into multiple fields, I could reach 300k with no problems.
Thanks a lot for the quick action.
Most helpful comment
@exekias I've tested your build and it works like magic. I hit the limit in ES, where an indexed field cannot contain more than 32k characters, but when I split the message into multiple fields, I could reach 300k with no problems.
Thanks a lot for the quick action.