Fluent-bit: Docker_mode to recombine multiline records in json-log from docker

Created on 15 Feb 2019 · 31Comments · Source: fluent/fluent-bit

Problem
If the application in kubernetes logs multiline messages, docker split this message to multiple json-log messages.

The actual output from the application

[2019-02-15 10:36:31.224][38][debug][http] source/common/http/conn_manager_impl.cc:521] [C463][S12543431219240717937] request headers complete (end_stream=true):
':authority', 'customer1.demo1.acme.us'
':path', '/api/config/namespaces/test/routes'
':method', 'GET'
'user-agent', 'Go-http-client/1.1'
'cookie', 'X-ACME-GW-AUTH=eyJpc3N1ZWxxxxxxxx948b94'
'accept-encoding', 'gzip'
'connection', 'close'

Now this becomes in docker log, to be parsed by fluentbit in_tail: (example differs from the above)

{"log":"[2019-02-15 11:00:08.688][9][debug][router] source/common/router/router.cc:303] [C0][S14319188767040639561] router decoding headers:\n","stream":"stderr","time":"2019-02-15T11:00:08.688733409Z"}
{"log":"':method', 'POST'\n","stream":"stderr","time":"2019-02-15T11:00:08.688736209Z"}
{"log":"':path', '/envoy.api.v2.ClusterDiscoveryService/StreamClusters'\n","stream":"stderr","time":"2019-02-15T11:00:08.688757909Z"}
{"log":"':authority', 'xds_cluster'\n","stream":"stderr","time":"2019-02-15T11:00:08.688760809Z"}
{"log":"':scheme', 'http'\n","stream":"stderr","time":"2019-02-15T11:00:08.688763609Z"}
{"log":"'te', 'trailers'\n","stream":"stderr","time":"2019-02-15T11:00:08.688766209Z"}
{"log":"'content-type', 'application/grpc'\n","stream":"stderr","time":"2019-02-15T11:00:08.688768809Z"}
{"log":"'x-envoy-internal', 'true'\n","stream":"stderr","time":"2019-02-15T11:00:08.688771609Z"}
{"log":"'x-forwarded-for', '192.168.6.6'\n","stream":"stderr","time":"2019-02-15T11:00:08.688774309Z"}
{"log":"\n","stream":"stderr","time":"2019-02-15T11:00:08.688777009Z"}

docker_mode: 0n shall - recombine split Docker log lines before passing them to any parser as configured above.

I would expect it will apply to this case as well, however I it does not. Below I provided my configuration.

Describe the solution you'd like

in_tail/docker_mode - shall have the possibility to read docker's json-log as a stream of original text. json parser, here is just pre-processor that will buffer the "log" key, so multiline regexp patterns can be used later.

Describe alternatives you've considered

I believe this problem can be avoided if:

docker logs are sent directly to fluentd (docker fluentd driver, https://docs.docker.com/config/containers/logging/fluentd/)
docker logs are sent to journal/syslog etc

However:

on hosted kubernetes platforms, you are not allowed to change docker loging driver. (Azure AKS for example)
on hosted environments, mixing your docker logs with system (journal, syslog) is note desired.

Fluent bit FILTERS are applied after the parsing, so can't transform the stream early.

Additional context

Fluentbit config I am using:

  input-kubernetes.conf: |
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Skip_Long_Lines   Off
        Docker_Mode       On
        Refresh_Interval  10
        Chunk_Size        32k
        Buffer_Max_Size   2M
  filter-kubernetes.conf: |
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc.cluster.local:443
        Merge_Log           On
        K8S-Logging.Parser  On

    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On
        # Command      |  Decoder | Field | Optional Action
        # =============|==================|=================
        Decode_Field_As   escaped_utf8    log    do_next
        Decode_Field_As   escaped         log    do_next
        Decode_Field_As   json            log

Source

epcim

👍7

Most helpful comment

Hey I'm struggling with the same right now. Is there any additional planned feature or bug fix for this? Docker_Mode On is exactly what I want. My parsers can then extract fields. I struggle finding a solution for spring boot stack traces with fluent bit at all (using Multiline or Docker_Mode). Any update or feedback would be appreciated.

etwillbefine on 26 May 2019

👍9

All 31 comments

epcim on 15 Feb 2019

Repository that can be used for testing: https://github.com/epcim/fluentbit-sandbox

epcim on 19 Feb 2019

etwillbefine on 26 May 2019

👍9

I'm struggling with this right now. Do you have any solution for k8s's multiline log?

sysword on 31 Oct 2019

👍4

I am also stuck in same issue. Multiline log parser is not working in K8.

sreedharbukya on 4 Nov 2019

👍2

I'm stuck with this as well; is there a set of input flags under which the large input (16 kb) input from Docker will work?

DanielJRutledge on 12 Nov 2019

I'm also experiencing the same problem with not being able to parse multiline logs in Kubernetes cluster. I have tried solutions suggested in related threads in this repo but couldn't get it working.

My input config:

input-kubernetes.conf: |
[INPUT]
    Name              tail
    Tag               kube.*
    Path              /var/log/containers/abc-*.log
    Parser            docker
    Parser_Firstline  multiline_parser_head
    Parser_1          multiline_parser_error
    Multiline         On
    DB                /var/log/flb_kube.db
    Mem_Buf_Limit     10MB
    Skip_Long_Lines   On
    Refresh_Interval  10

Parsers:

parsers.conf: |
[PARSER]
    Name   json
    Format json
    Time_Key time
    Time_Format %d/%b/%Y:%H:%M:%S %z

[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On

[PARSER]
    Name        multiline_parser_head
    Format      regex
    Regex       /\d{4}-\d{1,2}-\d{1,2}/

[PARSER]
    Name        multiline_parser_error
    Format      regex
    Regex       /(?<timestamp>[^ ]* [^ ]*) (?<level>[^\s]+:)(?<message>[\s\S]*)/

isurusiri on 14 Nov 2019

👍2

Same issue here. Any news on this?

TomaszKlosinski on 4 Dec 2019

I'm also experiencing the same problem with not being able to parse multiline logs in Kubernetes cluster. I have tried solutions suggested in related threads in this repo but couldn't get it working.

My input config:

input-kubernetes.conf: |
[INPUT]
    Name              tail
    Tag               kube.*
    Path              /var/log/containers/abc-*.log
    Parser            docker
    Parser_Firstline  multiline_parser_head
    Parser_1          multiline_parser_error
    Multiline         On
    DB                /var/log/flb_kube.db
    Mem_Buf_Limit     10MB
    Skip_Long_Lines   On
    Refresh_Interval  10

Parsers:

parsers.conf: |
[PARSER]
    Name   json
    Format json
    Time_Key time
    Time_Format %d/%b/%Y:%H:%M:%S %z

[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On

[PARSER]
    Name        multiline_parser_head
    Format      regex
    Regex       /\d{4}-\d{1,2}-\d{1,2}/

[PARSER]
    Name        multiline_parser_error
    Format      regex
    Regex       /(?<timestamp>[^ ]* [^ ]*) (?<level>[^\s]+:)(?<message>[\s\S]*)/

@isurusiri Are you able to figure out any solution for this?

manvinderr21 on 11 Dec 2019

I have the same issue with multi-line JSON output in docker logs.

@isurusiri Based on my understanding of the documentation, the Parser directive is ignored in the tail input when MultiLine is set to On. However, Parser_Firstline and Parser_N are not ignored.

Edit: Link to 1.3 documentation referenced above: https://docs.fluentbit.io/manual/v/1.3/input/tail#multiline

matayto on 20 Dec 2019

Hi there, any update on that issue? Thank you

Born2Bake on 19 Feb 2020

Is fluentd this only alternative to fix this issue?

jujugrrr on 12 Mar 2020

Is fluentd this only alternative to fix this issue?

No, I'm using elastic fielbeat for this and it works like a charm.

TomaszKlosinski on 17 Mar 2020

Is fluentd this only alternative to fix this issue?

No, I'm using elastic fielbeat for this and it works like a charm.

Can you please share your solution to that? Thank you

Born2Bake on 17 Mar 2020

I think @TomaszKlosinski is referring to using Elastic Filebeat as a log shipper instead of Fluent Bit.

Unfortunately, that only works if you're using the ELK stack- not much help to those of us using other products, e.g. Splunk.

dharmab on 23 Mar 2020

@edsiper are you able to look into the issue? The original author of the PR that added this feature in https://github.com/fluent/fluent-bit/pull/863 is no longer on GitHub. I took a look at plugins/in_tail/tail_dockermode to see if I could help, but the lack of code comments and use of opaque abbreviations are pretty inaccessible.

dharmab on 23 Mar 2020

I prepared some changes in the dockermode plugin (#2043). I need to suit it to the contributing guide but I believe it's worth to look and give me some feedback.

Output for issued input:

[0] containers.var.log.containers.test.log: [1585073268.000318200, {"log"=>"{"log":"[2019-02-15 11:00:08.688][9][debug][router] source/common/router/router.cc:303] [C0][S14319188767040639561] router decoding headers:\n':method', 'POST'\n':path', '/envoy.api.v2.ClusterDiscoveryService/StreamClusters'\n':authority', 'xds_cluster'\n':scheme', 'http'\n'te', 'trailers'\n'content-type', 'application/grpc'\n'x-envoy-internal', 'true'\n'x-forwarded-for', '192.168.6.6'\n\n","stream":"stderr","time":"2019-02-15T11:00:08.688777009Z"}"}]

sumo-drosiek on 24 Mar 2020

👍4

@sumo-drosiek - any updates on merging this ?

vishiy on 1 May 2020

👀2

@sumo-drosiek any updates?

collardmsc on 19 May 2020

@collardmsc @vishiy Sorry for no updates. The PR has been reviewed. I'm working on the runtime tests and everything should be ready soon :)

sumo-drosiek on 20 May 2020

👍8 👀4

@sumo-drosiek Thanks a bunch for your time and effort working on this!

collardmsc on 21 May 2020

PR is ready for another review 🤞

sumo-drosiek on 9 Jun 2020

🚀5

:wave: just wondering if anyone has any updates on this? It's really help me!

Stono on 30 Jun 2020

👍4

Looks like the issue was solved, brilliant. Which version number will support the new Docker_Mode_Parser field?

Oduig on 14 Jul 2020

@Oduig AFAIK the 1.5.0 supports Docker_Mode_Parser

sumo-drosiek on 14 Jul 2020

This is not documented yet, is it?
At least I can not find anything about the Docker_Mode_Parser within the official docs here:
https://docs.fluentbit.io/manual/pipeline/inputs/tail#docker_mode

davelosert on 17 Jul 2020

@davelosert Thats right. I didn't documented it yet

sumo-drosiek on 17 Jul 2020

👍1

Im also stuck with fluent-bit and multiline logs in EKS...Does anyone found a solution/workaround for this? if so I will appreciate your comments in advance

shake76 on 10 Dec 2020

Im also stuck with fluent-bit and multiline logs in EKS...Does anyone found a solution/workaround for this? if so I will appreciate your comments in advance

Hey @shake76 Did you find any solution yet? If yes, pls let me know the sample config for docker parser with multiline parser

ankit1mg on 23 Dec 2020

@shake76 @ankit1mg Is something wrong with docker_mode_parser and EKS?

sumo-drosiek on 28 Dec 2020

Hey folks, for the latest issues around logs not working wanted to check if this might be something around CRI format vs. Docker format log parsing? https://docs.fluentbit.io/manual/installation/kubernetes#container-runtime-interface-cri-parser or if the ask if fully around multiline + docker mode