Fluent-bit: Docker_mode to recombine multiline records in json-log from docker

Created on 15 Feb 2019  路  31Comments  路  Source: fluent/fluent-bit

Problem
If the application in kubernetes logs multiline messages, docker split this message to multiple json-log messages.

The actual output from the application

[2019-02-15 10:36:31.224][38][debug][http] source/common/http/conn_manager_impl.cc:521] [C463][S12543431219240717937] request headers complete (end_stream=true):
':authority', 'customer1.demo1.acme.us'
':path', '/api/config/namespaces/test/routes'
':method', 'GET'
'user-agent', 'Go-http-client/1.1'
'cookie', 'X-ACME-GW-AUTH=eyJpc3N1ZWxxxxxxxx948b94'
'accept-encoding', 'gzip'
'connection', 'close'

Now this becomes in docker log, to be parsed by fluentbit in_tail: (example differs from the above)

{"log":"[2019-02-15 11:00:08.688][9][debug][router] source/common/router/router.cc:303] [C0][S14319188767040639561] router decoding headers:\n","stream":"stderr","time":"2019-02-15T11:00:08.688733409Z"}
{"log":"':method', 'POST'\n","stream":"stderr","time":"2019-02-15T11:00:08.688736209Z"}
{"log":"':path', '/envoy.api.v2.ClusterDiscoveryService/StreamClusters'\n","stream":"stderr","time":"2019-02-15T11:00:08.688757909Z"}
{"log":"':authority', 'xds_cluster'\n","stream":"stderr","time":"2019-02-15T11:00:08.688760809Z"}
{"log":"':scheme', 'http'\n","stream":"stderr","time":"2019-02-15T11:00:08.688763609Z"}
{"log":"'te', 'trailers'\n","stream":"stderr","time":"2019-02-15T11:00:08.688766209Z"}
{"log":"'content-type', 'application/grpc'\n","stream":"stderr","time":"2019-02-15T11:00:08.688768809Z"}
{"log":"'x-envoy-internal', 'true'\n","stream":"stderr","time":"2019-02-15T11:00:08.688771609Z"}
{"log":"'x-forwarded-for', '192.168.6.6'\n","stream":"stderr","time":"2019-02-15T11:00:08.688774309Z"}
{"log":"\n","stream":"stderr","time":"2019-02-15T11:00:08.688777009Z"}

docker_mode: 0n shall - recombine split Docker log lines before passing them to any parser as configured above.

I would expect it will apply to this case as well, however I it does not. Below I provided my configuration.

Describe the solution you'd like

in_tail/docker_mode - shall have the possibility to read docker's json-log as a stream of original text. json parser, here is just pre-processor that will buffer the "log" key, so multiline regexp patterns can be used later.

Describe alternatives you've considered

I believe this problem can be avoided if:

  1. docker logs are sent directly to fluentd (docker fluentd driver, https://docs.docker.com/config/containers/logging/fluentd/)
  2. docker logs are sent to journal/syslog etc

However:

  • on hosted kubernetes platforms, you are not allowed to change docker loging driver. (Azure AKS for example)
  • on hosted environments, mixing your docker logs with system (journal, syslog) is note desired.

Fluent bit FILTERS are applied after the parsing, so can't transform the stream early.

Additional context

Fluentbit config I am using:

  input-kubernetes.conf: |
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Skip_Long_Lines   Off
        Docker_Mode       On
        Refresh_Interval  10
        Chunk_Size        32k
        Buffer_Max_Size   2M
  filter-kubernetes.conf: |
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc.cluster.local:443
        Merge_Log           On
        K8S-Logging.Parser  On

    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On
        # Command      |  Decoder | Field | Optional Action
        # =============|==================|=================
        Decode_Field_As   escaped_utf8    log    do_next
        Decode_Field_As   escaped         log    do_next
        Decode_Field_As   json            log

Most helpful comment

Hey I'm struggling with the same right now. Is there any additional planned feature or bug fix for this? Docker_Mode On is exactly what I want. My parsers can then extract fields. I struggle finding a solution for spring boot stack traces with fluent bit at all (using Multiline or Docker_Mode). Any update or feedback would be appreciated.

All 31 comments

Repository that can be used for testing: https://github.com/epcim/fluentbit-sandbox

Hey I'm struggling with the same right now. Is there any additional planned feature or bug fix for this? Docker_Mode On is exactly what I want. My parsers can then extract fields. I struggle finding a solution for spring boot stack traces with fluent bit at all (using Multiline or Docker_Mode). Any update or feedback would be appreciated.

I'm struggling with this right now. Do you have any solution for k8s's multiline log?

I am also stuck in same issue. Multiline log parser is not working in K8.

I'm stuck with this as well; is there a set of input flags under which the large input (16 kb) input from Docker will work?

I'm also experiencing the same problem with not being able to parse multiline logs in Kubernetes cluster. I have tried solutions suggested in related threads in this repo but couldn't get it working.

My input config:

input-kubernetes.conf: |
[INPUT]
    Name              tail
    Tag               kube.*
    Path              /var/log/containers/abc-*.log
    Parser            docker
    Parser_Firstline  multiline_parser_head
    Parser_1          multiline_parser_error
    Multiline         On
    DB                /var/log/flb_kube.db
    Mem_Buf_Limit     10MB
    Skip_Long_Lines   On
    Refresh_Interval  10

Parsers:

parsers.conf: |
[PARSER]
    Name   json
    Format json
    Time_Key time
    Time_Format %d/%b/%Y:%H:%M:%S %z

[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On

[PARSER]
    Name        multiline_parser_head
    Format      regex
    Regex       /\d{4}-\d{1,2}-\d{1,2}/

[PARSER]
    Name        multiline_parser_error
    Format      regex
    Regex       /(?<timestamp>[^ ]* [^ ]*) (?<level>[^\s]+:)(?<message>[\s\S]*)/

Same issue here. Any news on this?

I'm also experiencing the same problem with not being able to parse multiline logs in Kubernetes cluster. I have tried solutions suggested in related threads in this repo but couldn't get it working.

My input config:

input-kubernetes.conf: |
[INPUT]
    Name              tail
    Tag               kube.*
    Path              /var/log/containers/abc-*.log
    Parser            docker
    Parser_Firstline  multiline_parser_head
    Parser_1          multiline_parser_error
    Multiline         On
    DB                /var/log/flb_kube.db
    Mem_Buf_Limit     10MB
    Skip_Long_Lines   On
    Refresh_Interval  10

Parsers:

parsers.conf: |
[PARSER]
    Name   json
    Format json
    Time_Key time
    Time_Format %d/%b/%Y:%H:%M:%S %z

[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On

[PARSER]
    Name        multiline_parser_head
    Format      regex
    Regex       /\d{4}-\d{1,2}-\d{1,2}/

[PARSER]
    Name        multiline_parser_error
    Format      regex
    Regex       /(?<timestamp>[^ ]* [^ ]*) (?<level>[^\s]+:)(?<message>[\s\S]*)/

@isurusiri Are you able to figure out any solution for this?

I have the same issue with multi-line JSON output in docker logs.

@isurusiri Based on my understanding of the documentation, the Parser directive is ignored in the tail input when MultiLine is set to On. However, Parser_Firstline and Parser_N are not ignored.

Edit: Link to 1.3 documentation referenced above: https://docs.fluentbit.io/manual/v/1.3/input/tail#multiline

Hi there, any update on that issue? Thank you

Is fluentd this only alternative to fix this issue?

Is fluentd this only alternative to fix this issue?

No, I'm using elastic fielbeat for this and it works like a charm.

Is fluentd this only alternative to fix this issue?

No, I'm using elastic fielbeat for this and it works like a charm.

Can you please share your solution to that? Thank you

I think @TomaszKlosinski is referring to using Elastic Filebeat as a log shipper instead of Fluent Bit.

Unfortunately, that only works if you're using the ELK stack- not much help to those of us using other products, e.g. Splunk.

@edsiper are you able to look into the issue? The original author of the PR that added this feature in https://github.com/fluent/fluent-bit/pull/863 is no longer on GitHub. I took a look at plugins/in_tail/tail_dockermode to see if I could help, but the lack of code comments and use of opaque abbreviations are pretty inaccessible.

I prepared some changes in the dockermode plugin (#2043). I need to suit it to the contributing guide but I believe it's worth to look and give me some feedback.

Output for issued input:

[0] containers.var.log.containers.test.log: [1585073268.000318200, {"log"=>"{"log":"[2019-02-15 11:00:08.688][9][debug][router] source/common/router/router.cc:303] [C0][S14319188767040639561] router decoding headers:\n':method', 'POST'\n':path', '/envoy.api.v2.ClusterDiscoveryService/StreamClusters'\n':authority', 'xds_cluster'\n':scheme', 'http'\n'te', 'trailers'\n'content-type', 'application/grpc'\n'x-envoy-internal', 'true'\n'x-forwarded-for', '192.168.6.6'\n\n","stream":"stderr","time":"2019-02-15T11:00:08.688777009Z"}"}]

@sumo-drosiek - any updates on merging this ?

@sumo-drosiek any updates?

@collardmsc @vishiy Sorry for no updates. The PR has been reviewed. I'm working on the runtime tests and everything should be ready soon :)

@sumo-drosiek Thanks a bunch for your time and effort working on this!

PR is ready for another review 馃

:wave: just wondering if anyone has any updates on this? It's really help me!

Looks like the issue was solved, brilliant. Which version number will support the new Docker_Mode_Parser field?

@Oduig AFAIK the 1.5.0 supports Docker_Mode_Parser

This is not documented yet, is it?
At least I can not find anything about the Docker_Mode_Parser within the official docs here:
https://docs.fluentbit.io/manual/pipeline/inputs/tail#docker_mode

@davelosert Thats right. I didn't documented it yet

Im also stuck with fluent-bit and multiline logs in EKS...Does anyone found a solution/workaround for this? if so I will appreciate your comments in advance

Im also stuck with fluent-bit and multiline logs in EKS...Does anyone found a solution/workaround for this? if so I will appreciate your comments in advance

Hey @shake76 Did you find any solution yet? If yes, pls let me know the sample config for docker parser with multiline parser

@shake76 @ankit1mg Is something wrong with docker_mode_parser and EKS?

Hey folks, for the latest issues around logs not working wanted to check if this might be something around CRI format vs. Docker format log parsing? https://docs.fluentbit.io/manual/installation/kubernetes#container-runtime-interface-cri-parser or if the ask if fully around multiline + docker mode

Was this page helpful?
0 / 5 - 0 ratings

Related issues

brycefisher picture brycefisher  路  3Comments

edsiper picture edsiper  路  4Comments

iamshreeram picture iamshreeram  路  3Comments

lbogdan picture lbogdan  路  3Comments

arienchen picture arienchen  路  3Comments