Fluent-bit: Parser nginx regex not generating fields expected from rubular

Created on 9 Oct 2018  Â·  15Comments  Â·  Source: fluent/fluent-bit

Bug Report

Description:
I followed https://github.com/fluent/fluent-bit/issues/729 including the nginx regex pattern but to no avail. I tried a different pattern, will be mentioned below, that works in Rubular. Log is not extracted and merged.

Steps to reproduce:

  • I'm also using fluent-bit inside a kubernetes cluster (minikube)
apiVersion: v1
kind: ConfigMap
metadata:
  name: efk-fluent-bit-config
  labels:
    app: efk-fluent-bit
data:
  fluent-bit.conf: |-
    [SERVICE]
        Flush        1
        Daemon       Off
        Log_Level    info
        Parsers_File parsers.conf

    [INPUT]
        Name              tail
        Tag               kube.default.nginx.*
        Path              /var/log/containers/*_nginx*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Refresh_Interval  10
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On

    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Merge_Log           On
        K8S-Logging.Parser  On

    [OUTPUT]
        Name  es
        Match *
        Host  efk-elasticsearch-client.default.svc.cluster.local
        Port  9200
        Logstash_Format On
        Retry_Limit False
        Type  flb_type
        Logstash_Prefix kubernetes_cluster
        Time_Key @timestamp_es

  parsers.conf: |-
    [PARSER]
        Name         docker
        Format       json
        Time_Key     time
        Time_Format  %Y-%m-%dT%H:%M:%S.%L
        Time_Keep    On
        # Command      |  Decoder | Field | Optional Action
        # =============|==================|=================
        Decode_Field_As   escaped    log

     [PARSER]
        Name   nginx
        Format regex
        Regex  ^(?<remote>\S*) (?<host>\S*) (?<user>\S*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?:\s+\S*)?)?" (?<code>\S*) (?<size>\S*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)"(?:\s+(?<http_x_forwarded_for>\S+))?)?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

And in my deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redacted-app
  labels:
    app: redacted
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redacted
  template:
    metadata:
      annotations:
        fluentbit.io/parser: nginx
      labels:
        app: redacted

And from the logs (from Kibana)

{
  "_index": "kubernetes_cluster-2018.10.09",
  "_type": "flb_type",
  "_id": "ogP3WGYBlVl5afTGSP3L",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp_es": "2018-10-09T13:14:52.960Z",
    "log": "172.17.0.1 - - [09/Oct/2018:13:14:52 +0000] \"POST /enterprise/info/ HTTP/1.1\" 404 104 \"http://192.168.39.251:30309/enterprise/info/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.16 Safari/537.36\" \"-\"\n",
    "stream": "stdout",
    "time": "2018-10-09T13:14:52.960867435Z",
    "kubernetes": {
      "pod_name": "redacted-app-586c454745-7ztk8",
      "namespace_name": "default",
      "pod_id": "d1361c4f-cbc0-11e8-abc9-7487d6dd4e5c",
      "labels": {
        "app": "redacted",
        "pod-template-hash": "1427010301"
      },
      "annotations": {
        "fluentbit.io/parser": "nginx"
      },
      "host": "minikube",
      "container_name": "nginx-proxy",
      "docker_id": "b6ba28183a2edc3269657037f967c7cc2d70874a109509548ef56e3f2657e1c8"
    }
  },
  "fields": {
    "@timestamp_es": [
      "2018-10-09T13:14:52.960Z"
    ],
    "time": [
      "2018-10-09T13:14:52.960Z"
    ]
  },
  "highlight": {
    "kubernetes.annotations.fluentbit.io/parser.keyword": [
      "@kibana-highlighted-field@nginx@/kibana-highlighted-field@"
    ],
    "kubernetes.annotations.fluentbit.io/parser": [
      "@kibana-highlighted-field@nginx@/kibana-highlighted-field@"
    ],
    "kubernetes.container_name": [
      "@kibana-highlighted-field@nginx@/kibana-highlighted-field@-proxy"
    ]
  },
  "sort": [
    1539090892960
  ]
}

The difference from above is the regex used ^(?<remote>\S*) (?<host>\S*) (?<user>\S*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?:\s+\S*)?)?" (?<code>\S*) (?<size>\S*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)"(?:\s+(?<http_x_forwarded_for>\S+))?)?$

Test: http://rubular.com/r/glM0o40A9M

Q: Does the last part of the log \" \"-\"\n from the json matter ?

Expected behavior
Same expected output from https://github.com/fluent/fluent-bit/issues/729. To get all the fields that I described in the Rubular output in the fluent-bit log.

_Originally posted by @aldnav in https://github.com/fluent/fluent-bit/issue_comments#issuecomment-428192888_

All 15 comments

Inspecting the logs yield

Fluentbit logs

[2018/10/10 06:38:44] [debug] [in_tail] file=/var/log/containers/redacted-app-586c454745-7ztk8_default_nginx-proxy-cb46caffcb63ad165faa760e5f4dd009f167d3b7afded8b3b5e36745c0f71c07.log event
[2018/10/10 06:38:44] [debug] [filter_kube] could not merge JSON log as requested
[2018/10/10 06:38:44] [debug] [input tail.0] [mem buf] size = 638

Tailing that log

root@fluent-bit-p4ng2:/# tail /var/log/containers/redacted-app-586c454745-7ztk8_default_nginx-proxy-cb46caffcb63ad165faa760e5f4dd009f167d3b7afded8b3b5e36745c0f71c07.log
{"log":"172.17.0.1 - - [10/Oct/2018:06:35:09 +0000] \"GET /api-sileo/winners/winners-page/filter/?top=0\u0026bottom=8 HTTP/1.1\" 200 12 \"http://192.168.39.251:30309/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.16 Safari/537.36\" \"-\"\n","stream":"stdout","time":"2018-10-10T06:35:09.267040602Z"}
{"log":"172.17.0.1 - - [10/Oct/2018:06:35:09 +0000] \"GET /api-sileo/winners/cf-tournament-winners/filter/?top=0\u0026bottom=8 HTTP/1.1\" 200 12 \"http://192.168.39.251:30309/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.16 Safari/537.36\" \"-\"\n","stream":"stdout","time":"2018-10-10T06:35:09.271704302Z"}
{"log":"172.17.0.1 - - [10/Oct/2018:06:35:09 +0000] \"GET /api/video/video-player/filter/?slug=how-to-win-in-redacted HTTP/1.1\" 200 14 \"http://192.168.39.251:30309/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.16 Safari/537.36\" \"-\"\n","stream":"stdout","time":"2018-10-10T06:35:09.287174688Z"}
{"log":"172.17.0.1 - - [10/Oct/2018:06:35:09 +0000] \"GET /api/video/video-player/filter/?slug=cfschool-what-is-redactedcom HTTP/1.1\" 200 14 \"http://192.168.39.251:30309/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.16 Safari/537.36\" \"-\"\n","stream":"stdout","time":"2018-10-10T06:35:09.288450651Z"}
{"log":"172.17.0.1 - - [10/Oct/2018:06:35:09 +0000] \"GET /api-sileo/home/profile/filter/?top=0 HTTP/1.1\" 200 12 \"http://192.168.39.251:30309/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.16 Safari/537.36\" \"-\"\n","stream":"stdout","time":"2018-10-10T06:35:09.298623044Z"}
{"log":"172.17.0.1 - - [10/Oct/2018:06:35:09 +0000] \"GET /api-sileo/landing/featured/filter/?size_per_request=10 HTTP/1.1\" 200 12 \"http://192.168.39.251:30309/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.16 Safari/537.36\" \"-\"\n","stream":"stdout","time":"2018-10-10T06:35:09.322164595Z"}
{"log":"172.17.0.1 - - [10/Oct/2018:06:38:43 +0000] \"GET / HTTP/1.1\" 200 40431 \"-\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.16 Safari/537.36\" \"-\"\n","stream":"stdout","time":"2018-10-10T06:38:43.631043381Z"}
{"log":"172.17.0.1 - - [10/Oct/2018:06:38:43 +0000] \"GET /session-info/get-session-info/ HTTP/1.1\" 200 407 \"http://192.168.39.251:30309/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.16 Safari/537.36\" \"-\"\n","stream":"stdout","time":"2018-10-10T06:38:43.677699068Z"}
{"log":"172.17.0.1 - - [10/Oct/2018:06:38:44 +0000] \"GET /session-info/get-session-info/ HTTP/1.1\" 200 407 \"http://192.168.39.251:30309/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.16 Safari/537.36\" \"-\"\n","stream":"stdout","time":"2018-10-10T06:38:44.255370783Z"}
{"log":"172.17.0.1 - - [10/Oct/2018:06:38:44 +0000] \"GET /session-info/get-session-info/ HTTP/1.1\" 200 407 \"http://192.168.39.251:30309/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.16 Safari/537.36\" \"-\"\n","stream":"stdout","time":"2018-10-10T06:38:44.321339656Z"}

I made adjustments with the regex used to consider the backslashes.
See http://rubular.com/r/jw07YfUfTn
But this config did not work either. [filter_kube] could not merge JSON log as requested still appears and fields aren't parsed and merged.

I made a test with my local build of fluentbit

testconf.conf

[SERVICE]
    Flush        1
    Daemon       Off
    Log_Level    debug
    Parsers_File testparser.conf

[INPUT]
    Name    tail
    ### This input is copied straight from nginx stdout access log ###
    Tag     nginx.data
    Path    nginxlog

[INPUT]
    Name dummy
    Tag  dummy.data
    #### This input has backslash from kibana json view ###
    Dummy {"log": "172.17.0.1 - - [10/Oct/2018:09:38:14 +0000] \"GET /api-sileo/landing/opentournament/filter/?status=1u0026started_date__lte=1539164292u0026prize_order=u0026top=0u0026bottom=9 HTTP/1.1\" 200 12 \"http://192.168.39.251:30309/browsing/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.16 Safari/537.36\" \"-\"\n"}

[FILTER]
    Name        parser
    Match       nginx.*
    Key_Name    log
    Parser      nginxp

[FILTER]
    Name        parser
    Match       dummy.*
    Key_Name    log
    Parser      dummy_test_2

[OUTPUT]
    Name    stdout
    Match   *

testparser.conf

[PARSER]
    Name     nginxp
    Format   regex
    ### clean nginx regex (get input from nginx tail) ###
    Regex    ^(?<remote>\S*) (?<host>\S*) (?<user>\S*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?:\s+\S*)?)?" (?<code>\S*) (?<size>\S*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)"(?:\s+(?<http_x_forwarded_for>\S+))?)?$
    Time_Key time
    Time_Format %d/%b/%Y:%H:%M:%S %z

[PARSER]
    Name   dummy_test_2
    Format regex
    ### get input from kibana json (with backslashes) ###
    Regex  ^(?<remote>\S*) (?<host>\S*) (?<user>\S*) \[(?<time>[^\]]*)\] \\"(?<method>\S+)(?: +(?<path>[^\"]*?)(?:\s+\S*)?)?" (?<code>\S*) (?<size>\S*)(?: \\"(?<referer>[^\"]*)\\" \\"(?<agent>[^\"]*)\\"(?:\s+\\"(?<http_x_forwarded_for>\S+)\\"\S+)?)?$

The test uses these two regex patterns:
With backslash http://rubular.com/r/glM0o40A9M
Without backslash http://rubular.com/r/jw07YfUfTn

I tested both on kubernetes hoping that the parser pipeline follows docker -> nginx but the log remains intact.

The local test has the following results.

➜  build bin/fluent-bit -c testconf.conf
Fluent-Bit v0.14.1
Copyright (C) Treasure Data

[2018/10/10 17:58:20] [ info] [engine] started (pid=5241)
[2018/10/10 17:58:20] [debug] [in_tail] inotify watch fd=19
[2018/10/10 17:58:20] [debug] [in_tail] scanning path nginxlog
[2018/10/10 17:58:20] [debug] [in_tail] add to scan queue nginxlog, offset=0
[2018/10/10 17:58:20] [debug] [router] input=tail.0 'DYNAMIC TAG'
[2018/10/10 17:58:20] [debug] [router] match rule dummy.0:stdout.0
[2018/10/10 17:58:20] [debug] [input tail.0] [mem buf] size = 1388
[2018/10/10 17:58:20] [debug] [in_tail] file=nginxlog read=1532 lines=4
[2018/10/10 17:58:20] [debug] [in_tail] file=nginxlog promote to TAIL_EVENT
[0] nginx.data: [1539164292.000000000, {"remote"=>"172.17.0.1", "host"=>"-", "user"=>"-", "method"=>"GET", "path"=>"/api-sileo/landing/featured/filter/?size_per_request=10", "code"=>"200", "size"=>"12", "referer"=>"http://192.168.39.251:30309/browsing/", "agent"=>"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.16 Safari/537.36", "http_x_forwarded_for"=>""-""}]
[2018/10/10 17:58:20] [debug] [task] created task=0x1bcd880 id=0 OK
[1] nginx.data: [1539164292.000000000, {"remote"=>"172.17.0.1", "host"=>"-", "user"=>"-", "method"=>"GET", "path"=>"/api-sileo/landing/trendingtournament/filter/?top=0&bottom=9", "code"=>"200", "size"=>"12", "referer"=>"http://192.168.39.251:30309/browsing/", "agent"=>"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.16 Safari/537.36", "http_x_forwarded_for"=>""-""}]
[2] nginx.data: [1539164293.000000000, {"remote"=>"172.17.0.1", "host"=>"-", "user"=>"-", "method"=>"GET", "path"=>"/api-sileo/landing/battlingtournament/filter/?status=3&started_date__lte=1539164292&prize_order=&top=0&bottom=9", "code"=>"200", "size"=>"12", "referer"=>"http://192.168.39.251:30309/browsing/", "agent"=>"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.16 Safari/537.36", "http_x_forwarded_for"=>""-""}]
[3] nginx.data: [1539164294.000000000, {"remote"=>"172.17.0.1", "host"=>"-", "user"=>"-", "method"=>"GET", "path"=>"/api-sileo/landing/rankingtournament/filter/?status=2&started_date__lte=1539164292&prize_order=&top=0&bottom=9", "code"=>"200", "size"=>"12", "referer"=>"http://192.168.39.251:30309/browsing/", "agent"=>"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.16 Safari/537.36", "http_x_forwarded_for"=>""-""}]
[2018/10/10 17:58:20] [debug] [input dummy.0] [mem buf] size = 416
[2018/10/10 17:58:20] [debug] [task] destroy task=0x1bcd880 (task_id=0)
[2018/10/10 17:58:20] [debug] [dyntag tail.0] 0x1bcdb90 destroy (tag=nginx.data, bytes=1388)
[0] dummy.data: [1539165501.000329612, {"remote"=>"172.17.0.1", "host"=>"-", "user"=>"-", "time"=>"10/Oct/2018:09:38:14 +0000", "method"=>"GET", "path"=>"/api-sileo/landing/opentournament/filter/?status=1u0026started_date__lte=1539164292u0026prize_order=u0026top=0u0026bottom=9", "code"=>"200", "size"=>"12", "referer"=>"http://192.168.39.251:30309/browsing/", "agent"=>"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.16 Safari/537.36", "http_x_forwarded_for"=>"-"}]
[2018/10/10 17:58:21] [debug] [task] created task=0x1bcd880 id=0 OK
[2018/10/10 17:58:21] [debug] [input dummy.0] [mem buf] size = 416
[2018/10/10 17:58:21] [debug] [task] destroy task=0x1bcd880 (task_id=0)
^C[engine] caught signal (SIGINT)
[2018/10/10 17:58:22] [ info] [input] pausing tail.0
[2018/10/10 17:58:22] [ info] [input] pausing dummy.0

The local test parsed it correctly.

The pattern is:

^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$

But you have

Dummy {"log": "172.17.0.1 - - [10/Oct/2018:09:38:14 +0000] \"GET /api-sileo/landing/opentournament/filter/?status=1u0026started_date__lte=1539164292u0026prize_order=u0026top=0u0026bottom=9 HTTP/1.1\" 200 12 \"http://192.168.39.251:30309/browsing/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.16 Safari/537.36\" \"-\"\n"}

a trailing "-"\n

Unescaped, your log is:

172.17.0.1 - - [10/Oct/2018:09:38:14 +0000] "GET /api-sileo/landing/opentournament/filter/?status=1u0026started_date__lte=1539164292u0026prize_order=u0026top=0u0026bottom=9 HTTP/1.1" 200 12 "http://192.168.39.251:30309/browsing/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.16 Safari/537.36" "-"\n

So, i'm not sure why you have that "-"\n on the end. If you have changed your nginx config to use a different format in access log, you'll need to change your signature. If you feel your log is correct, the signature would be:

^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?

consider also setting "merge_log_trim" in the Kubernetes config if you are using it.

Thanks @donbowman The funny thing is I did not customize the log_format of Nginx.

I also tried "Merge_Log_Trim On". And tried your last regexp but I still get [filter_kube] could not merge JSON log as requested

If this matters:

I managed to set nginx acess_log to:

log_format test '$remote_addr $host $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';
access_log /var/log/nginx/access.log test;

and tailing the log file

root@fluent-bit-47p6z:/# tail /var/log/containers/redacted-app-59564c9548-c9g52_default_nginx-proxy-19a6fc87c0c06558c1fcd18039c1aad51ca1df74936c75bf202f22e79c38f42e.log
{"log":"172.17.0.1 192.168.39.251 - [16/Oct/2018:02:14:51 +0000] \"GET /session-info/get-session-info/ HTTP/1.1\" 200 221 \"http://192.168.39.251:31714/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.54 Safari/537.36\"\n","stream":"stdout","time":"2018-10-16T02:14:51.199419312Z"}
{"log":"172.17.0.1 192.168.39.251 - [16/Oct/2018:02:14:52 +0000] \"GET /session-info/get-session-info/ HTTP/1.1\" 200 221 \"http://192.168.39.251:31714/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.54 Safari/537.36\"\n","stream":"stdout","time":"2018-10-16T02:14:52.655943811Z"}
{"log":"172.17.0.1 192.168.39.251 - [16/Oct/2018:02:14:52 +0000] \"GET /session-info/get-session-info/ HTTP/1.1\" 200 221 \"http://192.168.39.251:31714/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.54 Safari/537.36\"\n","stream":"stdout","time":"2018-10-16T02:14:52.674953975Z"}
{"log":"172.17.0.1 192.168.39.251 - [16/Oct/2018:02:14:52 +0000] \"GET /?ajax=true HTTP/1.1\" 200 539 \"http://192.168.39.251:31714/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.54 Safari/537.36\"\n","stream":"stdout","time":"2018-10-16T02:14:52.785946146Z"}
{"log":"172.17.0.1 192.168.39.251 - [16/Oct/2018:02:14:52 +0000] \"GET /api-sileo/winners/winners-page/filter/?top=0\u0026bottom=8 HTTP/1.1\" 200 12 \"http://192.168.39.251:31714/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.54 Safari/537.36\"\n","stream":"stdout","time":"2018-10-16T02:14:52.835820492Z"}
{"log":"172.17.0.1 192.168.39.251 - [16/Oct/2018:02:14:52 +0000] \"GET /api-sileo/winners/cf-tournament-winners/filter/?top=0\u0026bottom=8 HTTP/1.1\" 200 12 \"http://192.168.39.251:31714/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.54 Safari/537.36\"\n","stream":"stdout","time":"2018-10-16T02:14:52.840595389Z"}
{"log":"172.17.0.1 192.168.39.251 - [16/Oct/2018:02:14:52 +0000] \"GET /api/video/video-player/filter/?slug=how-to-win-in-redacted HTTP/1.1\" 200 14 \"http://192.168.39.251:31714/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.54 Safari/537.36\"\n","stream":"stdout","time":"2018-10-16T02:14:52.866050142Z"}
{"log":"172.17.0.1 192.168.39.251 - [16/Oct/2018:02:14:52 +0000] \"GET /api-sileo/landing/featured/filter/?size_per_request=10 HTTP/1.1\" 200 12 \"http://192.168.39.251:31714/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.54 Safari/537.36\"\n","stream":"stdout","time":"2018-10-16T02:14:52.866085395Z"}
{"log":"172.17.0.1 192.168.39.251 - [16/Oct/2018:02:14:52 +0000] \"GET /api/video/video-player/filter/?slug=cfschool-what-is-redactedcom HTTP/1.1\" 200 14 \"http://192.168.39.251:31714/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.54 Safari/537.36\"\n","stream":"stdout","time":"2018-10-16T02:14:52.867881901Z"}
{"log":"172.17.0.1 192.168.39.251 - [16/Oct/2018:02:14:57 +0000] \"GET /business/?ajax=true HTTP/1.1\" 200 441 \"http://192.168.39.251:31714/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.54 Safari/537.36\"\n","stream":"stdout","time":"2018-10-16T02:14:57.244942033Z"}

and from Kibana JSON view

 "_source": {
    "@timestamp_es": "2018-10-16T02:14:57.244Z",
    "log": "172.17.0.1 192.168.39.251 - [16/Oct/2018:02:14:57 +0000] \"GET /business/?ajax=true HTTP/1.1\" 200 441 \"http://192.168.39.251:31714/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.54 Safari/537.36\"\n",
    "stream": "stdout",
    "time": "2018-10-16T02:14:57.244942033Z",
    "kubernetes": {
      "pod_name": "redacted-app-59564c9548-c9g52",
      "namespace_name": "default",
      "pod_id": "07149771-d0e9-11e8-a6a1-7487d6dd4e5c",
      "labels": {
        "app": "redacted",
        "pod-template-hash": "1512075104"
      },
      "annotations": {
        "fluentbit.io/parser": "nginxp"
      },
      "host": "minikube",
      "container_name": "nginx-proxy",
      "docker_id": "19a6fc87c0c06558c1fcd18039c1aad51ca1df74936c75bf202f22e79c38f42e"
    }
  },

\n and \" are still in the log so I used the following regex pattern

^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] \\"(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: \\"(?<referer>[^\"]*)\\" \\"(?<agent>[^\"]*)\\")?\\n$

a slight modification from the nginx parser regex. As in http://rubular.com/r/KZRaYAMrDK

However, I still get

[2018/10/16 02:14:57] [debug] [filter_kube] could not merge JSON log as requested

fluent-bit.conf

[SERVICE]
    Flush        1
    Daemon       Off
    Log_Level    debug
    Parsers_File parsers-wtf.conf
[INPUT]
    Name              tail
    Tag               kube.default.nginx.*
    Path              /var/log/containers/*_nginx*.log
    Parser            docker
    DB                /var/log/flb_kube.db
    Refresh_Interval  10
    Mem_Buf_Limit     5MB
    Skip_Long_Lines   On
[FILTER]
    Name                kubernetes
    Match               kube.*
    Kube_URL            https://kubernetes.default.svc:443
    Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
    Merge_Log_Trim      On
    Merge_Log           On
    # Merge_Log_Key       rlog
    K8S-Logging.Parser  On
[OUTPUT]
    Name  es
    Match *
    Host  efk-elasticsearch-client.default.svc.cluster.local
    Port  9200
    Logstash_Format On
    Retry_Limit False
    Type  flb_type
    Logstash_Prefix kubernetes_cluster
    Time_Key @timestamp_es
[OUTPUT]
    Name stdout
    Match *

parsers-wtf.conf

[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On
    # Command      |  Decoder | Field | Optional Action
    # =============|==================|=================
    Decode_Field_As   escaped    log
    Decode_Field_As   escaped    stream

[PARSER]
    Name        nginxp
    Format      regex
    Regex       ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] \\"(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: \\"(?<referer>[^\"]*)\\" \\"(?<agent>[^\"]*)\\")?\\n$
    Time_Key    time
    Time_Format %d/%b/%Y:%H:%M:%S

Anyone lucky to get this working?

i don't think you need the escapes in the parser:
You have:
^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] \\"(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: \\"(?<referer>[^\"]*)\\" \\"(?<agent>[^\"]*)\\")?\\n$

I think you want \" instead of \" etc.

You can test the pattern on https://regex101.com/ pretty simply.

I applied you log message:

172.17.0.1 192.168.39.251 - [16/Oct/2018:02:14:51 +0000] "GET /session-info/get-session-info/ HTTP/1.1" 200 221 "http://192.168.39.251:31714/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.54 Safari/537.36"

and the suggested pattern:

^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?

on regex101.com and I see it extract ok (and thus the merge should work i think).

if the merge doesn't work it would imply maybe a duplicate field, likely time,and you could consider a docker line w/o the time key.

@edsiper can you take a look at this please?

@donbowman That would be perfect only that the log message contains \" and \n

It seems like no regex pattern is working?

I tried setting
Decode_Field_As escaped_utf8 log
in docker parser which yields

"log": "172.17.0.1 192.168.39.251 - [16/Oct/2018:03:18:40 +0000] \"GET /api-sileo/version-info/version-info/filter/?platform=1 HTTP/1.1\" 200 12 \"http://192.168.39.251:31714/populaasdfasdfasdf/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.54 Safari/537.36\"\u0000",

then I adjusted the regex pattern, (\n becomes \u0000), to

Regex       ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] \\"(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: \\"(?<referer>[^\"]*)\\" \\"(?<agent>[^\"]*)\\")?\\u0000$

as in http://rubular.com/r/oUqgP7DUtp

I also tried to change the time key in the pattern <rtime> to avoid duplicate when merging. Still no luck.

Thanks @donbowman I tried your pattern and it did merge fields.

However, I'm still puzzled as to why kibana json output log as

172.17.0.1 192.168.39.251 - [16/Oct/2018:03:45:20 +0000] \"GET /populaasdfasdfasdf/ HTTP/1.1\" 404 4617 \"-\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.54 Safari/537.36\"\n

Question
I was under the impression that the parser flow is docker -> nginxp that's why I tested it against the above log. If that's not the case, how else would I test the log against other patterns (for other apps aside from nginx)?

if you have \" and \n in the message, it sounds like the decoder is not
running.

On Mon, 15 Oct 2018 at 23:00, Aldrin Navarro notifications@github.com
wrote:

@donbowman https://github.com/donbowman That would be perfect only that
the log message contains \" and \n

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/fluent/fluent-bit/issues/838#issuecomment-430084050,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AE5Ok9la_ryFrN87Dv2SiUBHD3ZtEbr9ks5ulUvXgaJpZM4XTS5F
.

i think that is normal for kibana (except the \n). Kibana is re-escaping to make it valid json.

E.g. I see:

{\"version\":\"e5b877e0587fff4797e0dc3a6c01514601ea2d562cf5cd7e3f927bcaaea3e7ec\",\"initContainers\":[\"istio-init\"],\"containers\":[\"istio-proxy\"],\"volumes\":[\"istio-envoy\",\"istio-certs\"],\"imagePullSecrets\":[\"regcred\"]}

But the fields are parsed out as well.

to see the raw log, i run kubectl logs (or docker log, or jsut cp the file from /var/lib/containers/...).

Thanks @donbowman for clearing that. Closing issue now.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

thrift24 picture thrift24  Â·  4Comments

JavaCS3 picture JavaCS3  Â·  3Comments

iamshreeram picture iamshreeram  Â·  3Comments

edsiper picture edsiper  Â·  4Comments

arienchen picture arienchen  Â·  3Comments