Fluent-bit: nginx regex does not generate the fields

Created on 30 Aug 2018  路  3Comments  路  Source: fluent/fluent-bit

Bug Report

Describe the bug
A custom Regex on the nginx parser does not generate the corresponding fields in the output.

To Reproduce

  • Regex: ^(?<remote_addr>[^ ]*) - (?<user>[^ ]*) - (?<host>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+) (?<path>[^ ]*?) (?<proto>[^"]*?)" (?<upstream_addr>[^ ]*) upstream_response_time (?<upstream_response_time_sec>[^ ]*) msec (?<upstream_response_time_msec>[^ ]*) request_time (?<request_time_sec>[^ ]*) (?<status>[^ ]*) (?<bytes_sent>[^ ]*) "(?<referer>[^"]*)" (?<user_agent>.*) (?<x_forwarded_for>-|[\d\.]*)?$
  • Example log message if applicable: 172.17.0.1 - - - local.mydomain.io [29/Aug/2018:10:55:13 +0000] "GET /user/config/NOTFOUND HTTP/1.1" 10.109.220.149:80 upstream_response_time 0.000 msec 1535540113.743 request_time 0.001 404 423 "-" Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36 -

Rubular shows the output correctly:


remote_addr | 172.17.0.1
-- | --
user | -
host | local.mydomain.io
time | 29/Aug/2018:10:55:13 +0000
method | GET
path | /user/config/NOTFOUND
proto | HTTP/1.1
upstream_addr | 10.109.220.149:80
upstream_response_time_sec | 0.000
upstream_response_time_msec | 1535540113.743
request_time_sec | 0.001
status | 404
bytes_sent | 423
referer | -
user_agent | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36
x_forwarded_for | -

This is the log I get in the standard output plugin:

[0] kube.nginx.var.log.containers.nginx-b7tm6_default_nginx-3b424de3f8df0b696d90b43a3206da0c8e68506d015b0c03c1cc6f87696ce6fc.log: [1535548128.215045251, {"log"=>"{"log":"172.17.0.1 - - - local.mydomain.io [29/Aug/2018:10:55:13 +0000] \"GET /user/config/NOTFOUND HTTP/1.1\" 10.109.220.149:80 upstream_response_time 0.001 msec 1535540113.743 request_time 0.001 404 423 \"-\" Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36 -\n","stream":"stdout","time":"2018-08-29T13:08:48.214745384Z"}"}]
  • Steps to reproduce the problem:
    I'm running fluent-bit inside a kubernetes cluster and this is the config I'm using
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: monitoring
  labels:
    k8s-app: fluent-bit
data:
  # Configuration files: server, input, filters and output
  # ======================================================
  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     info
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   Off
    @INCLUDE input-kubernetes-nginx.conf
    @INCLUDE filter-grep.conf
    @INCLUDE output-stdout.conf
  input-kubernetes-nginx.conf: |
    [INPUT]
        Name              tail
        Tag               kube.default.nginx.*
        Path              /var/log/containers/*_nginx*.log
        Parser            nginx
        DB                /var/log/flb_kube.db
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On
        Refresh_Interval  10
  filter-grep.conf: |
    [FILTER]
        Name    grep
        Match   kube.default.*
        Regex   log 404
        Regex   log 500
        Regex   log error
        Regex   log critical
  output-stdout.conf: |
    [OUTPUT]
        Name    stdout
        Match   kube.default.*
  parsers.conf: |
    [PARSER]
        Name   nginx
        Format regex
        Regex  ^(?<remote_addr>[^ ]*) - (?<user>[^ ]*) - (?<host>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+) (?<path>[^ ]*?) (?<proto>[^"]*?)" (?<upstream_addr>[^ ]*) upstream_response_time (?<upstream_response_time_sec>[^ ]*) msec (?<upstream_response_time_msec>[^ ]*) request_time (?<request_time_sec>[^ ]*) (?<status>[^ ]*) (?<bytes_sent>[^ ]*) "(?<referer>[^"]*)" (?<user_agent>.*) (?<x_forwarded_for>-|[\d\.]*)?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

Expected behavior
I expected to get all the fields that I described in the Rubular output in the fluent-bit log.

fixed question

Most helpful comment

Thanks for the detailed answer.

I have my pods now properly annotated and properly getting structured logs, which makes my life easier to be able to filter by regex filter additionally when needed.

I think this explanation could be properly added into documentation and give some examples in the yaml files managing these annotations, otherwise is hard to get to this point. I couldn't read something like this in any place as far as I remember.

Anyway thanks again for the help.

All 3 comments

@mbelchin

Despite your Pods generate a valid log line, note this message is encapsulated by Docker engine and stored in a JSON map:

__1__ message generated by your Pod

172.17.0.1 - - - local.mydomain.io [29/Aug/2018:10:55:13 +0000] "GET /user/config/NOTFOUND HTTP/1.1" 10.109.220.149:80 upstream_response_time 0.001 msec 1535540113.743 request_time 0.001 404 423 "-" Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36 -

__2__ message encapsulated (and quoted) by Docker engine in a JSON map:

{"log":"172.17.0.1 - - - local.mydomain.io [29/Aug/2018:10:55:13 +0000] \"GET /user/config/NOTFOUND HTTP/1.1\" 10.109.220.149:80 upstream_response_time 0.001 msec 1535540113.743 request_time 0.001 404 423 \"-\" Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36 -\n","stream":"stdout","time":"2018-08-29T13:08:48.214745384Z"}

so as you can see the content that arrives to Fluent Bit is the one in number __2__ , so the proper parser to apply is _docker_ instead of _nginx_.

Now obviously you want to convert that unstructured message to structured, the way to workaround that is to use the Annotation feature that is part of Kubernetes Filter in Fluent Bit. So your config should looks like (just replacing some parts):

    [INPUT]
        Name              tail
        Tag               kube.default.nginx.*
        Path              /var/log/containers/*_nginx*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On
        Refresh_Interval  10

in your parsers file expand to:

[PARSER]
    Name         docker
    Format       json
    Time_Key     time
    Time_Format  %Y-%m-%dT%H:%M:%S.%L
    Time_Keep    On
    # Command      |  Decoder | Field | Optional Action
    # =============|==================|=================
    Decode_Field_As   escaped    log

 [PARSER]
    Name   nginx
    Format regex
    Regex  ^(?<remote_addr>[^ ]*) - (?<user>[^ ]*) - (?<host>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+) (?<path>[^ ]*?) (?<proto>[^"]*?)" (?<upstream_addr>[^ ]*) upstream_response_time (?<upstream_response_time_sec>[^ ]*) msec (?<upstream_response_time_msec>[^ ]*) request_time (?<request_time_sec>[^ ]*) (?<status>[^ ]*) (?<bytes_sent>[^ ]*) "(?<referer>[^"]*)" (?<user_agent>.*) (?<x_forwarded_for>-|[\d\.]*)?$
    Time_Key time
    Time_Format %d/%b/%Y:%H:%M:%S %z

this is an example of an apache pod with proper annotation, just change the docker image and parser name to nginx:

apiVersion: v1
kind: Pod
metadata:
  name: apache-logs-annotated
  labels:
    app: apache-logs-annotated
  annotations:
    fluentbit.io/parser: apache
spec:
  containers:
  - name: apache
    image: edsiper/apache_logs

so for your case the part _fluentbit.io/parser: apache_ should be _fluentbit.io/parser: nginx_.

With that config in place when Fluent Bit gather the metadata from the Pod, it will realize that there is a suggested parser (the parser must be registered in the configmap) and will apply it to the original content of the log generated in numer __1__.

Note that you must add the Kubernetes filter, otherwise it will not work, as a reference take a look here:

https://github.com/fluent/fluent-bit-kubernetes-logging/blob/master/output/elasticsearch/fluent-bit-configmap.yaml#L36

Thanks for the detailed answer.

I have my pods now properly annotated and properly getting structured logs, which makes my life easier to be able to filter by regex filter additionally when needed.

I think this explanation could be properly added into documentation and give some examples in the yaml files managing these annotations, otherwise is hard to get to this point. I couldn't read something like this in any place as far as I remember.

Anyway thanks again for the help.

thanks for the update. Indeed after this thread I merged the pending docs, you can see the updated docs with examples here:

https://docs.fluentbit.io/manual/filter/kubernetes#kubernetes-annotations

(if you think it needs some additional info, just let me know)

closing the ticket for now, regards.

Was this page helpful?
0 / 5 - 0 ratings