Describe the bug
A custom Regex on the nginx parser does not generate the corresponding fields in the output.
To Reproduce
^(?<remote_addr>[^ ]*) - (?<user>[^ ]*) - (?<host>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+) (?<path>[^ ]*?) (?<proto>[^"]*?)" (?<upstream_addr>[^ ]*) upstream_response_time (?<upstream_response_time_sec>[^ ]*) msec (?<upstream_response_time_msec>[^ ]*) request_time (?<request_time_sec>[^ ]*) (?<status>[^ ]*) (?<bytes_sent>[^ ]*) "(?<referer>[^"]*)" (?<user_agent>.*) (?<x_forwarded_for>-|[\d\.]*)?$172.17.0.1 - - - local.mydomain.io [29/Aug/2018:10:55:13 +0000] "GET /user/config/NOTFOUND HTTP/1.1" 10.109.220.149:80 upstream_response_time 0.000 msec 1535540113.743 request_time 0.001 404 423 "-" Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36 -Rubular shows the output correctly:
remote_addr | 172.17.0.1
-- | --
user | -
host | local.mydomain.io
time | 29/Aug/2018:10:55:13 +0000
method | GET
path | /user/config/NOTFOUND
proto | HTTP/1.1
upstream_addr | 10.109.220.149:80
upstream_response_time_sec | 0.000
upstream_response_time_msec | 1535540113.743
request_time_sec | 0.001
status | 404
bytes_sent | 423
referer | -
user_agent | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36
x_forwarded_for | -
This is the log I get in the standard output plugin:
[0] kube.nginx.var.log.containers.nginx-b7tm6_default_nginx-3b424de3f8df0b696d90b43a3206da0c8e68506d015b0c03c1cc6f87696ce6fc.log: [1535548128.215045251, {"log"=>"{"log":"172.17.0.1 - - - local.mydomain.io [29/Aug/2018:10:55:13 +0000] \"GET /user/config/NOTFOUND HTTP/1.1\" 10.109.220.149:80 upstream_response_time 0.001 msec 1535540113.743 request_time 0.001 404 423 \"-\" Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36 -\n","stream":"stdout","time":"2018-08-29T13:08:48.214745384Z"}"}]
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: monitoring
labels:
k8s-app: fluent-bit
data:
# Configuration files: server, input, filters and output
# ======================================================
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server Off
@INCLUDE input-kubernetes-nginx.conf
@INCLUDE filter-grep.conf
@INCLUDE output-stdout.conf
input-kubernetes-nginx.conf: |
[INPUT]
Name tail
Tag kube.default.nginx.*
Path /var/log/containers/*_nginx*.log
Parser nginx
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
filter-grep.conf: |
[FILTER]
Name grep
Match kube.default.*
Regex log 404
Regex log 500
Regex log error
Regex log critical
output-stdout.conf: |
[OUTPUT]
Name stdout
Match kube.default.*
parsers.conf: |
[PARSER]
Name nginx
Format regex
Regex ^(?<remote_addr>[^ ]*) - (?<user>[^ ]*) - (?<host>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+) (?<path>[^ ]*?) (?<proto>[^"]*?)" (?<upstream_addr>[^ ]*) upstream_response_time (?<upstream_response_time_sec>[^ ]*) msec (?<upstream_response_time_msec>[^ ]*) request_time (?<request_time_sec>[^ ]*) (?<status>[^ ]*) (?<bytes_sent>[^ ]*) "(?<referer>[^"]*)" (?<user_agent>.*) (?<x_forwarded_for>-|[\d\.]*)?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
Expected behavior
I expected to get all the fields that I described in the Rubular output in the fluent-bit log.
@mbelchin
Despite your Pods generate a valid log line, note this message is encapsulated by Docker engine and stored in a JSON map:
__1__ message generated by your Pod
172.17.0.1 - - - local.mydomain.io [29/Aug/2018:10:55:13 +0000] "GET /user/config/NOTFOUND HTTP/1.1" 10.109.220.149:80 upstream_response_time 0.001 msec 1535540113.743 request_time 0.001 404 423 "-" Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36 -
__2__ message encapsulated (and quoted) by Docker engine in a JSON map:
{"log":"172.17.0.1 - - - local.mydomain.io [29/Aug/2018:10:55:13 +0000] \"GET /user/config/NOTFOUND HTTP/1.1\" 10.109.220.149:80 upstream_response_time 0.001 msec 1535540113.743 request_time 0.001 404 423 \"-\" Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36 -\n","stream":"stdout","time":"2018-08-29T13:08:48.214745384Z"}
so as you can see the content that arrives to Fluent Bit is the one in number __2__ , so the proper parser to apply is _docker_ instead of _nginx_.
Now obviously you want to convert that unstructured message to structured, the way to workaround that is to use the Annotation feature that is part of Kubernetes Filter in Fluent Bit. So your config should looks like (just replacing some parts):
[INPUT]
Name tail
Tag kube.default.nginx.*
Path /var/log/containers/*_nginx*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
in your parsers file expand to:
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
# Command | Decoder | Field | Optional Action
# =============|==================|=================
Decode_Field_As escaped log
[PARSER]
Name nginx
Format regex
Regex ^(?<remote_addr>[^ ]*) - (?<user>[^ ]*) - (?<host>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+) (?<path>[^ ]*?) (?<proto>[^"]*?)" (?<upstream_addr>[^ ]*) upstream_response_time (?<upstream_response_time_sec>[^ ]*) msec (?<upstream_response_time_msec>[^ ]*) request_time (?<request_time_sec>[^ ]*) (?<status>[^ ]*) (?<bytes_sent>[^ ]*) "(?<referer>[^"]*)" (?<user_agent>.*) (?<x_forwarded_for>-|[\d\.]*)?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
this is an example of an apache pod with proper annotation, just change the docker image and parser name to nginx:
apiVersion: v1
kind: Pod
metadata:
name: apache-logs-annotated
labels:
app: apache-logs-annotated
annotations:
fluentbit.io/parser: apache
spec:
containers:
- name: apache
image: edsiper/apache_logs
so for your case the part _fluentbit.io/parser: apache_ should be _fluentbit.io/parser: nginx_.
With that config in place when Fluent Bit gather the metadata from the Pod, it will realize that there is a suggested parser (the parser must be registered in the configmap) and will apply it to the original content of the log generated in numer __1__.
Note that you must add the Kubernetes filter, otherwise it will not work, as a reference take a look here:
Thanks for the detailed answer.
I have my pods now properly annotated and properly getting structured logs, which makes my life easier to be able to filter by regex filter additionally when needed.
I think this explanation could be properly added into documentation and give some examples in the yaml files managing these annotations, otherwise is hard to get to this point. I couldn't read something like this in any place as far as I remember.
Anyway thanks again for the help.
thanks for the update. Indeed after this thread I merged the pending docs, you can see the updated docs with examples here:
https://docs.fluentbit.io/manual/filter/kubernetes#kubernetes-annotations
(if you think it needs some additional info, just let me know)
closing the ticket for now, regards.
Most helpful comment
Thanks for the detailed answer.
I have my pods now properly annotated and properly getting structured logs, which makes my life easier to be able to filter by regex filter additionally when needed.
I think this explanation could be properly added into documentation and give some examples in the yaml files managing these annotations, otherwise is hard to get to this point. I couldn't read something like this in any place as far as I remember.
Anyway thanks again for the help.