Which chart:
stable/fluent-bit
What happened:
An application produces a JSON log, e.g.
{"context":{"package":"@applaudience/cinema-data-scraper-http-client","namespace":"httpClient","logLevel":20},"message":"using proxy http://proxy-gateway:8080","sequence":19715,"time":1546821495356,"version":"1.0.0"}
but ES receives the log as a plain text message (log field):
{
"_index": "kubernetes_cluster-2019.01.07",
"_type": "flb_type",
[..]
"fields": {
"@timestamp": [
"2019-01-07T01:30:01.815Z"
]
},
"highlight": {
"log": [
"2019-01-07T00:38:15.356711687Z stdout F {\"context\":{\"package\":\"@applaudience/cinema-data-scraper-http-client\",\"namespace\":\"httpClient\",\"logLevel\":20},\"message\":\"using @kibana-highlighted-field@proxy@/kibana-highlighted-field@ http://@kibana-highlighted-field@proxy@/kibana-highlighted-field@-@kibana-highlighted-field@gateway@/kibana-highlighted-field@:8080\",\"sequence\":19715,\"time\":1546821495356,\"version\":\"1.0.0\"}"
],
"kubernetes.container_name": [
"@kibana-highlighted-field@data@/kibana-highlighted-field@-@kibana-highlighted-field@manager@/kibana-highlighted-field@"
]
},
"sort": [
1546824601815
]
}
What you expected to happen:
I expect that fluent-bit-parses the json message and providers the parsed message to ES.
How to reproduce it (as minimally and precisely as possible):
Using default configuration.
CC @naseemkullah @jknipper @vroyer (Recent contributors to stable/fluent-bit Chart).
What is the configuration for parsing JSON logs produced by an application running in a Kubernetes Pod?
I have tried multiple configuration combinations, including:
Thank you for the link @kfox1111 .
I have tried multiple combinations, including:
backend:
type: es
es:
host: elasticsearch-client
image:
fluent_bit:
tag: 1.0.1
parsers:
json:
- extraEntries: |-
Decode_Field_As escaped log do_next
Decode_Field_As json log
and
backend:
type: es
es:
host: elasticsearch-client
image:
fluent_bit:
tag: 1.0.1
parsers:
json:
- extraEntries: |-
Decode_Field_As escaped_utf8 log do_next
Decode_Field_As json log
but it continuous to come as a plain text log:
{
"_index": "kubernetes_cluster-2019.01.07",
"_type": "flb_type",
"_id": "HnceJmgBINsxlb1YnF_W",
"_score": 1,
"_source": {
"@timestamp": "2019-01-07T02:22:46.865Z",
"log": "2019-01-07T02:22:46.865426429Z stdout F {\"context\":{\"package\":\"slonik\",\"logLevel\":20,\"executionTime\":\"1.28 ms\",\"rowCount\":1,\"sql\":\"SELECT id FROM event_seat_state_change WHERE event_id = ? ORDER BY id DESC LIMIT 1\"},\"message\":\"query\",\"sequence\":47781,\"time\":1546827766865,\"version\":\"1.0.0\"}",
"kubernetes": {
"pod_name": "adm-do-event-seating-lookups-7555bf849c-h4qlc",
"namespace_name": "default",
"pod_id": "563e747d-1208-11e9-b648-42010aa40038",
"labels": {
"chart-name": "data-manager",
"chart-version": "1.0.0",
"heritage": "Tiller",
"pod-template-hash": "3111694057",
"release": "adm-do-event-seating-lookups"
},
[...]
},
"fields": {
"@timestamp": [
"2019-01-07T02:22:46.865Z"
]
}
}
@gajus but your key. _source . log is not valid JSON. It starts with 2019-01-07T02:22:46.865426429Z stdout F {\"context\":{. i.e. there is a bunch of randomn strings before the JSON. Isn't that the problem here?
@gajus but your key.
_source . logis not valid JSON. It starts with2019-01-07T02:22:46.865426429Z stdout F {\"context\":{. i.e. there is a bunch of randomn strings before the JSON. Isn't that the problem here?
My understanding is that the prefix to which you are referring is added by Docker.
I have SSHed to one of the GKE nodes and checked the raw logs to confirm this:
$ tail -1 /var/log/pods/6829cea2-1208-11e9-b648-42010aa40038/data-manager/0.log
2019-01-07T10:52:37.095818517Z stdout F {"context":{"package":"@applaudience/cinema-data-scraper-http-client","namespace":"httpClient","logLevel":20},"message":"using proxy http://proxy-gateway:8080","sequence":180167,"time":1546858357095,"version":"1.0.0"}
The app itself simply outputs a JSON line, i.e. just:
{"context":{"package":"@applaudience/cinema-data-scraper-http-client","namespace":"httpClient","logLevel":20},"message":"using proxy http://proxy-gateway:8080","sequence":180167,"time":1546858357095,"version":"1.0.0"}
My understanding is that fluent-bit is supposed to search the log message for occurrences satisfying the JSON parser and extract that information.
I see.
FWFIW, this works for us with chart version 0.12.0 and these settings:
$ kubectl -n kube-system get -o yaml configmap fluent-bit-config
apiVersion: v1
data:
fluent-bit.conf: |
[SERVICE]
Flush 1
Daemon Off
Log_Level info
Parsers_File parsers.conf
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser json
Tag kube.*
Refresh_Interval 5
Mem_Buf_Limit 5MB
Skip_Long_Lines On
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Merge_Log Off
Merge_Log_Key log
K8S-Logging.Parser On
K8S-Logging.Exclude On
[OUTPUT]
Name splunk
<removed>
parsers.conf: "[PARSER]\n Name json\n Format json\n Time_Keep On\n Time_Key
time\n Time_Format %Y-%m-%dT%H:%M:%S.%L\n Decode_Field_As escaped_utf8
\ log do_next\n Decode_Field_As json log \n Decode_Field_As
\ escaped_utf8 logj.message do_next\n Decode_Field_As json logj.message\n\n[PARSER]\n
\ Name docker\n Format json\n Time_Keep On\n Time_Key time\n Time_Format
%Y-%m-%dT%H:%M:%S.%L\n"
kind: ConfigMap
But it only works with top level JSON keys as far as I can tell. And here's one line from an nginx container that is set to log in JSON:
# tail -1 /var/log/pods/cb921f33-xxx-xxx-xxxx-0ae5e5d97c08/nginx-ingress-controller/0.log
{"log":"{ \"@timestamp\": \"2019-01-07T11:13:06+00:00\", \"body_bytes_sent\": \"0\", \"client_ip\":...
This works for us in Splunk, not ES.
Do you mind telling me what is the contents of cat /etc/docker/daemon.json in your Kubernetes cluster?
Mine is:
{
"disable-legacy-registry": true,
"live-restore": true,
"storage-driver": "overlay2"
}
Shouldn't this say log-driver: json-file?
Never mind. It appears that docker was launched with that config:
$ docker info | grep 'Logging Driver: json-file'
Logging Driver: json-file
That is strange. I logged in to another cluster that I own to see what is the format output, and it is indeed just plain JSON.
Something somewhere is broken in my setup.
Mine is the default for an EKS node on AWS:
{ "log-driver": "json-file", "log-opts": { "max-size": "10m", "max-file": "10" }}
I think the default GKE node has the same log-driver, no?
I have identified the cause.
The reasons the log format is off is because I am using containerd.
Have not discovered what is the fix, yet.
There is something utterly broken with fluent-bit (or poorly documented).
After many hours... it looks that I needed to configure Merge_Log_Trim On.
https://docs.fluentbit.io/manual/filter/kubernetes
Everything else works after configuring Merge_Log_Trim.
@gajus I had the same issue and I can see it works only if u have a JSON MAP, so if u have a nested json the decoder will still fail.
It doesn't matter if u use kubernetes filter, because the same result can be achieved using normal JSON parser.
an issue has been created several months ago https://github.com/fluent/fluent-bit/issues/1130 regarding this matter, who knows C ? :)
If a nested JSON object the decoder will fail and keys will not be parsed.
There's many issues for this, just search. In this one they have a list of them. It's a fluent-bit problem though, not a chart issue.
I found this to be a working configuration to properly format json:
input:
tail:
parser: dockerCustom
parsers:
enabled: true
json:
- name: dockerCustom
extraEntries: |-
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
Decode_Field_As escaped_utf8 log do_next
Decode_Field_As json log
I am also experiencing the same problem as others in this thread, and have tried the solutions presented here without success. In every case I'm seeing log: 2019-08-20T16:08:35.976976032Z stdout F {my_properly_formatted_json_map}, with no merging of the json map, and no obvious reason why it's not being merged. I don't have nested json or anything fancy, it appears to be a simple map of key names to strings.
@TarekAS, @gajus Since you both have this working, is it still working for you?
Hi, I'm facing this same problem "log": 2019-09-17T18:17:07.799810563Z stdout F { ... }" using a managed cluster from IBMCloud. I've tried all the options on this post without success.
I am facing the same issue "log": 2019-09-17T18:17:07.799810563Z stdout F { ... }" on OpenShift cluster. Any suggestions?
same issue on k3s cluster
"log": 2020-01-31T05:29:30.815679275+03:00 stdout F {"level":"warning","msg":"A new release has been found: 2.1.3. Please consider updating.","time":"2020-01-31T02:29:30Z"}
Same here:
2020-02-13T15:45:55.192902696Z stdout F {"@timestamp":"2020-02-13T15:45:55.192+00:00","@version":"1","message":"Span reported: 852eeff1c903a4e3:19b0f345af5bac8:1ba028c3f553cb66:1 - /permissions/_find_by_traits","logger_name":"io.jaegertracing.reporters.LoggingReporter","thread_name":"http-nio-8080-exec-9","level":"INFO","level_value":20000,"trace.id":"852eeff1c903a4e3"}
After many hours... it looks that I needed to configure
Merge_Log_Trim On.https://docs.fluentbit.io/manual/filter/kubernetes
Everything else works after configuring
Merge_Log_Trim.
I tried the same, but it still appears to have problems with the 2019-01-07T10:52:37.095818517Z stdout F
Log messages from app containers in openshift cluster are updated before they are saved to log files. If the log message from app container is This is test, then when it is saved to the file, something like 2019-01-07T10:52:37.095818517Z stdout F is prepended to the log. If the log message itself is in json format, prepending it with 2019-01-07T10:52:37.095818517Z stdout F invalidates json. I believe https://github.com/cri-o/cri-o/blob/0366d726c1b5b957671412efac86c96b63175987/utils/io/logger.go#L82 does it. I don't think there is any way to get rid of it, but you can configure fluent-bit parser and input to make it more sensible.
In fluent-bit config, have one INPUT for k8s application pods' logs
[INPUT]
Name tail
Path /var/log/containers/*.log
Tag kube.<namespace_name>.<container_name>.<pod_name>.<docker_id>-
Tag_Regex (?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$
DB /var/log/containers/fluentbit_kube.db
Parser crio
Mem_Buf_Limit 100MB
Skip_Long_Lines On
Refresh_Interval 5
and have crio PARSER
[PARSER]
Name crio
Format regex
Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
This will change the log message to look like:
{
"date": 1581526271.544489,
"stream": "stdout",
"logtag": "F",
"log": "This is test"
}
@sgujrati-up I got it partly working by simply setting input.tail.parser to cri, which then uses the built-in parser https://github.com/fluent/fluent-bit/blob/551ae7778e5e4cad0593ad299010998b52ac2ee1/conf/parsers.conf#L97.
However, some of my messages contain logs in JSON which I would also like to parse.
Some look like this:
E0214 07:59:30.060195 1 reststorage.go:147] unable to fetch pod metrics for pod mip-playground/permission-selector-ui-service-5f6f84cdbd-7f7zs: no metrics known for pod
Perfect, nothing to change here.
But some look like this:
{"@timestamp":"2020-02-14T07:59:30.057+00:00","@version":"1","message":"Could not update new incoming content for search/cognitive feed with feedId.","logger_name":"newsupdater","thread_name":"pool-2-thread-1","level":"WARN","level_value":30000}
Here I would also love to parse the JSON and merge it.
All logs come from k8s-application.
Soooo, I got it working.
Additionally to the above, I just needed to add another filter to rawConfig:
rawConfig: |-
@INCLUDE fluent-bit-service.conf
@INCLUDE fluent-bit-input.conf
[FILTER]
Name parser
Match kube.*
Key_Name message
Parser json
@INCLUDE fluent-bit-filter.conf
@INCLUDE fluent-bit-output.conf
Now it works!
@boxcee . could you show us the full config for fluentbit where log is parsed as json please?
and which fluentbit image are you using ?
So I spend some time getting this working.
I am running an docker image that is writing json as output
so the files is as follows
td-agent-bit.conf
[SERVICE]
Flush 1
Parsers_File parsers.conf
@INCLUDE inputs.conf
@INCLUDE filters.conf
@INCLUDE outputs.conf
filters.conf
[FILTER]
Name parser
Match_regex docker.fluentd.*
Key_Name log
Reserve_Data true
Parser json
inputs.conf
[INPUT]
Name forward
Listen 0.0.0.0
Port 24224
Buffer_Chunk_Size 32KB
Buffer_Max_Size 64KB
to get the docker container logging I added tagging (the forward plugin does not support Tag) so this is the options I am having on my docker container
--log-driver fluentd \
--log-opt 'fluentd-address=localhost:24224' \
--log-opt 'fluentd-async-connect=true' \
--log-opt 'fluentd-buffer-limit=4m' \
--log-opt 'fluentd-max-retries=1000' \
--log-opt 'tag=docker.fluentd' \
--log-opt 'mode=non-blocking' \
Passing these values to helm chart solves this issue for me in k3d clusters:
parsers:
enabled: true
regex:
- name: log_parser
regex: ^(?<logtimestamp>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
timeKey: logtimestamp
timeFormat: "%Y-%m-%dT%H:%M:%S.%L%z"
input:
tail:
parser: log_parser
Most helpful comment
@gajus I had the same issue and I can see it works only if u have a JSON MAP, so if u have a nested json the decoder will still fail.
It doesn't matter if u use kubernetes filter, because the same result can be achieved using normal JSON parser.