Hi.
I'm using fluent-bit 13.2 daemonset with the following configuration:
[SERVICE]
Flush 1
Daemon Off
Log_Level info
Parsers_File parsers.conf
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Exclude_Path ${FLUENT_ELASTICSEARCH_EXCLUDE}
Parser docker
DB /var/log/${FLUENT_INTERNAL_DB}
Mem_Buf_Limit 20MB
Buffer_Chunk_Size 32k
Buffer_Max_Size 128k
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Merge_Log On
[OUTPUT]
Name es
Match *
Host ${FLUENT_ELASTICSEARCH_HOST}
Port ${FLUENT_ELASTICSEARCH_PORT}
Logstash_DateFormat ${FLUENT_ELASTICSEARCH_INDEX_DATE_FORMAT}
Logstash_Format On
Retry_Limit False
Logstash_Prefix ${FLUENT_ELASTICSEARCH_INDEX}
Index ${FLUENT_ELASTICSEARCH_INDEX}
Time_Key _@timestamp
Generate_ID On
In logs with " like this one, fluent-bit doesn't parse the log.
{"@timestamp":"2018-06-04T16:40:04.732+00:00","@version":1,"message":"Error processing (flow.ProcessErrorException: Failed to aggregate fields in Task from event -> Dimension HOME_NETWORK_ID has an empty value) event ->EventImpl [_internalMap=\nRD_ACCESS_POINT_NAME_NI (String) \"abc \"\nRD_CALL_EVENTS_COUNT (Long) 5\nRD_CALL_EVENT_DURATION (Long) 1084\nRD_CALL_EVENT_START_TIMESTAMP (String) \"20170101180000\"\nRD_CAUSE_FOR_TERMINATION (Long) 0\nRD_CHARGING_ID (Long) 00000000\nRD_DATA_VOLUME_INCOMING (Long) 0\nRD_DATA_VOLUME_OUTGOING (Long) 0\nRD_EVENT_TYPE (String) \"GPRS\"\nRD_EVENT_UTC_TIME_OFFSET (String) \"+0100\"\nRD_GGSN_ADDRESS (String) \"0.0.0.0\"\nRD_IMEI (String) \"9999999999999999\"\nRD_IMSI (String) \"9999999999999999\"\nRD_SGSN_ADDRESS (String) \"0.0.0.0\"\n, eventType=REGULAR, inputEntry=ControlId [3 - NXXXXXXXXXXC0010000] at position 5 5]","logger_name":"loading.services.LoadingProcessManager.[LoadingProcessManager]","thread_name":"LP(286)-P(0)-E(0)","level":"WARN","level_value":30000,"code":"G1000","instance":"product","layer":"backend"}
It seems related to \"abc \" inside the field message value. I was using the docker parser. Also tried changing the parser to:
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
# Command | Decoder | Field | Optional Action
# =============|==================|=================
Decode_Field_As json log try_next
Decode_Field_As escaped log
but the result is the same.
I was using version 11 and it was working ok with this kind of logs.
I'm having the same issue at the moment. If anyone can assist, it'd be great.
I've been trying with all the Decode_Field_As combinations I can imagine without success. I need to find a way to keep the " escaped inside the message field
@rgomesf would you please paste (in formatted mode) your docker parser content that comes from your configmap ?
Found the problem in your config, in your docker parser you are processing a key named __log__, but in your log example the content is inside the key called __message__. Just put something like this in your docker parser:
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
# Command | Decoder | Field | Optional Action
# =============|==================|=================
Decode_Field_As escaped message
Hi @edsiper .
I already tried with that config and it also didn't work
I found two outcomes:
Decode_Field_As escaped json
The log isn't parsed as json when the message field has " in the value.
It seems that it removes the escaping from the original " and then the json is not valid.
Decode_Field_As escaped message
The log is parsed as json but the " are represented stored as \", and shown as " in kibana.
__1__ Decode_Field_As escaped json
that means: unescape field named _json_
__2__ Decode_Field_As escaped message
that means: unescape field named _message_
also there is something required to be clarified:
if your original message contains a byte ", by JSON spec it needs to be escaped so it becomes \". So if your message was packed inside a JSON map likely by docker you will get:
test "message"
in JSON becomes
{"log": "test \"message\""}
the decoders in Fluent Bit allows to avoid double escaping when processing the text messages, but when sending the same message to elasticsearch or kibana by JSON spec it needs to be escaped, otherwise it's an invalid JSON message and will not be accepted.
For short: if you see double escaped in elasticsearch/kibana means something needs to be fixed, otherwise single escaped is by spec.
The exact same logs are parser correctly with fluentbit 0.11.11
I see \\\" in kibana JSON view
When using in_tail, docker json-logs, kubernetes-filter and outputing to splunk, and applications logging JSON to stdout, it seems we got everything right by not having any Decode_Field_As at all in the parsers config.
@edsiper is there anything else I can provide you to help find a fix for this? Right now I reverted to 0.11.11 version.
Dealing with the same urgent problem here. Would be glad to help testing or whatever
We're having the same issue...
If a container writes the following to stdout, it is parsed correctly by Elasticsearch.
{"timestamp":"2016-12-01T02:23:12.236543Z","level":"info","message":"Hello busybox"}
However, if the same log line contains \" , it does not get indexed properly in Elasticsearch.
{"timestamp":"2016-12-01T02:23:12.236543Z","level":"info","message":"Hello \"busybox\""}
We're using the docker parser with the log field escaped.
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep Off
# Command | Decoder | Field | Optional Action
# =============|==================|=================
Decode_Field_As escaped log
In essence:
I would expect the property containing the escaped quotes to be left as is but it seems that the backslashes are removed when the log field is escaped resulting in invalid JSON.
{"timestamp":"2016-12-01T02:23:12.236543Z","level":"info","message":"Hello "busybox""}
What's the proper way of configuring fluent-bit to handle this? We're using fluent-bit 13.2.
I'm also facing a similar challenge and would like to know the correct approach. Steps to reproduce as follows. Run a container that outputs the following:
{"Message": "Hello from Kubernetes!"}
This can be achieved by performing the command:
kubectl run hi --quiet --restart=Never --image=alpine -- sh -c 'sleep 3; echo "{\"Message\": \"Hello from Kubernetes!\"}"'
I then have a docker kubernetes input setup as follows:
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser docker
DB /var/log/flb_kube.db
Which is parsed by the docker parser configured as:
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
# Command | Decoder | Field | Optional Action
# =============|==========|=========|=================
Decode_Field_As escaped log
The JSON payload is then sent to logz.io as:
{
"_index": "logz_io_index",
"_type": "http-bulk",
"_id": "foo",
"_version": 1,
"_score": null,
"_source": {
"kubernetes": {
"container_name": "hi",
"host": "1.2.3.4",
"annotations": {
"kubernetes.io/limit-ranger": "LimitRanger plugin set: cpu request for container hi"
},
"docker_id": "9d5c453d097cfb63f3ca237f0c9d8938147674c9b7bf8956a8b930f476b34e51",
"pod_id": "d7e18544-8a6e-11e8-b430-0a948d0850cc",
"pod_name": "hi",
"namespace_name": "default",
"labels": {
"run": "hi"
}
},
"@timestamp": "2018-07-18T09:45:28.000+00:00",
"log": "{\"Message\": \"Hello from Kubernetes!\"}\n",
"stream": "stdout",
"@log_name": "kube.var.log.containers.hi_default_hi-9d5c453d097cfb63f3ca237f0c9d8938147674c9b7bf8956a8b930f476b34e51.log",
"time": "2018-07-18T09:45:28.626181368Z",
"type": "http-bulk",
"tags": [
"_logz_http_bulk_json_8070"
]
},
"fields": {
"@timestamp": [
1531907128000
]
},
"sort": [
1531907128000
]
}
This causes the end result of the field displaying in elasticsearch/logz.io being an incorrectly indexed JSON field of:
{"Message": "Hello from Kubernetes!"}
I have spent many days pulling my hair out trying to work this out, I'm wondering if someone can suggest the right configuration to have these JSON messages correctly indexed in elasticsearch?
I have the same issue here, if UTF8 chars are inside JSON which is the log field, it cannot be parsed correctly. I think the dilemma here is that if it escape UTF8 firstly, it cannot parse JSON correctly, if it parse JSON firstly, it cannot parse escaped utf8 inside json. 馃槖 Any suggestions? Is it a bug? I was thinking if decode as json in the first, and then escape utf8 would work, but it doesn't.
@meggarr your assessment is exactly what I'm seeing and it appears to be a limitation per the fluent-bit docs. I am having the same problem of an escaped json in the log field, which I can't parse as JSON as it's escaped, and when I use the do_next after parsing the JSON object is not parsed. I have a ticket in #691 which is a specific representation of my use case.
I've included below an excerpt from the docs that suggest that you can only use multiple decoders on a field that is unstructured raw text.
Perhaps this is a feature request and not a ug?
Optional Actions
By default if a decoder fails to decode the field or want to try a next decoder, is possible to define an optional action. Available actions are:
Name | Description
-- | --
try_next | if the decoder failed, apply the next Decoder in the list for the same field.
do_next | if the decoder succeeded or failed, apply the next Decoder in the list for the same field.
Note that actions are affected by some restrictions:
on Decode_Field_As, if succeeded, another decoder of the same type in the same field can be applied only if the data continue being a unstructed message (raw text).
on Decode_Field, if succeeded, can only be applied once for the same field. By nature Decode_Field aims to decode a structured message.
Maybe this is a bug, what I am thinking is that parsing it as JSON should only care double quotes, but apparently, it transforms UTF8 chars as well, e.g. in my case, after JSON decoding, \u003e is changed to u003e.
@edsiper
After several minutes investigation in the code I think the UTF8 decoding part is good except that it reads \\n as a JSON escaped char which results the decoded string cannot be encoded as JSON. A sample log is like,
{"log":"{\"timeMillis\":1532502611649,\"message\":\"Response \u003eGET /abc/xyz, 69ms, \u003e200 OK, \u003e{\\n \\\"_id\\\" : \\\"50-CCBE59-261712046121\\\",\\n \\\"manufacturer\\\" : \\\"BigBang\\\",\\n}\\n\",\"threadId\":64,\"threadPriority\":5}\r\n","stream":"stdout","time":"2018-07-25T07:10:11.650537766Z"}
The UTF8 will read \\n as new line, but it is really not a new line yet at the time to do UTF8 decoding.
Hi @edsiper,
I am trying with the below log message {"Message":"Hi Does double quotes \" work\"","severity":"info"}.
In my kibana I see the below result

I can not see Message and severity key as a indexable in elastic search.
My parser config is below, using docker image as fluent/fluent-bit:0.14.1 and kubernetes version 1.11.1
Does this behavior is fixed? Am I doing some wrong configuration?
`[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
# Command | Decoder | Field | Optional Action
# =============|==================|=================
Decode_Field_As escaped log try_next
Decode_Field_As json log
`
@HarishHothi would you please share your full Fluent Bit configuration ? (and..are you running in Kubernetes?)
Hi @edsiper Below is my full fluentbit configuration. Yes we are running fluentbit in on-premise kubernetes cluster.
https://github.com/HarishHothi/SpringTestRepoHarish/blob/master/fluentbit-configmap.yaml
hi @edsiper is there any update?
I did a PR and build 0.14.2 with cherry-picked fix. Could you test if this work for you.
quay.io/sergioballesteros/fluentbit:0.14.2-pr769
Hi @ese Still see the same problem. My sample log {"Message":"Hi Does double quotes \" work\"","severity":"info"} is still not indexable in kibana. I am using the same fluent bit configuration in on-premise kube cluster.
https://raw.githubusercontent.com/HarishHothi/SpringTestRepoHarish/master/fluentbit-configmap.yaml
I also forwarded the more details on your slack account.
@HarishHothi This is the config I am using:
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
Refresh_Interval 5
ignore_older 6h
Tag kube.*
Mem_Buf_Limit 10MB
Skip_Long_Lines On
[FILTER]
Name kubernetes
Match kube.*
K8S-Logging.Parser On
K8S-Logging.Exclude On
Merge_Log On
[OUTPUT]
Name es
Match kube.*
Host ${FLUENT_ELASTICSEARCH_HOST}
Port ${FLUENT_ELASTICSEARCH_PORT}
Logstash_Format On
Logstash_Prefix containers
parsers.conf: |
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
@HarishHothi The quote must be double back-slash escaped in the logs, otherwise, it will be treated as JSON tokens which is wrong,
E.g.
{
"_index": "logs-2018.09.18",
"_type": "flb_type",
"_id": "ppHI62UBcPJcRNEQT6uo",
"_version": 1,
"_score": null,
"_source": {
"@timestamp": "2018-09-18T08:24:56.777Z",
"message": "ABC Request >GET /xyz/123, data >{\"paramOne\":\"123456\"}",
"log": "{\"timeMillis\":1537259096777,\"thread\":\"thread-1\",\"level\":\"DEBUG\",\"loggerName\":\"com.xyz.XyzHelper\",\"message\":\"ABC Request >GET /xyz/123, data >{\\\"paramOne\\\":\\\"123456\\\"}\",\"endOfBatch\":false,\"loggerFqcn\":\"io.vertx.core.logging.Logger\",\"contextMap\":{},\"threadId\":83,\"threadPriority\":5}\r",
"stream": "stdout",
"timeMillis": 1537259096777,
"thread": "thread-1",
"level": "DEBUG",
...
}
Is this a related issue https://github.com/helm/charts/issues/10424 ?
For what it's worth, we ran into the same issue in our Kubernetes set up with the fluent-bit Helm chart using fluentbit 1.0.4 and ended up solving it with the following:
docker parserDecode_Field_As escaped_utf8 logJust comment here as well as we were also facing same issue where " in the value field of our json formatted logs were not being parsed/correctly understood in our case by Splunk.
Our app log after it has been decoded by fluent-bit docker parser+decoder:
{"log":"{\"logName\":\"app-third_party\",\"timestamp\":\"1554804600355\",\"level\":\"INFO\",\"thread\":\"http-nio-8080-exec-5\",\"req.requestURI\":\"/xxx.do\",\"request_method\":\"GET\",\"message\":\"target=\\\"dsc-command-chx_datas
tax-readSession\\\" duration=42 result=SUCCESS \",\"context\":\"default\"}\n","stream":"stdout","time":"2019-04-09T14:25:00.103780996Z"}
When gone through the docker parser/decoder of fluent-bit with the following config:
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep Off
# Command | Decoder | Field | Optional Action
# =============|==================|=================
Decode_Field_As escaped log do_next
Decode_Field json log
Was not showing up as valid JSON on Splunk hence it could not extract these lovely JSON fields.
Once we changed the decoder to use escaped_utf8 instead of escaped, it all magically started to work and Splunk was able to extract and index all the JSON fields so a big thank you to @Pluies for the pointer!
@Pluies we are having the same issues with fluent-bit on Helm (escaping "). Can you show us how you did it in values.yaml?
This is what I'm doing right now:
...
parsers:
enabled: true
json:
- name: docker
extraEntries: |-
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep Off
Decode_Field_As escaped_utf8 log do_next
Decode_Field_As json log
For those using Helm, I found this to be a working configuration in values.yaml:
input:
tail:
parser: dockerCustom
parsers:
enabled: true
json:
- name: dockerCustom
extraEntries: |-
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
Decode_Field_As escaped_utf8 log do_next
Decode_Field_As json log
I appreciate this is already closed issue and I have previously commented with workaround I used of using Decode_Field_As escaped_utf8 log do_next - BUT i have found this to then break other things in my log, mainly the multiline stack traces.
When using escaped_utf8, the new line \n in the log line seems to get decoded to \\n and hence just appears as that in Splunk, causing it NOT to display as multiline stack trace.
Anyone else seen this? Will raise a new issue anyways.
FYI: Please check the following comment on #1278 :
https://github.com/fluent/fluent-bit/issues/1278#issuecomment-499583503
Issue already fixed, ref: #1278 (comment)
Most helpful comment
I'm also facing a similar challenge and would like to know the correct approach. Steps to reproduce as follows. Run a container that outputs the following:
{"Message": "Hello from Kubernetes!"}This can be achieved by performing the command:
kubectl run hi --quiet --restart=Never --image=alpine -- sh -c 'sleep 3; echo "{\"Message\": \"Hello from Kubernetes!\"}"'I then have a docker kubernetes input setup as follows:
Which is parsed by the docker parser configured as:
The JSON payload is then sent to logz.io as:
This causes the end result of the field displaying in elasticsearch/logz.io being an incorrectly indexed JSON field of:
{"Message": "Hello from Kubernetes!"}I have spent many days pulling my hair out trying to work this out, I'm wondering if someone can suggest the right configuration to have these JSON messages correctly indexed in elasticsearch?