Hello,
After writing on slack, will post here the issue i'm facing.
I've been using fluent-bit in several times, but this time i found out that the logs created by an application with some \n are being changed by fluent-bit adding one more "\" this is leading to a lot of bad escaping on kibana, and debugging stacktraces with it is very difficult. I've attached the 2 differents logs (application raw logs, and logs parsed by fluent-bit), and the SOS report. If you can take a look and give some advices would be much appreciated. Thanks,
From application : {"@timestamp":"2017-12-11T17:22:07.421+00:00","@version":1,"message":"\nFailed events housekeeping completed in 4ms\n","logger_name":"LoadingProcessManager.[LoadingProcessManager]","thread_name":"[FR(234141)]@Wrk.80","level":"INFO","level_value":20000,"user":"adm","code":"G1000","instance":"product","layer":"converter"}
From Kibana:
{
"_index": "dev-logs-2017.12.11",
"_type": "flb_type",
"_id": "AWBGmTGR9USO52Xb23Br",
"_version": 1,
"_score": null,
"_source": {
"_@timestamp": "2017-12-11T17:22:07.424Z",
"@timestamp": "2017-12-11T17:22:07.421+00:00",
"@version": 1,
"message": "\\nFailed events housekeeping completed in 4ms\\n",
"logger_name": "LoadingProcessManager.[LoadingProcessManager]",
"thread_name": "[FR(234141)]@Wrk.80",
"level": "INFO",
"level_value": 20000,
"user": "adm",
"code": "G1000",
"instance": "product",
"layer": "converter"
},
"fields": {
"@timestamp": [
1513012927421
]
},
"sort": [
1513012927421
]
}
SOS Report
Customers of Treasure Data. For more details visit: [0/25816]
https://fluentd.treasuredata.com
[Fluent Bit]
Edition Community Edition
Version 0.12.9
Built Flags JSMN_PARENT_LINKS JSMN_STRICT FLB_HAVE_TLS FLB_HAVE_SQLDB FLB_HAVE_BUFFERING FLB_HAVE_TRACE FLB_HAVE_FLUSH_LIBCO FLB_HAVE_SYSTEMD FLB_HAVE_FORK FLB_HAVE_PROXY_GO FLB_HAVE_JEMALLOC JEMALLOC_MANGLE FLB_HAVE_REGEX FLB_HAVE_C_TLS FLB_HAVE_SETJMP FLB_HAVE_ACCEPT4 FLB_HAVE_INOTIFY
[Operating System]
Name Linux
Release 4.4.0-1043-aws
Version #52-Ubuntu SMP Tue Dec 5 10:49:06 UTC 2017
[Hardware]
Architecture x86_64
Processors 2
[Built Plugins]
Inputs cpu mem kmsg tail proc disk systemd netif dummy head health serial stdin tcp mqtt lib forward random syslog
Filters grep stdout kubernetes parser record_modifier
Outputs counter es exit file forward http influxdb kafka-rest nats null plot stdout td lib flowcounter
[SERVER] Runtime configuration
Flush 1
Daemon Off
Log_Level Debug
[INPUT] Instance
Name tail.0 (tail, id=0)
Flags DYN_TAG
Threaded No
Tag tail.0
Mem_Buf_Limit 9.5M
Path /var/log/file.log
Parser json-test
DB /var/log/flb_kube.db
Buffer_Chunk_Size 8k
Buffer_Max_Size 32k
[OUTPUT] Instance
Name es.0 (es, mask_id=1)
Match *
TLS Active No
Retry Limit no limit
Host.TCP_Port 80
Host.Name elastic.host
Logstash_Format On
Logstash_Prefix dev-logs
Index dev-logs
Time_Key _@timestamp
Did you find a fix for this?
@captainkerk not so far..
Ok, I'm experiencing something similar. I'm still debugging, I'll let you know if I figure anything out.
My fallback is to ditch this lightweight agent, and go back to the full fluentd agent (which doesnt have this problem).
Well for me it's not a solution as in a container env i need everything as lightweight as possible, and this is not a major blocker for me.
hello everyone, my apologizes for the delay on this.
@hdiass would you please provide the specific Docker log file that contains the problematic messages ?
side question, are you able to reproduce the problem with Fluent Bit v0.12.11 ?
Hello,
I can't provide the entire docker log, but in the first post you can see exactly that on the application log (where fluent is tailing from) there is only "\n"in the message field(beggining and end) , and in the elasticsearch in the message field for the same log there is "\n" with one more backslash than what it should have also in the beggining and in the end of the message, btw this also happens if there is any "\n" in the middle of the message. And has there is not any app between fluent-bit and our Elasticsearch we can assum the issue is on the fluent-bit processing.
Regarding the fluent version, i'm now on the latest and same issue happens.
Thanks,
hi!
i'm experiencing the same issue using fluent/fluent-bit:0.12.11 on a 1.9 k8s cluster.
example log line from kubectl logs:
10.32.0.1 - - [18/Jan/2018:16:13:42 +0000] "GET /login/ HTTP/1.1" 200 1259 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
same line on elasticsearch:
{
"_index": "logstash-2018.01.18",
"_type": "flb_type",
"_id": "AWEKDCLo-e7v1YXKY33t",
"_score": null,
"_source": {
"@timestamp": "2018-01-18T16:13:42.203Z",
"log": "10.32.0.1 - - [18/Jan/2018:16:13:42 +0000] \\\"GET /login/ HTTP/1.1\\\" 200 1259 \\\"-\\\" \\\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36\\\"\\n",
"stream": "stdout",
"time": "2018-01-18T16:13:42.203127351Z",
"kubernetes": {
"pod_name": "insight-nginx-7f59b9c9cb-bddm6",
"namespace_name": "insight-staging",
"pod_id": "81e9f86e-fa11-11e7-bb88-005056970681",
"labels": {
"app": "insight",
"deployment": "app",
"pod-template-hash": "3915657576",
"process": "nginx",
"tier": "core"
},
"host": "shark-kube-node2",
"container_name": "insight-core-nginx",
"docker_id": "d2c21d5ffc702b397f843f5edd0d292f2ae8169eb8404f24fbc407c748aab21f"
}
}
}
here the full k8s fluent-bit yaml config
apiVersion: v1
kind: Namespace
metadata:
name: logging
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluent-bit
namespace: logging
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: fluent-bit-read
rules:
- apiGroups: [""]
resources:
- namespaces
- pods
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: fluent-bit-read
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: fluent-bit-read
subjects:
- kind: ServiceAccount
name: fluent-bit
namespace: logging
---
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: logging
labels:
k8s-app: fluent-bit
data:
# Configuration files: server, input, filters and output
# ======================================================
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Daemon off
Parsers_File parsers.conf
@INCLUDE input-kubernetes.conf
@INCLUDE filter-kubernetes.conf
@INCLUDE output-elasticsearch.conf
input-kubernetes.conf: |
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
filter-kubernetes.conf: |
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Merge_JSON_Log On
output-elasticsearch.conf: |
[OUTPUT]
Name es
Match *
Host ${FLUENT_ELASTICSEARCH_HOST}
Port ${FLUENT_ELASTICSEARCH_PORT}
Logstash_Format On
Retry_Limit False
parsers.conf: |
[PARSER]
Name apache
Format regex
Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name apache2
Format regex
Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name apache_error
Format regex
Regex ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$
[PARSER]
Name nginx
Format regex
Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name json-test
Format json
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
[PARSER]
Name syslog
Format regex
Regex ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
Time_Key time
Time_Format %b %d %H:%M:%S
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: logging
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
template:
metadata:
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
containers:
- name: fluent-bit
image: fluent/fluent-bit:0.12.11
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "XXXXXXXXX"
- name: FLUENT_ELASTICSEARCH_PORT
value: "9200"
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
terminationGracePeriodSeconds: 10
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-config
serviceAccountName: fluent-bit
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
fluent-bit logs some of this sometimes:
[2018/01/18 16:12:13] [ warn] [filter_kube] could not pack merged json
[2018/01/18 16:15:59] [ warn] [filter_kube] could not pack merged json
i'm available to provide more information if necessary :)
In order to identify on which part of Fluent Bit the problem is happening I need to get a Docker log file. kubectl logs process the JSON map of Docker log engine, it's better to get the real source.
The log file should have the name containing "d2c21d5ffc702b397f843f5edd0d292f2ae8169eb8404f24fbc407c748aab21f" (docker id)
hi @edsiper , thanks for your interest!
I attach a docker log file and paste how some lines appear on kibana.
docker log file:
insight-nginx-7f59b9c9cb-bddm6_insight-staging_insight-core-nginx-f4cd342bda99fdee3e0af2a03bc2070343cf5d7577e106fc04b58b5cf4f4d3fa.log
first log line from kibana:
{
"_index": "logstash-2018.01.19",
"_type": "flb_type",
"_id": "AWENsBEw-e7v1YXK5Szq",
"_score": null,
"_source": {
"@timestamp": "2018-01-19T09:11:37.63Z",
"log": "10.32.0.1 - - [19/Jan/2018:09:11:37 +0000] \\\"GET /dashboard HTTP/1.1\\\" 302 432 \\\"-\\\" \\\"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36\\\"\\n",
"stream": "stdout",
"time": "2018-01-19T09:11:37.063525473Z",
"kubernetes": {
"pod_name": "insight-nginx-7f59b9c9cb-bddm6",
"namespace_name": "insight-staging",
"pod_id": "81e9f86e-fa11-11e7-bb88-005056970681",
"labels": {
"app": "insight",
"deployment": "app",
"pod-template-hash": "3915657576",
"process": "nginx",
"tier": "core"
},
"host": "shark-kube-node2",
"container_name": "insight-core-nginx",
"docker_id": "f4cd342bda99fdee3e0af2a03bc2070343cf5d7577e106fc04b58b5cf4f4d3fa"
}
}
}
second log line from kibana:
{
"_index": "logstash-2018.01.19",
"_type": "flb_type",
"_id": "AWENsBEw-e7v1YXK5Szr",
"_score": null,
"_source": {
"@timestamp": "2018-01-19T09:11:37.150Z",
"log": "10.32.0.1 - - [19/Jan/2018:09:11:37 +0000] \\\"GET /login HTTP/1.1\\\" 301 436 \\\"-\\\" \\\"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36\\\"\\n",
"stream": "stdout",
"time": "2018-01-19T09:11:37.150930054Z",
"kubernetes": {
"pod_name": "insight-nginx-7f59b9c9cb-bddm6",
"namespace_name": "insight-staging",
"pod_id": "81e9f86e-fa11-11e7-bb88-005056970681",
"labels": {
"app": "insight",
"deployment": "app",
"pod-template-hash": "3915657576",
"process": "nginx",
"tier": "core"
},
"host": "shark-kube-node2",
"container_name": "insight-core-nginx",
"docker_id": "f4cd342bda99fdee3e0af2a03bc2070343cf5d7577e106fc04b58b5cf4f4d3fa"
}
}
}
hope it's enough for debugging!
Hello @edsiper,
I'm approaching you on slack to send the logs privately.
Thanks,
thanks everyone for reporting the problems associated to escaping.
@hdiass provided me some information that was helpful to identify the issue associated to one posted at the beginning:
When an application generate a formal JSON log, likely special ASCII characters like '\n' (note: ASCII character, one byte) is converted to JSON string format which is "\n" (two bytes: \ and n). So our JSON parser applied to the records is not converting back the string "\n" to the ASCII byte '\n', for hence when an output plugin is going to convert to JSON output, it will re-escape the original "\n" to \\n".
I think the solution to the problem is to extend our JSON parser where we could instruct to "decode" specific fields of the record; ideally this must be optional because as you know text parsing is expensive and I would like to avoid performance penalties for scenarios where logs with special characters does not exists.
A parser could look like this:
[PARSER]
Name docker_parser
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
Decode_JSON_Field message
As an example consider the following original record from a file:
{"@timestamp":"2018-01-19T11:38:06.225+00:00",
"@version":1,
"message":"\nFailed events housekeeping completed in 1ms\n",
}
in the example above it contains a stringify text in _message_ key. Actually the \n is kept as it is and if it hits an output plugin that re-format to JSON (like elasticsearch), it will escape that character.
I've also found that if a JSON string comes with encoded UTF-8, these are not decoded and kept as they come, e.g:
{"log":"JP characters \u6628\u591c\u306e\u30b3\u30f3\u30b5"}
after processing to stdout we get:
[1516385197.832135298, {"log"=>"JP characters \u6628\u591c\u306e\u30b3\u30f3\u30b5"}]
we expected the Japanese characters 鏄ㄥ銇偝銉炽偟
so a "decoding" feature at Parser level is required, either for special characters like \n and others in UTF-8.
hi @edsiper thanks for the investigation!
in the meanwhile,
do you think it is possibile to use a different docker logging driver and k8s daemonset input configuration to avoid the decoding issue?
do you have any suggestion to proceed in that way?
thanks in advance,
@peelandsee
the problem is not the driver, the thing is that Docker is agnostic from the content that each application generate through stdout/stderr, so any output coming from the applications it will consider it as a "string". So when it's time to generate the JSON record in the log, any string needs to be formatted and escaped as required.
@edsiper tried the commits above and they conflict with the kubernetes filter when merge_json_log is on and the input parser is docker. (This is the default setup in the official helm chart)
@sagikazarmark thanks, I will check that.
All, please test the latest 0.13-dev image which comes with several fixes:
https://github.com/fluent/fluent-bit-kubernetes-logging/tree/0.13-dev
all, any feedback on ^
@edsiper I will test it today
I'll check today also .
@edsiper, tested the 0.13-dev but the issue still remains. {
"_index": "dev-logs-2018.02.28",
"_type": "flb_type",
"_id": "AWHcJeMzn3gzkgkFV7g1",
"_version": 1,
"_score": null,
"_source": {
"_@timestamp": "2018-02-28T11:22:05.321Z",
"@timestamp": "2018-02-28T11:22:05.321+00:00",
"@version": 1,
"message": "\\nPre-actions completed in 79ms\\n",
"logger_name": "loading.services.LoadingProcessManager.[DATAPUMP.LoadingProcessManager]",
"thread_name": "[CBPM.FR(667441)]@CBPM.FlowManager.JobManager.Wrk.1",
"level": "INFO",
"level_value": 20000,
"user": "xxx",
"code": "JAF_G1000",
"instance": "product",
"layer": "converter"
},
"fields": {
"@timestamp": [
1519816925321
]
},
"sort": [
1519816925321
]
}
As you can see in message field, there is still "\" being added to the "\n".
Thanks for your support on this case
@hdiass how the original log line from Docker looks like ?
I've corrected the my last comment maybe it was misleading.
I've tested the new release but issue remains the same unfortunately. Have you implemeted that Decode JSON Field on this version (perhaps i have to configure that on my fluent-bit.conf) to disable the parsing inside the json field (message) itself?
(Tested on v0.12.15 and 0.13-dev)
Source log:
{"@timestamp":"2018-03-06T11:30:06.190+00:00","@version":1,"message":"Terminated partition 0 for process named lp_203_failed_events with execution id 204,295 and flow_id e20889a2-50ea-4a14-8b41-c2c1eab9ee24 \nPartition expression: N/A \nEvents: 0 \nStats: ReadTime=27 ms,WriteTime=0 ms,FlowTime=0 ms,TotalTime=129 ms\nEventsPerHour: 0\nStatus: SUCCESS","logger_name":"loading.services.LoadingProcessManager.[LoadingProcessManager]","thread_name":"LP(204295)-P(0)-E(0)","level":"INFO","level_value":20000,"user":"xxx","code":"yyy","instance":"product","layer":"converter"}
ElasticSearch log:
{
"_index": "dev-logs-2018.03.06",
"_type": "flb_type",
"_id": "AWH7Fwufn3gzkgkFq-Tr",
"_score": 1,
"_source": {
"_@timestamp": "2018-03-06T11:34:06.251Z",
"@timestamp": "2018-03-06T11:34:06.251+00:00",
"@version": 1,
"message": "Terminated partition 0 for process named lp_203_failed_events with execution id 204,307 and flow_id e20889a2-50ea-4a14-8b41-c2c1eab9ee24 \\nPartition expression: N/A \\nEvents: 0 \\nStats: ReadTime=33 ms,WriteTime=0 ms,FlowTime=0 ms,TotalTime=137 ms\\nEventsPerHour: 0\\nStatus: SUCCESS",
"logger_name": "loading.services.LoadingProcessManager.[LoadingProcessManager]",
"thread_name": "LP(204307)-P(0)-E(0)",
"level": "INFO",
"level_value": 20000,
"user": "xxx",
"code": "yyyy",
"instance": "product",
"layer": "converter"
},
"fields": {
"@timestamp": [
1520336046251
]
}
}
Hope this is useful to help solving the issue 馃憤
After hour conversation on fluent-bit slack, i updated my parsers.conf and fluent-bit.conf (in the new 0.13 dev release) and it solved the issue. No more \\n on valid json fields.
Thanks a lot @edsiper for your support!
@hdiass you are welcome!
for the record the issue was generated by:
As a workaround for these kind of situation Fluent Bit on 0.13-dev introduced Parser Decoders, where through a simple configuration the parser can 'unescape' any special character so the output plugin will only escape as required once.
The example parser configuration provided to @hdiass was this:
[PARSER]
Name json
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
Time_Keep On
Decode_Field_As escaped message
For anyone using Kubernets I suggest you try our 0.13-dev image and Yaml files:
https://github.com/fluent/fluent-bit-kubernetes-logging/tree/0.13-dev
Closing this issue as fixed.
Most helpful comment
thanks everyone for reporting the problems associated to escaping.
@hdiass provided me some information that was helpful to identify the issue associated to one posted at the beginning:
Issue
When an application generate a formal JSON log, likely special ASCII characters like '\n' (note: ASCII character, one byte) is converted to JSON string format which is "\n" (two bytes: \ and n). So our JSON parser applied to the records is not converting back the string "\n" to the ASCII byte '\n', for hence when an output plugin is going to convert to JSON output, it will re-escape the original "\n" to \\n".
Ideas
I think the solution to the problem is to extend our JSON parser where we could instruct to "decode" specific fields of the record; ideally this must be optional because as you know text parsing is expensive and I would like to avoid performance penalties for scenarios where logs with special characters does not exists.
A parser could look like this:
As an example consider the following original record from a file:
in the example above it contains a stringify text in _message_ key. Actually the \n is kept as it is and if it hits an output plugin that re-format to JSON (like elasticsearch), it will escape that character.
I've also found that if a JSON string comes with encoded UTF-8, these are not decoded and kept as they come, e.g:
after processing to stdout we get:
we expected the Japanese characters 鏄ㄥ銇偝銉炽偟
so a "decoding" feature at Parser level is required, either for special characters like \n and others in UTF-8.