Describe the bug
We see fluent-bit crashing when we feed in large JSON object with many top level keys. We think the problem here https://github.com/fluent/fluent-bit/blob/master/plugins/filter_record_modifier/filter_modifier.c#L264.
CRASH:
[2020/09/24 19:20:03] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2020/09/24 19:20:03] [ info] [sp] stream processor started
[engine] caught signal (SIGSEGV)
#0 0x55ef62185b48 in cb_modifier_filter() at plugins/filter_record_modifier/filter_modifier.c:306
#1 0xffffffffffffffff in ???() at ???:0
The SIGSEGV happens when we run it on cluster in our kubernetes environment but we are having issues reproducing it locally when we run the fluent-bit container, the behavior is that it's skipping the problematic log lines when we run it locally.
Localhost output with debugging (not reading in large json logs):
tseng@jjtseng-macOS:fluent-bit>./run-local.sh
Fluent Bit v1.5.3
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
[2020/09/25 18:08:30] [ info] Configuration:
[2020/09/25 18:08:30] [ info] flush time | 1.000000 seconds
[2020/09/25 18:08:30] [ info] grace | 5 seconds
[2020/09/25 18:08:30] [ info] daemon | 0
[2020/09/25 18:08:30] [ info] ___________
[2020/09/25 18:08:30] [ info] inputs:
[2020/09/25 18:08:30] [ info] tail
[2020/09/25 18:08:30] [ info] ___________
[2020/09/25 18:08:30] [ info] filters:
[2020/09/25 18:08:30] [ info] grep.0
[2020/09/25 18:08:30] [ info] lua.1
[2020/09/25 18:08:30] [ info] modify.2
[2020/09/25 18:08:30] [ info] modify.3
[2020/09/25 18:08:30] [ info] parser.4
[2020/09/25 18:08:30] [ info] modify.5
[2020/09/25 18:08:30] [ info] nest.6
[2020/09/25 18:08:30] [ info] record_modifier.7
[2020/09/25 18:08:30] [ info] ___________
[2020/09/25 18:08:30] [ info] outputs:
[2020/09/25 18:08:30] [ info] stdout.0
[2020/09/25 18:08:30] [ info] ___________
[2020/09/25 18:08:30] [ info] collectors:
[2020/09/25 18:08:30] [ info] [engine] started (pid=1)
[2020/09/25 18:08:30] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2020/09/25 18:08:30] [debug] [storage] [cio stream] new stream registered: tail.0
[2020/09/25 18:08:30] [ info] [storage] version=1.0.5, initializing...
[2020/09/25 18:08:30] [ info] [storage] in-memory
[2020/09/25 18:08:30] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/09/25 18:08:30] [debug] [input:tail:tail.0] inotify watch fd=20
[2020/09/25 18:08:30] [debug] [input:tail:tail.0] scanning path /logging-volume/*.log
[2020/09/25 18:08:30] [debug] [input:tail:tail.0] inode=12961276487 appended as /logging-volume/single-largejson.log
[2020/09/25 18:08:30] [debug] [input:tail:tail.0] scan_glob add(): /logging-volume/single-largejson.log, inode 12961276487
[2020/09/25 18:08:30] [debug] [input:tail:tail.0] 1 new files found on path '/logging-volume/*.log'
[2020/09/25 18:08:30] [debug] [filter:modify:modify.2] Initialized modify filter with 0 conditions and 1 rules
[2020/09/25 18:08:30] [debug] [filter:modify:modify.3] Initialized modify filter with 0 conditions and 1 rules
[2020/09/25 18:08:30] [debug] [filter:modify:modify.5] Initialized modify filter with 0 conditions and 9 rules
[2020/09/25 18:08:30] [debug] [router] match rule tail.0:stdout.0
[2020/09/25 18:08:30] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2020/09/25 18:08:30] [ info] [sp] stream processor started
[2020/09/25 18:08:30] [debug] [filter:modify:modify.2] Input map size 2 elements, output map size 2 elements
[2020/09/25 18:08:30] [debug] [filter:modify:modify.2] Input map size 2 elements, output map size 2 elements
[2020/09/25 18:08:30] [debug] [filter:modify:modify.2] Input map size 2 elements, output map size 2 elements
[2020/09/25 18:08:30] [debug] [filter:modify:modify.2] Input map size 2 elements, output map size 2 elements
[2020/09/25 18:08:30] [debug] [filter:modify:modify.5] Input map size 2 elements, output map size 11 elements
[2020/09/25 18:08:30] [debug] [filter:modify:modify.5] Input map size 2 elements, output map size 11 elements
[2020/09/25 18:08:30] [debug] [filter:modify:modify.5] Input map size 2 elements, output map size 11 elements
[2020/09/25 18:08:30] [debug] [filter:modify:modify.5] Input map size 2 elements, output map size 11 elements
[2020/09/25 18:08:30] [debug] [filter:nest:nest.6] outer map size is 11, will be 5, nested map size will be 7
[2020/09/25 18:08:30] [debug] [filter:nest:nest.6] outer map size is 11, will be 5, nested map size will be 7
[2020/09/25 18:08:30] [debug] [filter:nest:nest.6] outer map size is 11, will be 5, nested map size will be 7
[2020/09/25 18:08:30] [debug] [filter:nest:nest.6] outer map size is 11, will be 5, nested map size will be 7
[2020/09/25 18:08:30] [debug] [filter:modify:modify.2] Input map size 2 elements, output map size 2 elements
[2020/09/25 18:08:30] [debug] [filter:modify:modify.5] Input map size 2 elements, output map size 11 elements
[2020/09/25 18:08:30] [debug] [filter:nest:nest.6] outer map size is 11, will be 5, nested map size will be 7
[2020/09/25 18:08:30] [debug] [input:tail:tail.0] inode=12961276487 file=/logging-volume/single-largejson.log promote to TAIL_EVENT
[2020/09/25 18:08:30] [ info] inotify_fs_add(): inode=12961276487 watch_fd=1 name=/logging-volume/single-largejson.log
[2020/09/25 18:08:30] [debug] [task] created task=0x7fb975e302a0 id=0 OK
[2020/09/25 18:08:30] [debug] [input:tail:tail.0] scanning path /logging-volume/*.log
[0] in_tail: [1601057310.114859819, {"event"=>"Start SigSegV Repro Case on localhost...", "index"=>"ethos-sandbox-ue1", "host"=>"tseng-local", "sourcetype"=>"ethos-sigsegv", "fields"=>{"file_path"=>"/logging-volume/single-largejson.log", "namespace"=>"ns-personal-tseng", "node_ip"=>"0.0.0.0", "node_name"=>"localhost", "pod_ip"=>"0.0.0.0", "pod_name"=>"ethos-tseng-test", "pod_uid"=>"ethos-sigsegv-localtest"}}]
[1] in_tail: [1601057310.115355491, {"event"=>"test log between large logs...", "index"=>"ethos-sandbox-ue1", "host"=>"tseng-local", "sourcetype"=>"ethos-sigsegv", "fields"=>{"file_path"=>"/logging-volume/single-largejson.log", "namespace"=>"ns-personal-tseng", "node_ip"=>"0.0.0.0", "node_name"=>"localhost", "pod_ip"=>"0.0.0.0", "pod_name"=>"ethos-tseng-test", "pod_uid"=>"ethos-sigsegv-localtest"}}]
[2] in_tail: [1601057310.116012573, {"event"=>"test json log between large logs...", "index"=>"ethos-sandbox-ue1", "host"=>"tseng-local", "sourcetype"=>"ethos-sigsegv", "fields"=>{"file_path"=>"/logging-volume/single-largejson.log", "namespace"=>"ns-personal-tseng", "node_ip"=>"0.0.0.0", "node_name"=>"localhost", "pod_ip"=>"0.0.0.0", "pod_name"=>"ethos-tseng-test", "pod_uid"=>"ethos-sigsegv-localtest"}}]
[3] in_tail: [1601057310.116373777, {"event"=>"test log between large logs...", "index"=>"ethos-sandbox-ue1", "host"=>"tseng-local", "sourcetype"=>"ethos-sigsegv", "fields"=>{"file_path"=>"/logging-volume/single-largejson.log", "namespace"=>"ns-personal-tseng", "node_ip"=>"0.0.0.0", "node_name"=>"localhost", "pod_ip"=>"0.0.0.0", "pod_name"=>"ethos-tseng-test", "pod_uid"=>"ethos-sigsegv-localtest"}}]
[4] in_tail: [1601057310.121352195, {"event"=>"Stop SigSegv localhost testing...", "index"=>"ethos-sandbox-ue1", "host"=>"tseng-local", "sourcetype"=>"ethos-sigsegv", "fields"=>{"file_path"=>"/logging-volume/single-largejson.log", "namespace"=>"ns-personal-tseng", "node_ip"=>"0.0.0.0", "node_name"=>"localhost", "pod_ip"=>"0.0.0.0", "pod_name"=>"ethos-tseng-test", "pod_uid"=>"ethos-sigsegv-localtest"}}]
[2020/09/25 18:08:30] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /logging-volume/single-largejson.log, inode 12961276487
[2020/09/25 18:08:30] [debug] [input:tail:tail.0] 0 new files found on path '/logging-volume/*.log'
To Reproduce
Feed a large JSON object with > 128 Keys. log example:
log.zip
{"key1`":"value1","key2":"value2",..."key129":"value129"}
kubectl -n <namespace> apply -f ./fluent-bit-crashloop-largejson.yaml
k8s-manifest.zip
localhost skipping json logs case:
Run local and add stdout plugin right after tail a see that is skipping the json logs.
log.zip
local-conf.zip
docker run --memory=512m -v /test-conf/:/fluent-bit/etc/ -v /log/:/logging-volume/ -v /db/:/var/fluent-bit/ fluent/fluent-bit:1.5.3-debug
Expected behavior
No SIGSEGV crash.
When running localhost reproduction should not skip JSON logs.
Screenshots
Your Environment
Version used: 1.4.6 & 1.5.3
Configuration:
Environment name and version Kubernetes 1.17 for crashing behavior.
Server type and version: AWS and Azure
Operating System and version: flatcar linux
I am troubleshooting this