Reported on #429, we will track this problem here:
[2018/04/07 01:44:14] [ info] [out_http] HTTP STATUS=200
[2018/04/07 01:44:15] [ info] [out_http] HTTP STATUS=200
fluent-bit: /tmp/src/lib/monkey/deps/flb_libco/amd64.c:121: crash: Assertion `0' failed.
Fluent-Bit v0.13.0
Copyright (C) Treasure Data
[engine] caught signal
#0 0x7f6defdfa529 in ???() at ???:0
#1 0x7f6defdf1e66 in ???() at ???:0
#2 0x7f6defdf1f11 in ???() at ???:0
#3 0x55994d80c389 in ???() at ???:0
#4 0xffffffffffffffff in ???() at ???:0
This problem is associated to the bad use of co-routines implementation, likely there is an unexpected and explicit return from a co-routine that trigger this problem.
cc: @StevenACoffman @jgsqware @onorua
@stevenacoffman @jgsqware @onorua
are you facing this crash only when the HTTP server is enabled ?
I did not try to disable HTTP server to be honest, as we use it for metrics exposure.
@edsiper We had not noticed excessive restarts when the liveness and readiness probes were omitted previously. I will disable the HTTP Server, remove the prometheus annotations, the liveness and readiness probes and start over, and let it run overnight.
thanks. I was able to track down the issue in 0.13-dev, indeed the problem is in the HTTP Server, a fix will be available shortly.
note: If you see a restart when the HTTP server is off, that's a separate problem.
Yes we have Http Server setup, and we use the dev version for this purpose.
Le mar. 10 avr. 2018 Ã 04:26, Eduardo Silva notifications@github.com a
écrit :
thanks. I was able to track down the issue in 0.13-dev, indeed the problem
is in the HTTP Server, a fix will be available shortly.note: If you see a restart when the HTTP server is off, that's a separate
problem.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/fluent/fluent-bit/issues/557#issuecomment-379953402,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJrq-1ACsMLmAlHI-XYirHoGkEdUJR0ks5tnBhRgaJpZM4TLGmM
.
I've fixed the issues associated to the HTTP Server that generated the crash. All please upgrade to the following Docker image:
@edsiper FYI After 24 hours running fluent/fluent-bit-0.13-dev:0.15 without the http server, I only had 4 restarts on a single pod, and one on another. The other 20 pods had no issues. Not the same issue as what you fixed, but if you are curious, all 5 restarts were this error:
[2018/04/10 03:27:50] [ info] [out_http] HTTP STATUS=200
[engine] caught signal
Fluent-Bit v0.13.0
Copyright (C) Treasure Data
#0 0x7ffb141f2e63 in ???() at ???:0
#1 0x55de3bed7aec in ???() at ???:0
#2 0x55de3c0e7415 in msgpack_pack_ext_body() at lib/msgpack-2.1.3/include/msgpack/pack_template.h:890
#3 0x55de3c0e7415 in msgpack_pack_object() at lib/msgpack-2.1.3/src/objectc.c:72
#4 0x55de3bed84ce in ???() at ???:0
#5 0x55de3bed8d1c in ???() at ???:0
#6 0x55de3be7b3da in ???() at ???:0
#7 0x55de3be78e2c in ???() at ???:0
#8 0x55de3be7a98c in ???() at ???:0
#9 0x55de3be9e99d in ???() at ???:0
#10 0x55de3be9f7e8 in ???() at ???:0
#11 0x55de3be9d34c in ???() at ???:0
#12 0x55de3bea3721 in ???() at ???:0
#13 0x55de3be7aecb in ???() at ???:0
#14 0x55de3be81a86 in ???() at ???:0
#15 0x55de3be21ef3 in ???() at ???:0
#16 0x7ffb140ea2e0 in ???() at ???:0
#17 0x55de3be20449 in ???() at ???:0
#18 0xffffffffffffffff in ???() at ???:0
I have just applied the fluent/fluent-bit-0.13-dev:0.16 changes and will monitor it and let you know.
@edsiper fluent/fluent-bit-0.13-dev:0.16 with HTTP_Server On is in CrashLoopBackOff for all applied containers:
[2018/04/10 20:57:33] [ info] [out_http] HTTP STATUS=200
[2018/04/10 20:57:33] [ info] [out_http] HTTP STATUS=200
[engine] caught signal
Fluent-Bit v0.13.0
Copyright (C) Treasure Data
#0 0x55a212d8e727 in __mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:87
#1 0x55a212d8e75e in mk_list_del() at lib/monkey/include/monkey/mk_core/mk_list.h:93
#2 0x55a212d8eb64 in mk_http_thread_purge() at lib/monkey/mk_server/mk_http_thread.c:197
#3 0x55a212d8e8e6 in thread_cb_init_vars() at lib/monkey/mk_server/mk_http_thread.c:104
#4 0x55a212d995e6 in co_init() at lib/monkey/deps/flb_libco/amd64.c:117
#5 0xffffffffffffffff in ???() at ???:0
Some error immediately, some run for a minute, but then they all crash with the same error.
thanks for the info. troubleshooting.
I'm doubting to start a new issue...but I faced some crashes aswell.
Version: fluent/fluent-bit-0.13-dev:0.16
Kubernetes: 1.8.4
```[2018/04/16 06:04:21] [debug] [filter_kube] API Server (ns=online-xxx, pod=xxxxxx-bff-848967cd86-mlct5) http_do=0, HTTP Status: 200
[engine] caught signal
Fluent-Bit v0.13.0
Copyright (C) Treasure Data
#0 0x7f59d36f0f08 in ???() at ???:0
There are several hundred of pods running, but its was caused by one pod only. It's a nginx pod. I changed the log format to json and then fluent-bit started to crash on that particular node.
I figured out that the json was not correct. The json parser of fluent-bit crashes without giving any clue. The c implementation of the json parser is probably a limited one.
I attached a file with the wrong log and one file with the fixed one. This way it might be reproducible.
@marckamerbeek what you have reported looks like a different issue, I've filed #567 for your case, let's follow up there.
We currently experiencing the same since we upgraded to 0.16
would you please try the following image which have several fixes in the HTTP server side?
edsiper/fluent-bit-0.13-next:0.17
note: this image size is 150MB and should be used only to try to reproduce the main problem in question
Currently, no restart with your version.
10 minutes without restart, I'll let it run all gnight long and keep you posted
thanks @jgsqware
@edsiper I have applied the change and am going to leave it running overnight as well. Here are results from one hour:
$ kubectl get pods -n kangaroo -l k8s-app=fluent-bit-logging
NAME READY STATUS RESTARTS AGE
fluent-bit-24c7c 3/3 Running 4 1h
fluent-bit-2f4ph 3/3 Running 0 1h
fluent-bit-884lq 3/3 Running 0 1h
fluent-bit-ccbp8 3/3 Running 1 1h
fluent-bit-fmq4k 2/3 Running 0 1h
fluent-bit-g7mjq 3/3 Running 10 1h
fluent-bit-mk5cf 3/3 Running 0 1h
fluent-bit-mtqhc 3/3 Running 0 1h
fluent-bit-pd629 3/3 Running 0 1h
fluent-bit-pfgsv 3/3 Running 2 1h
fluent-bit-s2phj 2/3 Running 0 1h
fluent-bit-tsbv5 3/3 Running 0 1h
kubectl logs -n kangaroo fluent-bit-g7mjq -c fluent-bit --previous results:
[2018/04/16 23:05:02] [ info] [out_http] HTTP STATUS=200
[2018/04/16 23:05:02] [ info] [out_http] HTTP STATUS=200
[2018/04/16 23:05:02] [ info] [out_http] HTTP STATUS=200
[engine] caught signal
[2018/04/16 23:05:02] [ info] [out_http] HTTP STATUS=200
[2018/04/16 23:05:02] [ info] [out_http] HTTP STATUS=200
[2018/04/16 23:05:02] [ info] [out_http] HTTP STATUS=200
All the pods with restarts exhibit the same behavior. Sometimes [engine] caught signal is the last log message, but generally it's a few before that.
If it is helpful, here is the startup logs:
[2018/04/16 23:14:56] [ info] [engine] started
[2018/04/16 23:14:56] [ info] [filter_kube] https=1 host=kubernetes.default.svc.cluster.local port=443
[2018/04/16 23:14:56] [ info] [filter_kube] local POD info OK
[2018/04/16 23:14:56] [ info] [filter_kube] testing connectivity with API server...
[2018/04/16 23:14:56] [ info] [filter_kube] API server connectivity OK
[2018/04/16 23:14:56] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2018/04/16 23:14:58] [ info] [out_http] HTTP STATUS=200
With log level set to debug:
[2018/04/16 23:20:56] [debug] [filter_kube] could not merge JSON log as requested
[2018/04/16 23:20:56] [debug] [filter_kube] could not merge JSON log as requested
[2018/04/16 23:20:56] [debug] [filter_kube] could not merge JSON log as requested
[2018/04/16 23:20:56] [debug] [filter_kube] could not merge JSON log as requested
[2018/04/16 23:20:56] [debug] [filter_kube] could not merge JSON log as requested
[2018/04/16 23:20:56] [debug] [filter_kube] could not merge JSON log as requested
[2018/04/16 23:20:56] [debug] [input tail.0] [mem buf] size = 5212894
[2018/04/16 23:20:56] [debug] [input] tail.0 paused (mem buf overlimit)
[2018/04/16 23:20:56] [debug] [in_tail] file=/var/log/containers/fluent-bit-g4nst_kangaroo_fluent-bit-5327b6357ef3287060d07e9ff6523f6ec9c592b5cd7d085617605cdd197bc631.log read=32753 lines=214
[2018/04/16 23:20:56] [ info] [out_http] HTTP STATUS=200
[2018/04/16 23:20:56] [debug] [task] created task=0x563255dcbd10 id=2 OK
[2018/04/16 23:20:57] [ info] [out_http] HTTP STATUS=200
In my configuration, I have been ommitting these lines from the docker parser config:
# Command | Decoder | Field | Optional Action
# =============|==================|=================
Decode_Field_As escaped log
After adding those lines to the configmap, deleting the daemonset and re-applying with Log_Level debug, one of the restarting pods gives me these results from kubectl logs -n kangaroo fluent-bit-xprqc -c fluent-bit --previous:
494b78.log event
[2018/04/16 23:27:32] [debug] [in_tail] file=/var/log/containers/fluent-bit-xprqc_kangaroo_fluent-bit-caaa8b76c69cc60a0b427af79b0b2f33fbbfe116d41a275b5e7dafbf6c494b78.log event
[2018/04/16 23:27:32] [debug] [in_tail] file=/var/log/containers/fluent-bit-xprqc_kangaroo_fluent-bit-caaa8b76c69cc60a0b427af79b0b2f33fbbfe116d41a275b5e7dafbf6c494b78.log event
[2018/04/16 23:27:32] [debug] [in_tail] file=/var/log/containers/fluent-bit-xprqc_kangaroo_fluent-bit-caaa8b76c69cc60a0b427af79b0b2f33fbbfe116d41a275b5e7dafbf6c494b78.log event
[2018/04/16 23:27:32] [debug] [in_tail] file=/var/log/containers/fluent-bit-xprqc_kangaroo_fluent-bit-caaa8b76c69cc60a0b427af79b0b2f33fbbfe116d41a275b5e7dafbf6c494b78.log event
[2018/04/16 23:27:32] [debug] [in_tail] file=/var/log/containers/fluent-bit-xprqc_kangaroo_fluent-bit-caaa8b76c69cc60a0b427af79b0b2f33fbbfe116d41a275b5e7dafbf6c494b78.log event
[2018/04/16 23:27:32] [debug] [in_tail] file=/var/log/containers/fluent-bit-xprqc_kangaroo_fluent-bit-caaa8b76c69cc60a0b427af79b0b2f33fbbfe116d41a275b5e7dafbf6c494b78.log event
[2018/04/16 23:27:32] [debug] [in_tail] file=/var/log/containers/fluent-bit-xprqc_kangaroo_fluent-bit-caaa8b76c69cc60a0b427af79b0b2f33fbbfe116d41a275b5e7dafbf6c494b78.log event
[2018/04/16 23:27:32] [debug] [in_tail] file=/var/log/containers/fluent-bit-xprqc_kangaroo_fluent-bit-caaa8b76c69cc60a0b427af79b0b2f33fbbfe116d41a275b5e7dafbf6c494b78.log event
[2018/04/16 23:27:32] [debug] [in_tail] file=/var/log/containers/fluent-bit-xprqc_kangaroo_fluent-bit-caaa8b76c69cc60a0b427af79b0b2f33fbbfe116d41a275b5e7dafbf6c494b78.log event
[2018/04/16 23:27:32] [debug] [task] created task=0x562bdc6b5020 id=0 OK
[2018/04/16 23:27:32] [debug] [task] created task=0x562bdc6b4c30 id=1 OK
[2018/04/16 23:27:32] [debug] [task] created task=0x562bdc6c4b50 id=2 OK
[2018/04/16 23:27:32] [debug] [task] created task=0x562bdc6b69f0 id=3 OK
[2018/04/16 23:27:33] [ info] [out_http] HTTP STATUS=200
[2018/04/16 23:27:33] [debug] [in_tail] file=/var/log/containers/fluent-bit-xprqc_kangaroo_hermes-20c3961e13e5a92f2b2652a1cdce0c73ecba06a62286dc3a1f876d755c93cd52.log event
[2018/04/16 23:27:33] [debug] [input tail.0] [mem buf] size = 12404
[2018/04/16 23:27:33] [debug] [in_tail] file=/var/log/containers/fluent-bit-xprqc_kangaroo_hermes-20c3961e13e5a92f2b2652a1cdce0c73ecba06a62286dc3a1f876d755c93cd52.log read=206 lines=1
[2018/04/16 23:27:33] [debug] [in_tail] file=/var/log/containers/fluent-bit-xprqc_kangaroo_hermes-20c3961e13e5a92f2b2652a1cdce0c73ecba06a62286dc3a1f876d755c93cd52.log event
[2018/04/16 23:27:33] [debug] [input tail.0] [mem buf] size = 14774
[2018/04/16 23:27:33] [debug] [in_tail] file=/var/log/containers/fluent-bit-xprqc_kangaroo_hermes-20c3961e13e5a92f2b2652a1cdce0c73ecba06a62286dc3a1f876d755c93cd52.log read=414 lines=2
[engine] caught signal
Running overnight with edsiper/fluent-bit-0.13-next:0.17, logs set to Log_Level info, and omitting Decode_Field_As escaped log, when I run kubectl get pods -n kangaroo -l k8s-app=fluent-bit-logging I get this:
NAME READY STATUS RESTARTS AGE
fluent-bit-4qwsp 3/3 Running 61 14h
fluent-bit-8nhz4 2/3 Running 0 14h
fluent-bit-bnbgp 3/3 Running 61 14h
fluent-bit-gfd26 3/3 Running 0 14h
fluent-bit-jr74g 3/3 Running 14 14h
fluent-bit-mq66l 3/3 Running 1 14h
fluent-bit-mwmqp 2/3 CrashLoopBackOff 154 14h
fluent-bit-pqqqb 3/3 Running 8 14h
fluent-bit-pr429 3/3 Running 1 14h
fluent-bit-qgjp5 3/3 Running 106 14h
fluent-bit-vrzl4 3/3 Running 1 14h
fluent-bit-z7xbm 3/3 Running 7 14h
For the one in CrashLoopBackOff, I run kubectl get pods -n kangaroo -l k8s-app=fluent-bit-logging and get this:
[2018/04/17 13:47:32] [ info] [out_http] HTTP STATUS=200
[2018/04/17 13:47:32] [ info] [out_http] HTTP STATUS=200
[2018/04/17 13:47:32] [ info] [out_http] HTTP STATUS=200
[2018/04/17 13:47:32] [ info] [out_http] HTTP STATUS=200
[engine] caught signal
[2018/04/17 13:47:32] [ info] [out_http] HTTP STATUS=200
[2018/04/17 13:47:33] [ info] [out_http] HTTP STATUS=200
*** stack smashing detected ***: <unknown> terminated
[engine] caught signal
This is for Kubernetes 1.8.1:
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.6", GitCommit:"9f8ebd171479bec0ada837d7ee641dec2f8c6dd1", GitTreeState:"clean", BuildDate:"2018-03-21T15:21:50Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.4", GitCommit:"9befc2b8928a9426501d3bf62f72849d5cbcd5a3", GitTreeState:"clean", BuildDate:"2017-11-20T05:17:43Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
thanks @StevenACoffman
definitely we need more information associated with the crashes. I've pushed a new version which improve that area, would you please give it a try ?
edsiper/fluent-bit-0.13-next:0.17-2
note: it also upgrade librdkafka to v0.11.4
Running with edsiper/fluent-bit-0.13-next:0.17-2, logs set to Log_Level debug, and omitting Decode_Field_As escaped log, when I run kubectl get pods -n kangaroo -l k8s-app=fluent-bit-logging I got this:
kubectl get pods -n kangaroo -l k8s-app=fluent-bit-logging
NAME READY STATUS RESTARTS AGE
fluent-bit-4b64n 2/3 OOMKilled 4 2m
fluent-bit-5wnrr 2/3 CrashLoopBackOff 4 2m
fluent-bit-7d77t 2/3 OOMKilled 4 2m
fluent-bit-8fcxs 2/3 CrashLoopBackOff 4 2m
fluent-bit-95gdz 2/3 CrashLoopBackOff 4 2m
fluent-bit-fwm57 2/3 OOMKilled 4 2m
fluent-bit-jl2hh 2/3 CrashLoopBackOff 4 2m
fluent-bit-m8gnz 2/3 CrashLoopBackOff 4 2m
fluent-bit-sqv2s 2/3 OOMKilled 4 2m
fluent-bit-vxtpn 2/3 CrashLoopBackOff 4 2m
fluent-bit-x284m 2/3 CrashLoopBackOff 4 2m
fluent-bit-xbmg7 2/3 CrashLoopBackOff 4 2m
I am adjusting limits (completely arbitrarily chosen) to:
resources:
requests:
cpu: 5m
memory: 50Mi
limits:
cpu: 50m
memory: 260Mi
@StevenACoffman are you using Kafka output ?
No, just http
hmm please try to disable "debug" mode, looks like Fluent Bit is taking it own output out through out_http ?
Running with edsiper/fluent-bit-0.13-next:0.17-2, logs set to Log_Level info, and omitting Decode_Field_As escaped log, when I run kubectl logs -n kangaroo fluent-bit-qmb6v -c fluent-bit --previous I got this:
$ kubectl logs -n kangaroo fluent-bit-qmb6v -c fluent-bit --previous
[2018/04/17 19:22:12] [ info] [engine] started
[2018/04/17 19:22:12] [ info] [filter_kube] https=1 host=kubernetes.default.svc.cluster.local port=443
[2018/04/17 19:22:12] [ info] [filter_kube] local POD info OK
[2018/04/17 19:22:12] [ info] [filter_kube] testing connectivity with API server...
[2018/04/17 19:22:13] [ info] [filter_kube] API server connectivity OK
[2018/04/17 19:22:13] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
=================================================================
==1==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffd0c62e78 at pc 0x7ffb3c5db733 bp 0x7fffd0c61e60 sp 0x7fffd0c61608
READ of size 1645299 at 0x7fffd0c62e78 thread T0
#0 0x7ffb3c5db732 (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x79732)
#1 0x55ea7c25f98f in msgpack_sbuffer_write /home/edsiper/coding/fluent-bit/lib/msgpack-2.1.3/include/msgpack/sbuffer.h:84
#2 0x55ea7c74c7b1 in msgpack_pack_ext_body /home/edsiper/coding/fluent-bit/lib/msgpack-2.1.3/include/msgpack/pack_template.h:890
#3 0x55ea7c74c7b1 in msgpack_pack_object /home/edsiper/coding/fluent-bit/lib/msgpack-2.1.3/src/objectc.c:72
#4 0x55ea7c261826 in pack_map_content /home/edsiper/coding/fluent-bit/plugins/filter_kubernetes/kubernetes.c:322
#5 0x55ea7c262da3 in cb_kube_filter /home/edsiper/coding/fluent-bit/plugins/filter_kubernetes/kubernetes.c:491
#6 0x55ea7c1455b5 in flb_filter_do /home/edsiper/coding/fluent-bit/src/flb_filter.c:86
#7 0x55ea7c13f50b in flb_input_dbuf_write_end /home/edsiper/coding/fluent-bit/include/fluent-bit/flb_input.h:642
#8 0x55ea7c143122 in flb_input_dyntag_append_raw /home/edsiper/coding/fluent-bit/src/flb_input.c:894
#9 0x55ea7c1a3547 in process_content /home/edsiper/coding/fluent-bit/plugins/in_tail/tail_file.c:290
#10 0x55ea7c1a5c13 in flb_tail_file_chunk /home/edsiper/coding/fluent-bit/plugins/in_tail/tail_file.c:651
#11 0x55ea7c19e916 in in_tail_collect_static /home/edsiper/coding/fluent-bit/plugins/in_tail/tail.c:129
#12 0x55ea7c14437c in flb_input_collector_fd /home/edsiper/coding/fluent-bit/src/flb_input.c:995
#13 0x55ea7c155516 in flb_engine_handle_event /home/edsiper/coding/fluent-bit/src/flb_engine.c:296
#14 0x55ea7c155516 in flb_engine_start /home/edsiper/coding/fluent-bit/src/flb_engine.c:515
#15 0x55ea7c11baff in main /home/edsiper/coding/fluent-bit/src/fluent-bit.c:808
#16 0x7ffb3afd4b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
#17 0x55ea7c118729 in _start (/fluent-bit/bin/fluent-bit+0x102729)
Address 0x7fffd0c62e78 is located in stack of thread T0 at offset 568 in frame
#0 0x55ea7c26239b in cb_kube_filter /home/edsiper/coding/fluent-bit/plugins/filter_kubernetes/kubernetes.c:396
This frame has 11 object(s):
[32, 40) 'off'
[96, 104) 'cache_buf'
[160, 168) 'cache_size'
[224, 240) 'tmp_pck'
[288, 304) 'props'
[352, 376) 'time'
[416, 440) 'map'
[480, 504) 'root'
[544, 568) 'tmp_sbuf'
[608, 640) 'result' <== Memory access at offset 568 partially underflows this variable
[672, 752) 'meta' <== Memory access at offset 568 partially underflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x79732)
Shadow bytes around the buggy address:
0x10007a184570: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007a184580: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f2 f2 f2
0x10007a184590: f2 f2 f2 f2 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2
0x10007a1845a0: f2 f2 f2 f2 00 00 f2 f2 f2 f2 f2 f2 00 00 f2 f2
0x10007a1845b0: f2 f2 f2 f2 00 00 00 f2 f2 f2 f2 f2 00 00 00 f2
=>0x10007a1845c0: f2 f2 f2 f2 00 00 00 f2 f2 f2 f2 f2 00 00 00[f2]
0x10007a1845d0: f2 f2 f2 f2 00 00 00 00 f2 f2 f2 f2 00 00 00 00
0x10007a1845e0: 00 00 00 00 00 00 f2 f2 00 00 00 00 00 00 00 00
0x10007a1845f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007a184600: 00 00 f1 f1 f1 f1 00 f2 f2 f2 f2 f2 f2 f2 00 f2
0x10007a184610: f2 f2 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==1==ABORTING
Another one kubectl logs -n kangaroo fluent-bit-ncvpn -c fluent-bit --previous gave:
[2018/04/17 19:23:03] [ info] [engine] started
[2018/04/17 19:23:03] [ info] [filter_kube] https=1 host=kubernetes.default.svc.cluster.local port=443
[2018/04/17 19:23:03] [ info] [filter_kube] local POD info OK
[2018/04/17 19:23:03] [ info] [filter_kube] testing connectivity with API server...
[2018/04/17 19:23:04] [ info] [filter_kube] API server connectivity OK
[2018/04/17 19:23:04] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
=================================================================
==1==ERROR: AddressSanitizer: unknown-crash on address 0x60400001c969 at pc 0x55908d7167b8 bp 0x7ffdb76fe530 sp 0x7ffdb76fe520
READ of size 24 at 0x60400001c969 thread T0
#0 0x55908d7167b7 in pack_map_content /home/edsiper/coding/fluent-bit/plugins/filter_kubernetes/kubernetes.c:320
#1 0x55908d717da3 in cb_kube_filter /home/edsiper/coding/fluent-bit/plugins/filter_kubernetes/kubernetes.c:491
#2 0x55908d5fa5b5 in flb_filter_do /home/edsiper/coding/fluent-bit/src/flb_filter.c:86
#3 0x55908d5f450b in flb_input_dbuf_write_end /home/edsiper/coding/fluent-bit/include/fluent-bit/flb_input.h:642
#4 0x55908d5f8122 in flb_input_dyntag_append_raw /home/edsiper/coding/fluent-bit/src/flb_input.c:894
#5 0x55908d658547 in process_content /home/edsiper/coding/fluent-bit/plugins/in_tail/tail_file.c:290
#6 0x55908d65ac13 in flb_tail_file_chunk /home/edsiper/coding/fluent-bit/plugins/in_tail/tail_file.c:651
#7 0x55908d653916 in in_tail_collect_static /home/edsiper/coding/fluent-bit/plugins/in_tail/tail.c:129
#8 0x55908d5f937c in flb_input_collector_fd /home/edsiper/coding/fluent-bit/src/flb_input.c:995
#9 0x55908d60a516 in flb_engine_handle_event /home/edsiper/coding/fluent-bit/src/flb_engine.c:296
#10 0x55908d60a516 in flb_engine_start /home/edsiper/coding/fluent-bit/src/flb_engine.c:515
#11 0x55908d5d0aff in main /home/edsiper/coding/fluent-bit/src/fluent-bit.c:808
#12 0x7f79e755eb96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
#13 0x55908d5cd729 in _start (/fluent-bit/bin/fluent-bit+0x102729)
0x60400001c980 is located 0 bytes to the right of 48-byte region [0x60400001c950,0x60400001c980)
allocated by thread T0 here:
#0 0x7f79e8bcab50 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb50)
#1 0x55908d6340ad in flb_malloc /home/edsiper/coding/fluent-bit/include/fluent-bit/flb_mem.h:57
#2 0x55908d637e17 in tokens_to_msgpack /home/edsiper/coding/fluent-bit/src/flb_pack.c:159
#3 0x55908d638142 in flb_pack_json /home/edsiper/coding/fluent-bit/src/flb_pack.c:198
#4 0x55908d715893 in merge_log_handler /home/edsiper/coding/fluent-bit/plugins/filter_kubernetes/kubernetes.c:138
#5 0x55908d715ed6 in pack_map_content /home/edsiper/coding/fluent-bit/plugins/filter_kubernetes/kubernetes.c:235
#6 0x55908d717da3 in cb_kube_filter /home/edsiper/coding/fluent-bit/plugins/filter_kubernetes/kubernetes.c:491
#7 0x55908d5fa5b5 in flb_filter_do /home/edsiper/coding/fluent-bit/src/flb_filter.c:86
#8 0x55908d5f450b in flb_input_dbuf_write_end /home/edsiper/coding/fluent-bit/include/fluent-bit/flb_input.h:642
#9 0x55908d5f8122 in flb_input_dyntag_append_raw /home/edsiper/coding/fluent-bit/src/flb_input.c:894
#10 0x55908d658547 in process_content /home/edsiper/coding/fluent-bit/plugins/in_tail/tail_file.c:290
#11 0x55908d65ac13 in flb_tail_file_chunk /home/edsiper/coding/fluent-bit/plugins/in_tail/tail_file.c:651
#12 0x55908d653916 in in_tail_collect_static /home/edsiper/coding/fluent-bit/plugins/in_tail/tail.c:129
#13 0x55908d5f937c in flb_input_collector_fd /home/edsiper/coding/fluent-bit/src/flb_input.c:995
#14 0x55908d60a516 in flb_engine_handle_event /home/edsiper/coding/fluent-bit/src/flb_engine.c:296
#15 0x55908d60a516 in flb_engine_start /home/edsiper/coding/fluent-bit/src/flb_engine.c:515
#16 0x55908d5d0aff in main /home/edsiper/coding/fluent-bit/src/fluent-bit.c:808
#17 0x7f79e755eb96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
SUMMARY: AddressSanitizer: unknown-crash /home/edsiper/coding/fluent-bit/plugins/filter_kubernetes/kubernetes.c:320 in pack_map_content
Shadow bytes around the buggy address:
0x0c087fffb8d0: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
0x0c087fffb8e0: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
0x0c087fffb8f0: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
0x0c087fffb900: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
0x0c087fffb910: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
=>0x0c087fffb920: fa fa 00 00 00 00 00 fa fa fa 00 00 00[00]00 00
0x0c087fffb930: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c087fffb940: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c087fffb950: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c087fffb960: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c087fffb970: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==1==ABORTING
[2018/04/17 19:25:35] [ info] [engine] started
[2018/04/17 19:25:36] [ info] [filter_kube] https=1 host=kubernetes.default.svc.cluster.local port=443
[2018/04/17 19:25:36] [ info] [filter_kube] local POD info OK
[2018/04/17 19:25:36] [ info] [filter_kube] testing connectivity with API server...
[2018/04/17 19:25:37] [ info] [filter_kube] API server connectivity OK
[2018/04/17 19:25:37] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
=================================================================
==1==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000010811 at pc 0x5592af840710 bp 0x7ffc8c88e540 sp 0x7ffc8c88e530
READ of size 24 at 0x602000010811 thread T0
#0 0x5592af84070f in pack_map_content /home/edsiper/coding/fluent-bit/plugins/filter_kubernetes/kubernetes.c:319
#1 0x5592af841da3 in cb_kube_filter /home/edsiper/coding/fluent-bit/plugins/filter_kubernetes/kubernetes.c:491
#2 0x5592af7245b5 in flb_filter_do /home/edsiper/coding/fluent-bit/src/flb_filter.c:86
#3 0x5592af71e50b in flb_input_dbuf_write_end /home/edsiper/coding/fluent-bit/include/fluent-bit/flb_input.h:642
#4 0x5592af722122 in flb_input_dyntag_append_raw /home/edsiper/coding/fluent-bit/src/flb_input.c:894
#5 0x5592af782547 in process_content /home/edsiper/coding/fluent-bit/plugins/in_tail/tail_file.c:290
#6 0x5592af784c13 in flb_tail_file_chunk /home/edsiper/coding/fluent-bit/plugins/in_tail/tail_file.c:651
#7 0x5592af77d916 in in_tail_collect_static /home/edsiper/coding/fluent-bit/plugins/in_tail/tail.c:129
#8 0x5592af72337c in flb_input_collector_fd /home/edsiper/coding/fluent-bit/src/flb_input.c:995
#9 0x5592af734516 in flb_engine_handle_event /home/edsiper/coding/fluent-bit/src/flb_engine.c:296
#10 0x5592af734516 in flb_engine_start /home/edsiper/coding/fluent-bit/src/flb_engine.c:515
#11 0x5592af6faaff in main /home/edsiper/coding/fluent-bit/src/fluent-bit.c:808
#12 0x7f64a89f1b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
#13 0x5592af6f7729 in _start (/fluent-bit/bin/fluent-bit+0x102729)
0x602000010818 is located 0 bytes to the right of 8-byte region [0x602000010810,0x602000010818)
allocated by thread T0 here:
#0 0x7f64aa05db50 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb50)
#1 0x5592af75e0ad in flb_malloc /home/edsiper/coding/fluent-bit/include/fluent-bit/flb_mem.h:57
#2 0x5592af761e17 in tokens_to_msgpack /home/edsiper/coding/fluent-bit/src/flb_pack.c:159
#3 0x5592af762142 in flb_pack_json /home/edsiper/coding/fluent-bit/src/flb_pack.c:198
#4 0x5592af83f893 in merge_log_handler /home/edsiper/coding/fluent-bit/plugins/filter_kubernetes/kubernetes.c:138
#5 0x5592af83fed6 in pack_map_content /home/edsiper/coding/fluent-bit/plugins/filter_kubernetes/kubernetes.c:235
#6 0x5592af841da3 in cb_kube_filter /home/edsiper/coding/fluent-bit/plugins/filter_kubernetes/kubernetes.c:491
#7 0x5592af7245b5 in flb_filter_do /home/edsiper/coding/fluent-bit/src/flb_filter.c:86
#8 0x5592af71e50b in flb_input_dbuf_write_end /home/edsiper/coding/fluent-bit/include/fluent-bit/flb_input.h:642
#9 0x5592af722122 in flb_input_dyntag_append_raw /home/edsiper/coding/fluent-bit/src/flb_input.c:894
#10 0x5592af782547 in process_content /home/edsiper/coding/fluent-bit/plugins/in_tail/tail_file.c:290
#11 0x5592af784c13 in flb_tail_file_chunk /home/edsiper/coding/fluent-bit/plugins/in_tail/tail_file.c:651
#12 0x5592af77d916 in in_tail_collect_static /home/edsiper/coding/fluent-bit/plugins/in_tail/tail.c:129
#13 0x5592af72337c in flb_input_collector_fd /home/edsiper/coding/fluent-bit/src/flb_input.c:995
#14 0x5592af734516 in flb_engine_handle_event /home/edsiper/coding/fluent-bit/src/flb_engine.c:296
#15 0x5592af734516 in flb_engine_start /home/edsiper/coding/fluent-bit/src/flb_engine.c:515
#16 0x5592af6faaff in main /home/edsiper/coding/fluent-bit/src/fluent-bit.c:808
#17 0x7f64a89f1b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
SUMMARY: AddressSanitizer: heap-buffer-overflow /home/edsiper/coding/fluent-bit/plugins/filter_kubernetes/kubernetes.c:319 in pack_map_content
Shadow bytes around the buggy address:
0x0c047fffa0b0: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fa
0x0c047fffa0c0: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fa
0x0c047fffa0d0: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fa
0x0c047fffa0e0: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fa
0x0c047fffa0f0: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fa
=>0x0c047fffa100: fa fa[00]fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fffa110: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fffa120: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fffa130: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fffa140: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fffa150: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==1==ABORTING
[2018/04/17 19:31:44] [ info] [out_http] HTTP STATUS=200
ASAN:DEADLYSIGNAL
=================================================================
==1==ERROR: AddressSanitizer: stack-overflow on address 0x62b00059a220 (pc 0x62b00059a220 bp 0x62b00059a080 sp 0x62b00059a060 T0)
[2018/04/17 19:31:45] [ info] [out_http] HTTP STATUS=200
==1==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7f20bdfc4880; bottom 0x62b000599000; size: 0x1c70bda2b880 (31270543472768)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
ASAN:DEADLYSIGNAL
==1==AddressSanitizer: while reporting a bug found another one. Ignoring.
#0 0x62b00059a21f (<unknown module>)
SUMMARY: AddressSanitizer: stack-overflow (<unknown module>)
==1==ABORTING
I'm just cherry picking the ones that look different:
[2018/04/17 19:41:35] [ info] [out_http] HTTP STATUS=200
ASAN:DEADLYSIGNAL
=================================================================
==1==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000008 (pc 0x5599ee118809 bp 0x62b0008b8100 sp 0x62b0008b8090 T4)
==1==The signal is caused by a READ memory access.
==1==Hint: address points to the zero page.
ASAN:DEADLYSIGNAL
==1==AddressSanitizer: while reporting a bug found another one. Ignoring.
#0 0x5599ee118808 in flb_thread_yield /home/edsiper/coding/fluent-bit/include/fluent-bit/flb_thread_libco.h:69
#1 0x5599ee118808 in net_io_read_async /home/edsiper/coding/fluent-bit/src/flb_io.c:413
#2 0x5599ee118808 in flb_io_net_read /home/edsiper/coding/fluent-bit/src/flb_io.c:484
#3 0x5599ee20d9a6 in flb_http_do /home/edsiper/coding/fluent-bit/src/flb_http_client.c:824
#4 0x5599ee1a46c8 in cb_http_flush /home/edsiper/coding/fluent-bit/plugins/out_http/http.c:390
#5 0x5599ee0e48f5 in output_pre_cb_flush /home/edsiper/coding/fluent-bit/include/fluent-bit/flb_output.h:310
#6 0x5599ee724d86 in co_init /home/edsiper/coding/fluent-bit/lib/monkey/deps/flb_libco/amd64.c:117
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /home/edsiper/coding/fluent-bit/include/fluent-bit/flb_thread_libco.h:69 in flb_thread_yield
Thread T4 (monkey: wrk/0) created by T2 (monkey: server) here:
#0 0x7fbb6b2aad2f in __interceptor_pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x37d2f)
#1 0x5599ee6f9610 in mk_sched_launch_thread /home/edsiper/coding/fluent-bit/lib/monkey/mk_server/mk_scheduler.c:438
#2 0x5599ee711e23 in mk_server_launch_workers /home/edsiper/coding/fluent-bit/lib/monkey/mk_server/mk_server.c:255
#3 0x5599ee71886a in mk_server_setup /home/edsiper/coding/fluent-bit/lib/monkey/mk_server/monkey.c:169
#4 0x5599ee6e5008 in mk_lib_worker /home/edsiper/coding/fluent-bit/lib/monkey/mk_server/mk_lib.c:140
#5 0x7fbb6a4d86da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
Thread T2 (monkey: server) created by T0 here:
#0 0x7fbb6b2aad2f in __interceptor_pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x37d2f)
#1 0x5599ee7215f9 in mk_utils_worker_spawn /home/edsiper/coding/fluent-bit/lib/monkey/mk_core/mk_utils.c:244
#2 0x5599ee6e5579 in mk_start /home/edsiper/coding/fluent-bit/lib/monkey/mk_server/mk_lib.c:182
#3 0x5599ee11b99c in flb_hs_start /home/edsiper/coding/fluent-bit/src/http_server/flb_hs.c:96
#4 0x5599ee0e31cc in flb_engine_start /home/edsiper/coding/fluent-bit/src/flb_engine.c:504
#5 0x5599ee0a9aff in main /home/edsiper/coding/fluent-bit/src/fluent-bit.c:808
#6 0x7fbb69ce5b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
==1==ABORTING
@StevenACoffman thanks for the report :) , that is very helpful.
in the nodes where Fluent Bit is crashing, is there any Pod using the new Annotations feature to specify a parser ?
Go-spew is available as open source and is running on one of the nodes. Most applications do not have that annotation, so the majority of nodes do not have apps with that annotation.
@StevenACoffman
I've not been able to make it crash, anyways I have added some warn messages to stdout around one of the principal crashes reported above. would you please try the following test image ? (the only changes are some messages like "[issue_557]...":
edsiper/fluent-bit-0.13-next:0.17-3
question: if you disable the HTTP Server, do you see the same crash ?
Changes associated to this problem reported:
==1==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000010811 at pc 0x5592af840710 bp 0x7ffc8c88e540 sp 0x7ffc8c88e530
READ of size 24 at 0x602000010811 thread T0
#0 0x5592af84070f in pack_map_content /home/edsiper/coding/fluent-bit/plugins/filter_kubernetes/kubernetes.c:319
#1 0x5592af841da3 in cb_kube_filter /home/edsiper/coding/fluent-bit/plugins/filter_kubernetes/kubernetes.c:491
#2 0x5592af7245b5 in flb_filter_do /home/edsiper/coding/fluent-bit/src/flb_filter.c:86
#3 0x5592af71e50b in flb_input_dbuf_write_end /home/edsiper/coding/fluent-bit/include/fluent-bit/flb_input.h:642
#4 0x5592af722122 in flb_input_dyntag_append_raw /home/edsiper/coding/fluent-bit/src/flb_input.c:894
#5 0x5592af782547 in process_content /home/edsiper/coding/fluent-bit/plugins/in_tail/tail_file.c:290
#6 0x5592af784c13 in flb_tail_file_chunk /home/edsiper/coding/fluent-bit/plugins/in_tail/tail_file.c:651
#7 0x5592af77d916 in in_tail_collect_static /home/edsiper/coding/fluent-bit/plugins/in_tail/tail.c:129
#8 0x5592af72337c in flb_input_collector_fd /home/edsiper/coding/fluent-bit/src/flb_input.c:995
#9 0x5592af734516 in flb_engine_handle_event /home/edsiper/coding/fluent-bit/src/flb_engine.c:296
#10 0x5592af734516 in flb_engine_start /home/edsiper/coding/fluent-bit/src/flb_engine.c:515
#11 0x5592af6faaff in main /home/edsiper/coding/fluent-bit/src/fluent-bit.c:808
#12 0x7f64a89f1b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
#13 0x5592af6f7729 in _start (/fluent-bit/bin/fluent-bit+0x102729)
__research__ the stack trace above can be generated by a false positive where AddressSanitizer is not aware about coroutines and stack swtiches.
@StevenACoffman
in order to continue troubleshooting I need your help with:
When I run edsiper/fluent-bit-0.13-next:0.17 (no -2, no -3) with HTTP Server disabled, annotations off, and remove the health and readiness checks, I have not yet had any restarts.
$ kubectl get pods -n kangaroo -l k8s-app=fluent-bit-logging
NAME READY STATUS RESTARTS AGE
fluent-bit-4xcl4 3/3 Running 0 19m
fluent-bit-5ts6g 3/3 Running 0 19m
fluent-bit-7k42q 3/3 Running 0 19m
fluent-bit-b6n6z 3/3 Running 0 19m
fluent-bit-bfxrf 3/3 Running 0 19m
fluent-bit-g5ggf 3/3 Running 0 19m
fluent-bit-g7npj 3/3 Running 0 19m
fluent-bit-gmdht 3/3 Running 0 19m
fluent-bit-h8jk9 3/3 Running 0 19m
fluent-bit-qff6j 3/3 Running 0 19m
fluent-bit-tmd49 3/3 Running 0 19m
fluent-bit-tmjjs 3/3 Running 0 19m
I have shared the source code with you, but the documentation is embarrassingly not well prepared for collaboration. The docker registry is a private one, and you need to specify an environment variable KAFKA_BOOTSTRAP to the go-kafka-logsink (also called hermes) for the kafka brokers or it will attempt to retrieve this information from a Eureka service registry. Similarly, the go-s3-logsink (also called iris) AWS S3 service expects access to a bucket (S3_BUCKET) and region (S3_REGION).
When I run edsiper/fluent-bit-0.13-next:0.17 (no -2, no -3) with HTTP Server disabled, prometheus annotations off, and remove the health and readiness checks, for 12 hours I have not yet had any restarts from fluent-bit.
$ kubectl get pods -n kangaroo -l k8s-app=fluent-bit-logging
NAME READY STATUS RESTARTS AGE
fluent-bit-4xcl4 3/3 Running 0 12h
fluent-bit-5ts6g 3/3 Running 0 12h
fluent-bit-7k42q 3/3 Running 0 12h
fluent-bit-b6n6z 3/3 Running 0 12h
fluent-bit-bfxrf 3/3 Running 0 12h
fluent-bit-g5ggf 3/3 Running 1 12h
fluent-bit-g7npj 3/3 Running 0 12h
fluent-bit-gmdht 3/3 Running 0 12h
fluent-bit-h8jk9 3/3 Running 0 12h
fluent-bit-qff6j 3/3 Running 0 12h
fluent-bit-tmd49 3/3 Running 0 12h
fluent-bit-tmjjs 3/3 Running 0 12h
The single restart was from my go-s3-logsink container, not fluent-bit. The termination message was that there was a 500 error uploading to aws s3 (probably because the session I reused expired), otherwise all the other fluent-bit HTTP post requests were responded to with 200 status codes.
Uploading to S3 sometimes does take a comparatively long time, depending on object size, so I can imagine fluent-bit having to wait a long time for the HTTP response might be problematic.
@StevenACoffman thanks for the feedback.
Note that I've just found that versions 0.17-2 and 0.17-3 are memory bombs, since they include address sanitizer checks.
@edsiper I can reliably reproduce the problem with the code all in this public repository:
This repository also contains the dummy golang service container referenced in the daemonset yaml that immediately replies to every HTTP Post with 200 OK. This makes me think the problem is not the delayed responses from AWS S3 or Kafka.
I can also see you are actively working on the edsiper/fluent-bit-0.13-next:0.17-5 but I have been applying the changes you make by specifying the sha256, as most recently edsiper/fluent-bit-0.13-next:0.17-5@sha256:b74a30e3ec7308006e6dbe00c45c159e06bad66f263c11cc62cdb0cf868547fd.
So far when I run kubectl logs -n kangaroo fluent-bit-pk78r -c fluent-bit --previous I have not seen any useful termination error messages.
@StevenACoffman thanks, I will give it a try now!
Indeed I have been trying to reproduce multiple times, I found a minor problem that was fixed in the I/O interface but I don't think is related to what you stated previously.
continue troubleshooting...
btw, are you able to crash it with the HTTP Server off ?
No, it is only with the HTTP Server on (and adding the prometheus metrics and enabling health checks) that I can crash it reliably.
Just to warn you, this configuration expects the namespace to be kangaroo instead of logging. This allows us to use multiple log shipping methods in parallel so I can experiment with the latest version on real clusters with real data.
I have wondered if there's any problem caused by how in my configuration I am using the HTTP output plugin twice.
If it is at all helpful to you, I have invited you as a collaborator on the StevenACoffman/logsink github repository. Thank you so much for your persistence and effort on my use case. I'm a huge of fan of your work on fluent-bit, and it's been very valuable even with the HTTP Server and metrics off.
yes, the problem is reproducible when using more than one http output :) , I was able to make it crash locally with that setup (without kubernetes/docker)
Ok, thanks. Is that problem expected and I should avoid doing that? If so, would using both kafka and http output be supported?
I find it odd that it appears to work with http server off.
nope, it's a nasty bug, I am not sure if it affect other kind of setups... but definitely is something wrong in the I/O and event loop
@StevenACoffman
I've pushed a new version with relevant fixes in the scheduler that generated corruption:
fluent/fluent-bit-0.13-dev:0.17
please let me know if you are able to reproduce the problem with all features enabled.
@edsiper Thanks. I am not able to pull it yet. Still pushing?
I see:
$ docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock ivanilves/lstags fluent/fluent-bit-0.13-dev
<STATE> <DIGEST> <(local) ID> <Created At> <IMAGE>:<TAG>
ABSENT sha256:f41081bed4870c910df045e489c067fd0 n/a 2018-04-10T19:20:27Z fluent/fluent-bit-0.13-dev:0.16
And
$ docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock ivanilves/lstags edsiper/fluent-bit-0.13-next:0.17-6
<STATE> <DIGEST> <(local) ID> <Created At> <IMAGE>:<TAG>
PRESENT sha256:0aceef3e6dd0c4072dbb890a3bc87b734 6c0b37898e5d 2018-04-20T17:06:16Z edsiper/fluent-bit-0.13-next:0.17-6
yeah, try now.
Got it! Thanks. Running with image fluent/fluent-bit-0.13-dev:0.17@sha256:e7d6f7c984ffd018adda95a2b5bbe38585f7177c5b99b1be0d89b697d9fc753a with HTTP_Server On, prometheus annotations applied, and both readiness and liveness checks on.
Running for 5 minutes and only one restart so far, which is much better.
kubectl logs -n kangaroo fluent-bit-qptk7 -c fluent-bit --previous gives:
[2018/04/20 18:02:32] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/20 18:02:32] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[engine] caught signal
[2018/04/20 18:02:34] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/20 18:02:35] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/20 18:02:35] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[engine] caught signal
Fluent-Bit v0.13.0
Copyright (C) Treasure Data
#0 0x55d2d6fc9dee in ???() at ???:0
#1 0x55d2d70142c4 in ???() at ???:0
#2 0x55d2d6ff2d9e in ???() at ???:0
#3 0x55d2d6fb44aa in ???() at ???:0
#4 0x55d2d6fb40b1 in ???() at ???:0
#5 0x55d2d6f542b3 in ???() at ???:0
#6 0x7fea80d632e0 in ???() at ???:0
#7 0x55d2d6f52809 in ???() at ???:0
#8 0xffffffffffffffff in ???() at ???:0
Got two more restarts, one at nine minutes, and another at 12 minutes. They look the same so only mentioning once:
[2018/04/20 18:08:13] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/20 18:08:13] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/20 18:08:13] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/20 18:08:13] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/20 18:08:13] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
fluent-bit: /tmp/src/lib/monkey/deps/flb_libco/amd64.c:121: crash: Assertion `0' failed.
[engine] caught signal
Fluent-Bit v0.13.0
Copyright (C) Treasure Data
[2018/04/20 18:08:13] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/20 18:08:13] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/20 18:08:13] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
#0 0x7f3f05393529 in ???() at ???:0
#1 0x7f3f0538ae66 in ???() at ???:0
#2 0x7f3f0538af11 in ???() at ???:0
#3 0x55bb999b8709 in crash() at lib/monkey/deps/flb_libco/amd64.c:121
That's an older version (due to message status type)
On Fri, Apr 20, 2018, 12:09 Steve Coffman notifications@github.com wrote:
Got more restart:
[2018/04/20 18:08:13] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/20 18:08:13] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/20 18:08:13] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/20 18:08:13] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/20 18:08:13] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
fluent-bit: /tmp/src/lib/monkey/deps/flb_libco/amd64.c:121: crash: Assertion `0' failed.
[engine] caught signal
Fluent-Bit v0.13.0
Copyright (C) Treasure Data[2018/04/20 18:08:13] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/20 18:08:13] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/20 18:08:13] [ info] [out_http] 127.0.0.1:3000, HTTP status=2000 0x7f3f05393529 in ???() at ???:0
1 0x7f3f0538ae66 in ???() at ???:0
2 0x7f3f0538af11 in ???() at ???:0
3 0x55bb999b8709 in crash() at lib/monkey/deps/flb_libco/amd64.c:121
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/fluent/fluent-bit/issues/557#issuecomment-383178104,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAWkNgGOsCy7pwYg88aciK9v4lbOHPwkks5tqiRLgaJpZM4TLGmM
.
Hmmm... I re-did kubectl delete -f 50fluent-bit-ds-http.yaml, and waited until there were no running pods, then I did kubectl apply -f 50fluent-bit-ds-http.yaml and the pod definitions show:
Image: fluent/fluent-bit-0.13-dev:0.17@sha256:e7d6f7c984ffd018adda95a2b5bbe38585f7177c5b99b1be0d89b697d9fc753a
Image ID: docker-pullable://fluent/fluent-bit-0.13-dev@sha256:e7d6f7c984ffd018adda95a2b5bbe38585f7177c5b99b1be0d89b697d9fc753a
But running for a minute, I got a restart which kubectl logs -n kangaroo fluent-bit-ntnwl -c fluent-bit --previous showed:
[2018/04/20 18:24:38] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/20 18:24:38] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/20 18:24:39] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/20 18:24:39] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
fluent-bit: /tmp/src/lib/monkey/deps/flb_libco/amd64.c:121: crash: Assertion `0' failed.
[engine] caught signal
[2018/04/20 18:24:39] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
Fluent-Bit v0.13.0
Copyright (C) Treasure Data
[2018/04/20 18:24:39] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/20 18:24:39] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/20 18:24:39] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/20 18:24:39] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/20 18:24:39] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
#0 0x7f035f4ec529 in ???() at ???:0
#1 0x7f035f4e3e66 in ???() at ???:0
#2 0x7f035f4e3f11 in ???() at ???:0
#3 0x56235d9bc709 in crash() at lib/monkey/deps/flb_libco/amd64.c:121
#4 0xffffffffffffffff in ???() at ???:0
@edsiper That looks like it was generated by https://github.com/fluent/fluent-bit/blob/ed4b2f09d68f1b71bed53aaa37a048700adc2b5d/plugins/out_http/http.c#L404
It looks like HTTP STATUS were changed to HTTP status in defac7869ff19a95d81fad469ab76baaca2af48a so I think this is the latest version.
Although I do see:
[2018/04/20 18:35:02] [ warn] [filter_kube] annotation parser 'json' not found (ns='teachers' pod_name='go-spew-5d99bbd878-5jq8p')
Which you changed in b4b1e09f55bc82a36e6b774ef1b60e176941d09f so maybe it is an older version that got pushed to that docker tag?
I did see one that got this message before it crashed with the same error:
[2018/04/20 18:36:23] [error] [out_http] no upstream connections available
Ah, right. The configmap overwrote the new parser.conf with the old json-test, so I needed to update that there to match. I can't tell how to verify that I am not using an older version. When I run kubectl exec -it fluent-bit-ncv9l -n kangaroo -c fluent-bit -- /bin/sh and then /fluent-bit/bin/fluent-bit --version I get Fluent Bit v0.13.0
When I run ./fluent-bit --sosreport I get:
Fluent-Bit v0.13.0
Copyright (C) Treasure Data
[2018/04/20 19:04:35] [ info] [engine] started
Fluent Bit Enterprise - SOS Report
==================================
The following report aims to be used by Fluent Bit and Fluentd Enterprise
Customers of Treasure Data. For more details visit:
https://fluentd.treasuredata.com
[Fluent Bit]
Edition Community Edition
Version 0.13.0
Built Flags JSMN_PARENT_LINKS JSMN_STRICT FLB_HAVE_TLS FLB_HAVE_SQLDB FLB_HAVE_BUFFERING FLB_HAVE_METRICS FLB_HAVE_HTTP_SERVER FLB_HAVE_FLUSH_LIBCO FLB_HAVE_SYSTEMD FLB_HAVE_FORK FLB_HAVE_TIMESPEC_GET FLB_HAVE_PROXY_GO FLB_HAVE_JEMALLOC JEMALLOC_MANGLE FLB_HAVE_LIBBACKTRACE FLB_HAVE_REGEX FLB_HAVE_C_TLS FLB_HAVE_ACCEPT4 FLB_HAVE_INOTIFY
[Operating System]
Name Linux
Release 4.4.0-1020-aws
Version #29-Ubuntu SMP Wed Jun 14 15:54:52 UTC 2017
[Hardware]
Architecture x86_64
Processors 2
[Built Plugins]
Inputs cpu mem kmsg tail proc disk systemd netif dummy head exec health serial stdin tcp mqtt lib forward random syslog
Filters grep stdout throttle kubernetes parser record_modifier
Outputs counter es exit file forward http influxdb kafka kafka-rest nats null plot splunk stdout td lib flowcounter
[SERVER] Runtime configuration
Flush 5
Daemon Off
Log_Level Info
With HTTP_Server Off and the prometheus annotations removed, and liveness and readiness probes removed, I got a restart at 44 minutes on a single pod:
$ POD_ID=fluent-bit-bqlsb;kubectl logs -n kangaroo $POD_ID -c fluent-bit --previous
[2018/04/20 19:57:54] [ info] [engine] started
[2018/04/20 19:57:54] [ info] [filter_kube] https=1 host=kubernetes.default.svc.cluster.local port=443
[2018/04/20 19:57:54] [ info] [filter_kube] local POD info OK
[2018/04/20 19:57:54] [ info] [filter_kube] testing connectivity with API server...
[2018/04/20 19:57:55] [ info] [filter_kube] API server connectivity OK
[2018/04/20 19:57:55] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[engine] caught signal
Fluent-Bit v0.13.0
Copyright (C) Treasure Data
#0 0x55ae209bf572 in msgpack_pack_object() at lib/msgpack-2.1.3/src/objectc.c:103
#1 0x55ae207aeaeb in ???() at ???:0
#2 0x55ae207af339 in ???() at ???:0
#3 0x55ae2075179a in ???() at ???:0
#4 0x55ae2074f1ec in ???() at ???:0
#5 0x55ae20750d4c in ???() at ???:0
#6 0x55ae20774f4d in ???() at ???:0
#7 0x55ae20775d98 in ???() at ???:0
#8 0x55ae20773787 in ???() at ???:0
#9 0x55ae2075128b in ???() at ???:0
#10 0x55ae20757ee2 in ???() at ???:0
#11 0x55ae206f82b3 in ???() at ???:0
#12 0x7fcde44382e0 in ???() at ???:0
#13 0x55ae206f6809 in ???() at ???:0
#14 0xffffffffffffffff in ???() at ???:0
After more than two hours, I have no further restarts with the HTTP Server off. I will let it run overnight, and until monday.
thanks for the report. Focusing on this specific error:
#0 0x55ae209bf572 in msgpack_pack_object() at lib/msgpack-2.1.3/src/objectc.c:103
my assumption is that some invalid data was passes to msgpack-c library.. and looks very associated to #567 where passing an invalid Map (on this case a missing value for a key) generate a small corruption.
I have done two changes:
Please test the following new image:
edsiper/fluent-bit-0.13-next:0.18
With HTTP Server on, metrics on, liveness probes they all are in CrashLoopBackoff after running for only a few minutes:
[2018/04/20 23:17:43] [ info] [engine] started
[2018/04/20 23:17:43] [ info] [filter_kube] https=1 host=kubernetes.default.svc.cluster.local port=443
[2018/04/20 23:17:43] [ info] [filter_kube] local POD info OK
[2018/04/20 23:17:43] [ info] [filter_kube] testing connectivity with API server...
[2018/04/20 23:17:48] [ info] [filter_kube] API server connectivity OK
[2018/04/20 23:17:48] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[engine] caught signal
are you able to see this issue again ?
#0 0x55ae209bf572 in msgpack_pack_object() at lib/msgpack-2.1.3/src/objectc.c:103
I got this once:
[engine] caught signal
[2018/04/20 23:22:47] [ warn] [input] cannot disable event for tail.0
[2018/04/20 23:22:47] [ warn] [input] cannot disable event for tail.0
[2018/04/20 23:22:47] [ warn] [input] cannot disable event for tail.0
[2018/04/20 23:22:48] [ warn] [engine] service will stop in 5 seconds
Fluent-Bit v0.13.0
Copyright (C) Treasure Data
When I disable the HTTP Server, I'm still getting similar immediate crash behavior.
@StevenACoffman
I've pushed the following new image:
edsiper/fluent-bit-0.13-next:0.18-2
the co-routines library was not build with the required pre-processor flags to enable multithreading support, for hence when starting the HTTP server (in a separate thread) and multiple co-routines where started i the HTTP Server + Fluent Bit output doing the same, it lead to data corruption.
After add the missing flags I was not able to reproduce the crash. Would you please double-check in your end?
With edsiper/fluent-bit-0.13-next:0.18-2, with HTTP_Server On and the prometheus annotations applied, and liveness and readiness probes applied things are more stable:
$ kubectl get pods -n kangaroo -l k8s-app=fluent-bit-logging
NAME READY STATUS RESTARTS AGE
fluent-bit-2jwj5 3/3 Running 0 51m
fluent-bit-4nksl 3/3 Running 0 51m
fluent-bit-56dl2 3/3 Running 0 51m
fluent-bit-92qvj 3/3 Running 0 51m
fluent-bit-crvhn 3/3 Running 0 51m
fluent-bit-fnqqp 3/3 Running 2 51m
fluent-bit-lrlpx 3/3 Running 0 51m
fluent-bit-m2ghf 2/3 CrashLoopBackOff 15 51m
fluent-bit-q56gt 3/3 Running 0 51m
fluent-bit-s7rvz 3/3 Running 0 51m
fluent-bit-x2vh5 3/3 Running 0 51m
fluent-bit-xkncq 3/3 Running 0 51m
$ POD_ID=fluent-bit-m2ghf; kubectl logs -n kangaroo $POD_ID -c fluent-bit --previous
[2018/04/22 13:16:35] [ info] [engine] started
[2018/04/22 13:16:36] [ info] [filter_kube] https=1 host=kubernetes.default.svc.cluster.local port=443
[2018/04/22 13:16:36] [ info] [filter_kube] local POD info OK
[2018/04/22 13:16:36] [ info] [filter_kube] testing connectivity with API server...
[2018/04/22 13:16:36] [ info] [filter_kube] API server connectivity OK
[2018/04/22 13:16:36] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[engine] caught signal
$ POD_ID=fluent-bit-fnqqp; kubectl logs -n kangaroo $POD_ID -c fluent-bit --previous
[2018/04/22 12:24:39] [ info] [engine] started
[2018/04/22 12:24:39] [ info] [filter_kube] https=1 host=kubernetes.default.svc.cluster.local port=443
[2018/04/22 12:24:39] [ info] [filter_kube] local POD info OK
[2018/04/22 12:24:39] [ info] [filter_kube] testing connectivity with API server...
[2018/04/22 12:24:39] [ info] [filter_kube] API server connectivity OK
[2018/04/22 12:24:39] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2018/04/22 12:24:40] [ warn] [filter_kube] annotation parser 'json' not found (ns='kangaroo' pod_name='go-spew-pgm52')
[2018/04/22 12:24:45] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/22 12:24:45] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/22 12:24:45] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/22 12:24:45] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/22 12:24:45] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/22 12:24:45] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/22 12:24:45] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/22 12:24:45] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/22 12:24:45] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/22 12:24:45] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:3000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[2018/04/22 12:24:46] [ info] [out_http] 127.0.0.1:4000, HTTP status=200
[engine] caught signal
I adjusted that one pod that was in CrashLoopBackOff to Log_Level debug:
$ POD_ID=fluent-bit-ngmph; kubectl logs -n kangaroo $POD_ID -c fluent-bit
[2018/04/22 14:18:51] [ info] [engine] started
[2018/04/22 14:18:51] [debug] [in_tail] inotify watch fd=20
[2018/04/22 14:18:51] [debug] [in_tail] scanning path /var/log/containers/*.log
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/consortium-books-holdings-1524268140-q9sp5_teachers_consortium-books-holdings-4807e8810c330d5a9d69f239c9c2480ece3d1d0f9156626bb7c31fef2e5d975d.log, offset=382774
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/eviz-hackathon-77fbb76f-2xw6v_teachers_eureka-yup-9726a3ba513d5a05d064d7c3b8b33a95ecd480388c457eb6ea0cf57ab9c91a3b.log, offset=1091595
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/eviz-hackathon-77fbb76f-2xw6v_teachers_eureka-yup-e0125f0f07d00e2e9b3749b60f8bf0654214096564dd4725c86ec58c0497e056.log, offset=229024
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/eviz-hackathon-77fbb76f-2xw6v_teachers_eviz-hackathon-e969ec8e57c1d14ce5f9a208297e815a539863ae161f2e46081c489874e3bb46.log, offset=0
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/fluent-bit-ngmph_kangaroo_fluent-bit-0065e2781b06f877ae5207d0eeac9a7c1a3c592f9c7f034fbea0018ec221701a.log, offset=0
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/fluent-bit-ngmph_kangaroo_fluent-bit-7d05c4331e95df14fccdea3732be855e975c64854aa88c5687f7952721d5291b.log, offset=0
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/fluent-bit-ngmph_kangaroo_go-kafka-logsink-af433108645cd7378b5f435c20480a084af395b4a5236b4f7f341139fd7ad775.log, offset=0
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/fluent-bit-ngmph_kangaroo_go-s3-logsink-ab15461b1abc4064b5bee72b73c6670e116513138ec32969e784d538b2f5a21d.log, offset=0
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/go-spew-5d99bbd878-5jq8p_teachers_go-spew-2796b3ae129b6f38835e28724ce92d0b08ee16b248fa8630de0f68e06e6cc062.log, offset=7495520
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/go-spew-gbbzk_kangaroo_go-spew-71d3b944a5a9c944409d2356aca4a3706b3435b4fb93823afe82f12c4a04a119.log, offset=301401
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/hive-metastore-65467bc98f-2tgbm_teachers_hive-metastore-ff0fbfc32857f67a512a9096d5ee72477a2fcae99b56d5074a0e6d842a3a504e.log, offset=847530
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/ingress-nginx-1911116232-jrq5t_kube-ingress_dashboard-ingress-controller-b513dbbf44d2b9639d85b1b9c661aa28ac8a798be40035526bca0785a615de5d.log, offset=3899152
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/jenkins-slave-pkgd2-mb2s9_teachers_curl-47d37a334a4fe421f2a9db10969756046d1013efe78e9b3d5a5f0302a4062e5c.log, offset=0
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/jenkins-slave-pkgd2-mb2s9_teachers_docker-aaf0fa3cb424d1307b62ac8b26efd36e73ad219ff57557bf88e53e4a0c4075bb.log, offset=0
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/jenkins-slave-pkgd2-mb2s9_teachers_jnlp-93828032008ad110576779239086784470d2c0e4702e160c21701acf3abab963.log, offset=7985
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/jenkins-slave-pkgd2-mb2s9_teachers_kubectl-faadee10626ff802d1833ab5e553825d5141a7449a020875062371f546064376.log, offset=0
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/kube-proxy-ip-172-28-154-19.ec2.internal_kube-system_kube-proxy-8dea9e1e6f7b0837857a51efc732a0489d9fe67f13ead0454f016cc581c63a0e.log, offset=179002
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/kube2iam-bxjp9_kube-system_kube2iam-c42d3e90ca4ae7089c946cdfdde1532f8c8e2d6f949bc12a271ac9356fb52b51.log, offset=130367
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/licensing-service-dstar-2598-e3c971-6ccd87468-5xk98_teachers_go-eureka-2dcbe2818580157b4f6880f54a594859fd2f99583886362ff0da6f9d65f0d22a.log, offset=4548006
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/licensing-service-dstar-2598-e3c971-6ccd87468-5xk98_teachers_licensing-service-dstar-2598-e3c971-2871d75cdaa1ef18acaaeecae012cb3d619ecbc9cc287c2e6f53325fd3004452.log, offset=0
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/licensing-service-dstar-2598-e3c971-6ccd87468-5xk98_teachers_licensing-service-dstar-2598-e3c971-74eb7c39bc5e375a5d613117c1a4372c3d417a09adb69b6d50d1fa06f6c34107.log, offset=0
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/selenium-python-6479976d89-tfg9x_teachers_selenium-python-c22ceee7b235c9310ec00920d8716b72911a8c0820d77b77a1e9cb35788f5309.log, offset=221
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/servicegraph-7677cbf586-qxszr_istio-system_servicegraph-8dd37c54695619087e413e9a9160c9274104d5e89795fc37d54eaf85cdb7cf56.log, offset=129
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/sonarqube-77784d95b-tfn9x_teachers_backup-sidecar-144b980cb69f453cd5254e7752b9676c6224c635e4255f1e8c0ad43e702afc3a.log, offset=1218185
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/sonarqube-77784d95b-tfn9x_teachers_eureka-yup-5837c6caafbab943a2266e10bb63aa2c888d8b011212b21d5134958a5ed74545.log, offset=5638801
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/sonarqube-77784d95b-tfn9x_teachers_eureka-yup-79e46f557cfd8b201296e93012b8781cead883c49411fe747b322840fdf1dd18.log, offset=65418
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/sonarqube-77784d95b-tfn9x_teachers_restore-settings-51cd10ed588a7e477766adacb4abc21b6ebf3da0e4148949e47d192ebf5fa724.log, offset=374
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/sonarqube-77784d95b-tfn9x_teachers_restore-settings-8e9e10aa191a12184e49919f7ebc1915a8da858dc0a5064b04ad59dd2cb4204a.log, offset=374
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/sonarqube-77784d95b-tfn9x_teachers_restore-settings-8fc4ead750d8022aa26a9bbb987f270851e12da6f946ae126ebbd074bfc28235.log, offset=374
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/sonarqube-77784d95b-tfn9x_teachers_restore-settings-c2d228943e009424bc7950b526bc555beb9debb3472b2664ac63aca9bec77d7a.log, offset=374
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/sonarqube-77784d95b-tfn9x_teachers_sonarqube-04431a036b0aec39da0e661a5b76961c59443905450a840fef1c9d91a96dfb31.log, offset=355692
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/test-nginix-release-nginx-ingress-controller-4c2r8_kangaroo_nginx-ingress-controller-99ddb30b5661fb5a3a90b441b9494b58be44f0a8f3724e5a717d4309b97b7501.log, offset=65346
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/test-prometheus-prometheus-node-exporter-9xjt6_kangaroo_prometheus-node-exporter-59f18e0015509c0fe053c685e6d13b35ca316160596a3f777a7eef4380284854.log, offset=5335
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/weave-net-gtc5g_kube-system_weave-c3d87840f863d35d7006baacbfa6053ba97f18c6fcd96d35268e211658f36b41.log, offset=5198496
[2018/04/22 14:18:51] [debug] [in_tail] add to scan queue /var/log/containers/weave-net-gtc5g_kube-system_weave-npc-f307ee0f21479416fd7e27a333da6d05028be8d601d8193724d075196c338c7c.log, offset=56754
[2018/04/22 14:18:51] [ info] [filter_kube] https=1 host=kubernetes.default.svc.cluster.local port=443
[2018/04/22 14:18:51] [ info] [filter_kube] local POD info OK
[2018/04/22 14:18:51] [ info] [filter_kube] testing connectivity with API server...
[2018/04/22 14:18:51] [debug] [filter_kube] API Server (ns=kangaroo, pod=fluent-bit-ngmph) http_do=0, HTTP Status: 200
[2018/04/22 14:18:51] [ info] [filter_kube] API server connectivity OK
[2018/04/22 14:18:51] [debug] [router] input=tail.0 'DYNAMIC TAG'
[2018/04/22 14:18:51] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[engine] caught signal
With edsiper/fluent-bit-0.13-next:0.18-2, with HTTP_Server Off and the prometheus annotations removed, and liveness and readiness probes removed, and Log_Level info, whatever pod gets scheduled to that node still has a problem:
$ kubectl get pods -n kangaroo -l k8s-app=fluent-bit-logging
NAME READY STATUS RESTARTS AGE
fluent-bit-2k4d7 3/3 Running 0 1m
fluent-bit-485r8 3/3 Running 0 1m
fluent-bit-87wzl 3/3 Running 0 1m
fluent-bit-btf6t 3/3 Running 0 1m
fluent-bit-chns9 3/3 Running 0 1m
fluent-bit-dfzsz 3/3 Running 0 1m
fluent-bit-dsd6v 3/3 Running 0 1m
fluent-bit-gjjhb 2/3 CrashLoopBackOff 3 1m
fluent-bit-j4sg2 3/3 Running 0 1m
fluent-bit-pd4h9 3/3 Running 0 1m
fluent-bit-rn9n8 3/3 Running 0 1m
fluent-bit-tvh4k 3/3 Running 0 1m
I don't see an obvious error message:
$ POD_ID=fluent-bit-gjjhb; kubectl logs -n kangaroo $POD_ID -c fluent-bit
[2018/04/22 14:27:37] [ info] [engine] started
[2018/04/22 14:27:37] [ info] [filter_kube] https=1 host=kubernetes.default.svc.cluster.local port=443
[2018/04/22 14:27:37] [ info] [filter_kube] local POD info OK
[2018/04/22 14:27:37] [ info] [filter_kube] testing connectivity with API server...
[2018/04/22 14:27:37] [ info] [filter_kube] API server connectivity OK
[engine] caught signal
@edsiper I am not sure what's going on with that one pod, but I have not seen any other restarts after 30 minutes.
@StevenACoffman
I've pushed a new 0.13 dev image:
fluent/fluent-bit-0.13-dev:0.18
this version merge the latest changes and also add some minor information about the signal being trapped (which is fundamental to continue troubleshooting), please give it a try...
With fluent/fluent-bit-0.13-dev:0.18 and the HTTP_Server Off I have no restarts after an hour:
$ kubectl get pods -n kangaroo -l k8s-app=fluent-bit-logging
NAME READY STATUS RESTARTS AGE
fluent-bit-2g259 3/3 Running 0 1h
fluent-bit-2l998 3/3 Running 0 1h
fluent-bit-62zvd 3/3 Running 0 1h
fluent-bit-6pjh7 3/3 Running 0 1h
fluent-bit-72dm4 3/3 Running 0 1h
fluent-bit-8rggc 3/3 Running 0 1h
fluent-bit-8tx9c 3/3 Running 0 1h
fluent-bit-fdxqs 3/3 Running 0 1h
fluent-bit-g7rgp 3/3 Running 0 1h
fluent-bit-nf76d 3/3 Running 0 1h
fluent-bit-rrz6n 3/3 Running 0 1h
fluent-bit-zhn9w 3/3 Running 0 1h
I'm going to turn the HTTP_Server On, with prometheus annotations and liveness/readiness checks applied.
Well, so far, so good. With fluent/fluent-bit-0.13-dev:0.18 and the HTTP_Server On with prometheus annotations, liveness probes, readiness probes applied, I'm not seeing any restarts:
$ kubectl get pods -n kangaroo -l k8s-app=fluent-bit-logging
NAME READY STATUS RESTARTS AGE
fluent-bit-889qs 3/3 Running 0 8h
fluent-bit-8qj5m 3/3 Running 0 8h
fluent-bit-gwg5c 3/3 Running 0 8h
fluent-bit-jj8b5 3/3 Running 0 8h
fluent-bit-k6bmx 3/3 Running 0 8h
fluent-bit-nqspb 3/3 Running 0 8h
fluent-bit-nqt99 3/3 Running 0 8h
fluent-bit-rqkjs 3/3 Running 0 8h
fluent-bit-s7pnv 3/3 Running 0 8h
fluent-bit-tf622 3/3 Running 0 8h
fluent-bit-wq5h9 3/3 Running 0 8h
fluent-bit-xqvhm 3/3 Running 0 8h
I will continue to monitor it, and I have some additional clusters I can apply this to. Nice work!
Please disregard this last error message (since deleted). I mistakenly reverted to an older version.
That's running an older version of Fluent Bit
On Mon, Apr 23, 2018, 15:45 Steve Coffman notifications@github.com wrote:
I applied the change to another cluster, and within the first few seconds
I got:$ kubectl logs -n kangaroo fluent-bit-qrb8v -c fluent-bit --previous
[2018/04/23 21:41:44] [ info] [engine] started
[2018/04/23 21:41:44] [ info] [filter_kube] https=1 host=kubernetes.default.svc.cluster.local port=443
[2018/04/23 21:41:44] [ info] [filter_kube] local POD info OK
[2018/04/23 21:41:44] [ info] [filter_kube] testing connectivity with API server...
[2018/04/23 21:41:50] [ info] [filter_kube] API server connectivity OK
[2018/04/23 21:41:50] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2018/04/23 21:41:51] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:51] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:51] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:51] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:51] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:51] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:51] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:51] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:51] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:51] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:51] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:51] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:51] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:51] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:52] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:52] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:52] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:52] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:52] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:52] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:52] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:52] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:53] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:53] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:53] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:53] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:54] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:54] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:54] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:54] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:54] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:54] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:55] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:55] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:55] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:55] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:56] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:56] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:56] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:56] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:57] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:57] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:57] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:57] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:58] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:58] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:58] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:58] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:58] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:58] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:58] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:58] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:59] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:59] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:59] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:41:59] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:00] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:00] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:00] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:00] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:01] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:01] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:01] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:01] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:01] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:01] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:02] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:02] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:02] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:02] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:02] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:02] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:03] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:03] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:03] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:03] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:03] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:03] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:03] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:04] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:04] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:04] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:04] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:04] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:05] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:05] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:05] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:05] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:05] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:05] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:05] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:05] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:06] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:06] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:06] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:06] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:06] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:06] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:06] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:06] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:06] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:07] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:07] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:07] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:07] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:07] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:08] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:08] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:08] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:08] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:08] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:08] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:09] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:09] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:09] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:09] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:09] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:09] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:09] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:10] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:10] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:10] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:10] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:10] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:10] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:11] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:11] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:11] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:11] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:11] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:11] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:12] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:12] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:12] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:12] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:12] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:12] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:12] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:12] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:12] [ info] [out_http] HTTP STATUS=200
Fluent-Bit v0.13.0
Copyright (C) Treasure Dataretry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
retry read
[2018/04/23 21:42:13] [ info] [out_http] HTTP STATUS=200
[engine] caught signal
[2018/04/23 21:42:13] [ info] [out_http] HTTP STATUS=200
[2018/04/23 21:42:13] [ info] [out_http] HTTP STATUS=200
retry rea—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/fluent/fluent-bit/issues/557#issuecomment-383733415,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAWkNlMl8JiUIUW55-lYeuHZto8NsWxcks5trkuFgaJpZM4TLGmM
.
I let it run in three clusters overnight, and I have had no more restarts. I think this issue can be closed as fixed! Thank you!
Same for me! Thanks
thanks everyone who helped to troubleshoot this issue! now it's time to close it :)
Fixed.
Great job Eduardo!