Describe the bug
Our fluentBit fails to flush chunks to Kafka output plugin after Kafka cluster is recover from downtime. the following error occurs every few seconds [ warn] [engine] failed to flush chunk '1-1588758003.4494800.flb', retry in 9 seconds: task_id=14, input=dummy.0 > output=kafka.1
To Reproduce
[2020/05/06 10:12:35] [debug] [out_kafka] enqueued message (50 bytes) for topic 'bit'
[2020/05/06 10:12:35] [debug] [out_kafka] enqueued message (50 bytes) for topic 'bit'
[2020/05/06 10:12:35] [debug] [out_kafka] enqueued message (50 bytes) for topic 'bit'
[2020/05/06 10:12:35] [debug] [task] destroy task=0x7fbd9b045000 (task_id=0)
[2020/05/06 10:12:39] [debug] [task] created task=0x7fbd9b045000 id=0 OK
[2020/05/06 10:12:39] [debug] [out_kafka] enqueued message (50 bytes) for topic 'bit'
[0] dummy.log: [1588759955.004882100, {"message"=>"dummy"}]
[1] dummy.log: [1588759956.001468000, {"message"=>"dummy"}]
[2] dummy.log: [1588759957.001931300, {"message"=>"dummy"}]
[3] dummy.log: [1588759958.000703700, {"message"=>"dummy"}]
[4] dummy.log: [1588759959.001432000, {"message"=>"dummy"}]
[2020/05/06 10:12:40] [debug] [out_kafka] enqueued message (50 bytes) for topic 'bit'
[2020/05/06 10:12:40] [debug] [out_kafka] enqueued message (50 bytes) for topic 'bit'
[2020/05/06 10:12:40] [debug] [out_kafka] enqueued message (50 bytes) for topic 'bit'
[2020/05/06 10:12:40] [debug] [out_kafka] enqueued message (50 bytes) for topic 'bit'
[2020/05/06 10:12:40] [debug] [task] destroy task=0x7fbd9b045000 (task_id=0)
[2020/05/06 10:12:45] [debug] [task] created task=0x7fbd9b045000 id=0 OK
% Failed to produce to topic bit: Local: Queue full
[2020/05/06 10:12:45] [ warn] [out_kafka] internal queue is full, retrying in one second
[0] dummy.log: [1588759960.002851100, {"message"=>"dummy"}]
[1] dummy.log: [1588759961.000785600, {"message"=>"dummy"}]
[2] dummy.log: [1588759962.001597600, {"message"=>"dummy"}]
[3] dummy.log: [1588759963.002372700, {"message"=>"dummy"}]
[4] dummy.log: [1588759964.001981600, {"message"=>"dummy"}]
% Failed to produce to topic bit: Local: Queue full
[2020/05/06 10:12:46] [ warn] [out_kafka] internal queue is full, retrying in one second
% Failed to produce to topic bit: Local: Queue full
[2020/05/06 10:12:46] [ warn] [out_kafka] internal queue is full, retrying in one second
% Failed to produce to topic bit: Local: Queue full
[2020/05/06 10:12:48] [ warn] [out_kafka] internal queue is full, retrying in one second
% Failed to produce to topic bit: Local: Queue full
[2020/05/06 10:12:49] [ warn] [out_kafka] internal queue is full, retrying in one second
[0] dummy.log: [1588759965.004790100, {"message"=>"dummy"}]
[1] dummy.log: [1588759966.002579100, {"message"=>"dummy"}]
[2] dummy.log: [1588759967.001390400, {"message"=>"dummy"}]
[3] dummy.log: [1588759968.002448000, {"message"=>"dummy"}]
[4] dummy.log: [1588759969.004080300, {"message"=>"dummy"}]
[2020/05/06 10:12:50] [debug] [task] created task=0x7fbd9b045460 id=1 OK
% Failed to produce to topic bit: Local: Queue full
[2020/05/06 10:12:50] [ warn] [out_kafka] internal queue is full, retrying in one second
[2020/05/06 10:12:50] [debug] [retry] new retry created for task_id=1 attemps=1
[2020/05/06 10:12:50] [ warn] [engine] failed to flush chunk '1-1588759965.4815900.flb', retry in 7 seconds: task_id=1, input=dummy.0 > output=kafka.1
% Failed to produce to topic bit: Local: Queue full
[2020/05/06 10:12:51] [ warn] [out_kafka] internal queue is full, retrying in one second
% Failed to produce to topic bit: Local: Queue full
[2020/05/06 10:12:52] [ warn] [out_kafka] internal queue is full, retrying in one second
% Failed to produce to topic bit: Local: Queue full
[2020/05/06 10:12:53] [ warn] [out_kafka] internal queue is full, retrying in one second
% Failed to produce to topic bit: Local: Queue full
[2020/05/06 10:12:53] [ warn] [out_kafka] internal queue is full, retrying in one second
[2020/05/06 10:12:55] [debug] [task] created task=0x7fbd9b045500 id=2 OK
[0] dummy.log: [1588759970.004754900, {"message"=>"dummy"}]
[1] dummy.log: [1588759971.003325600, {"message"=>"dummy"}]
[2] dummy.log: [1588759972.003833000, {"message"=>"dummy"}]
[3] dummy.log: [1588759973.002538400, {"message"=>"dummy"}]
[4] dummy.log: [1588759974.001743000, {"message"=>"dummy"}]
[2020/05/06 10:12:55] [debug] [retry] new retry created for task_id=2 attemps=1
[2020/05/06 10:12:55] [ warn] [engine] failed to flush chunk '1-1588759970.4818500.flb', retry in 8 seconds: task_id=2, input=dummy.0 > output=kafka.1
[2020/05/06 10:12:55] [debug] [retry] new retry created for task_id=0 attemps=1
[2020/05/06 10:12:55] [ warn] [engine] failed to flush chunk '1-1588759960.2916400.flb', retry in 11 seconds: task_id=0, input=dummy.0 > output=kafka.1
[2020/05/06 10:12:57] [debug] [retry] re-using retry for task_id=1 attemps=2
[2020/05/06 10:12:57] [ warn] [engine] failed to flush chunk '1-1588759965.4815900.flb', retry in 93 seconds: task_id=1, input=dummy.0 > output=kafka.1
[0] dummy.log: [1588759975.004836600, {"message"=>"dummy"}]
[1] dummy.log: [1588759976.001987300, {"message"=>"dummy"}]
[2] dummy.log: [1588759977.002878600, {"message"=>"dummy"}]
[3] dummy.log: [1588759978.005304800, {"message"=>"dummy"}]
[4] dummy.log: [1588759979.002412500, {"message"=>"dummy"}]
[2020/05/06 10:12:59] [debug] [task] created task=0x7fbd9b0455a0 id=3 OK
[2020/05/06 10:12:59] [debug] [retry] new retry created for task_id=3 attemps=1
[2020/05/06 10:13:00] [ warn] [engine] failed to flush chunk '1-1588759975.4849500.flb', retry in 11 seconds: task_id=3, input=dummy.0 > output=kafka.1
Expected behavior
Fluentbit Kafka output plugin expected to start producing messages after Kafka cluster is working again and fully recover from downtime
Your Environment
[SERVICE]
Flush 5
Daemon Off
Log_Level ${LOG_LEVEL}
Parsers_File parsers.conf
Plugins_File plugins.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
[INPUT]
Name dummy
Rate 1
Tag dummy.log
[OUTPUT]
Name stdout
Match *
[OUTPUT]
Name kafka
Match *
Brokers ${BROKER_ADDRESS}
Topics bit
Timestamp_Key @timestamp
Retry_Limit false
# Specify the number of extra seconds to monitor a file once is rotated in case some pending data is flushed
Rotate_Wait 20
rdkafka.queue.buffering.max.ms ${KAFKA_QUEUE_BUFFER_MAX_MS}
# for logs you'll probably want this to be 0 or 1, not more
rdkafka.request.required.acks ${KAFKA_REQUIRE_ACKS}
# hides errors "Receive failed: Disconnected" when kafka kills idle connections
rdkafka.log.connection.close false
# producer buffer is not included in http://fluentbit.io/documentation/0.12/configuration/memory_usage.html#estimating
rdkafka.queue.buffering.max.kbytes 1
Additional context
All seems to function properly when Kafka cluster is recover after a short period of time (1 minute for example) and Fluentbit successfully re-producing the messages. but when we get 'Queue is full' error after around 10 minutes in production, we are not able to recover.
If Kafka is down, our Kafka output connector (based on librdkafka) won't be able to deliver the records, hence every time the engine passes records to the plugin will be enqueued by librdkafka. If the message cannot be enqueued in librdkafka, Fluent Bit will retry later.
If librdkafka queue is not flushed you will face this issue and is expected. If librdkafka is not flushing fast enough
Fluent Bit v1.3 uses librdkafka v1.2 while Fluent Bit v1.4 uses librdkafka v1.3.
Looking at librdkafka changelog, on 1.2 they reported an issue for high throughput applications:
https://github.com/edenhill/librdkafka/releases/tag/v1.2.1
Warning: v1.2.1 has a producer performance regression which may
affect high-throughput producer applications. We recommend such
users to upgrade to v1.3.0
So I would suggest you try to reproduce this problem using Fluent Bit v1.4 which uses librdkafka v1.3. If we issue persists, we can take a look more in detail.
I checked it with Fluent Bit 1.4 and it seems to have the same issue
@edsiper thanks for the responses.
it seems that the same issue happen with 1.4 version. in addition, it looks like that if we are running in log level of info, we are not getting any log message that can helps us (we cannot run in debug level in production of course)
If theres something i can help with in the investigation please let me know
please help to elaborate the problem, if I understand correctly the issue is:
is that correct ?
Yes, that is what we observed.
@edsiper Hi !
any news or updates about the issue ?
@shyimo https://github.com/fluent/fluent-bit/pull/2894 should fix this. Please help out in testing :hugs:
Actually, even with that PR in place, fluent-bit completely freezes at some point. Here is the backtrace:
Thread 13 (Thread 1.17):
#0 futex_abstimed_wait_cancelable (private=0, abstime=0x7fbfd4bf3770, expected=0,
futex_word=0x7fbfd6421294) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1 __pthread_cond_wait_common (abstime=0x7fbfd4bf3770, mutex=0x7fbfd6421240, cond=0x7fbfd6421268)
at pthread_cond_wait.c:539
#2 __pthread_cond_timedwait (cond=0x7fbfd6421268, mutex=0x7fbfd6421240, abstime=0x7fbfd4bf3770)
at pthread_cond_wait.c:667
#3 0x00007fbfdf734149 in cnd_timedwait (cond=<optimized out>, mutex=<optimized out>,
time_point=<optimized out>) at cnd_timedwait.c:25
#4 0x000055939a45fdcf in cnd_timedwait_abs (cnd=0x7fbfd6421268, mtx=0x7fbfd6421240,
tspec=0x7fbfd4bf3770) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/tinycthread_extra.c:103
#5 0x000055939a3a40d6 in rd_kafka_q_pop_serve (rkq=0x7fbfd6421240, timeout_us=999967, version=0,
cb_type=RD_KAFKA_Q_CB_RETURN, callback=0x0, opaque=0x0)
at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_queue.c:404
#6 0x000055939a3a420c in rd_kafka_q_pop (rkq=0x7fbfd6421240, timeout_us=999967, version=0)
at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_queue.c:428
#7 0x000055939a3065a6 in rd_kafka_broker_ops_serve (rkb=0x7fbfd6417400, timeout_us=999967)
at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:3295
#8 0x000055939a3067f4 in rd_kafka_broker_ops_io_serve (rkb=0x7fbfd6417400,
abs_timeout=8401009486523)
at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:3346
#9 0x000055939a308ad6 in rd_kafka_broker_producer_serve (rkb=0x7fbfd6417400,
abs_timeout=8401009486523)
at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:3930
#10 0x000055939a3131f6 in rd_kafka_broker_serve (rkb=0x7fbfd6417400, timeout_ms=1000)
at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:4981
#11 0x000055939a3135d9 in rd_kafka_broker_thread_main (arg=0x7fbfd6417400)
at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:5035
#12 0x00007fbfdf7280a2 in start_thread (arg=<optimized out>) at pthread_create.c:483
#13 0x00007fbfded664cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 12 (Thread 1.16):
#0 futex_abstimed_wait_cancelable (private=0, abstime=0x7fbfd53f4770, expected=0,
futex_word=0x7fbfd64211d0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1 __pthread_cond_wait_common (abstime=0x7fbfd53f4770, mutex=0x7fbfd6421180, cond=0x7fbfd64211a8)
at pthread_cond_wait.c:539
#2 __pthread_cond_timedwait (cond=0x7fbfd64211a8, mutex=0x7fbfd6421180, abstime=0x7fbfd53f4770)
at pthread_cond_wait.c:667
#3 0x00007fbfdf734149 in cnd_timedwait (cond=<optimized out>, mutex=<optimized out>,
time_point=<optimized out>) at cnd_timedwait.c:25
#4 0x000055939a45fdcf in cnd_timedwait_abs (cnd=0x7fbfd64211a8, mtx=0x7fbfd6421180,
tspec=0x7fbfd53f4770) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/tinycthread_extra.c:103
#5 0x000055939a3a40d6 in rd_kafka_q_pop_serve (rkq=0x7fbfd6421180, timeout_us=999962, version=0,
cb_type=RD_KAFKA_Q_CB_RETURN, callback=0x0, opaque=0x0)
at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_queue.c:404
#6 0x000055939a3a420c in rd_kafka_q_pop (rkq=0x7fbfd6421180, timeout_us=999962, version=0) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_queue.c:428
#7 0x000055939a3065a6 in rd_kafka_broker_ops_serve (rkb=0x7fbfd6416800, timeout_us=999962) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:3295
#8 0x000055939a3067f4 in rd_kafka_broker_ops_io_serve (rkb=0x7fbfd6416800, abs_timeout=8401009486480) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:3346
#9 0x000055939a308ad6 in rd_kafka_broker_producer_serve (rkb=0x7fbfd6416800, abs_timeout=8401009486480) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:3930
#10 0x000055939a3131f6 in rd_kafka_broker_serve (rkb=0x7fbfd6416800, timeout_ms=1000) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:4981
#11 0x000055939a3135d9 in rd_kafka_broker_thread_main (arg=0x7fbfd6416800) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:5035
#12 0x00007fbfdf7280a2 in start_thread (arg=<optimized out>) at pthread_create.c:483
#13 0x00007fbfded664cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 11 (Thread 1.15):
#0 futex_abstimed_wait_cancelable (private=0, abstime=0x7fbfd5bf5770, expected=0, futex_word=0x7fbfd6421110) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1 __pthread_cond_wait_common (abstime=0x7fbfd5bf5770, mutex=0x7fbfd64210c0, cond=0x7fbfd64210e8) at pthread_cond_wait.c:539
#2 __pthread_cond_timedwait (cond=0x7fbfd64210e8, mutex=0x7fbfd64210c0, abstime=0x7fbfd5bf5770) at pthread_cond_wait.c:667
#3 0x00007fbfdf734149 in cnd_timedwait (cond=<optimized out>, mutex=<optimized out>, time_point=<optimized out>) at cnd_timedwait.c:25
#4 0x000055939a45fdcf in cnd_timedwait_abs (cnd=0x7fbfd64210e8, mtx=0x7fbfd64210c0, tspec=0x7fbfd5bf5770) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/tinycthread_extra.c:103
#5 0x000055939a3a40d6 in rd_kafka_q_pop_serve (rkq=0x7fbfd64210c0, timeout_us=999949, version=0, cb_type=RD_KAFKA_Q_CB_RETURN, callback=0x0, opaque=0x0) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_queue.c:404
#6 0x000055939a3a420c in rd_kafka_q_pop (rkq=0x7fbfd64210c0, timeout_us=999949, version=0) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_queue.c:428
#7 0x000055939a3065a6 in rd_kafka_broker_ops_serve (rkb=0x7fbfd6415c00, timeout_us=999949) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:3295
#8 0x000055939a3067f4 in rd_kafka_broker_ops_io_serve (rkb=0x7fbfd6415c00, abs_timeout=8401009556313) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:3346
#9 0x000055939a308ad6 in rd_kafka_broker_producer_serve (rkb=0x7fbfd6415c00, abs_timeout=8401009556313) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:3930
#10 0x000055939a3131f6 in rd_kafka_broker_serve (rkb=0x7fbfd6415c00, timeout_ms=1000) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:4981
#11 0x000055939a3135d9 in rd_kafka_broker_thread_main (arg=0x7fbfd6415c00) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:5035
#12 0x00007fbfdf7280a2 in start_thread (arg=<optimized out>) at pthread_create.c:483
#13 0x00007fbfded664cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 10 (Thread 1.14):
#0 0x00007fbfded5b819 in __GI___poll (fds=0x7fbfd241d098, nfds=2, timeout=1000) at ../sysdeps/unix/sysv/linux/poll.c:29
#1 0x000055939a3e7286 in rd_kafka_transport_poll (rktrans=0x7fbfd241d000, tmout=1000) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_transport.c:965
#2 0x000055939a3e6677 in rd_kafka_transport_io_serve (rktrans=0x7fbfd241d000, timeout_ms=1000) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_transport.c:792
#3 0x000055939a306792 in rd_kafka_broker_ops_io_serve (rkb=0x7fbfd6415000, abs_timeout=8401010015931) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:3338
#4 0x000055939a308ad6 in rd_kafka_broker_producer_serve (rkb=0x7fbfd6415000, abs_timeout=8401010015931) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:3930
#5 0x000055939a3131f6 in rd_kafka_broker_serve (rkb=0x7fbfd6415000, timeout_ms=1000) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:4981
#6 0x000055939a313c10 in rd_kafka_broker_thread_main (arg=0x7fbfd6415000) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:5140
#7 0x00007fbfdf7280a2 in start_thread (arg=<optimized out>) at pthread_create.c:483
#8 0x00007fbfded664cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 9 (Thread 1.13):
#0 0x00007fbfdf7315d7 in __close (fd=584) at ../sysdeps/unix/sysv/linux/close.c:27
#1 0x000055939a52e88e in _mk_event_loop_destroy (ctx=0x7fbfd6800040) at /tmp/src/lib/monkey/mk_core/mk_event_epoll.c:92
#2 0x000055939a52f17c in mk_event_loop_destroy (loop=0x7fbfd6815000) at /tmp/src/lib/monkey/mk_core/mk_event.c:80
#3 0x000055939a525082 in mk_server_worker_loop (server=0x7fbfdcab9c80) at /tmp/src/lib/monkey/mk_server/mk_server.c:510
#4 0x000055939a516e0f in mk_sched_launch_worker_loop (data=0x7fbfd7e00008) at /tmp/src/lib/monkey/mk_server/mk_scheduler.c:412
#5 0x00007fbfdf727fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#6 0x00007fbfded664cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 8 (Thread 1.12):
#0 0x00007fbfded33720 in __GI___nanosleep (requested_time=requested_time@entry=0x7fbfd7df69d0, remaining=remaining@entry=0x7fbfd7df69d0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
#1 0x00007fbfded3362a in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
#2 0x000055939a522f8e in mk_clock_worker_init (data=0x7fbfdcab9c80) at /tmp/src/lib/monkey/mk_server/mk_clock.c:114
#3 0x00007fbfdf727fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#4 0x00007fbfded664cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 7 (Thread 1.11):
#0 0x00007fbfdf729495 in __GI___pthread_timedjoin_ex (threadid=140461928867584, thread_return=0x0, abstime=0x0, block=<optimized out>) at pthread_join_common.c:89
#1 0x000055939a517e10 in mk_sched_workers_join (server=0x7fbfdcab9c80) at /tmp/src/lib/monkey/mk_server/mk_scheduler.c:810
#2 0x000055939a528af6 in mk_exit_all (server=0x7fbfdcab9c80) at /tmp/src/lib/monkey/mk_server/monkey.c:187
#3 0x000055939a509ca2 in mk_lib_worker (data=0x7fbfdacc89a0) at /tmp/src/lib/monkey/mk_server/mk_lib.c:164
#4 0x00007fbfdf727fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#5 0x00007fbfded664cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 6 (Thread 1.10):
#0 futex_abstimed_wait_cancelable (private=0, abstime=0x7fbfd99f3770, expected=0, futex_word=0x7fbfdcac0394) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1 __pthread_cond_wait_common (abstime=0x7fbfd99f3770, mutex=0x7fbfdcac0340, cond=0x7fbfdcac0368) at pthread_cond_wait.c:539
#2 __pthread_cond_timedwait (cond=0x7fbfdcac0368, mutex=0x7fbfdcac0340, abstime=0x7fbfd99f3770) at pthread_cond_wait.c:667
#3 0x00007fbfdf734149 in cnd_timedwait (cond=<optimized out>, mutex=<optimized out>, time_point=<optimized out>) at cnd_timedwait.c:25
#4 0x000055939a45fdcf in cnd_timedwait_abs (cnd=0x7fbfdcac0368, mtx=0x7fbfdcac0340, tspec=0x7fbfd99f3770) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/tinycthread_extra.c:103
#5 0x000055939a3a40d6 in rd_kafka_q_pop_serve (rkq=0x7fbfdcac0340, timeout_us=999997, version=0, cb_type=RD_KAFKA_Q_CB_RETURN, callback=0x0, opaque=0x0) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_queue.c:404
#6 0x000055939a3a420c in rd_kafka_q_pop (rkq=0x7fbfdcac0340, timeout_us=999997, version=0) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_queue.c:428
#7 0x000055939a3065a6 in rd_kafka_broker_ops_serve (rkb=0x7fbfdca02800, timeout_us=999997) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:3295
#8 0x000055939a3067f4 in rd_kafka_broker_ops_io_serve (rkb=0x7fbfdca02800, abs_timeout=8401009954024) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:3346
#9 0x000055939a308ad6 in rd_kafka_broker_producer_serve (rkb=0x7fbfdca02800, abs_timeout=8401009954024) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:3930
#10 0x000055939a3131f6 in rd_kafka_broker_serve (rkb=0x7fbfdca02800, timeout_ms=1000) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:4981
#11 0x000055939a3135d9 in rd_kafka_broker_thread_main (arg=0x7fbfdca02800) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:5035
#12 0x00007fbfdf7280a2 in start_thread (arg=<optimized out>) at pthread_create.c:483
#13 0x00007fbfded664cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 5 (Thread 1.9):
#0 futex_abstimed_wait_cancelable (private=0, abstime=0x7fbfda1f4790, expected=0, futex_word=0x7fbfdca2d8d0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1 __pthread_cond_wait_common (abstime=0x7fbfda1f4790, mutex=0x7fbfdca2d880, cond=0x7fbfdca2d8a8) at pthread_cond_wait.c:539
#2 __pthread_cond_timedwait (cond=0x7fbfdca2d8a8, mutex=0x7fbfdca2d880, abstime=0x7fbfda1f4790) at pthread_cond_wait.c:667
#3 0x00007fbfdf734149 in cnd_timedwait (cond=<optimized out>, mutex=<optimized out>, time_point=<optimized out>) at cnd_timedwait.c:25
#4 0x000055939a45fdcf in cnd_timedwait_abs (cnd=0x7fbfdca2d8a8, mtx=0x7fbfdca2d880, tspec=0x7fbfda1f4790) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/tinycthread_extra.c:103
#5 0x000055939a3a40d6 in rd_kafka_q_pop_serve (rkq=0x7fbfdca2d880, timeout_us=999998, version=0, cb_type=RD_KAFKA_Q_CB_RETURN, callback=0x0, opaque=0x0) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_queue.c:404
#6 0x000055939a3a420c in rd_kafka_q_pop (rkq=0x7fbfdca2d880, timeout_us=999998, version=0) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_queue.c:428
#7 0x000055939a3065a6 in rd_kafka_broker_ops_serve (rkb=0x7fbfdca01c00, timeout_us=999998) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:3295
#8 0x000055939a3067f4 in rd_kafka_broker_ops_io_serve (rkb=0x7fbfdca01c00, abs_timeout=8401009411228) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:3346
#9 0x000055939a3071ab in rd_kafka_broker_internal_serve (rkb=0x7fbfdca01c00, abs_timeout=8401009411228) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:3517
#10 0x000055939a3131a5 in rd_kafka_broker_serve (rkb=0x7fbfdca01c00, timeout_ms=1000) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:4979
#11 0x000055939a3135d9 in rd_kafka_broker_thread_main (arg=0x7fbfdca01c00) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_broker.c:5035
#12 0x00007fbfdf7280a2 in start_thread (arg=<optimized out>) at pthread_create.c:483
#13 0x00007fbfded664cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 4 (Thread 1.8):
#0 futex_abstimed_wait_cancelable (private=0, abstime=0x7fbfda9f5850, expected=0, futex_word=0x7fbfdca2db10) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1 __pthread_cond_wait_common (abstime=0x7fbfda9f5850, mutex=0x7fbfdca2dac0, cond=0x7fbfdca2dae8) at pthread_cond_wait.c:539
#2 __pthread_cond_timedwait (cond=0x7fbfdca2dae8, mutex=0x7fbfdca2dac0, abstime=0x7fbfda9f5850) at pthread_cond_wait.c:667
#3 0x00007fbfdf734149 in cnd_timedwait (cond=<optimized out>, mutex=<optimized out>, time_point=<optimized out>) at cnd_timedwait.c:25
#4 0x000055939a45fdcf in cnd_timedwait_abs (cnd=0x7fbfdca2dae8, mtx=0x7fbfdca2dac0, tspec=0x7fbfda9f5850) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/tinycthread_extra.c:103
#5 0x000055939a3a441d in rd_kafka_q_serve (rkq=0x7fbfdca2dac0, timeout_ms=999, max_cnt=0, cb_type=RD_KAFKA_Q_CB_CALLBACK, callback=0x0, opaque=0x0) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka_queue.c:474
#6 0x000055939a2dd1c8 in rd_kafka_thread_main (arg=0x7fbfdabb9000) at /tmp/src/plugins/out_kafka/librdkafka-1.5.0/src/rdkafka.c:1926
#7 0x00007fbfdf7280a2 in start_thread (arg=<optimized out>) at pthread_create.c:483
#8 0x00007fbfded664cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 3 (Thread 1.7):
#0 0x00007fbfded667ef in epoll_wait (epfd=9, events=0x7fbfdca2b0c0, maxevents=16, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1 0x000055939a52ef5d in _mk_event_wait (loop=0x7fbfdca0a018) at /tmp/src/lib/monkey/mk_core/mk_event_epoll.c:354
#2 0x000055939a52f4d8 in mk_event_wait (loop=0x7fbfdca0a018) at /tmp/src/lib/monkey/mk_core/mk_event.c:163
#3 0x0000559399ebc087 in log_worker_collector (data=0x7fbfdca2b000) at /tmp/src/src/flb_log.c:126
#4 0x0000559399eead6e in step_callback (data=0x7fbfdca31070) at /tmp/src/src/flb_worker.c:51
#5 0x00007fbfdf727fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#6 0x00007fbfded664cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 2 (Thread 1.6):
#0 __libc_write (nbytes=8, buf=0x7fbfc9070c38, fd=569) at ../sysdeps/unix/sysv/linux/write.c:26
#1 __libc_write (fd=569, buf=0x7fbfc9070c38, nbytes=8) at ../sysdeps/unix/sysv/linux/write.c:24
#2 0x0000559399fa414b in flb_output_return (ret=1, th=0x7fbfdca31c40) at /tmp/src/include/fluent-bit/flb_output.h:571
#3 0x0000559399fa4304 in flb_output_return_do (x=1) at /tmp/src/include/fluent-bit/flb_output.h:602
#4 0x0000559399fa6741 in cb_kafka_flush (data=0x7fbfc88c3a00, bytes=68080, tag=0x7fbfd2529440 "def.iap.kube.var.log.containers.placements-api-64bf9bd575-5zp9r_placements-api_placements-api-e28c8f523cd682e55fead0bf48e8c9e737cec1c21997851d9bee7c8ef8f461e5.log", tag_len=163, i_ins=0x7fbfdacc3180, out_context=0x7fbfdcac69c0, config=0x7fbfddc32a40) at /tmp/src/plugins/out_kafka/kafka.c:421
#5 0x0000559399edf10c in output_pre_cb_flush () at /tmp/src/include/fluent-bit/flb_output.h:473
#6 0x000055939a532319 in co_init () at /tmp/src/lib/monkey/deps/flb_libco/amd64.c:117
#7 0x0000000000000000 in ?? ()
Thread 1 (Thread 1.1):
#0 0x00007fbfded33720 in __GI___nanosleep (requested_time=requested_time@entry=0x7ffc0667f610, remaining=remaining@entry=0x7ffc0667f610) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
#1 0x00007fbfded3362a in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
#2 0x0000559399ebb77e in flb_loop (ctx=0x7fbfddc0b340) at /tmp/src/src/flb_lib.c:554
#3 0x0000559399e3c231 in flb_main (argc=3, argv=0x7ffc0667f7c8) at /tmp/src/src/fluent-bit.c:1044
#4 0x0000559399e3c2a0 in main (argc=3, argv=0x7ffc0667f7c8) at /tmp/src/src/fluent-bit.c:1055
quit
Detaching from program: /tmp/src/build/bin/fluent-bit, process 1
Ending remote debugging.
[Inferior 1 (process 1) detached]
It seems like FLB tries to exit for some reason and then fails. There's nothing interesting in logs besides that mem buf limit has been reached and that more messages had been queued which were never sent.
Most helpful comment
@edsiper Hi !
any news or updates about the issue ?