We're doing load test for 10K logs per second from syslog and found at least two possibly design problems with the filesystem buffer, which (I hope) is supposed to relieve the backpressure:
PS: Due to the nature of syslog forwarding, pausing input doesn't work well and it always results in massive losses in our tests.
Can you share please your Fluent Bit configuration and the version you
are using?
On Mon, Mar 16, 2020, 06:23 Ji-Ping Shen notifications@github.com wrote:
We're doing load test for 10K logs per second from syslog and found at
least two possibly design problems with the filesystem buffer, which (I
hope) is supposed to relieve the backpressure:
- As soon input memory limit is limit, each new chunk is immediately
put down after one log record is appended, effectively creating one file
per record. This is easily solvable by putting down every locked (full)
chunks which ensures the worst-case performance.- The engine seems to be creating new output task as soon as input
chunk comes, making the memory limit in inputs rather useless when
filesystem buffer is used and inputs are thus never paused. It keeps
opening new output connection and memory keeps growing far beyond limit
(due to busy chunks, those in process of being sent to output).PS: Due to the nature of syslog forwarding, pausing input doesn't work
well and it always results in massive losses in our tests.โ
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/fluent/fluent-bit/issues/2025, or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAC2INXVP4E3Z5TCRRKNQNDRHYK2PANCNFSM4LMFC4PQ
.
version 1.3.9,
[SERVICE]
Flush 5
Daemon False
Log_Level debug
Parsers_File parsers.conf
Storage.Path /data/
Storage.Max_Chunks_Up 1024
Storage.Backlog.Mem_Limit 32M
[INPUT]
Name syslog
Alias syslog_debug
Parser syslog-rfc5424
Listen 127.0.0.1
Port 5140
Mode tcp
Mem_Buf_Limit 64M
Buffer_Chunk_Size 16M
Buffer_Max_Size 16M
Storage.Type filesystem
[OUTPUT]
Name forward
Alias forward
Match *
Host somewhere
Port 1234
tls on
tls.verify off
KeepAlive on
KeepAlive_Timeout 60
Retry_Limit off
The hardcoded input chunk limit 2048000 (bytes) has been raised to 6MB. The first problem can be noticed by listing the storage/queue dir - we started getting large files with size a bit over the chunk limit, then it's all single record files around 1-2KB (we're feeding about 10GB / several millions of logs).
The second problem was detected by netstat and my failed patch to attempt to unload (down) previous chunks rather the current / last one when limit is hit. It showed most of time all but 1-2 chunks are always busy.
at the bottom of flb_input_chunk_append_raw, src/flb_input_chunk.c:
if (flb_input_chunk_is_overlimit(in) == FLB_TRUE &&
si->type == CIO_STORE_FS) {
if (cio_chunk_is_locked(ic->chunk)) {
if (cio_chunk_is_up(ic->chunk) == CIO_TRUE) {
flb_warn("[input chunk] unload current");
cio_chunk_down(ic->chunk);
}
}
else {
flb_warn("[input chunk] try to unload previous chunk...");
struct mk_list *head;
struct mk_list *tmp;
struct flb_input_chunk *ic_prev;
int i = 0;
mk_list_foreach_safe_r(head, tmp, &ic->_head) {
i -= 1;
ic_prev = mk_list_entry(head, struct flb_input_chunk, _head);
flb_warn("[input chunk] walking %i: busy=%i, up=%i", i, ic_prev->busy, cio_chunk_is_up(ic_prev->chunk));
if (ic_prev->busy == FLB_TRUE) {
break;
}
if (cio_chunk_is_up(ic_prev->chunk) == CIO_TRUE) {
ssize_t prev_chunk_size = cio_chunk_get_content_size(ic_prev->chunk);
in->mem_chunks_size -= prev_chunk_size;
flb_warn("[input chunk] unload %i, size=%i", i, prev_chunk_size);
cio_chunk_down(ic_prev->chunk);
if (flb_input_chunk_is_overlimit(in) == FLB_FALSE) {
flb_warn("[input chunk] finished unloading previous chunks");
return 0;
}
}
}
flb_warn("[input chunk] unloading previous chunk failed, consider to increase mem buffer");
}
return 0;
}
// flb_input_chunk_protect ... for memory buffer type input
PS: the patch caused corrupted saves. No idea why but unloading all locked chunks seems to work good enough.
If fluent-bit is restarted after all inputs are finished, the in_storage_backlog and buffer limit do function normally.
I'll try to limit busy chunks in flb_engine_dispatch, maybe it could help the second problem.
I am troubleshooting
null output, with a flush time of 5 seconds things looks bad:$ bin/flb-tcp-writer -c 1 -d ../data/syslog/syslog.log -r 100000 -p `pidof fluent-bit` -o 127.0.0.1:5150
records write (b) write secs | % cpu user (ms) sys (ms) Mem (bytes) Mem
-------- ---------- -------- ----- + ------ --------- -------- ----------- -------
100000 18524763 17.67M 1.20 | 19.12 150 80 68616192 65.44M
100000 18524763 17.67M 1.20 | 20.00 130 110 70975488 67.69M
100000 18524763 17.67M 1.25 | 17.63 160 60 68857856 65.67M
100000 18524763 17.67M 1.25 | 18.43 120 110 70635520 67.36M
100000 18524763 17.67M 1.19 | 17.62 130 80 68538368 65.36M
100000 18524763 17.67M 1.18 | 19.43 130 100 70590464 67.32M
100000 18524763 17.67M 1.20 | 15.00 120 60 68493312 65.32M
100000 18524763 17.67M 1.19 | 20.12 150 90 70778880 67.50M
100000 18524763 17.67M 1.21 | 19.05 150 80 68681728 65.50M
100000 18524763 17.67M 1.26 | 18.31 110 120 70909952 67.62M
0 0 0 b 1.00 | 20.00 120 80 69259264 66.05M
0 0 0 b 1.00 | 20.00 120 80 71237632 67.94M
0 0 0 b 1.00 | 22.00 140 80 69664768 66.44M
0 0 0 b 1.00 | 20.00 150 50 68091904 64.94M
0 0 0 b 1.00 | 21.00 150 60 70787072 67.51M
0 0 0 b 1.00 | 17.00 100 70 69214208 66.01M
0 0 0 b 1.00 | 21.00 120 90 67567616 64.44M
0 0 0 b 1.00 | 21.00 140 70 70262784 67.01M
0 0 0 b 1.00 | 21.00 130 80 68616192 65.44M
0 0 0 b 1.00 | 21.00 150 60 71311360 68.01M
0 0 0 b 1.00 | 20.00 120 80 69640192 66.41M
0 0 0 b 1.00 | 20.00 110 90 68038656 64.89M
$ bin/flb-tcp-writer -c 1 -d ../data/syslog/syslog.log -r 100000 -p `pidof fluent-bit` -o 127.0.0.1:5150
records write (b) write secs | % cpu user (ms) sys (ms) Mem (bytes) Mem
-------- ---------- -------- ----- + ------ --------- -------- ----------- -------
100000 18524763 17.67M 1.20 | 32.62 370 20 10452992 9.97M
100000 18524763 17.67M 1.20 | 34.17 390 20 10293248 9.82M
100000 18524763 17.67M 1.24 | 33.07 390 20 10518528 10.03M
100000 18524763 17.67M 1.21 | 34.68 390 30 10518528 10.03M
100000 18524763 17.67M 1.21 | 33.80 390 20 10518528 10.03M
100000 18524763 17.67M 1.22 | 31.86 370 20 10584064 10.09M
100000 18524763 17.67M 1.21 | 34.65 370 50 10727424 10.23M
100000 18524763 17.67M 1.22 | 33.72 390 20 10727424 10.23M
100000 18524763 17.67M 1.18 | 35.59 400 20 10727424 10.23M
100000 18524763 17.67M 1.22 | 33.61 390 20 10727424 10.23M
0 0 0 b 1.00 | 0.00 0 0 10727424 10.23M
0 0 0 b 1.00 | 0.00 0 0 10727424 10.23M
0 0 0 b 1.00 | 0.00 0 0 10727424 10.23M
- Summary
- Process : fluent-bit
- PID : 16076
- Elapsed Time: 12.11 seconds
- Avg Memory : 10.12M
- Avg CPU : 33.78%
- Avg Rate : 11.69M/sec
I will continue troubleshooting contention with a higher flush, then I will troubleshoot behaviors with retries and queued tasks
I have done good progress on this, this time I did a couple of improvements for a more intensive scenario:
Report after code improvements
$ bin/flb-tcp-writer -c 1 -d ../data/syslog/syslog.log -r 200000 -p `pidof fluent-bit` -o 127.0.0.1:5150
records write (b) write secs | % cpu user (ms) sys (ms) Mem (bytes) Mem
-------- ---------- -------- ----- + ------ --------- -------- ----------- -------
200000 37041308 35.33M 1.60 | 46.93 710 40 55701504 53.12M
200000 37041308 35.33M 3.08 | 100.01 2950 130 91095040 86.88M
200000 37041308 35.33M 9.79 | 99.06 9300 400 135757824 129.47M
200000 37041308 35.33M 11.16 | 99.32 10640 440 164794368 157.16M
200000 37041308 35.33M 11.03 | 99.84 10620 390 197013504 187.89M
200000 37041308 35.33M 13.57 | 99.40 12990 500 231346176 220.63M
200000 37041308 35.33M 15.04 | 97.67 14020 670 261799936 249.67M
200000 37041308 35.33M 14.23 | 99.91 13730 490 293003264 279.43M
200000 37041308 35.33M 14.55 | 93.91 13050 610 334270464 318.79M
200000 37041308 35.33M 13.88 | 99.19 13270 500 368586752 351.51M
0 0 0 b 1.00 | 99.99 940 60 370651136 353.48M
0 0 0 b 1.00 | 69.77 640 60 372703232 355.44M
0 0 0 b 1.00 | 25.00 250 0 375934976 358.52M
0 0 0 b 1.00 | 58.99 580 10 374755328 357.39M
0 0 0 b 1.00 | 45.99 380 80 18169856 17.33M
0 0 0 b 1.00 | 0.00 0 0 18169856 17.33M
0 0 0 b 1.00 | 0.00 0 0 18169856 17.33M
0 0 0 b 1.00 | 0.00 0 0 18169856 17.33M
- Summary
- Process : fluent-bit
- PID : 7935
- Elapsed Time: 112.93 seconds
- Avg Memory : 196.04M
- Avg CPU : 82.33%
- Avg Rate : 3.05M/sec
To make easier to troubleshoot internals I've implemented a simple dump that can be triggered using SIGCONT signal (similar to Fluentd), on running this is the output
===== Input =====
syslog_debug (syslog)
|- status
|- overlimit : yes
|- mem size : 286.6M (300553227 bytes)
|- mem limit : 61.0M (64000000 bytes)
|- tasks
|- total tasks : 143
|- status
|- new : 0
|- running : 143
|- coroutines : 143
|- chunks
|- total chunks : 150
|- up chunks : 150
|- down chunks: 0
|- busy chunks: 143
|- size : 273.0M (286235640 bytes)
|- size err: 0
===== Storage Layer =====
total chunks : 150
|- mem chunks : 0
|- fs chunks : 150
|- up : 150
|- down : 0
the code improvement is not yet on github, I will make sure to push the changes to a special branch so you can play with them.
definitely another improvement is to manage the scenario where an incoming data chunk is bigger than our chunks capacity, I will work on that tomorrow.
note: before this improvement, we usually see more than 100k small files, now that is fixed.
That's my latest patch for throttling concurrent output tasks and unloading locked chunks immediately to try to keep numbers of up chunks low: https://gist.github.com/AqD/6fce5fc0c26c81d014878504e609f3d0
With 1 second flush interval and null output, it still performs oddly as soon as the memory limit is hit on 10k logs/s syslog input. Could be the processing of incoming inputs (we always hit 100% cpu on 10k logs) interfering with output tasks? Debug logs showed that even with null output only some tasks take seconds to be destroyed.
We're moving to a new setup of dedicated fluent-bit instance as input/log processor and a dedicated output forwarder, sending those filesystem chunks.
@AqD
I've pushed several improvements based in your feedback, I did high-stress tests and the results look good here, but I would like to get your feedback.
Details about the fix and optimizations here: 9420ed9fe8cbd5413313e6bb59459fdb516bc61f
There are other patches involved, also I've included a new 'dump' feature on GIT master that can be used to dump internal stats to stdout for debugging/dev purposes, e.g:
$ kill -CONT `pidof fluent-bit`
output:
[engine] caught signal (SIGCONT)
[2020/03/18 18:21:49] Fluent Bit Dump
===== Input =====
syslog_debug (syslog)
โ
โโ status
โ โโ overlimit : no
โ โโ mem size : 0b (0 bytes)
โ โโ mem limit : 61.0M (64000000 bytes)
โ
โโ tasks
โ โโ total tasks : 0
โ โโ new : 0
โ โโ running : 0
โ โโ size : 0b (0 bytes)
โ
โโ chunks
โโ total chunks : 0
โโ up chunks : 0
โโ down chunks: 0
โโ busy chunks: 0
โโ size : 0b (0 bytes)
โโ size err: 0
storage_backlog.1 (storage_backlog)
โ
โโ status
โ โโ overlimit : no
โ โโ mem size : 0b (0 bytes)
โ โโ mem limit : 0b (0 bytes)
โ
โโ tasks
โ โโ total tasks : 17
โ โโ new : 16
โ โโ running : 1
โ โโ size : 32.2M (33714420 bytes)
โ
โโ chunks
โโ total chunks : 17
โโ up chunks : 17
โโ down chunks: 0
โโ busy chunks: 17
โโ size : 32.2M (33714420 bytes)
โโ size err: 0
===== Storage Layer =====
total chunks : 26
โโ mem chunks : 0
โโ fs chunks : 26
โโ up : 17
Please test GIT master and let me know if some issue persists.
did you try the last v1.3.11 that comes with the improvements ?
ping @AqD
@edsiper Thanks! The first problem disappeared with 1.3.11; the second problem remains and it quickly runs out of fd with around 20K logs/s from syslog.
However, 1.3.11 is good enough for our forwarder to work with (sending fluent-bit chunk files by a separated service), I'll stop here.
what is running out of file descriptors ? syslog in tcp mode ?
Yes. It's due to too many output tasks I guess, because the memory kept increasing and there were more and more output connections (up to hundreds until fluent-bit segfaults)
Most helpful comment
update
I have done good progress on this, this time I did a couple of improvements for a more intensive scenario:
Performance report
Report after code improvements
new dump report
To make easier to troubleshoot internals I've implemented a simple
dumpthat can be triggered using SIGCONT signal (similar to Fluentd), on running this is the outputstatus
the code improvement is not yet on github, I will make sure to push the changes to a special branch so you can play with them.
definitely another improvement is to manage the scenario where an incoming data chunk is bigger than our chunks capacity, I will work on that tomorrow.