Fluent-bit: tail plugin sends messages out of order

Created on 18 Nov 2019  路  12Comments  路  Source: fluent/fluent-bit

Bug Report

Describe the bug
We use fluentbit to forward logs to a centralized location. When writing 100,000 lines, all of them arrive, but the order is messed up (apparently at the sender side?).

To Reproduce
On sender:

$ for n in $(seq 0 100000); do echo "$n qqq" >> lines; done
$ wc -l lines
100000 lines
$ grep '1234 qqq' lines
1234 qqq
11234 qqq
21234 qqq
31234 qqq
41234 qqq
51234 qqq
61234 qqq
71234 qqq
81234 qqq
91234 qqq

On the receiving machine:

$ wc -l  lines
100000 lines
$ grep '1234 qqq' lines
1234 qqq
11234 qqq
71234 qqq
81234 qqq
51234 qqq
61234 qqq
41234 qqq
21234 qqq
31234 qqq
91234 qqq

Expected behavior
I expected the lines to arrive in the same order as they were written.

Your Environment

  • Version used:
curl -s  127.0.0.1:2020 | jq
{
  "fluent-bit": {
    "version": "1.3.2",
    "edition": "Community",
    "flags": [
      "FLB_HAVE_PARSER",
      "FLB_HAVE_RECORD_ACCESSOR",
      "FLB_HAVE_STREAM_PROCESSOR",
      "FLB_HAVE_TLS",
      "FLB_HAVE_SQLDB",
      "FLB_HAVE_TRACE",
      "FLB_HAVE_METRICS",
      "FLB_HAVE_HTTP_SERVER",
      "FLB_HAVE_SYSTEMD",
      "FLB_HAVE_VALGRIND",
      "FLB_HAVE_FORK",
      "FLB_HAVE_GMTOFF",
      "FLB_HAVE_UNIX_SOCKET",
      "FLB_IS_TD_AGENT",
      "FLB_HAVE_PROXY_GO",
      "FLB_HAVE_SYSTEM_STRPTIME",
      "FLB_HAVE_JEMALLOC",
      "FLB_HAVE_LIBBACKTRACE",
      "FLB_HAVE_REGEX",
      "FLB_HAVE_LUAJIT",
      "FLB_HAVE_C_TLS",
      "FLB_HAVE_ACCEPT4",
      "FLB_HAVE_INOTIFY"
    ]
  }
}
  • Configuration:
  • Environment name and version: Docker 19.03.2
  • Server type and version:
  • Operating System and version: ubuntu 18.04
  • Filters and plugins:
[SERVICE]
    Flush        5
    Daemon       Off
    Log_Level    info
    Parsers_File parsers.conf
    Config_Watch On
    HTTP_Server  On
    HTTP_Listen  0.0.0.0
    HTTP_Port    2020
    storage.backlog.mem_limit 50M

[INPUT]
    Name              tail-traces
    Tag               traces
    Multiline         On
    Skip_Long_Lines   On
    Parser_Firstline  traces
    Path_Key          filename
    Path              /path/to/all/logs/*/*
    DB                /var/log/.logs.db
    Mem_Buf_Limit     100MB
    Buffer_Chunk_Size 128k
    Buffer_Max_Size   16384k
    Refresh_Interval  5
    Exclude_Path      *.gz,*.zip,*.backup

[FILTER]
    Name record_modifier
    Match *
    Record var1 ${VAR1}
    Record var2 ${VAR2}

[OUTPUT]
    Name          forward
    Match         *
    Host          fluentd
    Port          60399
    Retry_Limit   5

Log:

[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=70 lines=10
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=738 lines=92
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=9362 lines=1026
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=6050 lines=605
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=10270 lines=1027
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=10020 lines=1002
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=10170 lines=1017
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=20320 lines=2032
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=10670 lines=1067
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=10400 lines=1040
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=10070 lines=1007
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11200 lines=1025
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11528 lines=1048
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11473 lines=1043
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11693 lines=1063
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11297 lines=1027
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11165 lines=1015
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11550 lines=1050
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11517 lines=1047
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11451 lines=1041
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11286 lines=1026
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11275 lines=1025
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11231 lines=1021
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11231 lines=1021
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=12012 lines=1092
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=12111 lines=1101
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11935 lines=1085
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=10318 lines=938
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11187 lines=1017
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=23045 lines=2095
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=25531 lines=2321
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=14234 lines=1294
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11781 lines=1071
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11539 lines=1049
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=21340 lines=1940
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11990 lines=1090
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11660 lines=1060
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11473 lines=1043
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11550 lines=1050
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11836 lines=1076
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11561 lines=1051
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11792 lines=1072
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11374 lines=1034
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11572 lines=1052
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11396 lines=1036
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11561 lines=1051
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11572 lines=1052
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11319 lines=1029
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11429 lines=1039
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11638 lines=1058
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11099 lines=1009
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=22682 lines=2062
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=12210 lines=1110
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11748 lines=1068
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11693 lines=1063
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11671 lines=1061
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11715 lines=1065
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11671 lines=1061
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11407 lines=1037
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11275 lines=1025
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11286 lines=1026
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11341 lines=1031
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11693 lines=1063
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11476 lines=1043
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11723 lines=1066
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11517 lines=1047
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11517 lines=1047
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=12089 lines=1099
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11946 lines=1086
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11682 lines=1062
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11352 lines=1032
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11583 lines=1053
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11737 lines=1067
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=22704 lines=2064
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=12793 lines=1163
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11605 lines=1055
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11396 lines=1036
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11638 lines=1058
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11836 lines=1076
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11825 lines=1075
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11693 lines=1063
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11682 lines=1062
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11572 lines=1052
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11880 lines=1080
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11715 lines=1065
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11715 lines=1065
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=12001 lines=1091
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11528 lines=1048
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11858 lines=1078
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11913 lines=1083
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=11550 lines=1050
[2019/11/18 12:47:22] [debug] [in_tail] file=lines event
[2019/11/18 12:47:22] [debug] [in_tail] file=lines read=3092 lines=281

Additional context
People looking at log files have an expectation that these will arrive in order.

fixed question

Most helpful comment

there is no such control at the moment, but there is a plan to implement that (Loki need it).

All 12 comments

Can you check if this problem is specific to the forward output plugin?

How would I go about doing that?

Checked with stdout plugin:

[OUTPUT]
    Name          stdout
    Match         *

It seems to work:

$ /opt/td-agent-bit/bin/td-agent-bit -c /etc/fluentbit/fluentbit.conf | grep '1234 qqq'
[1253] traces: [1574231224.589101536, {"filename"=>"/path/to/the/lines", "log"=>"1234 qqq", "var1"=>"1", "var2"=>"2"}]
[11274] traces: [1574231224.853998168, {"filename"=>"/path/to/the/lines", "log"=>"11234 qqq", "var1"=>"1", "var2"=>"2"}]
[4598] traces: [1574231224.943185502, {"filename"=>"/path/to/the/lines", "log"=>"21234 qqq", "var1"=>"1", "var2"=>"2"}]
[14598] traces: [1574231225.053945177, {"filename"=>"/path/to/the/lines", "log"=>"31234 qqq", "var1"=>"1", "var2"=>"2"}]
[7949] traces: [1574231225.214258257, {"filename"=>"/path/to/the/lines", "log"=>"41234 qqq", "var1"=>"1", "var2"=>"2"}]
[1306] traces: [1574231225.348539263, {"filename"=>"/path/to/the/lines", "log"=>"51234 qqq", "var1"=>"1", "var2"=>"2"}]
[11306] traces: [1574231225.445014308, {"filename"=>"/path/to/the/lines", "log"=>"61234 qqq", "var1"=>"1", "var2"=>"2"}]
[4665] traces: [1574231225.635418289, {"filename"=>"/path/to/the/lines", "log"=>"71234 qqq", "var1"=>"1", "var2"=>"2"}]
[14665] traces: [1574231225.711601283, {"filename"=>"/path/to/the/lines", "log"=>"81234 qqq", "var1"=>"1", "var2"=>"2"}]
[9150] traces: [1574231226.829944456, {"filename"=>"/path/to/the/lines", "log"=>"91234 qqq", "var1"=>"1", "var2"=>"2"}]

I don't have too much familiarity with fluentd. But my broad impression using other things such as SumoLogic is that timestamps are the organising principle of data fragments at scale, not order of creation or order of delivery. SumoLogic for example typically supports the use of either "source" timestamps or "arrival" timestamps. If there is any buffering the "source" timestamps might be more accurate, but data from multiple sources is relying on those various clocks to be synchronised, using the same timezone, etc.

@eplodn may I ask how does your fluentd config look like?

@eplodn also what do you see if you use a 1MB Mem_Buf_Limit ? In my measurements if fluentbit sends more data than what fluentd can write out, there will be issues just like you reported.

Order is not guaranteed if multiple chunks of data were generated (every chunk has N records), and upon flush/delivery time one output has to retry, that chunk will arrive later. Forcing an order will slow down things and add backpressure.

@edsiper is this similar to flush threads in fluentd? can this behaviour be controlled?

With normal fluentd you can use the buffer chunk key configuration to buffer logs by timestamp (and/or by source) before sending, so you can get some level of ordering depending on how long you wait for logs: https://docs.fluentd.org/configuration/buffer-section#time

I'm currently investigating how to deal with this also (as seen in my most recent commit in https://github.com/grafana/loki/issues/898#issuecomment-567722481)

@edsiper there are situations where the order is more important than throughput. Is there a flag to control this (disable concurrent chunk processing) for any given output?

there is no such control at the moment, but there is a plan to implement that (Loki need it).

@edsiper is this now implemented with the latest version?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

thrift24 picture thrift24  路  4Comments

JavaCS3 picture JavaCS3  路  3Comments

c0ze picture c0ze  路  3Comments

mhf-ir picture mhf-ir  路  4Comments

tarokkk picture tarokkk  路  3Comments