Fluent-bit: Unable to perform "GROUP BY" on "multiple fields" correctly in stream processing

Created on 7 Dec 2019  路  4Comments  路  Source: fluent/fluent-bit

Bug Report

Describe the bug
Unable to perform "GROUP BY" on "multiple fields" correctly in stream processing at Fluent Bit version 1.3.3 (the binary was got from the yum installation), please see the detailed information of wrong aggregation result and how to reproduce it below.

To Reproduce

  • Steps to reproduce the problem:

Step 1. Create a sample log like dns_bind.log below

2019-12-05 03:12:31.820221 client 192.168.11.12#55206 (google.com): query: google.com IN A +E(0)D (8.8.8.8)
2019-12-05 03:12:31.820221 client 192.168.11.12#55206 (google.com): query: google.com IN A +E(0)D (8.8.8.8)
2019-12-05 03:12:31.820221 client 192.168.11.12#55206 (facebook.com): query: facebook.com IN A +E(0)D (8.8.8.8)
2019-12-05 03:12:31.820221 client 192.168.11.12#55206 (facebook.com): query: facebook.com IN A +E(0)D (8.8.8.8)
2019-12-05 03:12:31.820221 client 192.168.11.12#55206 (google.com): query: google.com IN AAAA +E(0)D (8.8.8.8)
2019-12-05 03:12:31.820221 client 192.168.11.12#55206 (google.com): query: google.com IN AAAA +E(0)D (8.8.8.8)
2019-12-05 03:12:31.820221 client 192.168.11.12#55206 (facebook.com): query: facebook.com IN CNAME +E(0)D (8.8.8.8)
2019-12-05 03:12:31.820221 client 192.168.11.12#55206 (facebook.com): query: facebook.com IN CNAME +E(0)D (8.8.8.8)

Step 2. Setup an regex parser for the log above and put them into parser-bind.conf

[PARSER]
 Name bind
 Format regex
 Regex ^(?<time>[^ ]*\ [^ ]*) (?<client>[^ ]*) (?<client_ip>[^ ]*)#(?<client_port>[^ ]*) \((?<target_queryname>[^ ]*)\): (?<query>[^ ]*): (?<query_domain_name>[^ ]*) (?<class>[^ ]*) (?<query_type>[^ ]*) (?<recursion_desired_flag>[^ ]*) \((?<dns_server>[^ ]*)\)$

when log is parsed by the parser above, it will be organized to

{"time"=>"2019-12-05 03:12:31.820221", "client"=>"client", "client_ip"=>"192.168.11.12", "client_port"=>"55206", "target_queryname"=>"facebook.com", "query"=>"query", "query_domain_name"=>"facebook.com", "class"=>"IN", "query_type"=>"A", "recursion_desired_flag"=>"+E(0)D", "dns_server"=>"8.8.8.8"}

Step 3. Setup a stream processor configuration stream-process-bind.conf below

[STREAM_TASK]
 Name bind_sp_1
 Exec CREATE STREAM bind_sp_1 AS SELECT query_domain_name, query_type, COUNT(*) AS hits FROM STREAM:bind_raw_log WINDOW TUMBLING (60 SECOND) GROUP BY query_domain_name, query_type;

Step 4. Setup the main configuration file flb_main.conf below

[SERVICE]
 Parsers_File parser-bind.conf
 Streams_File stream-process-bind.conf
 Log_Level info

[INPUT]
 Name tail
 alias bind_raw_log
 Path dns_bind.log
 Parser bind

[OUTPUT]
 Name stdout
 Match bind_sp_1

Step 5. Execute FluentBit with the configuration files above, then we will get the aggregation result like the output below

{"query_domain_name"=>"google.com", "query_type"=>"A", "hits"=>4}
{"query_domain_name"=>"facebook.com", "query_type"=>"A", "hits"=>4}

but the result above isn't the correct result, the expected result should be like below

Expected behavior
The expected result from stream processor above should be like

{"query_domain_name"=>"google.com", "query_type"=>"A", "hits"=>2}
{"query_domain_name"=>"google.com", "query_type"=>"AAAA", "hits"=>2}
{"query_domain_name"=>"facebook.com", "query_type"=>"A", "hits"=>2}
{"query_domain_name"=>"facebook.com", "query_type"=>"CNAME", "hits"=>2}

Your Environment

  • Version used: Fluent Bit v1.3.3
  • Configuration: Please see the configuration files above
  • Environment name and version (e.g. Kubernetes? What version?): NA
  • Server type and version: NA
  • Operating System and version: CentOS Linux release 7.2.1511 (Core)
  • Filters and plugins: NA

Additional Context
Hi Team FluentBit,
Just want to say Thank you for you folks,
FluentBit is a really nice thing for using on log/metric collection or transferring.
btw, please feel free to let me know if there's something wrong with my configurations and occasion above.

work-in-process

Most helpful comment

Patch merged, it will be backported for v1.4.6 release too

All 4 comments

thanks for reaching out. we will take a look at this shortly.

cc: @koleini

Hi there @edsiper @koleini, I seem to be encountering the same bug, or something very close:

With this query:

Exec CREATE STREAM test WITH (tag='metrics') AS SELECT bucket, user, request_type, return_code FROM STREAM:syslog.0;

We can see 3 types of object:

  • HEAD and 204
  • HEAD and 404
  • DELETE and 204
[63] metrics: [1591200463.381824000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"204"}]
[64] metrics: [1591200463.388523000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"DELETE", "return_code"=>"204"}]
[86] metrics: [1591200463.979172000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"204"}]
[89] metrics: [1591200463.996844000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"404"}]
[90] metrics: [1591200464.004828000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"204"}]
[93] metrics: [1591200464.011969000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"DELETE", "return_code"=>"204"}]
[13] metrics: [1591200464.590625000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"204"}]
[43] metrics: [1591200465.263458000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"DELETE", "return_code"=>"204"}]
[57] metrics: [1591200465.844776000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"204"}]
[80] metrics: [1591200466.468888000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"204"}]
[88] metrics: [1591200466.501345999, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"DELETE", "return_code"=>"204"}]
[7] metrics: [1591200467.084268000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"204"}]
[12] metrics: [1591200467.101225000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"404"}]
[14] metrics: [1591200467.109168000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"204"}]
[18] metrics: [1591200467.116539000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"DELETE", "return_code"=>"204"}]
[15] metrics: [1591200467.701247000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"204"}]
[18] metrics: [1591200467.718519000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"404"}]
[20] metrics: [1591200467.729933000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"204"}]
[22] metrics: [1591200467.737960000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"DELETE", "return_code"=>"204"}]
[51] metrics: [1591200468.325867000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"204"}]
[58] metrics: [1591200468.356263000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"DELETE", "return_code"=>"204"}]

However with a GROUP BY on a 10s window:

Exec CREATE STREAM test WITH (tag='metrics') AS SELECT bucket, user, request_type, return_code, COUNT(*) FROM STREAM:syslog.0 WINDOW TUMBLING (10 SECOND) GROUP BY bucket, user, request_type, return_code;

The 3 types of query are merged into one, not sure why HEAD/204 most of the time:

[1] metrics: [1591200024.539781325, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>22}]
[1] metrics: [1591200034.539642286, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>30}]
[1] metrics: [1591200044.539768685, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>29}]
[1] metrics: [1591200054.540146255, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>30}]
[1] metrics: [1591200064.539900097, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>28}]
[1] metrics: [1591200074.539819689, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>38}]
[1] metrics: [1591200084.540167859, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>25}]
[1] metrics: [1591200094.540202674, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>34}]
[1] metrics: [1591200104.540878807, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"DELETE", "return_code"=>204, "COUNT(*)"=>35}]
[1] metrics: [1591200114.539894250, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>27}]
[1] metrics: [1591200124.539826410, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>404, "COUNT(*)"=>25}]
[1] metrics: [1591200134.539776696, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>26}]
[1] metrics: [1591200144.539739610, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>28}]
[1] metrics: [1591200154.539677022, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>404, "COUNT(*)"=>25}]
[1] metrics: [1591200164.539861108, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>35}]
[1] metrics: [1591200174.539722122, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>404, "COUNT(*)"=>23}]
[1] metrics: [1591200184.539711198, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>404, "COUNT(*)"=>31}]
[1] metrics: [1591200194.539793784, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"DELETE", "return_code"=>204, "COUNT(*)"=>34}]
[1] metrics: [1591200204.539724211, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>404, "COUNT(*)"=>29}]
[0] metrics: [1591200214.539801103, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>24}]
[1] metrics: [1591200224.539757248, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>32}]
[1] metrics: [1591200234.539763650, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>35}]
[1] metrics: [1591200244.539648297, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>28}]
[1] metrics: [1591200254.539896066, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"DELETE", "return_code"=>204, "COUNT(*)"=>35}]
[0] metrics: [1591200264.539762363, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>38}]
[1] metrics: [1591200274.540079431, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>33}]
[1] metrics: [1591200284.540142305, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>35}]
[1] metrics: [1591200294.540164270, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"DELETE", "return_code"=>204, "COUNT(*)"=>31}]
[1] metrics: [1591200304.539846425, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>25}]
[1] metrics: [1591200314.540670716, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>30}]
[1] metrics: [1591200324.539992510, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>33}]
[1] metrics: [1591200334.540104294, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>28}]

When another user and bucket come into play, I now get two different objects, but the request_type and return_type are still merged into one:

[0] metrics: [1591200688.539667544, {"bucket"=>"bucket1", "user"=>"user1", "request_type"=>"GET", "return_code"=>200, "COUNT(*)"=>1237}]
[1] metrics: [1591200688.539674959, {"bucket"=>"bucket2", "user"=>"user2", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>16}]
[0] metrics: [1591200698.539628194, {"bucket"=>"bucket1", "user"=>"user1", "request_type"=>"GET", "return_code"=>200, "COUNT(*)"=>2197}]
[1] metrics: [1591200698.539636080, {"bucket"=>"bucket2", "user"=>"user2", "request_type"=>"HEAD", "return_code"=>404, "COUNT(*)"=>32}]

Patch merged, it will be backported for v1.4.6 release too

Was this page helpful?
0 / 5 - 0 ratings

Related issues

edsiper picture edsiper  路  4Comments

c0ze picture c0ze  路  3Comments

tarokkk picture tarokkk  路  3Comments

Barbazoo picture Barbazoo  路  3Comments

edsiper picture edsiper  路  4Comments