Describe the bug
Unable to perform "GROUP BY" on "multiple fields" correctly in stream processing at Fluent Bit version 1.3.3 (the binary was got from the yum installation), please see the detailed information of wrong aggregation result and how to reproduce it below.
To Reproduce
Step 1. Create a sample log like dns_bind.log below
2019-12-05 03:12:31.820221 client 192.168.11.12#55206 (google.com): query: google.com IN A +E(0)D (8.8.8.8)
2019-12-05 03:12:31.820221 client 192.168.11.12#55206 (google.com): query: google.com IN A +E(0)D (8.8.8.8)
2019-12-05 03:12:31.820221 client 192.168.11.12#55206 (facebook.com): query: facebook.com IN A +E(0)D (8.8.8.8)
2019-12-05 03:12:31.820221 client 192.168.11.12#55206 (facebook.com): query: facebook.com IN A +E(0)D (8.8.8.8)
2019-12-05 03:12:31.820221 client 192.168.11.12#55206 (google.com): query: google.com IN AAAA +E(0)D (8.8.8.8)
2019-12-05 03:12:31.820221 client 192.168.11.12#55206 (google.com): query: google.com IN AAAA +E(0)D (8.8.8.8)
2019-12-05 03:12:31.820221 client 192.168.11.12#55206 (facebook.com): query: facebook.com IN CNAME +E(0)D (8.8.8.8)
2019-12-05 03:12:31.820221 client 192.168.11.12#55206 (facebook.com): query: facebook.com IN CNAME +E(0)D (8.8.8.8)
Step 2. Setup an regex parser for the log above and put them into parser-bind.conf
[PARSER]
Name bind
Format regex
Regex ^(?<time>[^ ]*\ [^ ]*) (?<client>[^ ]*) (?<client_ip>[^ ]*)#(?<client_port>[^ ]*) \((?<target_queryname>[^ ]*)\): (?<query>[^ ]*): (?<query_domain_name>[^ ]*) (?<class>[^ ]*) (?<query_type>[^ ]*) (?<recursion_desired_flag>[^ ]*) \((?<dns_server>[^ ]*)\)$
when log is parsed by the parser above, it will be organized to
{"time"=>"2019-12-05 03:12:31.820221", "client"=>"client", "client_ip"=>"192.168.11.12", "client_port"=>"55206", "target_queryname"=>"facebook.com", "query"=>"query", "query_domain_name"=>"facebook.com", "class"=>"IN", "query_type"=>"A", "recursion_desired_flag"=>"+E(0)D", "dns_server"=>"8.8.8.8"}
Step 3. Setup a stream processor configuration stream-process-bind.conf below
[STREAM_TASK]
Name bind_sp_1
Exec CREATE STREAM bind_sp_1 AS SELECT query_domain_name, query_type, COUNT(*) AS hits FROM STREAM:bind_raw_log WINDOW TUMBLING (60 SECOND) GROUP BY query_domain_name, query_type;
Step 4. Setup the main configuration file flb_main.conf below
[SERVICE]
Parsers_File parser-bind.conf
Streams_File stream-process-bind.conf
Log_Level info
[INPUT]
Name tail
alias bind_raw_log
Path dns_bind.log
Parser bind
[OUTPUT]
Name stdout
Match bind_sp_1
Step 5. Execute FluentBit with the configuration files above, then we will get the aggregation result like the output below
{"query_domain_name"=>"google.com", "query_type"=>"A", "hits"=>4}
{"query_domain_name"=>"facebook.com", "query_type"=>"A", "hits"=>4}
but the result above isn't the correct result, the expected result should be like below
Expected behavior
The expected result from stream processor above should be like
{"query_domain_name"=>"google.com", "query_type"=>"A", "hits"=>2}
{"query_domain_name"=>"google.com", "query_type"=>"AAAA", "hits"=>2}
{"query_domain_name"=>"facebook.com", "query_type"=>"A", "hits"=>2}
{"query_domain_name"=>"facebook.com", "query_type"=>"CNAME", "hits"=>2}
Your Environment
Additional Context
Hi Team FluentBit,
Just want to say Thank you for you folks,
FluentBit is a really nice thing for using on log/metric collection or transferring.
btw, please feel free to let me know if there's something wrong with my configurations and occasion above.
thanks for reaching out. we will take a look at this shortly.
cc: @koleini
Hi there @edsiper @koleini, I seem to be encountering the same bug, or something very close:
With this query:
Exec CREATE STREAM test WITH (tag='metrics') AS SELECT bucket, user, request_type, return_code FROM STREAM:syslog.0;
We can see 3 types of object:
HEAD and 204HEAD and 404DELETE and 204[63] metrics: [1591200463.381824000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"204"}]
[64] metrics: [1591200463.388523000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"DELETE", "return_code"=>"204"}]
[86] metrics: [1591200463.979172000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"204"}]
[89] metrics: [1591200463.996844000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"404"}]
[90] metrics: [1591200464.004828000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"204"}]
[93] metrics: [1591200464.011969000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"DELETE", "return_code"=>"204"}]
[13] metrics: [1591200464.590625000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"204"}]
[43] metrics: [1591200465.263458000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"DELETE", "return_code"=>"204"}]
[57] metrics: [1591200465.844776000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"204"}]
[80] metrics: [1591200466.468888000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"204"}]
[88] metrics: [1591200466.501345999, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"DELETE", "return_code"=>"204"}]
[7] metrics: [1591200467.084268000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"204"}]
[12] metrics: [1591200467.101225000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"404"}]
[14] metrics: [1591200467.109168000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"204"}]
[18] metrics: [1591200467.116539000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"DELETE", "return_code"=>"204"}]
[15] metrics: [1591200467.701247000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"204"}]
[18] metrics: [1591200467.718519000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"404"}]
[20] metrics: [1591200467.729933000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"204"}]
[22] metrics: [1591200467.737960000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"DELETE", "return_code"=>"204"}]
[51] metrics: [1591200468.325867000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>"204"}]
[58] metrics: [1591200468.356263000, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"DELETE", "return_code"=>"204"}]
However with a GROUP BY on a 10s window:
Exec CREATE STREAM test WITH (tag='metrics') AS SELECT bucket, user, request_type, return_code, COUNT(*) FROM STREAM:syslog.0 WINDOW TUMBLING (10 SECOND) GROUP BY bucket, user, request_type, return_code;
The 3 types of query are merged into one, not sure why HEAD/204 most of the time:
[1] metrics: [1591200024.539781325, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>22}]
[1] metrics: [1591200034.539642286, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>30}]
[1] metrics: [1591200044.539768685, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>29}]
[1] metrics: [1591200054.540146255, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>30}]
[1] metrics: [1591200064.539900097, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>28}]
[1] metrics: [1591200074.539819689, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>38}]
[1] metrics: [1591200084.540167859, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>25}]
[1] metrics: [1591200094.540202674, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>34}]
[1] metrics: [1591200104.540878807, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"DELETE", "return_code"=>204, "COUNT(*)"=>35}]
[1] metrics: [1591200114.539894250, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>27}]
[1] metrics: [1591200124.539826410, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>404, "COUNT(*)"=>25}]
[1] metrics: [1591200134.539776696, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>26}]
[1] metrics: [1591200144.539739610, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>28}]
[1] metrics: [1591200154.539677022, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>404, "COUNT(*)"=>25}]
[1] metrics: [1591200164.539861108, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>35}]
[1] metrics: [1591200174.539722122, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>404, "COUNT(*)"=>23}]
[1] metrics: [1591200184.539711198, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>404, "COUNT(*)"=>31}]
[1] metrics: [1591200194.539793784, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"DELETE", "return_code"=>204, "COUNT(*)"=>34}]
[1] metrics: [1591200204.539724211, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>404, "COUNT(*)"=>29}]
[0] metrics: [1591200214.539801103, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>24}]
[1] metrics: [1591200224.539757248, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>32}]
[1] metrics: [1591200234.539763650, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>35}]
[1] metrics: [1591200244.539648297, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>28}]
[1] metrics: [1591200254.539896066, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"DELETE", "return_code"=>204, "COUNT(*)"=>35}]
[0] metrics: [1591200264.539762363, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>38}]
[1] metrics: [1591200274.540079431, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>33}]
[1] metrics: [1591200284.540142305, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>35}]
[1] metrics: [1591200294.540164270, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"DELETE", "return_code"=>204, "COUNT(*)"=>31}]
[1] metrics: [1591200304.539846425, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>25}]
[1] metrics: [1591200314.540670716, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>30}]
[1] metrics: [1591200324.539992510, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>33}]
[1] metrics: [1591200334.540104294, {"bucket"=>"stanislas-test", "user"=>"xxx", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>28}]
When another user and bucket come into play, I now get two different objects, but the request_type and return_type are still merged into one:
[0] metrics: [1591200688.539667544, {"bucket"=>"bucket1", "user"=>"user1", "request_type"=>"GET", "return_code"=>200, "COUNT(*)"=>1237}]
[1] metrics: [1591200688.539674959, {"bucket"=>"bucket2", "user"=>"user2", "request_type"=>"HEAD", "return_code"=>204, "COUNT(*)"=>16}]
[0] metrics: [1591200698.539628194, {"bucket"=>"bucket1", "user"=>"user1", "request_type"=>"GET", "return_code"=>200, "COUNT(*)"=>2197}]
[1] metrics: [1591200698.539636080, {"bucket"=>"bucket2", "user"=>"user2", "request_type"=>"HEAD", "return_code"=>404, "COUNT(*)"=>32}]
Patch merged, it will be backported for v1.4.6 release too
Most helpful comment
Patch merged, it will be backported for v1.4.6 release too