Describe the bug
Custom parser is not found and then is not applied
To Reproduce
Create a custom parser
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
@INCLUDE input-kubernetes.conf
@INCLUDE filter-kubernetes.conf
@INCLUDE output-kafka.conf
parsers.conf:-
[PARSER]
Name alexa
Format regex
Regex ^.*\{"request_id":"(?<aux5>.*?)","component":"(?<sitename>.*?)","timestamp":"(?<timestamp>.*?)","level":"(?<level>.*?)","message":{"method":"(?<method>.*?)","url":"(?<uri-stem>.*?)","query_string":(?<uri-query>.*?),"response_time":(?<rt>.*?),"status_code":(?<code>\d.*?),"response_length":(?<bytes_out>\d*+)\}\}$
Time_Key timestamp
Time_Format %Y-%m-%dT%H:%M:%S
Decode_Field_As escaped log
#Decode_Field json log
Create a tail input to use this parser:
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/alexa*.log
Parser alexa
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
Run docker fluent-bit container and check logs
[2019/07/30 12:53:48] [ info] [engine] started (pid=1)
[2019/07/30 12:53:48] [error] [in_tail] parser 'alexa' is not registered
Expected behavior
no errors
Your Environment
If I try your parser I get the following error:
[2019/07/31 11:13:21] [error] [parser:alexa] Invalid regex pattern ^.*\{"request_id":"(?<aux5>.*?)","component":"(?<sitename>.*?)","timestamp":"(?<timestamp>.*?)","level":"(?<level>.*?)","message":{"method":"(?<method>.*?)","url":"(?<uri-stem>.*?)","query_string":(?<uri-query>.*?),"response_time":(?<rt>.*?),"status_code":(?<code>\d.*?),"response_length":(?<bytes_out>\d*+)\}\}$
there is somethingi wrong with the the regex rule, so it's not registered.
Rubular passes the expression:
@edsiper @rmacian
The root issue is that Onigmo does not allow to use characters other than
alphanumeric + underscore in a group name. Since the regex uses hyphens
in some group names (e.g. <url-stem>), it causes Onigmo to fail.
To avoid this issue, we need to use something like below instead.
Regex ^.*\{"request_id":"(?<aux5>.*?)","component":"(?<sitename>.*?)","timestamp":"(?<timestamp>.*?)","level":"(?<level>.*?)","message":{"method":"(?<method>.*?)","url":"(?<uri_stem>.*?)","query_string":(?<uri_query>.*?),"response_time":(?<rt>.*?),"status_code":(?<code>\d.*?),"response_length":(?<bytes_out>\d*+)\}\}$
I can confirm this version works fine with Fluent Bit.
$ fluent-bit -R parser.conf -i tail -p path=test.log -p parser=alexa -o stdout
Fluent Bit v1.3.0
Copyright (C) Treasure Data
[2019/08/05 09:02:02] [ info] [storage] initializing...
[2019/08/05 09:02:02] [ info] [storage] in-memory
[2019/08/05 09:02:02] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2019/08/05 09:02:02] [ info] [engine] started (pid=2706)
[2019/08/05 09:02:02] [ info] [sp] stream processor started
[0] tail.0: [1564421500.000000000, {"aux5"=>"7d2805e7-938e-40d5-a726-e6511d38f6e7","elapsed_time":"2ms", "sitename"=>"tvopenplatform.alexa.api", "level"=>"INFO", "method"=>"GET", "uri_stem"=>"/healthcheck", "uri_query"=>"null", "rt"=>""2ms"", "code"=>"200", "bytes_out"=>"106"}]
@fujimotos do we have the same behavior in Fluentd?
do we have the same behavior in Fluentd?
I launched a testing instance and could confirm that Fluentd can handle the original
regular expression without problem.
Fluentd uses Ruby's RegExp, which is based on Onigumo too. However, it seems that
Ruby expanded the library to allow arbitrary characters in a group name, so the regular
expression works in Fluentd.
Hmmm, is there any reason (perf?) why numbers are not allowed in onigmo?
I can confirm this version works fine with Fluent Bit.
$ fluent-bit -R parser.conf -i tail -p path=test.log -p parser=alexa -o stdout Fluent Bit v1.3.0 Copyright (C) Treasure Data [2019/08/05 09:02:02] [ info] [storage] initializing... [2019/08/05 09:02:02] [ info] [storage] in-memory [2019/08/05 09:02:02] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128 [2019/08/05 09:02:02] [ info] [engine] started (pid=2706) [2019/08/05 09:02:02] [ info] [sp] stream processor started [0] tail.0: [1564421500.000000000, {"aux5"=>"7d2805e7-938e-40d5-a726-e6511d38f6e7","elapsed_time":"2ms", "sitename"=>"tvopenplatform.alexa.api", "level"=>"INFO", "method"=>"GET", "uri_stem"=>"/healthcheck", "uri_query"=>"null", "rt"=>""2ms"", "code"=>"200", "bytes_out"=>"106"}]
I can't reproduce in 1.2.2 in command line:
root@dd8789b1c011:~# fluent-bit -R parser.conf -i tail -p path=test.log -p parser=alexa -o stdout
[2019/08/05 12:32:48] [Warning] [config] invalid path file (null)
I created a config file and then I got it working with msg_pack as you has posted. But in json look what just do, the backslashes are inconsistent:
{"date":1565009949.221786,"aux5":"bcc281cd-9072-48a4-a133-8691d674183b\",\"elapsed_time\":\"1ms","sitename":"tvopenplatform.alexa.api","timestamp":"2019-08-05T11:34:41.985Z","level":"INFO","method":"GET","uri_stem":"/healthcheck","uri_query":"null","rt":"\"1ms\"","code":"200","bytes_out":"106"}
Taking a look carefully I have realized that my log output from my container is not exactly the same that I read from the files:
$ oc logs -n gvp alexa-9-fmvrc |tail -1
16:20:04 0|alexa | {"request_id":"80de5c5e-f648-485f-a942-cdbcccdfe91c","elapsed_time":"2ms","component":"tvopenplatform.alexa.api","timestamp":"2019-08-05T16:20:04.900Z","level":"INFO","message":{"method":"GET","url":"/healthcheck","query_string":null,"response_time":"2ms","status_code":200,"response_length":106}}
$ tail -1 /var/log/containers/alexa-9-fmvrc_gvp_alexa-7a007f6643a0b93494fa082a12667b6d1de7718d9143a82084e56ed014a7aa18.log
{"log":"16:20:04 0|alexa | {\"request_id\":\"80de5c5e-f648-485f-a942-cdbcccdfe91c\",\"elapsed_time\":\"2ms\",\"component\":\"tvopenplatform.alexa.api\",\"timestamp\":\"2019-08-05T16:20:04.900Z\",\"level\":\"INFO\",\"message\":{\"method\":\"GET\",\"url\":\"/healthcheck\",\"query_string\":null,\"response_time\":\"2ms\",\"status_code\":200,\"response_length\":106}}\n","stream":"stdout","time":"2019-08-05T16:20:04.901571029Z"}
Any idea how can I get rid of this ? I think it was a misconfiguration of fluent-bit but now I see it isn`t, it's how the log is read
@edsiper I've not thought through the full implication yet, but it seems to me
that it is just a matter of design choice, as much the same with C, which does
not allow variable names to contain hyphen.
@rmacian What you need is to do is to parse JSON first and apply filter_parser
to each parsed record. Here is an simple example:
[INPUT]
Name tail
Path /path/to/your/log
Tag servlet.*
Parser docker
[FILTER]
Name parser
Match servlet.*
Key_Name log
Parser alexa
we will fix the docs so: group names only allow alphabet characters.
thanks for the help @fujimotos
@edsiper I posted a patch that documents the limitation in the manual
at fluent/fluent-bit-docs/pull/201. Please let me know if you find anything
unclear...