Fluent-bit: [error] [in_tail] parser is not registered

Created on 30 Jul 2019  路  11Comments  路  Source: fluent/fluent-bit

Bug Report

Describe the bug
Custom parser is not found and then is not applied

To Reproduce
Create a custom parser

  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     info
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020

    @INCLUDE input-kubernetes.conf
    @INCLUDE filter-kubernetes.conf
    @INCLUDE output-kafka.conf
parsers.conf:-
  [PARSER]
    Name alexa
    Format regex
    Regex ^.*\{"request_id":"(?<aux5>.*?)","component":"(?<sitename>.*?)","timestamp":"(?<timestamp>.*?)","level":"(?<level>.*?)","message":{"method":"(?<method>.*?)","url":"(?<uri-stem>.*?)","query_string":(?<uri-query>.*?),"response_time":(?<rt>.*?),"status_code":(?<code>\d.*?),"response_length":(?<bytes_out>\d*+)\}\}$
    Time_Key timestamp
    Time_Format %Y-%m-%dT%H:%M:%S
    Decode_Field_As   escaped  log   
    #Decode_Field      json     log

Create a tail input to use this parser:

     [INPUT]
    Name              tail
    Tag               kube.*
    Path              /var/log/containers/alexa*.log
    Parser            alexa
    DB                /var/log/flb_kube.db
    Mem_Buf_Limit     5MB
    Skip_Long_Lines   On
    Refresh_Interval  10

Run docker fluent-bit container and check logs

[2019/07/30 12:53:48] [ info] [engine] started (pid=1)
[2019/07/30 12:53:48] [error] [in_tail] parser 'alexa' is not registered

Expected behavior
no errors

Your Environment

  • Version used: 1.2.2
  • Environment name and version (e.g. Kubernetes? What version?): Openshift 3.9
  • Operating System and version: RHEL7
  • Filters and plugins: Tail, kubernetes, custom filter
not-an-issue waiting-for-user

All 11 comments

If I try your parser I get the following error:

[2019/07/31 11:13:21] [error] [parser:alexa] Invalid regex pattern ^.*\{"request_id":"(?<aux5>.*?)","component":"(?<sitename>.*?)","timestamp":"(?<timestamp>.*?)","level":"(?<level>.*?)","message":{"method":"(?<method>.*?)","url":"(?<uri-stem>.*?)","query_string":(?<uri-query>.*?),"response_time":(?<rt>.*?),"status_code":(?<code>\d.*?),"response_length":(?<bytes_out>\d*+)\}\}$

there is somethingi wrong with the the regex rule, so it's not registered.

Rubular passes the expression:

https://rubular.com/r/jREQNoAy2FTtO9

@edsiper @rmacian

The root issue is that Onigmo does not allow to use characters other than
alphanumeric + underscore in a group name. Since the regex uses hyphens
in some group names (e.g. <url-stem>), it causes Onigmo to fail.

To avoid this issue, we need to use something like below instead.

Regex ^.*\{"request_id":"(?<aux5>.*?)","component":"(?<sitename>.*?)","timestamp":"(?<timestamp>.*?)","level":"(?<level>.*?)","message":{"method":"(?<method>.*?)","url":"(?<uri_stem>.*?)","query_string":(?<uri_query>.*?),"response_time":(?<rt>.*?),"status_code":(?<code>\d.*?),"response_length":(?<bytes_out>\d*+)\}\}$

I can confirm this version works fine with Fluent Bit.

$ fluent-bit -R parser.conf -i tail -p path=test.log -p parser=alexa -o stdout
Fluent Bit v1.3.0
Copyright (C) Treasure Data

[2019/08/05 09:02:02] [ info] [storage] initializing...
[2019/08/05 09:02:02] [ info] [storage] in-memory
[2019/08/05 09:02:02] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2019/08/05 09:02:02] [ info] [engine] started (pid=2706)
[2019/08/05 09:02:02] [ info] [sp] stream processor started
[0] tail.0: [1564421500.000000000, {"aux5"=>"7d2805e7-938e-40d5-a726-e6511d38f6e7","elapsed_time":"2ms", "sitename"=>"tvopenplatform.alexa.api", "level"=>"INFO", "method"=>"GET", "uri_stem"=>"/healthcheck", "uri_query"=>"null", "rt"=>""2ms"", "code"=>"200", "bytes_out"=>"106"}]

@fujimotos do we have the same behavior in Fluentd?

do we have the same behavior in Fluentd?

I launched a testing instance and could confirm that Fluentd can handle the original
regular expression without problem.

Fluentd uses Ruby's RegExp, which is based on Onigumo too. However, it seems that
Ruby expanded the library to allow arbitrary characters in a group name, so the regular
expression works in Fluentd.

Hmmm, is there any reason (perf?) why numbers are not allowed in onigmo?

I can confirm this version works fine with Fluent Bit.

$ fluent-bit -R parser.conf -i tail -p path=test.log -p parser=alexa -o stdout
Fluent Bit v1.3.0
Copyright (C) Treasure Data

[2019/08/05 09:02:02] [ info] [storage] initializing...
[2019/08/05 09:02:02] [ info] [storage] in-memory
[2019/08/05 09:02:02] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2019/08/05 09:02:02] [ info] [engine] started (pid=2706)
[2019/08/05 09:02:02] [ info] [sp] stream processor started
[0] tail.0: [1564421500.000000000, {"aux5"=>"7d2805e7-938e-40d5-a726-e6511d38f6e7","elapsed_time":"2ms", "sitename"=>"tvopenplatform.alexa.api", "level"=>"INFO", "method"=>"GET", "uri_stem"=>"/healthcheck", "uri_query"=>"null", "rt"=>""2ms"", "code"=>"200", "bytes_out"=>"106"}]

I can't reproduce in 1.2.2 in command line:

root@dd8789b1c011:~# fluent-bit -R parser.conf -i tail -p path=test.log -p parser=alexa -o stdout 
[2019/08/05 12:32:48] [Warning] [config] invalid path file (null)

I created a config file and then I got it working with msg_pack as you has posted. But in json look what just do, the backslashes are inconsistent:

{"date":1565009949.221786,"aux5":"bcc281cd-9072-48a4-a133-8691d674183b\",\"elapsed_time\":\"1ms","sitename":"tvopenplatform.alexa.api","timestamp":"2019-08-05T11:34:41.985Z","level":"INFO","method":"GET","uri_stem":"/healthcheck","uri_query":"null","rt":"\"1ms\"","code":"200","bytes_out":"106"}

Taking a look carefully I have realized that my log output from my container is not exactly the same that I read from the files:

$ oc logs -n gvp alexa-9-fmvrc |tail -1
16:20:04 0|alexa  | {"request_id":"80de5c5e-f648-485f-a942-cdbcccdfe91c","elapsed_time":"2ms","component":"tvopenplatform.alexa.api","timestamp":"2019-08-05T16:20:04.900Z","level":"INFO","message":{"method":"GET","url":"/healthcheck","query_string":null,"response_time":"2ms","status_code":200,"response_length":106}}
$ tail -1 /var/log/containers/alexa-9-fmvrc_gvp_alexa-7a007f6643a0b93494fa082a12667b6d1de7718d9143a82084e56ed014a7aa18.log
{"log":"16:20:04 0|alexa  | {\"request_id\":\"80de5c5e-f648-485f-a942-cdbcccdfe91c\",\"elapsed_time\":\"2ms\",\"component\":\"tvopenplatform.alexa.api\",\"timestamp\":\"2019-08-05T16:20:04.900Z\",\"level\":\"INFO\",\"message\":{\"method\":\"GET\",\"url\":\"/healthcheck\",\"query_string\":null,\"response_time\":\"2ms\",\"status_code\":200,\"response_length\":106}}\n","stream":"stdout","time":"2019-08-05T16:20:04.901571029Z"}

Any idea how can I get rid of this ? I think it was a misconfiguration of fluent-bit but now I see it isn`t, it's how the log is read

@edsiper I've not thought through the full implication yet, but it seems to me
that it is just a matter of design choice, as much the same with C, which does
not allow variable names to contain hyphen.

@rmacian What you need is to do is to parse JSON first and apply filter_parser
to each parsed record. Here is an simple example:

[INPUT]
  Name             tail
  Path             /path/to/your/log
  Tag              servlet.*
  Parser           docker

[FILTER]
  Name             parser
  Match            servlet.*
  Key_Name         log
  Parser           alexa

we will fix the docs so: group names only allow alphabet characters.

thanks for the help @fujimotos

@edsiper I posted a patch that documents the limitation in the manual
at fluent/fluent-bit-docs/pull/201. Please let me know if you find anything
unclear...

Was this page helpful?
0 / 5 - 0 ratings