Fluent-bit: Elasticsearch output plugin should fix duplicate fields. Elastic returns duplicate key exception

Created on 20 Jun 2018  路  14Comments  路  Source: fluent/fluent-bit

Imaging having the following log on a kubenetes cluster:

{"log":"{\"level\":30,\"time\":1529440745411}\n","stream":"stdout","time":"2018-06-19T20:39:05.411862193Z"}

The docker log contains a field "time". The application is also logging a field called "time". This causes a duplicate field exeption in ElasticSearch.

What I tried to do:

  1. Used the option 'Merge_JSON_key' in the Kubernetes plugin in the hope it should do some kind of prefixing of the time field inside the log-content. This without succes:
    `
    [FILTER]
    K8S-Logging.Parser On
    K8S-Logging.exclude True
    Kube_URL https://${KUBERNETES_SERVICE_HOST}:443
    Match kube.*
    Merge_JSON_Key k8s
    Merge_JSON_Log On
    Name kubernetes
    tls.verify Off

`

  1. Use the 'Record Modifier Plugin'. This didn't workout since it modifies both fields
  2. Rename the field 'time' in the application that is producing the logs. This is undoable since we have so many applications and development teams.

Anybody with a solution? I really need some help. This already cost me 2 days now.

An solution could be to modify the 'Elastic search output plugin' so that it postfixes duplicate fields with a number. E.a: "time1" and "time2"

fixed question

Most helpful comment

This issue crashed the elasticsearch node

All 14 comments

@marckamerbeek Can you try enabling Generate_ID in the elasticsearch output plugin configuration? https://fluentbit.io/documentation/0.13/output/elasticsearch.html

@MohdAhmad I will try this after this weekend. Today I have to finnish some stuff. Thanks for mentioning! I'll let you know

@MohdAhmad I've tried it and still getting error like this:

{"took":14,"errors":true,"items":[{"index":{"_index":"gslogs-2018.08.28","_type":"flb_type","_id":"3f377a62-4722-234c-4746-00653973668f","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"json_parse_exception","reason":"Duplicate field 'time'\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@7a895f42; line: 1, column: 746]"}}}},{"index":{"_index":"gslogs-2018.08.28","_type":"flb_type","_id":"70270de1-cafe-cfbd-87f2-12e6e3d5bd6b","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"json_parse_exception","reason":"Duplicate field 'time'\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@7ae46faa; line: 1, column: 715]"}}}},{"index":{"_index":"gslogs-2018.08.28","_type":"flb_type","_id":"5f0886c8-eb18-9017-7090-8804a8c97187","status":400,"e

Same here. I could fix the duplicate @timestamp-Field by using Time_Key:

[OUTPUT]
    Name            es
    Match           *
    Host            ${FLUENT_ELASTICSEARCH_HOST}
    Port             ${FLUENT_ELASTICSEARCH_PORT}
    Logstash_Format On
    Retry_Limit     False
    Time_Key        @timestamp-es

... but then i saw a similar message with the time field. However, after looking around at the Parser i toggled the Option Time_Keep from On to Off.

[PARSER]
    Name        docker
    Format      json
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L
    Time_Keep   Off

This is from the Docu:

Time_Keep | By default when a time key is recognized and parsed, the parser will drop the original time field. Enabling this option will make the parser to keep the original time field and it value in the log entry.

This got rid of the parse error. I will check now how this changed the logs in the index, but looks okay for me.

@abrakhim thanks for the update. This sounds good. I'll check this next week in my own configuration. Thanks!

Would it be possible, to implement a Time_Keep option in elasticsearch output plugin?

Shouldn't fluent-bit add items to json not just by appending them? 'time' field twice? smth went wrong.

Guys, Time_Keep only solves duplicated fields named time. And what about other duplicated fields like _BOOT_ID (get from journald)? IMO a new Record Modifier filter's rule could be implemented to do it (like _Duplicate_)

You can use the Elasticsearch "Merge_Log_Key" property so your unpacked 'log' content will be set under a new key, avoiding duplicate keys-case like this.

This issue crashed the elasticsearch node

It's not only about kubernetes. I've got the same problem parsing my application json log. And yes, es nodes are crushed by this issue

what would be the expected behavior when finding a duplicated ?

may it be called as "fluent-bit cannot parse Kubernetes logs properly."?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

JavaCS3 picture JavaCS3  路  3Comments

jcdauchy-moodys picture jcdauchy-moodys  路  3Comments

arienchen picture arienchen  路  3Comments

brycefisher picture brycefisher  路  3Comments

Barbazoo picture Barbazoo  路  3Comments