Fluent-bit: Elasticsearch output plugin should fix duplicate fields. Elastic returns duplicate key exception

Created on 20 Jun 2018 · 14Comments · Source: fluent/fluent-bit

Imaging having the following log on a kubenetes cluster:

{"log":"{\"level\":30,\"time\":1529440745411}\n","stream":"stdout","time":"2018-06-19T20:39:05.411862193Z"}

The docker log contains a field "time". The application is also logging a field called "time". This causes a duplicate field exeption in ElasticSearch.

What I tried to do:

Used the option 'Merge_JSON_key' in the Kubernetes plugin in the hope it should do some kind of prefixing of the time field inside the log-content. This without succes:
`
[FILTER]
K8S-Logging.Parser On
K8S-Logging.exclude True
Kube_URL https://${KUBERNETES_SERVICE_HOST}:443
Match kube.*
Merge_JSON_Key k8s
Merge_JSON_Log On
Name kubernetes
tls.verify Off

Use the 'Record Modifier Plugin'. This didn't workout since it modifies both fields
Rename the field 'time' in the application that is producing the logs. This is undoable since we have so many applications and development teams.

Anybody with a solution? I really need some help. This already cost me 2 days now.

An solution could be to modify the 'Elastic search output plugin' so that it postfixes duplicate fields with a number. E.a: "time1" and "time2"

fixed question

Source

marckamerbeek

👍3

Most helpful comment

This issue crashed the elasticsearch node

debu99 on 28 Feb 2020

👍2

All 14 comments

@marckamerbeek Can you try enabling Generate_ID in the elasticsearch output plugin configuration? https://fluentbit.io/documentation/0.13/output/elasticsearch.html

MohdAhmad on 24 Aug 2018

@MohdAhmad I will try this after this weekend. Today I have to finnish some stuff. Thanks for mentioning! I'll let you know

marckamerbeek on 24 Aug 2018

@MohdAhmad I've tried it and still getting error like this:

{"took":14,"errors":true,"items":[{"index":{"_index":"gslogs-2018.08.28","_type":"flb_type","_id":"3f377a62-4722-234c-4746-00653973668f","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"json_parse_exception","reason":"Duplicate field 'time'\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@7a895f42; line: 1, column: 746]"}}}},{"index":{"_index":"gslogs-2018.08.28","_type":"flb_type","_id":"70270de1-cafe-cfbd-87f2-12e6e3d5bd6b","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"json_parse_exception","reason":"Duplicate field 'time'\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@7ae46faa; line: 1, column: 715]"}}}},{"index":{"_index":"gslogs-2018.08.28","_type":"flb_type","_id":"5f0886c8-eb18-9017-7090-8804a8c97187","status":400,"e

jgsqware on 28 Aug 2018

👍1

Same here. I could fix the duplicate @timestamp-Field by using Time_Key:

[OUTPUT]
    Name            es
    Match           *
    Host            ${FLUENT_ELASTICSEARCH_HOST}
    Port             ${FLUENT_ELASTICSEARCH_PORT}
    Logstash_Format On
    Retry_Limit     False
    Time_Key        @timestamp-es

... but then i saw a similar message with the time field. However, after looking around at the Parser i toggled the Option Time_Keep from On to Off.

[PARSER]
    Name        docker
    Format      json
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L
    Time_Keep   Off

This is from the Docu:

Time_Keep | By default when a time key is recognized and parsed, the parser will drop the original time field. Enabling this option will make the parser to keep the original time field and it value in the log entry.

This got rid of the parse error. I will check now how this changed the logs in the index, but looks okay for me.

abrakhim on 6 Sep 2018

👍1

@abrakhim thanks for the update. This sounds good. I'll check this next week in my own configuration. Thanks!

marckamerbeek on 7 Sep 2018

Would it be possible, to implement a Time_Keep option in elasticsearch output plugin?

dbluxo on 30 Nov 2018

Shouldn't fluent-bit add items to json not just by appending them? 'time' field twice? smth went wrong.

Kami-no on 21 Jan 2019

Guys, Time_Keep only solves duplicated fields named time. And what about other duplicated fields like _BOOT_ID (get from journald)? IMO a new Record Modifier filter's rule could be implemented to do it (like _Duplicate_)

galindro on 11 Jul 2019

You can use the Elasticsearch "Merge_Log_Key" property so your unpacked 'log' content will be set under a new key, avoiding duplicate keys-case like this.