Beats: FileBeat: decode_json_fields processor max_depth option not working

Created on 13 Jul 2020  路  4Comments  路  Source: elastic/beats

To prevent creating tons of document fields in an Elasticsearch log index I want to control nested JSON parsing depth.

Related discussion post: https://discuss.elastic.co/t/filebeat-decode-json-fields-processor-max-depth-option-not-working/240948

Filebeat version 7.8.0 (also tested on 6.8.10 and result is the same)

/tmp/filebeat.conf:

filebeat.inputs:
- type: log
  paths:
    - /tmp/filebeat.input

processors:
  - decode_json_fields:
      fields: ["message"]
      max_depth: 1
      target: "parsed"

output.console:
  pretty: true

/tmp/filebeat.input:

{"top": "top_value", "top_obj": {"level_1": "level_1_value", "level_1_obj": {"level_2": "level_2_value", "level_2_obj": {"level_3": "level_3_value"}}}}

Command:

filebeat  -e -c /tmp/filebeat.conf

Result:

"parsed": {
  "top_obj": {
    "level_1_obj": {
      "level_2": "level_2_value",
      "level_2_obj": {
        "level_3": "level_3_value"
      }
    },
    "level_1": "level_1_value"
  },
  "top": "top_value"
}

Expected result:

"parsed": {
  "top_obj": {
    "level_1_obj": "{\"level_2\": \"level_2_value\", \"level_2_obj\": {\"level_3\": \"level_3_value\"}}",
    "level_1": "level_1_value"
  },
  "top": "top_value"
}
Integrations Investigate bug

Most helpful comment

I understood it exactly as @vitaliy-kravchenko. The max_depth option behaves as a limit to prevent mapping explosion.

All 4 comments

It works properly. It just doesn't do what you're expecting.

Your json input doesn't have nested json. If you parse it in the browser with JSON.parse, you'll find the following:
image

To get your desired effect, your level_1_obj value itself will have to be stringified first

"level_1_obj":"{\"level_2\":\"level_2_value\",\"level_2_obj\":{\"level_3\":\"level_3_value\"}}"

What max_depth does is recursively trying to decode the underlying fields until the max_depth is hit. So if you set it to 2, it will still be able to decode "level_1_obj":"{\"level_2\":\"level_2_value\",\"level_2_obj\":{\"level_3\":\"level_3_value\"}}"

Anyway, the documentation is not clear enough for me. And I suppose not only for me but for many other users.
The max_depth option behaves more like a limit option to prevent stack overflow but not for parsing JSON to N level depth and leave all next levels as an unparsed string.
I implemented the functional with logstash + ruby plugin. And did all necessary parsing logic with the ruby script. Now I have only the first 2 levels as document fields in Elasticsearch indexes. All next subfields stored as a string value of the fields.

I understood it exactly as @vitaliy-kravchenko. The max_depth option behaves as a limit to prevent mapping explosion.

I have tons of respect for Filebeat and I use it in multiple projects as a collector but I just spent 3 days trying to debug this until I found this issue and I agree it looks like the documentation is not clear at all about this. While we're at it, I'm not sure expecting message to be stringfied so this can work properly is reasonable. I've never seen logs that are like that. Right now I'm trying to fix this problem by doing some Elasticsearch's ingest pipeline trickery but it's depressing as Filebeat is so much better than ES pipelines despite of this issue... 馃槩

Was this page helpful?
0 / 5 - 0 ratings