To prevent creating tons of document fields in an Elasticsearch log index I want to control nested JSON parsing depth.
Related discussion post: https://discuss.elastic.co/t/filebeat-decode-json-fields-processor-max-depth-option-not-working/240948
Filebeat version 7.8.0 (also tested on 6.8.10 and result is the same)
/tmp/filebeat.conf:
filebeat.inputs:
- type: log
paths:
- /tmp/filebeat.input
processors:
- decode_json_fields:
fields: ["message"]
max_depth: 1
target: "parsed"
output.console:
pretty: true
/tmp/filebeat.input:
{"top": "top_value", "top_obj": {"level_1": "level_1_value", "level_1_obj": {"level_2": "level_2_value", "level_2_obj": {"level_3": "level_3_value"}}}}
Command:
filebeat -e -c /tmp/filebeat.conf
Result:
"parsed": {
"top_obj": {
"level_1_obj": {
"level_2": "level_2_value",
"level_2_obj": {
"level_3": "level_3_value"
}
},
"level_1": "level_1_value"
},
"top": "top_value"
}
Expected result:
"parsed": {
"top_obj": {
"level_1_obj": "{\"level_2\": \"level_2_value\", \"level_2_obj\": {\"level_3\": \"level_3_value\"}}",
"level_1": "level_1_value"
},
"top": "top_value"
}
It works properly. It just doesn't do what you're expecting.
Your json input doesn't have nested json. If you parse it in the browser with JSON.parse, you'll find the following:

To get your desired effect, your level_1_obj value itself will have to be stringified first
"level_1_obj":"{\"level_2\":\"level_2_value\",\"level_2_obj\":{\"level_3\":\"level_3_value\"}}"
What max_depth does is recursively trying to decode the underlying fields until the max_depth is hit. So if you set it to 2, it will still be able to decode "level_1_obj":"{\"level_2\":\"level_2_value\",\"level_2_obj\":{\"level_3\":\"level_3_value\"}}"
Anyway, the documentation is not clear enough for me. And I suppose not only for me but for many other users.
The max_depth option behaves more like a limit option to prevent stack overflow but not for parsing JSON to N level depth and leave all next levels as an unparsed string.
I implemented the functional with logstash + ruby plugin. And did all necessary parsing logic with the ruby script. Now I have only the first 2 levels as document fields in Elasticsearch indexes. All next subfields stored as a string value of the fields.
I understood it exactly as @vitaliy-kravchenko. The max_depth option behaves as a limit to prevent mapping explosion.
I have tons of respect for Filebeat and I use it in multiple projects as a collector but I just spent 3 days trying to debug this until I found this issue and I agree it looks like the documentation is not clear at all about this. While we're at it, I'm not sure expecting message to be stringfied so this can work properly is reasonable. I've never seen logs that are like that. Right now I'm trying to fix this problem by doing some Elasticsearch's ingest pipeline trickery but it's depressing as Filebeat is so much better than ES pipelines despite of this issue... 馃槩
Most helpful comment
I understood it exactly as @vitaliy-kravchenko. The
max_depthoption behaves as a limit to prevent mapping explosion.