Graylog2-server: The JSON Extractor do not expand nested JSON *array* objects

Created on 15 May 2019  路  6Comments  路  Source: Graylog2/graylog2-server

The JSON Extractor do not expand nested JSON array objects

Expected Behavior


Splunk use multivalue fields (mv* SPL)
Graylog shoud do the same (vertical storage) or store a list/tuple objects (horizontal storage)

Current Behavior


Grayog store the nested array JSON object in a separate field (with parent name), without expansion.

Possible Solution


Splunk use multivalue fields (mv* SPL)
Graylog shoud do the same (vertical storage) or store a list/tuple object (horizontal storage)

Steps to Reproduce (for bugs)


Office 365 Management Activity logs contain a nested JSON array objects AffectedItems when

  • Workload = Exchange
  • Operation = SoftDelete

Sample json with a nested JSON array object:

{"AffectedItems": [
{Attachments=image001.jpg (1234b); image002.jpg (1234b); image003.jpg (1234b), Id=Rg, InternetMessageId=DB@outlook.com, ParentFolder={Id=Lg, Path=\Inbox}, Subject=Quotation}, {Attachments=image003.jpg (1234b); image004.jpg (1234b), Id=Rg, InternetMessageId=AM@outlook.com, ParentFolder={Id=Lg, Path=\Inbox}, Subject=Test}, {Id=RA, InternetMessageId=B@xxxxxxx.com, ParentFolder={Id=Lg, Path=\Inbox}, Subject=Notification}, {Id=Rg, InternetMessageId=V@outlook.com, ParentFolder={Id=Lg, Path=\Inbox}, Subject=Documents}, {Attachments=image001.jpg (1234b); image002.jpg (1234b); image003.jpg (1234b), Id=Rg, InternetMessageId=K@outlook.com, ParentFolder={Id=Lg, Path=\Inbox}, Subject=Works}]}

Context


Can not use graylog for Office365 Management Activity log.

Your Environment

  • Graylog Version: v3.0.2+1686930
  • Elasticsearch Version: 6.7.2
  • MongoDB Version: 4.0.9
  • Operating System: CentOS-7.6.1810
  • Browser version: N/A
elasticsearch feature processing triaged

Most helpful comment

I can agree more natively handling nested JSON would be fantastic.

All 6 comments

I can agree more natively handling nested JSON would be fantastic.

In case someone else lands here.. I've seen that if you attempt a second phase parse_json on the newly parsed nested field inside the pipeline this fails, which I believe is due to the first parse_json sweep removing the double quotes around the key value pairs, which only shows up in the pipeline debug output, the field value displayed in the UI is displayed with all the quotes and is in fact a nested field inside elasticsearch. KV function may work for you inside the pipeline as a workaround until there is some support to handle this in the pipeline function or flatten the structure like the old extractor.

I can agree more natively handling nested JSON would be fantastic.
Dear graylog team, has there been any progress on handling nested json inside the pipeline?

Any update on this?
The fact that the quotes are dropped when you run the first parse_json function seems like a bug worthy of fixing. I've been banging my head against nested json for too long now in graylog.

Dear Graylog team, are there any plans to add support for this feature into the pipeline extractor?

Hi Team
Even I would like to have this feature.

Was this page helpful?
0 / 5 - 0 ratings