Loki: Filtering out specific JSON keys from logs

Created on 12 Sep 2019  ·  19Comments  ·  Source: grafana/loki

I have a Kubernetes application that produces JSON-formatted logs. One of the keys makes sense only in CLI environment — it's the same log message in the text format, with ANSI escape sequences for terminal colors.

I'd like to filter out this field while submitting logs to Loki using promtail. I have found how to extract specific JSON keys in promtail, but not how to _filter out_ specific keys.

I have considered removing this field, and reconstructing the formatting later using a different CLI tool, but the conversion to JSON is lossy, so I can't do it.

As a data point, Filebeat has this functionality in drop_fields processor.

componenagent good first issue help wanted keepalive kinfeature

Most helpful comment

I think we could add support for jq in the json stage, by default it will stay jmespath.

- json:
    parser: jq
    expressions:
      output:    del(.time)
- output:
    source: output

/cc @slim-bean WDYT ? the above example remove the key time at the root level of the json received.

All 19 comments

Hello are you currently using promtail ?

If you want to remove json content from the log line you can already parse and change the root object using http://jmespath.org/ via our pipeline configuration https://github.com/grafana/loki/blob/master/docs/logentry/processing-log-lines.md#example-without-source-1

Hello are you currently using promtail ?

Yes, and suffering from this key that ends up in Grafana and makes log lines unreadable in the UI.

If you want to remove json content from the log line you can already parse and change the root object using http://jmespath.org/

Of course I have tried that.

However JMESPath language does not have a map/filter function over objects, only over arrays, so I cannot figure out how to filter out a known key from an object.

That's why I wrote

I have found how to extract specific JSON keys in promtail, but not how to filter out specific keys

in the original bug description.

I think we could add support for jq in the json stage, by default it will stay jmespath.

- json:
    parser: jq
    expressions:
      output:    del(.time)
- output:
    source: output

/cc @slim-bean WDYT ? the above example remove the key time at the root level of the json received.

I like it @cyriltovena 👍

Note there is no jq golang implementation :/ .

So I just tried to fork jmespath to hack around the language and I was able to implement a del function. It is very simple https://github.com/jmespath/go-jmespath/commit/0ccdb0503953fed1b67d8724073becd4a1bd12aa .

Given:

{
    "bar": {},
    "deleteme": "ok",
    "foo": [1, 2, 3, 4]
}

With the expression: del(@,'deleteme','foo')

Result:

{
    "bar": {},
}

It basically returns the selected object minus the variadic list of field.

I'm wondering if that is enough or if we should try to find another way to implement it using like a search e.g del(@,foo.[?name='bar']). I'm open to suggestion.

If I do understand correctly, the issue is about being able to manipulate a JSON log message while keeping it in the JSON encoded format, so that the log "message" pushed to Loki is still a JSON.

_The following assumes my previous sentence is correct._

Filebeat has this functionality in drop_fields processor.

To my understanding, Filebeat drop_fields covers a different use case. In Filebeat all fields are pushed to ElasticSearch, so you need a way to remove fields from decoded JSON. In Loki, only the log message (see output stage) and labels (see labels stage) are pushed to Loki, while all intermediate extracted data is discarded at the end of the pipeline execution.

To keep it simple:

  • Filebeat: you decode the JSON, all decoded fields are implicitly pushed to ElasticSearch, so you do remove specific fields with drop_fields
  • Loki: you decode the JSON, all decoded fields are in the intermediate extracted data, so you cherry pick which fields should be labels (labels stage) and which one should be the log entry (output stage)

So I just tried to fork jmespath to hack around the language and I was able to implement a del function.

Good job! On the UX perspective, however, this may looks a bit complex to people not used to JMESPath. Having an expressive way to achieve it (JMESPath) is good, but I'm wondering if we should offer a more intuitive way to do simple JSON manipulation (like dropping fields). A couple of alternative ideas:

  1. A json_transform stage
  2. A json_encode stage which re-encode into JSON specific fields from the extracted data, but this may be tricky if we want to guarantee lossless (data types)

_P.S. The current json stage name is a bit unlucky cause it actually does "json decoding"._

del(@,'deleteme','foo')

This syntax solves my immediate problem.

A richer filter, e.g. being able to say something similar to @.[?name!='bar'], might be occasionally useful, but does not add much.

you decode the JSON, all decoded fields are in the intermediate extracted data

You actually have to choose what goes into the extracted map.

We can definitively run another stage while this seems more elegant it is less efficient as we would probably decode the json twice. Unless this is taken into consideration when implementing this we could share the map[string]{} across all json stages, feels hacky though.

I also feel like most of the time when you want to delete json property you probably also want to select some labels.

I'm not sure if a new stage would be a better fit, can we get more feedback here @slim-bean @rfratto @joe-elliott WDYT ?

I see 3 options:

jmespath extensions

- json:
    expressions:
      output:    del(@,'time','foo')
      level: level
- output:
    source: output
- labels:
     level:

new stages

- json_transform:
     drop_fields: time,foo
- json:
    expressions:
      level: level
- labels:
     level:

new stage property

- json:
    drop_fields: time,foo
    expressions:
      level: level
      ouput: @
- output:
    source: output
- labels:
     level:

I feel like the last option is the nicest, though it is less flexible than the first one. (you could do del(.nested,'foo') to project only the nested value.) Big question is do we need this flexibility ?

Started learning golang to practice I am Interested in contributing to Loki, looking for my first issue to solve. This one looks interesting and useful for many people, Can I work on this?

Excellent, please go ahead! I would recommend you to let us a know how you plan to design this feature first based on what Marco and I suggested.

new stage property

- json:
    drop_fields: time,foo
    expressions:
      level: level
      ouput: @
- output:
    source: output
- labels:
     level:

I like this one, new stage property way looks good, I will try to implement that. What's your opinion ? Shall I research on this ? I am currently reading the source code to get the idea of the pipelines.

I think that works yeah! Let us know if you need help, we’re on the grafana slack.

@thedeveloperr - Any progress done with this?
Any help with this issue will be highly appreciated.

I will be working on this over the weekends. Sorry for delay. I am new to
Golang so learning it along the way.

On Tue, 18 Feb, 2020, 12:24 PM avii-tectonic, notifications@github.com
wrote:

@thedeveloperr https://github.com/thedeveloperr - Any progress done
with this?
Any help with this issue will be highly appreciated.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/grafana/loki/issues/1011?email_source=notifications&email_token=AFTAFNDZKKABMRGWFUIZKQ3RDOAZJA5CNFSM4IWF4GG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMA2CZI#issuecomment-587309413,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AFTAFNC4JSTYTY2YACYVOILRDOAZJANCNFSM4IWF4GGQ
.

_It will be really useful feature, waiting for it_

Is this still in the works? I'm able to assist

Finally here in 2.0

Was this page helpful?
0 / 5 - 0 ratings

Related issues

suppix picture suppix  ·  3Comments

cyriltovena picture cyriltovena  ·  4Comments

ghostsquad picture ghostsquad  ·  3Comments

kylos101 picture kylos101  ·  4Comments

pandey-adarsh147 picture pandey-adarsh147  ·  4Comments