I have a Kubernetes application that produces JSON-formatted logs. One of the keys makes sense only in CLI environment — it's the same log message in the text format, with ANSI escape sequences for terminal colors.
I'd like to filter out this field while submitting logs to Loki using promtail. I have found how to extract specific JSON keys in promtail, but not how to _filter out_ specific keys.
I have considered removing this field, and reconstructing the formatting later using a different CLI tool, but the conversion to JSON is lossy, so I can't do it.
As a data point, Filebeat has this functionality in drop_fields processor.
Hello are you currently using promtail ?
If you want to remove json content from the log line you can already parse and change the root object using http://jmespath.org/ via our pipeline configuration https://github.com/grafana/loki/blob/master/docs/logentry/processing-log-lines.md#example-without-source-1
Hello are you currently using promtail ?
Yes, and suffering from this key that ends up in Grafana and makes log lines unreadable in the UI.
If you want to remove json content from the log line you can already parse and change the root object using http://jmespath.org/
Of course I have tried that.
However JMESPath language does not have a map/filter function over objects, only over arrays, so I cannot figure out how to filter out a known key from an object.
That's why I wrote
I have found how to extract specific JSON keys in promtail, but not how to filter out specific keys
in the original bug description.
I think we could add support for jq in the json stage, by default it will stay jmespath.
- json:
parser: jq
expressions:
output: del(.time)
- output:
source: output
/cc @slim-bean WDYT ? the above example remove the key time at the root level of the json received.
I like it @cyriltovena 👍
Note there is no jq golang implementation :/ .
So I just tried to fork jmespath to hack around the language and I was able to implement a del function. It is very simple https://github.com/jmespath/go-jmespath/commit/0ccdb0503953fed1b67d8724073becd4a1bd12aa .
Given:
{
"bar": {},
"deleteme": "ok",
"foo": [1, 2, 3, 4]
}
With the expression: del(@,'deleteme','foo')
Result:
{
"bar": {},
}
It basically returns the selected object minus the variadic list of field.
I'm wondering if that is enough or if we should try to find another way to implement it using like a search e.g del(@,foo.[?name='bar']). I'm open to suggestion.
If I do understand correctly, the issue is about being able to manipulate a JSON log message while keeping it in the JSON encoded format, so that the log "message" pushed to Loki is still a JSON.
_The following assumes my previous sentence is correct._
Filebeat has this functionality in
drop_fieldsprocessor.
To my understanding, Filebeat drop_fields covers a different use case. In Filebeat all fields are pushed to ElasticSearch, so you need a way to remove fields from decoded JSON. In Loki, only the log message (see output stage) and labels (see labels stage) are pushed to Loki, while all intermediate extracted data is discarded at the end of the pipeline execution.
To keep it simple:
drop_fieldslabels stage) and which one should be the log entry (output stage)So I just tried to fork jmespath to hack around the language and I was able to implement a del function.
Good job! On the UX perspective, however, this may looks a bit complex to people not used to JMESPath. Having an expressive way to achieve it (JMESPath) is good, but I'm wondering if we should offer a more intuitive way to do simple JSON manipulation (like dropping fields). A couple of alternative ideas:
json_transform stagejson_encode stage which re-encode into JSON specific fields from the extracted data, but this may be tricky if we want to guarantee lossless (data types)_P.S. The current json stage name is a bit unlucky cause it actually does "json decoding"._
del(@,'deleteme','foo')
This syntax solves my immediate problem.
A richer filter, e.g. being able to say something similar to @.[?name!='bar'], might be occasionally useful, but does not add much.
you decode the JSON, all decoded fields are in the intermediate extracted data
You actually have to choose what goes into the extracted map.
We can definitively run another stage while this seems more elegant it is less efficient as we would probably decode the json twice. Unless this is taken into consideration when implementing this we could share the map[string]{} across all json stages, feels hacky though.
I also feel like most of the time when you want to delete json property you probably also want to select some labels.
I'm not sure if a new stage would be a better fit, can we get more feedback here @slim-bean @rfratto @joe-elliott WDYT ?
I see 3 options:
- json:
expressions:
output: del(@,'time','foo')
level: level
- output:
source: output
- labels:
level:
- json_transform:
drop_fields: time,foo
- json:
expressions:
level: level
- labels:
level:
- json:
drop_fields: time,foo
expressions:
level: level
ouput: @
- output:
source: output
- labels:
level:
I feel like the last option is the nicest, though it is less flexible than the first one. (you could do del(.nested,'foo') to project only the nested value.) Big question is do we need this flexibility ?
Started learning golang to practice I am Interested in contributing to Loki, looking for my first issue to solve. This one looks interesting and useful for many people, Can I work on this?
Excellent, please go ahead! I would recommend you to let us a know how you plan to design this feature first based on what Marco and I suggested.
new stage property
- json: drop_fields: time,foo expressions: level: level ouput: @ - output: source: output - labels: level:
I like this one, new stage property way looks good, I will try to implement that. What's your opinion ? Shall I research on this ? I am currently reading the source code to get the idea of the pipelines.
I think that works yeah! Let us know if you need help, we’re on the grafana slack.
@thedeveloperr - Any progress done with this?
Any help with this issue will be highly appreciated.
I will be working on this over the weekends. Sorry for delay. I am new to
Golang so learning it along the way.
On Tue, 18 Feb, 2020, 12:24 PM avii-tectonic, notifications@github.com
wrote:
@thedeveloperr https://github.com/thedeveloperr - Any progress done
with this?
Any help with this issue will be highly appreciated.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/grafana/loki/issues/1011?email_source=notifications&email_token=AFTAFNDZKKABMRGWFUIZKQ3RDOAZJA5CNFSM4IWF4GG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMA2CZI#issuecomment-587309413,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AFTAFNC4JSTYTY2YACYVOILRDOAZJANCNFSM4IWF4GGQ
.
_It will be really useful feature, waiting for it_
Is this still in the works? I'm able to assist
https://twitter.com/theperiklis/status/1311942934837764096?s=20
it will land by the end of the month.
Finally here in 2.0
Most helpful comment
I think we could add support for
jqin the json stage, by default it will stayjmespath./cc @slim-bean WDYT ? the above example remove the key
timeat the root level of the json received.