Currently Kibana Logs UI needs a mechanism to rebuild the original message from events coming from Filebeat modules. This doesn't scale very well, as every time we add/update a new integration, changes need to happen on the Kibana side to support this.
For this reason, in order to provide a good experience, we can change the current behavior of modules to always send the original log line. This would mean:
Add a new keep_message setting to all modules (default: false). If user configures it to true, do not drop the original message like we do today: https://github.com/elastic/beats/blob/1db397d2dc3d1b49a1923d219fb84843b8b0186d/filebeat/module/nginx/access/ingest/default.json#L77-L80
In 8.0 switch the default to true
We also have log.original which contains the raw original log line, where message is the same without the initial timestamp. We need to come up with a solution that contemplates both
Prior related issues for completeness sake: #8950, #8083
I've done some testing, sending the NASA access logs Jul 95 and tweaking apache pipeline to allow for keeping message and log.original fields. In this example log.original is an exact copy of message, as timestamp is not located at the beginning:
filebeat.inputs:
# Don't keep anything
- type: log
paths:
- ./bench/NASA_access_log_Jul95.1
index: filebeat-8.0.0-keep_nothing
- type: log
paths:
- ./bench/NASA_access_log_Jul95.2
fields:
keep_message: true
fields_under_root: true
index: filebeat-8.0.0-keep_message
- type: log
paths:
- ./bench/NASA_access_log_Jul95.3
fields:
keep_message: true
keep_original: true
fields_under_root: true
index: filebeat-8.0.0-keep_all
setup.ilm.enabled: false
filebeat.overwrite_pipelines: true
setup.template.settings:
index.number_of_shards: 1
#index.codec: best_compression
#_source.enabled: false
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["localhost:9200"]
pipeline: filebeat-8.0.0-apache-access-default
After force merge:
yellow open filebeat-8.0.0-keep_all dGQKyg4TQROa9GvSHlxYhA 1 1 1891714 0 763.5mb 763.5mb
yellow open filebeat-8.0.0-keep_nothing NJinYOw9RYip_bb0nHc_Iw 1 1 1891714 0 310.1mb 310.1mb
yellow open filebeat-8.0.0-keep_message -iL4OMRrTyaO8x9iUmV3QQ 1 1 1891714 0 406.5mb 406.5mb
It sounds to me that we should try to do some effort to store only one of log.original or message if possible.
@weltenwort have we considered in the past using log.original + perhaps an offset field that tells you were timestamp ends and the message starts? I'm talking about the general case, taking out the ones where message is actually inside some JSON field or similar.
Noting also a previous conversation about this very same thing: https://github.com/elastic/beats/pull/8448
Seems like we also have a field event.original for storing original log message hmm
Ignoring the redundancy of log.original and event.original for now, for log viewing both an indexed version of message and the original version without the timestamp would be valuable. Using that combination the log entry can be searched for but also displayed correctly.
The full original would be valuable for lossless reindexing, which we don't have a UI for but probably want to have at some point.
using log.original + perhaps an offset field that tells you were timestamp ends and the message starts
@exekias that's an interesting idea, which I haven't heard mentioned before. It should't be a problem from the UI perspective, but I wonder how we would reasonably integrate that into ECS.
Just checked the status of existing filesets. 35 out of 91 are reporting a message field, many of them don't contain the original message, but a subset of it:
for f in $(find filebeat/module/*/*/test/*-expected* x-pack/filebeat/module/*/*/test/*-expected*); do grep \"message $f > /dev/null && echo $f | sed -e "s/x-pack\///" | cut -d/ -f3,4; done | uniq
apache/error
auditd/log
elasticsearch/audit
elasticsearch/deprecation
elasticsearch/gc
elasticsearch/server
elasticsearch/slowlog
icinga/debug
icinga/main
icinga/startup
kafka/log
kibana/log
logstash/log
logstash/slowlog
mongodb/log
mysql/error
nats/log
nginx/error
postgresql/log
redis/log
system/auth
system/syslog
activemq/audit
activemq/log
azure/signinlogs
cef/log
cisco/ftd
cisco/ios
coredns/log
envoyproxy/log
ibmmq/errorlog
misp/threat
mssql/log
rabbitmq/log
suricata/eve
One possible option to avoid big breaking changes is:
log.original in all filesetsmessage field to new filesets, as log.original should be enough.message). For instance, show the first match in this list:message fieldlog.original with maybe some stripping rules to remove the datemessage field from filesets, UI should fallback to log.originalThoughts?
Also pinging @urso @ruflin
UI can compile the message from structured data (only existing modules implemented in the UI)
This is what the Logs UI already does and it seems to be very unintuitive for our users. It also doesn't scale and makes search/highlighting very complicated.
8.0: remove message field from filesets, UI should fallback to log.original
As I wrote before, the UI wants to display the message without the timestamp, which is handled separately. As such being able to rely on a sensible message content would be better and make the search experience more predictable.
Ok, I had a good chat with @weltenwort, we discussed some things:
log.original for now because we don't really need to index it for log reindexing. The message field needs to be searchable by userslog.original or the structured fields), but it all boils down to one problem: If we introduce this kind of magic users will be confused when they try to search over the message they are seeing.So I think the safe play here is the simplest (as usual):
message field for all modules, make it mandatory for the new onesmessage)