Beats: Filebeat modules: Always send message by default

Created on 22 Nov 2019  路  10Comments  路  Source: elastic/beats

Currently Kibana Logs UI needs a mechanism to rebuild the original message from events coming from Filebeat modules. This doesn't scale very well, as every time we add/update a new integration, changes need to happen on the Kibana side to support this.

For this reason, in order to provide a good experience, we can change the current behavior of modules to always send the original log line. This would mean:

Integrations Investigate breaking change discuss enhancement

All 10 comments

We also have log.original which contains the raw original log line, where message is the same without the initial timestamp. We need to come up with a solution that contemplates both

Prior related issues for completeness sake: #8950, #8083

I've done some testing, sending the NASA access logs Jul 95 and tweaking apache pipeline to allow for keeping message and log.original fields. In this example log.original is an exact copy of message, as timestamp is not located at the beginning:

filebeat.inputs:
  # Don't keep anything
  - type: log
    paths:
      - ./bench/NASA_access_log_Jul95.1
    index: filebeat-8.0.0-keep_nothing

  - type: log
    paths:
      - ./bench/NASA_access_log_Jul95.2
    fields:
      keep_message: true
    fields_under_root: true
    index: filebeat-8.0.0-keep_message

  - type: log
    paths:
      - ./bench/NASA_access_log_Jul95.3
    fields:
      keep_message: true
      keep_original: true
    fields_under_root: true
    index: filebeat-8.0.0-keep_all

setup.ilm.enabled: false
filebeat.overwrite_pipelines: true

setup.template.settings:
  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false

output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["localhost:9200"]
  pipeline: filebeat-8.0.0-apache-access-default

After force merge:

yellow open filebeat-8.0.0-keep_all     dGQKyg4TQROa9GvSHlxYhA 1 1 1891714 0 763.5mb 763.5mb
yellow open filebeat-8.0.0-keep_nothing NJinYOw9RYip_bb0nHc_Iw 1 1 1891714 0 310.1mb 310.1mb
yellow open filebeat-8.0.0-keep_message -iL4OMRrTyaO8x9iUmV3QQ 1 1 1891714 0 406.5mb 406.5mb

It sounds to me that we should try to do some effort to store only one of log.original or message if possible.

@weltenwort have we considered in the past using log.original + perhaps an offset field that tells you were timestamp ends and the message starts? I'm talking about the general case, taking out the ones where message is actually inside some JSON field or similar.

Noting also a previous conversation about this very same thing: https://github.com/elastic/beats/pull/8448

Seems like we also have a field event.original for storing original log message hmm

Ignoring the redundancy of log.original and event.original for now, for log viewing both an indexed version of message and the original version without the timestamp would be valuable. Using that combination the log entry can be searched for but also displayed correctly.

The full original would be valuable for lossless reindexing, which we don't have a UI for but probably want to have at some point.

using log.original + perhaps an offset field that tells you were timestamp ends and the message starts

@exekias that's an interesting idea, which I haven't heard mentioned before. It should't be a problem from the UI perspective, but I wonder how we would reasonably integrate that into ECS.

Just checked the status of existing filesets. 35 out of 91 are reporting a message field, many of them don't contain the original message, but a subset of it:

for f in $(find filebeat/module/*/*/test/*-expected* x-pack/filebeat/module/*/*/test/*-expected*); do grep \"message $f > /dev/null && echo $f | sed -e "s/x-pack\///" | cut -d/ -f3,4; done  | uniq 
apache/error
auditd/log
elasticsearch/audit
elasticsearch/deprecation
elasticsearch/gc
elasticsearch/server
elasticsearch/slowlog
icinga/debug
icinga/main
icinga/startup
kafka/log
kibana/log
logstash/log
logstash/slowlog
mongodb/log
mysql/error
nats/log
nginx/error
postgresql/log
redis/log
system/auth
system/syslog
activemq/audit
activemq/log
azure/signinlogs
cef/log
cisco/ftd
cisco/ios
coredns/log
envoyproxy/log
ibmmq/errorlog
misp/threat
mssql/log
rabbitmq/log
suricata/eve

One possible option to avoid big breaking changes is:

  • 7.x: Always report log.original in all filesets
  • Also: Stop adding message field to new filesets, as log.original should be enough.
  • 7.x: Come up with a plan to use it from the UI (instead of message). For instance, show the first match in this list:

    • UI can compile the message from structured data (only existing modules implemented in the UI)

    • message field

    • log.original with maybe some stripping rules to remove the date

  • 8.0: remove message field from filesets, UI should fallback to log.original

Thoughts?

Also pinging @urso @ruflin

UI can compile the message from structured data (only existing modules implemented in the UI)

This is what the Logs UI already does and it seems to be very unintuitive for our users. It also doesn't scale and makes search/highlighting very complicated.

8.0: remove message field from filesets, UI should fallback to log.original

As I wrote before, the UI wants to display the message without the timestamp, which is handled separately. As such being able to rely on a sensible message content would be better and make the search experience more predictable.

Ok, I had a good chat with @weltenwort, we discussed some things:

  • We can forget about log.original for now because we don't really need to index it for log reindexing. The message field needs to be searchable by users
  • We could think of all kind of tricks to show a message from somewhere else (including log.original or the structured fields), but it all boils down to one problem: If we introduce this kind of magic users will be confused when they try to search over the message they are seeing.

So I think the safe play here is the simplest (as usual):

  • 7.7: Start sending message field for all modules, make it mandatory for the new ones
  • At some point (the UI can rely on it and avoid using its heuristics to reconstruct the message)
Was this page helpful?
0 / 5 - 0 ratings