Logstash: Execute ES Ingest Node pipelines in Logstash

Created on 28 Feb 2019 · 6Comments · Source: elastic/logstash

Provide a Logstash filter that reads arbitrary ES ingest node pipeline definitions and applies them to Logstash events. To ensure that ES ingest node pipelines in Logstash produce the same results as in ES, the ES ingest node code will be hosted and executed inside Logstash.

Users would supply their ES ingest node pipeline definition(s) [multiple pipelines could be specified because the pipeline operator would be supported] in JSON format as a configuration option to the Logstash filter. E.g.:

{
  "set_and_lower": {
    "processors": [
      {
        "set": {
          "field": "my_field1",
          "value": "FOO BAR BAZ"
        }
      },
      {
        "lowercase": {
          "field": "my_field1",
          "target_field": "my_field2",
          "ignore_missing": false
        }
      },
    ]
  },
  "rename_hostname": {
    "processors": [
      {
        "rename": {
          "field": "hostname",
          "target_field": "host",
          "ignore_missing": true
        }
      }
    ]
  }
}

The Logstash filter would apply the ingest node processors defined above to all events in the pipeline.

As of the ES 6.7.0 release, all ingest node processors including user_agent and geoip would be supported with the single exception of the set_security_user processor which provides security functionality that is relevant only within the ES context. The current proposal is to override the set_security_user processor to be a no-op when run within LS.

Possible use cases

Enable users to off-load processor-intensive or high-latency enrichment pipelines from ES to LS.
If users have an existing ES ingest pipeline for which they have new requirements that cannot be fulfilled in ES (i.e. external lookup enrichment, multiple outputs), the existing pipeline could be moved to LS and the additional functionality added within the LS pipeline.

Plugin & Elasticsearch version compatibility strategy

Align the plugin versions with the respective Elasticsearch MAJOR.MINOR versions. For example, the ingest node filter plugin v7.2.x will only bundle ES 7.2.x binaries and therefore only work with ES 7.2.x. This presents a simple compatibility story for users.
This alignment is only for major and minor versions of the plugin, patch releases will be reserved for bugs and patches.
Release a new plugin version in conjunction with each minor and major stack release.

Source

danhermann

👍4 ❤2

Most helpful comment

@praseodym, the biggest advantages are probably that unlike the converter tool, there are currently no limitations on the ingest node pipelines that could be run in Logstash -- all processors including user_agent and geoip are supported. There are also no differences in behavior between the Logstash Ruby implementations and ES Java implementations of the various operators such as grok because it is the actual ES ingest node code that is running within Logstash. Additionally, some people may consider it an advantage that the ES ingest node pipeline could be run directly in Logstash without having to go through an intermediate conversion step.

danhermann on 2 Mar 2019

👍4 ❤2

All 6 comments

What advantages would this provide over the existing ingest node pipeline converter tool?

praseodym on 2 Mar 2019

danhermann on 2 Mar 2019

👍4 ❤2

@danhermann @jsvd @jakelandis updated the original issue with the plugin & ES compatibility strategy that we decided on today. I've also removed the open question around dependencies as it's now been resolved.

acchen97 on 13 Mar 2019

@praseodym the ingest-converter tool was last updated 2 years ago as I can see. Also it has some bugs (try for example to convert kafka ingest pipeline -https://github.com/elastic/beats/blob/master/filebeat/module/kafka/log/ingest/pipeline.json).
On the other hand, maintaining 2 different implementations (java and ruby) for the same operators could lead to different behaviours.
Last, the effort to migrate the filebeat logstash pipelines filters between major versions of elastic stack is very cumbersome (take for example the breaking changes between 6.x and 7.x regarding filebeat renamed fields - https://www.elastic.co/guide/en/beats/libbeat/7.0/breaking-changes-7.0.html#_field_name_changes).

adriananeci on 19 Apr 2019

+1 for off-loading workload to Logstash. The reason we have a Logstash cluster, is to move data-processing work away from Elasticsearch.

As a user, it's awesome to have default out-of-the-box data-processing of every module at hand, that I don't have to maintain. \
If I were to use the conversion tool, I would have to check for differences in the module Ingest code, EVERY update. And since it has a lot of limitations...It gets out of hand very quickly.

\
One option I would love to see from this Filter, is to specify a pipeline to be read and processed. From here, I could further enhance the result of the Ingest Pipeline. E.g., if I were to use the Netflow Filebeat-module:

input {
  beats {
    port => 5000
  }
}

filter {
  # Read and process the built-in Ingest Pipeline for this module
  # (The pipeline that comes from:
  # "filebeat setup --pipelines --modules netflow ...")
  pipeline {
    name => "%{[@metadata][pipeline]}"
  }

  # Lookup the tcp_flags in the YML file,
  # based on fields from the Ingest Pipeline
  translate {
    dictionary_path => "/etc/logstash/patterns/tcp_flags.yml"
    field => "[netflow][tcp_control_bits]"
    destination => "[netflow][tcp_flags]"
  }
}

output {
  elasticsearch {
    hosts => ["https://elasticsearch:9200/"]
    user => "user"
    password => "password"
    ssl => true
    index => "my_index"
  }
}

%{[@metadata][pipeline]} could be stored in Elasticsearch (as it is now), but read and processed by Logstash.

I have no idea how portable the "Ingest Node"-code is (from Elasticsearch), but making Logstash nodes eligible as Ingest Nodes too, would remove the need to maintain two different implementations of the same thing (Ruby vs. Java). \
This would also remove possible bugs and behaviour differences between the two implementations.

magnuslarsen on 25 Jul 2019

This is even more important given that beats are starting to do more processing via an elastic ingest node out of the box. This means that when a beat forwards to logstash, logstash doesn't actually have the final copy of the message as the ingest node will transform the message further.

In the past this wasn't the case and it was trivial to have logstash forward the message to multiple destinations (ES, Kafka, File, Console) and each would end up with the exact same data.

Unfortunately not because ES is doing the transform we cannot use Logstash itself to forward the incoming parsed and processed log to multiple destinations.

If we could have logstash perform the more complex transforms by running the exact ES pipeline that ES would have run that would solve our issue.