Logstash: Adopting ECS in Logstash

Created on 14 Nov 2019  路  8Comments  路  Source: elastic/logstash

The Elastic Common Schema or ECS was introduced by Elastic in February 2019, to facilitate both usage of data across multiple components of the Elastic Stack and to standardize external data source formats.

The schema, along with documentation and examples, is being maintained at elastic/ecs.

While products such as beats and kibana can either produce or consume data in ECS format, Logstash has yet to support it, even though there are plenty community requests for it, such as:

The adoption has been discussed in a few places before, and the current main ideas are to approach it on two fronts:

a) Add toggable "ecs_compatibility" behaviours to all plugins that produce fixed schemas, such as the geoip/useragent filters, and the http/tcp inputs.
b) Create a new filter that outputs ECS compatible events but allows the user to perform most of the wiring between the source schema and ECS

Another idea that was considered was having an elasticsearch template that adds ECS fields as aliases as it was done in elastic/beats. However for logstash the source schema is highly unpredictable which reduces the benefit of this tactic.

SIEM elastic-common-schema meta

Most helpful comment

Users landing on this issue can also look at more recent issues https://github.com/elastic/logstash/issues/11623 and https://github.com/elastic/logstash/issues/11635 馃檪

All 8 comments

Pinging @tsg @megatrontony @mchopda for awareness. This is targeting 7.x release. cc @jsvd

Ping @webmat ;-)

I was about to open the same issue 馃槃 Here are a few ideas.

Documentation

I've recently discussed with @karenzone about going over the docs, and I think we could do a few things at that level

  • replacing examples to be more in line with ECS. Could be as simple as changing the field name in the example.
  • Addressing predictable problems head on, perhaps have a Logstash docs page specifically about ECS, covering:

    • how to install the correct template



      • If users are getting a "field data" error, they probably forgot that



    • talking about the two multi-field conventions



      • ES and Logstash do:





        • text on the canonical field (e.g. myfield)



        • keyword on the multi field myfield.keyword





      • ECS went with the reverse convention, to be in line with Beats





        • keyword on the canonical field (e.g. myfield)



        • text on the multi field (e.g. myfield.text).



        • To be precise, ECS doesn't have any multi fields yet, but they are coming






    • mapping conflicts on fields host (Logstash), source (Beats 6) etc

    • Having more examples with nested fields, which are a requirement for ECS. E.g. showing groks with brackets for the nesting

    • Not sure if the docs talk about discuss.elastic.co. If it does, users should still create their posts in the Logstash section, but also consider applying the elastic-common-schema tag to it, if it's a question about ECS :-) (I just tagged 10 of them this way)

Template

Logstash could offer a flag (off by default, for bwc) to install the latest ECS template instead of the default Logstash template.

We also need to think how to name the resulting index, should it still be logstash-* or should we come up with a second convention?

We try to release ECS a few weeks before stack releases, which should leave time to update the Logstash ECS template. Note that the sample templates in the ECS repo are made for experimentation more than prod (see settings and index_patterns). So it shouldn't be used as is. But producing a template ready to use as-is for Logstash is easy. Happy to help with that.

grok patterns

We should not change the existing grok patterns, obviously.

But we should publish new ones for each grok that has field names in it. People are looking for them
Having ECS_SYSLOGBASE right besides SYSLOGBASE would solve the need, for example.

I did some analysis a while ago (use this to extract field names from groks), to figure out which groks had field names in them, and whether they would cause a mapping conflict. At the time I counted 800+ field names in the groks. So this will probably need to be done gradually :-)

plugins

Totally agree we should adjust some plugins to offer the option to output using the ECS field names. :+1:

I had laid down some thoughts about creating a "mass rename" plugin here logstash#9768, which would remove the need for long series of mutate/rename + type coercion & so on. This issue is no longer quite up to date, so if/when someone wants to start on this, please ping me :-)

call to action

Please feel free to:

  • loop me in on discussions around this, happy to join the appropriate meetings
  • ping me on any ECS-related issue

Really looking forward to this! 鉂わ笍

In the beats stack we have two processors: community_id & registered_domain which look for certain criteria and output the relevant ECS fields. Can we have this added to the logstash as well right now I do not see any way of duplicating these processors in logstash without extensive scripting knowledge and it's every beneficial to have especially the community_id field. I have multiple firewall logs being sent to my logstash for parsing which are being outputted to the correct ECS fields.

If we could perhaps use the same terminlogoy and say something like:
`processors:

  • community_id:`

Logstash would then spit out the community_id given the pre-requisite fields are in the correct ECS fields. Else we can simply define them:

processors:
  - community_id:
      fields:
        source_ip: my_source_ip
        source_port: my_source_port
        destination_ip: my_dest_ip
        destination_port: my_dest_port
        iana_number: my_iana_number
        transport: my_transport
        icmp_type: my_icmp_type
        icmp_code: my_icmp_code
      target: network.community_id

As well perhaps we can do the same for the source.bytes and destination.bytes, simplifying this as well would be great as at the moment it require scripting as well in-order to populate the network.bytes field.

@Aqualie Note that in the meantime, if your Logstash sends directly to Elasticsearch, you can configure your ES output to send to an Elasticsearch ingest pipeline. This will let you compute the community ID anyway, until a Logstash filter is created for this.

Ive had this issue with a upstream GROK pattern (RT_FLOW/JUNOS) making an field called "event" , Ive fixed the pattern ( in a fork) however how are we handling GROK patterns and ECS as its not easy toggle ECS to non ECS in a pattern.

im not sure if I should do a merge request or not as my fix was to put everything under a subfolder (ECS style) but that isnt what is done for non ECS patterns.

=>{"type"=>"mapper_parsing_exception", "reason"=>"object mapping for [event] tried to parse field [event] as object, but found a concrete value"}}}}

Users landing on this issue can also look at more recent issues https://github.com/elastic/logstash/issues/11623 and https://github.com/elastic/logstash/issues/11635 馃檪

Was this page helpful?
0 / 5 - 0 ratings