Vector: ECS log schema support

Created on 23 Apr 2020  路  3Comments  路  Source: timberio/vector

Hi Vector team, general question how can we add Elastic Common Schema for vector data before writing to elasticsearch.

data model logs processing remap idea approval enhancement

Most helpful comment

@raghu999 great question! Vector's schema assumptions are currently very simple. Common fields names can be controlled via the global log_schema options. Outside of that, your best bet is to use the rename_fields transform to match that schema for your data.

But I really like the idea of Vector defining a more explicit schema around all fields. Specifically, the fields added in transforms like ec2_metadata and geoip. All of that should be customizable in a global sense.

All 3 comments

@raghu999 great question! Vector's schema assumptions are currently very simple. Common fields names can be controlled via the global log_schema options. Outside of that, your best bet is to use the rename_fields transform to match that schema for your data.

But I really like the idea of Vector defining a more explicit schema around all fields. Specifically, the fields added in transforms like ec2_metadata and geoip. All of that should be customizable in a global sense.

Our current pipeline also tries to comply to ECS before writing data to elasticsearch.

Considering the following log message, our pipeline looks like this:

2020-13-10T10:01:23Z - 12345 - INFO - My.Namespace.Component || My log message

A first regex_parser stage will extract individual parts (raw) from the log message. After parsing, the LogEvent will look like this:

| Field | Value |
| ------------- | :-------------:|
| log_timestamp | 2020-13-10T10:01:23Z |
| log_thread_id | 12345 |
| log_level | INFO |
| log_logger | My.Namespace.Component |
| log_message | My log message |

We then use a combination of rename_fields and lua transforms (to parse the thread id and timestamp) to rename the fields according to ECS.

Our final LogEvent will look like this

| Field | Value |
| ------------- | :-------------:|
| @timestamp | 2020-13-10T10:01:23Z |
| process.thread.id | 12345 |
| log.level | INFO |
| log.logger | My.Namespace.Component |
| message | My log message |
| host.name | node01 |
| log.original | 2020-13-10T10:01:23Z - 12345 - INFO - My.Namespace.Component || My log message |

Hope that helps

Thanks, @oktal, that's helpful. We are actively outlining first-class support for schemas like ECS. We hope to get the initial versions out this quarter (#3910). It'll likely start with more control over field mapping at the source and sink level and then progress into formal support for the schemas.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

trK54Ylmz picture trK54Ylmz  路  3Comments

LucioFranco picture LucioFranco  路  3Comments

valyala picture valyala  路  3Comments

binarylogic picture binarylogic  路  3Comments

LucioFranco picture LucioFranco  路  3Comments