Beats: Transition Beats to ECS

Created on 19 Oct 2018  路  14Comments  路  Source: elastic/beats

With 7.0 Beats will transition to ECS: https://github.com/elastic/ecs This meta issue is to track all changes needed in Beats. The list will be extended over time

Migration Strategy

The overall migration strategy is to add a alias layer to 7.x which is opt-in to be backward compatible with 6.x data if needed. For some of the core fields used in the Infra / Logging UI aliases are introduced in 6.x for the 7.x data.

6.x (6.6 / 6.7)

7.0

  • [x] Remove old fields which had a 1-1 mapping like beat.hostname
  • [x] Make agent.* overwritable for apm-server https://github.com/elastic/beats/pull/9952
  • [x] Make sure all alias from the migration contain the migrate: * flag

Fields changes

Libbeat adjustments

Beats processors

  • [x] Review all processors for necessary changes, and list changes required below
  • [x] add_cloud_metadata https://github.com/elastic/beats/pull/9265

    • [x] Remove nesting under meta.*, cloud.* should be at the top level.

    • [x] Rename to cloud.instance.id

    • [x] Rename to cloud.machine.type

    • [x] ECS doesn't have cloud.project_id or project.id. Should we address this in ECS or leave project_id as is? No

  • [x] add_docker_metadata https://github.com/elastic/beats/pull/9412
  • [x] Decide what to do with docker.container.labels as alias does not work here (object)

    • Decision is we don't migrate the labels, it's a breaking change.

  • [x] add_host_metadata (fields already in ECS schema, some additional fields like build or codename exist which is ok)
  • [x] add_locale https://github.com/elastic/beats/pull/9458
  • [x] add_process_metadata https://github.com/elastic/beats/pull/9949

    • the names seem to match very well, but some fields are missing from ECS. We should add them for 7.0.0 and make sure Beats is in sync. (additional fields are ignored for now)

  • [x] dns (no option default values to change)

Auditbeat

  • [x] Add missing ECS field defs used by Auditbeat elastic/beats#9318
  • [x] Review current Auditbeat GA modules for ECS compatibility elastic/beats#10111

    • [x] Perform minor changes outlined in this review

  • [x] Review new SecOps Auditbeat modules for ECS compatibility

Filebeat

  • [x] The redis input has a read_timestamp which should be changed to event.created elastic/beats#9924

Filebeat modules

  • [x] Rename fileset.name to event.dataset https://github.com/elastic/beats/pull/8879
  • [x] Rename fileset.module to event.module https://github.com/elastic/beats/pull/8879
  • [x] Convert source field to ECS https://github.com/elastic/beats/pull/8902
  • [x] Rename offset to log.offset https://github.com/elastic/beats/pull/8923
  • [x] Rename source_ecs to source elastic/beats#8983
  • [ ] How do we migrate lower case for http.request.method?
  • [x] Changes likely to affect multiple modules at once

    • [x] Output timestamp when Filebeat read an event to event.created, and not read_timestamp elastic/beats#10139

    • [x] Use [source|destination].address for the ambiguous address (prior to parsing an IP, socket, domain) everywhere elastic/beats#10141

    • [x] Transition HTTP size and timing metrics to use ECS fields elastic/beats#10188

    • [x] [Optional] Finish event duration migration: remove all old fields, mention them in ecs-migration as alias: false and with scale:, use the shared Ingest Node code, to reduce compilations. elastic/beats#10274

    • [x] Finish transition to ECS of the user_agent output: get rid of all the field renames in the pipelines. 10472, #10441

    • [x] Remove deprecated field url.hostname. #10469

Filebeat Module migrations

  • [x] apache2

    • [x] access elastic/beats#9245

    • [x] error elastic/beats#8963

  • [x] auditd: log elastic/beats#10192
  • [x] elasticsearch: audit, deprecation, gc, server, slowlog elastic/beats#9293
  • [x] haproxy: log elastic/beats#9117
  • [x] icinga: debug, main, startup elastic/beats#9294
  • [x] iis

    • [x] access elastic/beats#9084

    • [x] error elastic/beats#9955

  • [x] kafka: log elastic/beats#9297
  • [x] kibana: log elastic/beats#9301
  • [x] logstash: log, slowlog elastic/beats#9935
  • [x] mongodb: log elastic/beats#10009
  • [x] mysql: error, slowlog elastic/beats#10008
  • [x] nginx

    • [x] access elastic/beats#9081

    • [x] error elastic/beats#10007

  • [ ] osquery: result elastic/beats#10088
  • [x] postgresql: log elastic/beats#9308
  • [x] redis: log, slowlog elastic/beats#9315
  • [x] system:

    • [x] auth elastic/beats#9138

    • [x] syslog elastic/beats#9135

  • [x] Update Suricata vs Mike's spreadsheet elastic/beats#10006
  • [x] traefik: access elastic/beats#9005
  • [x] Revisit all modules doing int coercions in Grok, to see if we need to coerce using :long instead
    elastic/beats#9598
  • [x] Add service.type to modules: https://github.com/elastic/beats/pull/10042

Metricbeat modules

Packetbeat

Journalbeat

Heartbeat

Winlogbeat

  • [x] Any changes to Winlogbeat needed? #10333

Varia

  • [x] Temporary fix for dashboards elastic/beats#9031
  • [x] Populate ecs.version in all relevant places https://github.com/elastic/beats/pull/9284
  • [x] Finish the transition of user_agent parsing to ECS for all web access logs.
  • [x] Better representation of field aliases in the documentation elastic/beats#9269
  • Part 2 to improve alias representation in docs elastic/beats#9288 (can also happen later)

See also all issues tagged "ecs"

Others

Open questions:

  • Should we rename co.elastic.logs/fileset to co.elastic.logs/dataset for autodiscovery (@exekias )
  • Should we change the metricsets config option in Metricbeat?
  • Proposal by @ruflin Keep it for now as we keep also the field fileset and metricset around

Notes

  • The code side is not changed as part of this migration.
  • The filebeat generated files must often be updated. Use the following to commands: INTEGRATION_TESTS=1 GENERATE=1 nosetests tests/system/test_modules.py -v, x-pack: MODULES_PATH=./module INTEGRATION_TESTS=1 GENERATE=1 nosetests tests/system/test_xpack_modules.py -v.
Integrations ecs meta v7.0.0-beta1

Most helpful comment

Closing this issue as all the checkboxes have been done except the following 3:

  • http method lower case: We will figure out something at a later stage
  • osquery module: According to @webmat conversion to ECS does not make sense
  • Breaking changes files: PR is open and will be merged soonish.

A big thank you to everyone that contributed to getting this massive effort done.

All 14 comments

@ruflin

  • I've created a section listing all modules. We can expand modules to one line per fileset only when needed. I get the feeling we'll end up doing some modules with 1 PR for the whole module, rather than per fileset.
  • I've also checked the tasks that were listed and have been merged (e.g. beat.name). If any of those required follow-up work, please add subtasks.

@ruflin about fileset -> dataset. This relies in Filebeat docs, they name these fileset: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-modules-overview.html

In my opinion that makes sense, as we are talking about files. They will generate datasets, and that's correct too, but as as long as we name this fileset in Filebeat docs I think the annotation should keep that nomenclature.

@exekias We must change it in the docs too. Would this solve the issue?

If we completely rename the thing, I would say yes, annotations must follow

@ruflin I've updated the "Beats processors" section. The list and fields to change should be pretty comprehensive now. Please take a look, to confirm I haven't missed something.

cc @roncohen

@ruflin Just added this to the "Field changes" section. I think this would be best solved by moving ECS docs to asciidoc on the doc website:

  • Some ECS field definitions casually refer to other ECS Readme sections in the Beats docs. We need to address this better

For our UI ML Module automated testing, we do the following:

  • restore a data snapshot
  • call the Kibana API data recogniser
  • call the Kibana API module setup - this creates ML jobs, datafeeds and visualisations and dashboards
  • the run the ML jobs
  • then check ML results are visible
  • then check you can click through to the newly created dashboards (this bit is manual)

We currently use

  • filebeat-* where fileset.module": "nginx" AND "fileset.name": "access"
  • filebeat-* where fileset.module": "apache2" AND "fileset.name": "access"
  • auditbeat-* where "event.type": "syscall" for docker containers and hosts (tbc if ECS changes will affect ML in 7.0)

As we start with data snapshots in our existing test framework, is the beats team able to supply snapshots of indices containing
a) pure new 7.0 ECS data and b) backward compatible indices?
With (a) being the priority.

With 7.0 you will be able for the above queries to just rely on event.dataset: nginx.access as an example. BTW we also renamed apache2 to apache to be in line with the metricbeat module.

I assume the data you are looking for is nginx and apache data for the logs. What I could produce is a few lines of example data based on our test suite logs. Would that be enough? Or you need larger logs? If you have larger log files for nginx and apache I can easily create the data.

@ruflin Can we please start with some example data snapshots?
(We do have larger logs, but I'm not sure if we retained them in their original raw format as they were anonymised. Will need to check and will share with you if we can).

Our tests logs can be found here:

I initially thought I provide you with a snapshot or es_archiver zip file from ES. But I think it's easier if the one that works on these files ingests the data himself. Like this also your apache files can be used and it does not have to go through me anymore.

To make the module work with any file path, var.paths must be adjusted in the module config: https://github.com/elastic/beats/blob/master/filebeat/modules.d/apache.yml.disabled

For testing use the snapshot builds:

@webmat Above I did check the checkbox around http.request.method to normalise it. I suggest we skip this for now.

@ruflin Understood. If I can get around to it in time would you have any objections, though?

Not 100% sure I can (e.g. if we don't have what we need in field generation), but I'd like to get it done if possible.

@webmat No objections :-)

Closing this issue as all the checkboxes have been done except the following 3:

  • http method lower case: We will figure out something at a later stage
  • osquery module: According to @webmat conversion to ECS does not make sense
  • Breaking changes files: PR is open and will be merged soonish.

A big thank you to everyone that contributed to getting this massive effort done.

Was this page helpful?
0 / 5 - 0 ratings