Beats: Document fields.yml

Created on 11 Jan 2018  路  19Comments  路  Source: elastic/beats

Starting with 6.0 we generate the ES template and Kibana index mapping at runtime right from fields.yml. We also allow the user to use another fields.yml for generating the template, but there is no actual documentation on the format and types supported. This makes is difficult for users to re-use modify fields.yml in order to have beats manage the templates. Typical use-cases users want to modify fields.yml: adding custom fields via fields setting, JSON events in filebeat, custom Ingest Node pipeline.

Syntax:

# fields are configured using YAML dictionaries with `name` and `type` at least

FIELD ::=
  name: <FIELD_NAME> 
  type: <TYPE>
  [format: <FORMAT>]
  description: <TEXT>
  [fields: <FIELD_LIST>]  # `type` must be "group" if field list is used.
  [ ... ]

FIELD_LIST ::=
  [- <FIELD>]+

FIELD_NAME ::= json compatible field name

# used to set the templates type for use with Elasticsearch
TYPE ::=
   ip   # ip address
 | scaled_float
 | half_float
 | integer
 | text
 | keyword
 | object
 | array
 | group   # use group to define additional fields
 | ...

# configure custom formatter for use in Kibana
FORMAT ::= ...
devguide docs

Most helpful comment

I may not have tagged it correctly (let me know if that's the case and I can get it right in future) as it's not showing up here, but I've opened a PR for this documentation: https://github.com/elastic/beats/pull/12505

All 19 comments

@urso I'm thinking about where this content should go in the docs.

Maybe we should have a topic called "Manage mappings" that appears under the configuration container (for example, under Configuring Filebeat). The container topic would provide more detail about the index template and mappings (we gloss over the details currently). We could put the existing topic about loading the index template under the new container and then add a new topic called something like "Add fields to the index template." So we'd have:

Configuring <beatname> ... Manage mappings Load the Elasticsearch index template Add fields to the index template

Is that what you have in mind?

I'm thinking for now to have this under the dev guide and later move it into the loading index template place or similar. Reasons is that in general at the moment we don't recommend to change fields.yml so this is more an expert usage of beats and if someone changes it, he should know that it could break things like dashboards.

I started some docs in the past but never completed it. One thing I focused on was more on the "why" there is a fields.yml. We need the details on the syntax and explaining it, but there should be also the explanation on why this exists, is used for modules etc.

Is this the preferred way of resolving situations as date field is not recognised as date type in ES so we'll force filebeat to inject the mappings first in order to resolve that?

@Reeebuuk Yes, but not sure if this related to this issue. For further questions best open a topic on discuss: https://discuss.elastic.co/c/beats

My team is testing the upgrade to Filebeat 6.3 from version 5.6.10. Our custom mapping template (defined via a JSON file) is no longer being used and it seems that the resulting mapping that ends up being created in our Elasticsearch cluster is derived from fields.yml. We are kind of stuck now, since we don't know how to enforce our custom mapping:

  • the JSON file is ignored
  • the fields.yml format is not documented, and I read here that it's even not recommended to change it (see the comment from @ruflin )

Are we missing something? What is the way to upgrade from Filebeat 5 to 6 preserving the mappings? How come this is not documented as a breaking change?

We reintroduce in 6.4 again that you can load directly from the json file. The option is called setup.template.json.enabled: https://www.elastic.co/guide/en/beats/filebeat/master/configuration-template.html. Unfortunately it didn't make it into 6.3. One option that is in 6.3 and could be useful for you is append_fields: https://www.elastic.co/guide/en/beats/filebeat/6.3/configuration-template.html

Can you share a bit more detail on the modifications on your template? Did you mainly add new fields or also modify existing ones? As indicate above with the append_fields naming in general I would only recommend to add fields and not modify existing ones.

If you have a json template file that works also in 6.x you can load it manually in ES. What does your index pattern look like?

Thanks @ruflin for the fast and detailed answer. We ended up using a nested field structure to isolate what ultimately are our custom application log format fields. This seems to be more in line with filebeat's general philosophy of isolating fields coming from different sources (e.g. apache, nginx, etc.) into their respective parent fields.
This solution also solved some conflicts we had with custom fields having the same name as fields that are now built-in in filebeat, e.g. host or event.
The only drawback of this restructuring is that all dashboard widgets and saved queries will need to be adapted to prepend the parent field's name as a prefix, but that can somehow be automated.

If you have a common prefix it sounds like the append_fields should work well for you.

Thanks @ruflin . We have now moved the mapping management on the ES side, as this yelds a more centralized approach that can help avoid a situation where two different filebeat clients are racing for overriding the same template. I am curious about the reasons behind the design choice of letting filebeat handle the mappings, I guess it's because this is closer to the data source so the component knows better than the server how the data should be mapped? Do you see any problem with moving this responsibility to the ES cluster?

One of the main reasons is versioning. Assuming ES would ship with the templates, it would probably only ship one version but normally the ES and Beats version are not in sync. Beats creates a template and index for each version to make sure new beats can make use of the new fields and in case of a bug we can fix it without having to overwrite templates. An other part is that Beats now best about the fields and it feels like out of scope for ES to know about other systems and keep up-to-date.

We are thinking about ideas on how to manage templates, dashboards etc. in a more centralised way for some time but haven't come to a conclusion yet. What we see in large scale deployments is that the template management is turned off in all beats and one central beat close to Elasticsearch is used to load the templates. Would that work for your use?

Thanks for all the details @ruflin

Would that work for your use?

Yes, I believe this could be a viable approach that combines the advantage of having a more centralized management without moving this knowledge away from filebeat.

@ruflin I noticed that even when the templates are loaded manually via a PUT to the ES cluster, when filebeat sends new data and generates a new index, the mapping for the new index includes extraneous mappings like apache2, auditd etc. Is this expected?

@sterago Yes, it's expected as all the data from a Beat ends up at the moment in one index and we don't know in advance which modules will be used.

I'm happy to build out some documentation for fields.yml.

I'm building a new Beat and had to resort to reading the libbeat source to find out how to add a multifield mapping, so found out quite a bit while I was digging there.

I've been referring to the Beats Developer Guide and that was the first place I looked for documentation on the file format. Would there be a better place for it?

@bestpath-gb The devguide definitely needs to have this info, so I would put it there. In the reference guides, we need to tell users how to customize the index template, but we can point to the devguide for the nitty gritty details. Thank you for offering to help!

I have put together a new documentation page that details what I would consider to be commonly used types and parameters.
Looking through the code, however, there's a bunch of other mapping parameters I'm not sure about as they don't marry up with anything in the docs. There are also Kibana-related parameters that can be applied to fields, which I'm not familiar with. I've listed some and linked to the ES mapping parameters docs for more info.
I've written the page but not build the docs as I'm not familiar with the procedure. Are there any details on how to do this? I take it that once I've validated the build I can simply open a PR? Or is there a procedure to follow?

Beg your pardon... I've found the contribution guidelines.

I may not have tagged it correctly (let me know if that's the case and I can get it right in future) as it's not showing up here, but I've opened a PR for this documentation: https://github.com/elastic/beats/pull/12505

closing because this issue was resolved quite awhile ago (with https://github.com/elastic/beats/pull/12505).

Was this page helpful?
0 / 5 - 0 ratings