Beats: Support for ILM in Beats

Created on 9 Aug 2018 · 23Comments · Source: elastic/beats

With https://github.com/elastic/elasticsearch/issues/29823#issuecomment-384130831 index lifecycle management is added to Elasticsearch. ILM is especially useful in the case of Beats to automate rollover. This will ensure all indices have a similar size instead of having daily indices where size can vary based on how many Beats and the number of events sent.

This is a description of the possible implementation in Beats. It still might change during implementation.

Technical Implementation

The basic implementation if ILM is enabled in Beats will look as following:

The Beat template will contain the ILM policy which should be used:

PUT _template/filebeat-6.5.0
{
  "index_patterns": ["filebeat-6.5.0-*"],
  "settings": {
    "index.lifecycle.name": "filebeat-policy",
    "index.lifecycle.rollover_alias": "filebeat-6.5.0"
  },
  "mappings": {
    "_doc": {
      ...
    }
  }
}
````

This can already configured today through the following settings in Beats:

setup.template.settings.index.lifecycle.name: "filebeat-policy"
setup.template.settings.index.lifecycle.rollover_alias: "filebeat-6.5.0"

With ILM enabled these settings will be written automatically.

As soon as the template is loaded, the Beat will check for the existance of the write alias:

HEAD filebeat-6.5.0

In case the write alias does not exist yet, it will be created:

PUT filebeat-6.5.0-000001
{
"aliases": {
"filebeat-6.5.0":{
"is_write_index": true
}
}
}

From here on all data is sent to the `filebeat-6.5.0` alias and things work like usual.

## Configuration

The configuration of ILM belongs to the Elasticsearch output and could look as following.

elasticsearch.output.ilm:
enable: true
write_alias: filebeat
index: filebeat # Do we even want the version to be configurable? -> ES only and a lot of people break things with it
pattern: 000001
policy: filebeat-policy # What if the policy is also set in the settings?

The special part in the above is that the Beat version was left out. A common issue in Beats is that the version number is sometimes remove from the index which can cause issues on migration. To prevent this issue for ILM the version is automatically added to the write alias, index names and the template. We could add an additional config like `automatic_version: false` to disable this feature if we want.


Question:

* Should ILM config be it's own top level entry instead of part of elasticsearch ouptut?

## Example Policy

An example policy could look as following.

PUT _ilm/filebeat-policy
{
"policy": {
"type": "timeseries",
"phases": {
"hot": {
"after": "0s",
"actions": {
"rollover": {
"max_docs": "20"
}
}
}
}
}
}

This will create a new alias every 20 documents. In a real world example larger numbers and other rollover criterias can be used.

The policy is expected to be loaded with `filebeat setup ilm-policy`. A policy can be loaded at any time and is not required on template generation or data ingestion.

## Questions

* Do we provide a default policy?
* If yes, what is our default policy? Do we have different phases? Is it different per Beat?
* What if Beats is started against an ES instance without ILM and ILM is configured? It should not start an error out.

## Notes for testing

ILM policies are triggered every 10m by default. This can be changed as a cluster setting:

PUT /_cluster/settings
{
"persistent" : {
"indices.lifecycle.poll_interval": "5s"
}
}
```

The above means policies are triggered every 5s.

:Outputs libbeat meta

Source

ruflin

🎉2

Most helpful comment

Logstash ES output allows users to specify a direct configuration of the custom ILM policy used.

Beats, however, recommends changing the default policy instead (https://www.elastic.co/guide/en/beats/filebeat/6.7/ilm.html) and does not provide a direct configuration to specify the custom ILM policy under the ES output.

While setup.template.settings.index.lifecycle.name is documented, it is under the not recommended/advanced config section.

Have we considered providing a direct config to point to a custom ILM policy from Beats's ES output, and provide a way for users to substitute in app-specific identifier from custom fields/tags so that events will get routed to the right alias and use the corresponding policy?

ppf2 on 25 Mar 2019

👍2

All 23 comments

For anyone that wants to play around with ILM, here is a working copy / paste example:

PUT /_cluster/settings
{
    "persistent" : {
        "indices.lifecycle.poll_interval": "5s"
    }
}

PUT _template/filebeat-6.5.0
{
  "index_patterns": ["filebeat-6.5.0-*"],
  "settings": {
    "index.lifecycle.name": "filebeat-policy",
    "index.lifecycle.rollover_alias": "filebeat-6.5.0"
  }
}

PUT filebeat-6.5.0-000001
{
  "aliases": {
      "filebeat-6.5.0":{
            "is_write_index": true
     }
  }
}

HEAD filebeat-6.5.0

PUT _ilm/filebeat-policy
{
  "policy": {
    "type": "timeseries",
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_docs": "3"
          }
        }
      }
    }
  }
}

POST filebeat-6.5.0/_doc/
{
    "hello" : "world"
}

The last POST request must be execute at least 4 times to trigger a rollover.

ruflin on 10 Aug 2018

@ruflin for version 6.x I suggest the following, in order to avoid a breaking change for unaware users:

Policy will not be defined by a beat by default. The user will have to explicitly configure beat via start flag to define the policy
The policy will use 1 day rollover and not size of data as currently a date pattern is used in index names to create new index daily
Since users can change the ILM policy using API or GUI, users who won't like the rollover definition or need more definitions such as warm, cold indices or deletion of aged data, will be able to use the update policy wizard

yaronp68 on 13 Aug 2018

@ruflin thanks for drafting this up. Comments below...

index: filebeat # Do we even want the version to be configurable? -> ES only and a lot of people break things with it

If we strongly recommend having the version, then let's include it by default. +1 on having a flag to turn it off. For descriptiveness, perhaps it makes sense to name it index_version?

Should ILM config be it's own top level entry instead of part of elasticsearch ouptut?

Within the ES output feels like it makes sense given its an ES specific feature.

The policy is expected to be loaded with filebeat setup ilm-policy. A policy can be loaded at any time and is not required on template generation or data ingestion.

This is under the "Example Policy" section. I think it'd be worthwhile for us to recommend conducting this policy mgmt workflow completely in the ILM UI when possible. With the ILM UI in Basic, recommending the centralized and visual approach feels like a better user experience. If a user wants to use custom policies, they can create the policy in the UI and then reference it from the Beats side.

Regardless, I'd agree this filebeat setup ilm-policy feature is still of value for users who currently rely on third party config / change mgmt tools.

Do we provide a default policy?

I think we should include a default policy, especially for the getting started experience.

If yes, what is our default policy? Do we have different phases? Is it different per Beat?

We can probably start simple with one default policy with a 1d rollover, similar to the existing behavior. Additionally, we could also consider another default policy optimized for metrics (i.e. metricbeat, heartbeat, packetbeat) since they will naturally have a different storage profile. Thoughts? If needed, we'll have another opportunity at 7.0 to tweak these further after we get some customer feedback.

What if Beats is started against an ES instance without ILM and ILM is configured? It should not start an error out.

+1 on failing startup and showing a descriptive error msg. If the ILM feature isn't available in ES, configuring it in Beats would effectively be an invalid configuration.

acchen97 on 14 Aug 2018

ILM will be disable by default in 6.x as otherwise it would be a breaking change.
For the policy: The main benefit of having ILM is that we get away from the daily indices which can lead to very uneven index sizes or too large / too small indices. So I think our default policy should not be daily indices as otherwise we would just have the same we have now but with ILM.
index_version: I kind of like the suggestion, will play around with a few names
If we ship with a default policy for Beats (I assume in Kibana), would it always be there by default or would the user have to trigger it? Agree that the setup command should be there in any case.
I was thinking more about the default policy and I think our criteria should be size, something like "max_size": "25gb". The 6.x cycles allows us to play around with these params to understand what is best for 7.0. For users with more complex use cases there is going to be the wizard. The nice thing about ILM is that the policy can be adjusted at any time and will apply to all new data.

ruflin on 14 Aug 2018

Can we please not have a default policy with a 1 day rollover. One of the reasons that ILM will be great for beats is that it will get rid of the daily indices idea and get users to "fill" their indices more so they avoid having 1000s of tiny indices. I think the default policy should focus on the size of the index rather than the age, at least for 7.0 onwards.

colings86 on 14 Aug 2018

If we ship with a default policy for Beats (I assume in Kibana), would it always be there by default or would the user have to trigger it? Agree that the setup command should be there in any case.

I think the default policy can be stored on either Beats or ES/KB side, but I'd agree it feels like it would make sense to have it bundled by default in ES/KB. @yaronp68 @colings86 thoughts on this?

I was thinking more about the default policy and I think our criteria should be size, something like "max_size": "25gb". The 6.x cycles allows us to play around with these params to understand what is best for 7.0. For users with more complex use cases there is going to be the wizard. The nice thing about ILM is that the policy can be adjusted at any time and will apply to all new data.

@ruflin I'm good with having just one default policy and having it based on size. 25GB sounds reasonable to start with. I'm not sure there will be a perfect value here, but curious whether there's any science behind that suggested value? Thanks @colings86 for your input on this as well.

acchen97 on 14 Aug 2018

@acchen97 For the size: It's a bit smaller then 32 GB. No exact science here. Would be interesting to get feedback on this value from the field.

For the loading of the policy: I think it would be great if it would be bundled with Kibana and a user could install it with 1 click: Add Beats Policy. The policy would be the same that is loaded through metricbeat setup policy. So far I would version the policy in Beats an copy it over to Kibana. We can still figure out an automated way to keep the two in sync later.

Note: Having Kibana being able loading the policy I see as a user improvement and could still happen after 6.5.

ruflin on 15 Aug 2018

@acchen97 My justification for the 25GB size is that its a bit smaller than what we tell users is the maximum size of a shard that they should aim for (which is 30GB). The reason I suggested it be a little lower than this is to give a little leeway given the rollover is run periodically so we may be slightly over the max size by the time the rollover is actually done.

colings86 on 15 Aug 2018

As a starting point, you could have a look at how APM dashboards and index pattern are now bundled with Kibana and can be loaded within the APM Setup instructions (cc @sqren and @nreese).
As we still have the dashboards and index patterns within the APM Server code, we set up an automated test to check that those files are in sync. Sounds like a similar setup.

simitt on 15 Aug 2018

I actually think we should have a default policy that's a combination of 25gb + max age -- probably of something like 30d. That's still a downscaling of 30:1 from daily indices and puts a backstop in to prevent as much need for delete-by-query in the event of a misconfiguration.

eskibars on 15 Aug 2018

👍1

sure I would be fine with having a max_age of that order if we want one. I just want to avoid a max_age anywhere near 1 day ;)

colings86 on 15 Aug 2018

👍2

@eskibars Not opposed to 30d but can you share some background on why you want a max age at all?

ruflin on 15 Aug 2018

@eskibars Not opposed to 30d but can you share some background on why you want a max age at all?

There's a balance IMO between ease of knowing what you're keeping and execution of retention policies and excessive number of shards we sometimes see. A default max size helps with the oversharding, but it simultaneously makes it more difficult to delete data if/when a user starts a retention policy. A user that's held onto several months or a year of data that decides they want to get rid of any data older than 90-days has to run a delete-by-query if everything's keyed off max size and that's not going to be as pleasant of a user experience as just deleting indices, either from UI or from command line. Delete-by-query is also just not the type of thing I think we want to encourage people to do/have to do in the use case much. Many retention policies I've seen are ordered by months, so a 30d max gives a simple (though admittedly rough, in terms of granularity) escape hatch for the user without having to change many expectations around not needing DBQ for the use case.

eskibars on 16 Aug 2018

@eskibars thanks for the explanation, it makes sense to me.

@ruflin if you're good with it, let's go with the 25GB + 30d default policy.

acchen97 on 16 Aug 2018

@eskibars Appreciate the details, make sense.

Let's go with 25GB + 30d for now. As soon as we release it I hope we also get feedback from users on it and can still adjust.

ruflin on 16 Aug 2018

👍1

How did we get to 25GB? We've been recommending 50GB and benchmarking shows that that is still tiny. Why are we recommending such tiny indices as a default? I'd be leaning towards 100GB.

clintongormley on 12 Dec 2018

@ruflin Configuring the write alias name under the elasticsearch output feels like the wrong place:

elasticsearch.output.ilm:
  enable: true
  write_alias: filebeat
  index: filebeat # Do we even want the version to be configurable? -> ES only and a lot of people break things with it
  pattern: 000001
  policy: filebeat-policy # What if the policy is also set in the settings?

We want to move away from indexing everything into the filebeat-YYYY.MM-DD index and instead have named indices (eg apache_logs..., mysql_logs..., etc) . Only being able to specify a single write alias for the whole beat doesn't fit with that future.

clintongormley on 12 Dec 2018

👍1

@clintongormley note that we discussed the max size for the default policy since the 25GB that is listed here and the default is now 50GB. See https://github.com/elastic/beats/pull/7963/files#diff-2da066f8f6a753557247f81119034e25R44

colings86 on 12 Dec 2018

@clintongormley We decided for 50gb and is what is shipped at the moment. Changing it to 100gb would be simple if needed.

For the config: Where would you expect it? The Elasticsearch output feels like the correct place to me as ILM only applies to the Elasticsearch output and requires it. This will not prevent more dynamic indices in the future for ILM as patterns can be used like we have today.

Today there is only one alias with a write index because of limitations in Beats and changing it needs quite a few changes. @simitt already kicked of some work to allow loading multiple templates (https://github.com/elastic/beats/pull/9247) and the options we have to also allow multiple hardcoded write aliases which need to be setup. It gets more challenging in dynamic environments like k8s / docker where we don't know in advance which aliases we will need.

In short: We are working towards the future of allowing multiple aliases but it requires changes on how Beats loads templates and ingests data today.

ruflin on 12 Dec 2018

The introduced ILM configuration does not work with multiple templates, as the ilm specificas currently cannot be configured per template/index.

simitt on 13 Dec 2018

Created https://github.com/elastic/beats/issues/9919 for allowing multiple indices for ILM.

simitt on 7 Jan 2019

Logstash ES output allows users to specify a direct configuration of the custom ILM policy used.

While setup.template.settings.index.lifecycle.name is documented, it is under the not recommended/advanced config section.

ppf2 on 25 Mar 2019

👍2

jakelandis on 25 Mar 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Puppet / Chef / Ansible / Salt / etc. modules

tsg · 4Comments

[Filebeat] Add the ability to specify a custom path for autodiscover

feelan03 · 3Comments

Clarify documentation about indexers and matchers

exekias · 3Comments

Improvements To Beats/Logstash "ACK" Protocol Including Covering Load Balancing and Log Messaging

MorrieAtElastic · 3Comments

Reduce memory usage of elasticsearch/index metricset

ycombinator · 3Comments