With https://github.com/elastic/elasticsearch/issues/29823#issuecomment-384130831 index lifecycle management is added to Elasticsearch. ILM is especially useful in the case of Beats to automate rollover. This will ensure all indices have a similar size instead of having daily indices where size can vary based on how many Beats and the number of events sent.
This is a description of the possible implementation in Beats. It still might change during implementation.
The basic implementation if ILM is enabled in Beats will look as following:
The Beat template will contain the ILM policy which should be used:
PUT _template/filebeat-6.5.0
{
"index_patterns": ["filebeat-6.5.0-*"],
"settings": {
"index.lifecycle.name": "filebeat-policy",
"index.lifecycle.rollover_alias": "filebeat-6.5.0"
},
"mappings": {
"_doc": {
...
}
}
}
````
This can already configured today through the following settings in Beats:
setup.template.settings.index.lifecycle.name: "filebeat-policy"
setup.template.settings.index.lifecycle.rollover_alias: "filebeat-6.5.0"
With ILM enabled these settings will be written automatically.
As soon as the template is loaded, the Beat will check for the existance of the write alias:
HEAD filebeat-6.5.0
In case the write alias does not exist yet, it will be created:
PUT filebeat-6.5.0-000001
{
"aliases": {
"filebeat-6.5.0":{
"is_write_index": true
}
}
}
From here on all data is sent to the `filebeat-6.5.0` alias and things work like usual.
## Configuration
The configuration of ILM belongs to the Elasticsearch output and could look as following.
elasticsearch.output.ilm:
enable: true
write_alias: filebeat
index: filebeat # Do we even want the version to be configurable? -> ES only and a lot of people break things with it
pattern: 000001
policy: filebeat-policy # What if the policy is also set in the settings?
The special part in the above is that the Beat version was left out. A common issue in Beats is that the version number is sometimes remove from the index which can cause issues on migration. To prevent this issue for ILM the version is automatically added to the write alias, index names and the template. We could add an additional config like `automatic_version: false` to disable this feature if we want.
Question:
* Should ILM config be it's own top level entry instead of part of elasticsearch ouptut?
## Example Policy
An example policy could look as following.
PUT _ilm/filebeat-policy
{
"policy": {
"type": "timeseries",
"phases": {
"hot": {
"after": "0s",
"actions": {
"rollover": {
"max_docs": "20"
}
}
}
}
}
}
This will create a new alias every 20 documents. In a real world example larger numbers and other rollover criterias can be used.
The policy is expected to be loaded with `filebeat setup ilm-policy`. A policy can be loaded at any time and is not required on template generation or data ingestion.
## Questions
* Do we provide a default policy?
* If yes, what is our default policy? Do we have different phases? Is it different per Beat?
* What if Beats is started against an ES instance without ILM and ILM is configured? It should not start an error out.
## Notes for testing
ILM policies are triggered every 10m by default. This can be changed as a cluster setting:
PUT /_cluster/settings
{
"persistent" : {
"indices.lifecycle.poll_interval": "5s"
}
}
```
The above means policies are triggered every 5s.
For anyone that wants to play around with ILM, here is a working copy / paste example:
PUT /_cluster/settings
{
"persistent" : {
"indices.lifecycle.poll_interval": "5s"
}
}
PUT _template/filebeat-6.5.0
{
"index_patterns": ["filebeat-6.5.0-*"],
"settings": {
"index.lifecycle.name": "filebeat-policy",
"index.lifecycle.rollover_alias": "filebeat-6.5.0"
}
}
PUT filebeat-6.5.0-000001
{
"aliases": {
"filebeat-6.5.0":{
"is_write_index": true
}
}
}
HEAD filebeat-6.5.0
PUT _ilm/filebeat-policy
{
"policy": {
"type": "timeseries",
"phases": {
"hot": {
"actions": {
"rollover": {
"max_docs": "3"
}
}
}
}
}
}
POST filebeat-6.5.0/_doc/
{
"hello" : "world"
}
The last POST request must be execute at least 4 times to trigger a rollover.
@ruflin for version 6.x I suggest the following, in order to avoid a breaking change for unaware users:
@ruflin thanks for drafting this up. Comments below...
index: filebeat # Do we even want the version to be configurable? -> ES only and a lot of people break things with it
If we strongly recommend having the version, then let's include it by default. +1 on having a flag to turn it off. For descriptiveness, perhaps it makes sense to name it index_version?
Should ILM config be it's own top level entry instead of part of elasticsearch ouptut?
Within the ES output feels like it makes sense given its an ES specific feature.
The policy is expected to be loaded with filebeat setup ilm-policy. A policy can be loaded at any time and is not required on template generation or data ingestion.
This is under the "Example Policy" section. I think it'd be worthwhile for us to recommend conducting this policy mgmt workflow completely in the ILM UI when possible. With the ILM UI in Basic, recommending the centralized and visual approach feels like a better user experience. If a user wants to use custom policies, they can create the policy in the UI and then reference it from the Beats side.
Regardless, I'd agree this filebeat setup ilm-policy feature is still of value for users who currently rely on third party config / change mgmt tools.
Do we provide a default policy?
I think we should include a default policy, especially for the getting started experience.
If yes, what is our default policy? Do we have different phases? Is it different per Beat?
We can probably start simple with one default policy with a 1d rollover, similar to the existing behavior. Additionally, we could also consider another default policy optimized for metrics (i.e. metricbeat, heartbeat, packetbeat) since they will naturally have a different storage profile. Thoughts? If needed, we'll have another opportunity at 7.0 to tweak these further after we get some customer feedback.
What if Beats is started against an ES instance without ILM and ILM is configured? It should not start an error out.
+1 on failing startup and showing a descriptive error msg. If the ILM feature isn't available in ES, configuring it in Beats would effectively be an invalid configuration.
index_version: I kind of like the suggestion, will play around with a few names"max_size": "25gb". The 6.x cycles allows us to play around with these params to understand what is best for 7.0. For users with more complex use cases there is going to be the wizard. The nice thing about ILM is that the policy can be adjusted at any time and will apply to all new data.Can we please not have a default policy with a 1 day rollover. One of the reasons that ILM will be great for beats is that it will get rid of the daily indices idea and get users to "fill" their indices more so they avoid having 1000s of tiny indices. I think the default policy should focus on the size of the index rather than the age, at least for 7.0 onwards.
If we ship with a default policy for Beats (I assume in Kibana), would it always be there by default or would the user have to trigger it? Agree that the setup command should be there in any case.
I think the default policy can be stored on either Beats or ES/KB side, but I'd agree it feels like it would make sense to have it bundled by default in ES/KB. @yaronp68 @colings86 thoughts on this?
I was thinking more about the default policy and I think our criteria should be size, something like "max_size": "25gb". The 6.x cycles allows us to play around with these params to understand what is best for 7.0. For users with more complex use cases there is going to be the wizard. The nice thing about ILM is that the policy can be adjusted at any time and will apply to all new data.
@ruflin I'm good with having just one default policy and having it based on size. 25GB sounds reasonable to start with. I'm not sure there will be a perfect value here, but curious whether there's any science behind that suggested value? Thanks @colings86 for your input on this as well.
@acchen97 For the size: It's a bit smaller then 32 GB. No exact science here. Would be interesting to get feedback on this value from the field.
For the loading of the policy: I think it would be great if it would be bundled with Kibana and a user could install it with 1 click: Add Beats Policy. The policy would be the same that is loaded through metricbeat setup policy. So far I would version the policy in Beats an copy it over to Kibana. We can still figure out an automated way to keep the two in sync later.
Note: Having Kibana being able loading the policy I see as a user improvement and could still happen after 6.5.
@acchen97 My justification for the 25GB size is that its a bit smaller than what we tell users is the maximum size of a shard that they should aim for (which is 30GB). The reason I suggested it be a little lower than this is to give a little leeway given the rollover is run periodically so we may be slightly over the max size by the time the rollover is actually done.
As a starting point, you could have a look at how APM dashboards and index pattern are now bundled with Kibana and can be loaded within the APM Setup instructions (cc @sqren and @nreese).
As we still have the dashboards and index patterns within the APM Server code, we set up an automated test to check that those files are in sync. Sounds like a similar setup.
I actually think we should have a default policy that's a combination of 25gb + max age -- probably of something like 30d. That's still a downscaling of 30:1 from daily indices and puts a backstop in to prevent as much need for delete-by-query in the event of a misconfiguration.
sure I would be fine with having a max_age of that order if we want one. I just want to avoid a max_age anywhere near 1 day ;)
@eskibars Not opposed to 30d but can you share some background on why you want a max age at all?
@eskibars Not opposed to 30d but can you share some background on why you want a max age at all?
There's a balance IMO between ease of knowing what you're keeping and execution of retention policies and excessive number of shards we sometimes see. A default max size helps with the oversharding, but it simultaneously makes it more difficult to delete data if/when a user starts a retention policy. A user that's held onto several months or a year of data that decides they want to get rid of any data older than 90-days has to run a delete-by-query if everything's keyed off max size and that's not going to be as pleasant of a user experience as just deleting indices, either from UI or from command line. Delete-by-query is also just not the type of thing I think we want to encourage people to do/have to do in the use case much. Many retention policies I've seen are ordered by months, so a 30d max gives a simple (though admittedly rough, in terms of granularity) escape hatch for the user without having to change many expectations around not needing DBQ for the use case.
@eskibars thanks for the explanation, it makes sense to me.
@ruflin if you're good with it, let's go with the 25GB + 30d default policy.
@eskibars Appreciate the details, make sense.
Let's go with 25GB + 30d for now. As soon as we release it I hope we also get feedback from users on it and can still adjust.
How did we get to 25GB? We've been recommending 50GB and benchmarking shows that that is still tiny. Why are we recommending such tiny indices as a default? I'd be leaning towards 100GB.
@ruflin Configuring the write alias name under the elasticsearch output feels like the wrong place:
elasticsearch.output.ilm:
enable: true
write_alias: filebeat
index: filebeat # Do we even want the version to be configurable? -> ES only and a lot of people break things with it
pattern: 000001
policy: filebeat-policy # What if the policy is also set in the settings?
We want to move away from indexing everything into the filebeat-YYYY.MM-DD index and instead have named indices (eg apache_logs..., mysql_logs..., etc) . Only being able to specify a single write alias for the whole beat doesn't fit with that future.
@clintongormley note that we discussed the max size for the default policy since the 25GB that is listed here and the default is now 50GB. See https://github.com/elastic/beats/pull/7963/files#diff-2da066f8f6a753557247f81119034e25R44
@clintongormley We decided for 50gb and is what is shipped at the moment. Changing it to 100gb would be simple if needed.
For the config: Where would you expect it? The Elasticsearch output feels like the correct place to me as ILM only applies to the Elasticsearch output and requires it. This will not prevent more dynamic indices in the future for ILM as patterns can be used like we have today.
Today there is only one alias with a write index because of limitations in Beats and changing it needs quite a few changes. @simitt already kicked of some work to allow loading multiple templates (https://github.com/elastic/beats/pull/9247) and the options we have to also allow multiple hardcoded write aliases which need to be setup. It gets more challenging in dynamic environments like k8s / docker where we don't know in advance which aliases we will need.
In short: We are working towards the future of allowing multiple aliases but it requires changes on how Beats loads templates and ingests data today.
The introduced ILM configuration does not work with multiple templates, as the ilm specificas currently cannot be configured per template/index.
Created https://github.com/elastic/beats/issues/9919 for allowing multiple indices for ILM.
Logstash ES output allows users to specify a direct configuration of the custom ILM policy used.
Beats, however, recommends changing the default policy instead (https://www.elastic.co/guide/en/beats/filebeat/6.7/ilm.html) and does not provide a direct configuration to specify the custom ILM policy under the ES output.
While setup.template.settings.index.lifecycle.name is documented, it is under the not recommended/advanced config section.
Have we considered providing a direct config to point to a custom ILM policy from Beats's ES output, and provide a way for users to substitute in app-specific identifier from custom fields/tags so that events will get routed to the right alias and use the corresponding policy?
Related : https://github.com/elastic/beats/issues/11347#issuecomment-476319686
Most helpful comment
Logstash ES output allows users to specify a direct configuration of the custom ILM policy used.
Beats, however, recommends changing the default policy instead (https://www.elastic.co/guide/en/beats/filebeat/6.7/ilm.html) and does not provide a direct configuration to specify the custom ILM policy under the ES output.
While
setup.template.settings.index.lifecycle.nameis documented, it is under the not recommended/advanced config section.Have we considered providing a direct config to point to a custom ILM policy from Beats's ES output, and provide a way for users to substitute in app-specific identifier from custom fields/tags so that events will get routed to the right alias and use the corresponding policy?