Elasticsearch: Allow Dynamic Cluster Setting for `number_of_shards`

Created on 21 Mar 2017 · 8Comments · Source: elastic/elasticsearch

Today, we often suggest that users create a default "global" index template to create defaults for their cluster's sharding. For example:

{
  "template": "*",
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  }
}

However, as we look to the future of disabling template inheritance, we need to consider this very real use case of template inheritance, which was probably the most common use of it.

As such, I think we should allow certain index level settings to be set at the cluster level to control their defaults. We should avoid doing this in the same way that we did in 1.x and 2.x, where you simply used the same setting name. So I wonder if we could introduce a cluster-level index settings:

cluster.index.defaults:
  number_of_shards: 1
  number_of_replicas: 1

Once those exist, they could ideally be set dynamically as well, and then the desire/need for a global template is mostly erased.

/cc @AlexP-Elastic @inqueue @clintongormley

:CorInfrSettings >enhancement discuss

Source

pickypg

👍2

Most helpful comment

The reason that we are okay with removing template inheritance is due to the assumption that a cluster is expected to have a reasonable number of templates and in cases where a cluster does have a large number of templates, they would have to be programmatically managed. Given this assumption, I do not see the need to add default index settings. If there a reasonable number of templates, the default index settings can be applied to every template, and in cases where a cluster does have a large number of templates, they can programmatically added to every template.

Secondly, we try to maintain a philosophy that there should be one way to do things, and this goes against that philosophy as now settings could be added to a template, or set via these default settings.

Finally, I'm concerned about a long tail of requests to add a default index setting for every index setting and we end up with overhead of supporting a large number of default index settings.

jasontedor on 22 Mar 2017

👍3

All 8 comments

Finally, I'm concerned about a long tail of requests to add a default index setting for every index setting and we end up with overhead of supporting a large number of default index settings.

jasontedor on 22 Mar 2017

👍3

I think that's an unrealistic assumption (naturally, since I wrote the issue). We see time and time again that advanced users often setup the clusters, but then less advanced users will come in and create the mappings for their own purposes.

In such a case, the newer user is less likely to properly handle sharding -- if they even consider it -- and that is exactly what the "global" template solved. An advanced user will hopefully go back and fix said template, but it's probably too late at that point.

I do agree that it's a slippery slope to create the long tail problem, but there aren't really many index settings that can impact the cluster other than the sharding (compression level and refresh-related settings mostly only impact the index). If we do go down this path, the long tail can be cut off by asking: does it impact the cluster or primarily just the index.

pickypg on 22 Mar 2017

Additionally for the newer user, and perhaps one with a bit more awareness, having previously configured index.number_of_shards and index.number_of_replicas in elasticsearch.yml (pre-5.x), the use of a "default" (*) index template for these settings was an understood and accepted change. With inheritance removed, we'll need to re-inform those we have helped keep shard counts down with default index templates-- if only we knew inheritance was going away when index settings were removed!

Dynamic cluster update settings for number_of_shards would be a welcome add, especially if the default number of shards is to remain at 5.

inqueue on 22 Mar 2017

We discussed this during Fix-it-Friday and agree that we should not add this. In addition to the thoughts laid out above, this is effectively a path to implementing template inheritance via a backdoor through the cluster settings.

jasontedor on 24 Mar 2017

In looking at the original issue and rationales for removing template inheritance, it seems most of the issues are around complexity in the mapping and analysis sections of them. Most of the examples around defaults in this issue appear to revolve around index settings that do not introduce the complexities of analysis and mappings. With _all removed in v6.x for new indices it removes the need for one of the most popular mappings/analysis related global default.

I bring this topic up because by limiting the scope to the popular requested defaults here, seems to provide much value without introducing the complexities that were the drivers for the removal of template inheritance.

For anyone building an application with ES, the defaults can managed by the application much easier, but for folks using tools like Beats/Logstash, templates essentially are the only way to go outside of introducing custom code. In these situations having defaults for things like shard counts, can go a long way into simplifying the management of templates.

djschny on 27 Mar 2017

👍1

by limiting the scope to the popular requested defaults here,

It never ends there. If you expose defaults like this, then soon we'll have requests for exposing refresh interval, shard routing, etc etc. Do these get copied into the custom index settings or do they remain in the cluster state? What happens if the cluster state values change? This adds complexity and action at a distance.

for folks using tools like Beats/Logstash, templates essentially are the only way to go outside of introducing custom code

You have a single valid template which contains all the info you need in one place. That is simple and predictable. No magic.

clintongormley on 28 Mar 2017

You have a single valid template which contains all the info you need in one place. That is simple and predictable. No magic.

Would you mind elaborating how this is possible for non-trivial log ingestion? For example, when ingesting different kinds of logs (security, network, app, DB, etc.) into the same cluster usually these go into separate index patters. For example:

logs-app-YYYY.MM.DD
logs-network-YYYY.MM.DD
logs-security-YYYY.MM.DD

Each of those will different mappings, shard settings, etc because their document structure and volume and performance requirements vary. This would require three different templates. Excuse my ignorance but I am not aware of how that is possible with a single valid template.

djschny on 28 Mar 2017