Elasticsearch: Remove pluggable shard allocation mechanisms

Created on 27 Feb 2019  路  6Comments  路  Source: elastic/elasticsearch

Today it is possible for a plugin to alter the behaviour of the shard allocator, providing custom AllocationDecider implementations or even overriding the entire ShardsAllocator with an alternative implementation. The only plugin I can find that actually does this is Tempest and this only supports versions prior to 2.3.2 which are long past the end of their supported lives.

Maintaining this pluggability constrains our choices for improving the shards allocator, and yet there appears to be little appetite to use this flexibility in the wild. I propose deprecating this pluggability in the 7.x series and removing it in 8.0.

This issue is to gather feedback on the idea. If you rely on being able to plug in a custom allocator, or otherwise implement the ClusterPlugin interface, please let us know here.

:DistributeAllocation >deprecation team-discuss

Most helpful comment

@DaveCTurner I'm one of the coathors for Tempest. What @kennycason said is correct but let me expand on it.

First, it is my intention to update Templest to be compatible with newer versions. Like Kenny, I'm no longer at that particular company so the work hasn't been prioritized yet. That said, I still work on pretty large ES installs that suffer the exact same issue. In fact, the current workaround I see is disabling the rebalancer altogether and doing manual shard moves during off hours using an external process (which I think is slated to be removed too).

It would be fantastic if Elastic could engineer a balancer for everyone but franky the level of effort there is probably huge with the return on value being isolated to your large enterprise power users. What does "balanced" even mean? For Tempest that is a really complex answer that takes a number of factors into account including, history index sizes, modeled shared growth, primary vs replica, capacity of the node, distribution of the cluster as a whole, distribution of a particular index.

I would also point out that Tempest is biased to shard sizes but there are certainly cases for balancers that are more sensitive to ingest rates, search rates, hot vs cold storage, and memory footprinting. Notice that there are 10 public forks of Tempest, and who knows how my private balancers in corporate repos (maybe some of them will chime in here)

Over the years I've noticed Elastic has made several decisions to remove power user functionality. The usual justifications have been either "nobody uses it" or "it's dangerous". This has made ES in particular a hard-to-upgrade product in large scale deployments. I for one would like to see ES continue to ship with reasonable defaults, but also provide a rich set of extension points to let power users leverage ES to get maximum value.

I'll conclude by directly responding to "there appears to be little appetite to use this flexibility in the wild". You're probably right, but the counter point here is for the people that use it, it's huge. I actually know of at least one large company that wouldn't upgrade to 5.x until Tempest supported it or a similar balancer took its place.

All 6 comments

Pinging @elastic/es-distributed

The ability to plug in a custom allocator implementations is important to us. We would like to see this retained going forward.

@vigyasharma can you give more details? What gaps are there in the default allocator that you're filling with a custom implementation? Are you overriding the whole ShardsAllocator or just providing extra AllocationDeciders?

@DaveCTurner I'm posting in regards to Tempest. While the released version only supports up to 2.3.2, we have work to make it work with up to ES 5. IIRC, The only reason it's not publicly released as of now is due to lack of integration tests. I am no longer working with that team and am not sure of their future plans. When I left 4 months ago the plan was to update and release the 5.x version.

The use-case was that the default shard allocator only allocated shards based on the number of shards, which assumes a homogenous shard size in order to have a balanced cluster. If you have multiple shards of heterogenous size (log-normal distribution in our case), the default shard balancer results in an unbalanced cluster with hotspots. Use-cases are using shard routing to guarantee per-customer data is routed to a single shard to ensure data locality when searching. Then when data of a particular customer grows too large to sanely fit in a single shard within our default index, we create another index for that customer with the appropriate number of shards we need.

This structuring of data/shards is per what Elastic released considering best practices here: https://www.elastic.co/guide/en/elasticsearch/guide/master/user-based.html

I have met and consulted with a large number of engineers/companies who have problems with shard hotspotting caused by the simplicity of the default shard balancer. Most do not know how to work around it as there is not an option (hence Tempest) and as a result run a largely over provisioned ES cluster. This is notably expensive when running large, over provisioned ES clusters in data centers like AWS.

@DaveCTurner I'm one of the coathors for Tempest. What @kennycason said is correct but let me expand on it.

First, it is my intention to update Templest to be compatible with newer versions. Like Kenny, I'm no longer at that particular company so the work hasn't been prioritized yet. That said, I still work on pretty large ES installs that suffer the exact same issue. In fact, the current workaround I see is disabling the rebalancer altogether and doing manual shard moves during off hours using an external process (which I think is slated to be removed too).

It would be fantastic if Elastic could engineer a balancer for everyone but franky the level of effort there is probably huge with the return on value being isolated to your large enterprise power users. What does "balanced" even mean? For Tempest that is a really complex answer that takes a number of factors into account including, history index sizes, modeled shared growth, primary vs replica, capacity of the node, distribution of the cluster as a whole, distribution of a particular index.

I would also point out that Tempest is biased to shard sizes but there are certainly cases for balancers that are more sensitive to ingest rates, search rates, hot vs cold storage, and memory footprinting. Notice that there are 10 public forks of Tempest, and who knows how my private balancers in corporate repos (maybe some of them will chime in here)

Over the years I've noticed Elastic has made several decisions to remove power user functionality. The usual justifications have been either "nobody uses it" or "it's dangerous". This has made ES in particular a hard-to-upgrade product in large scale deployments. I for one would like to see ES continue to ship with reasonable defaults, but also provide a rich set of extension points to let power users leverage ES to get maximum value.

I'll conclude by directly responding to "there appears to be little appetite to use this flexibility in the wild". You're probably right, but the counter point here is for the people that use it, it's huge. I actually know of at least one large company that wouldn't upgrade to 5.x until Tempest supported it or a similar balancer took its place.

Thanks to everyone for their comments. We discussed this as a team again today and decided we will be able to continue to support custom AllocationDecider and ShardsAllocator implementations for the time being. We may reopen this in future, but for now I will close this to indicate that this is no longer something we intend to work on.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

abtpst picture abtpst  路  3Comments

rbayliss picture rbayliss  路  3Comments

clintongormley picture clintongormley  路  3Comments

rjernst picture rjernst  路  3Comments

clintongormley picture clintongormley  路  3Comments