Elasticsearch: Cluster will not automatically assign shards

Created on 12 May 2016 · 33Comments · Source: elastic/elasticsearch

Elasticsearch version: 2.2.1

JVM version: java version "1.8.0_31"
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)

OS version: CentOS release 6.7 (Final)

Description of the problem including expected versus actual behavior:

Thanks in advance for your help.

I inherited a poorly-behaving Elasticsearch cluster that I've been tuning on and off for a few months. Some things have improved, others have not.

About a month ago, another team needed to reboot the (AWS hosted) nodes. I turned off shard allocation in the cluster with an API call and got a success response. However, it didn't take. So I also turned off allocation in kopf. But, rebooting a node caused shards to be allocated anyway. So we rebooted one box per day and let the cluster rebalance each time.

Some time after the reboots, I noticed the cluster was still yellow. It had stopped allocating shards. I figured the API call had finally taken, so I turned allocation back on (and got another success message). According to kopf, allocation is enabled. But the cluster did not automatically assign shards.

I tried toggling shard allocation on and off with both kopf and the API. But the cluster would not allocate shards.

I complained about this on Twitter and an Elastic employee recommended upgrading. So, I upgraded the cluster from 1.5.2 to 2.2.1. However, the behavior persists.

If I assign shards manually with the API, they allocate just fine.

Steps to reproduce:

Turn off shard allocation with API call and kopf
Turn on shard allocation with API call and kopf
Cry a lot

Provide logs (if relevant): I didn't see any relevant logs, but if you tell me what to look for I can grep. I'm happy to gather diagnostics for you, but please note that I no longer own this cluster as of 3pm PDT Friday.

:DistributeDistributed feedback_needed

Source

alicegoldfuss

Most helpful comment

Okay, it's still possible that the indices have the disable allocation decider enabled on them as I don't have the settings you emailed Clint earlier (he's offline), you can try:

curl -XPUT 'node:9200/_cluster/settings' -d'{"transient": {"cluster.routing.allocation.disable_allocation": false}}'

And additionally (in case it is set on the indices themselves):

curl -XPUT 'node:9200/_all/_settings' -d'{"index.routing.allocation.disable_allocation": false}'

dakrone on 13 May 2016

👍3

All 33 comments

Hi @alicegoldfuss

Could you provide the actual command that you use to enable/disable shard allocation, along with the output of:

GET _cluster/settings

and (assuming you have unassigned shards atm):

GET _shard_stores

Also, would be good to take an unassigned shard and try to assign it to a particular node with the cluster-reroute API, adding the ?explain flag to figure out why the shard isn't being assigned.

thanks

clintongormley on 12 May 2016

Hi Clinton!

$ curl -XPUT http://node-1.com:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.enable": "all"}}'
{"acknowledged":true,"persistent":{},"transient":{"cluster":{"routing":{"allocation":{"enable":"all"}}}}}

$ curl -XGET http://node-1.com:9200/_cluster/settings
{"persistent":{"cluster":{"routing":{"allocation":{"enable":"all"}}}},"transient":{"cluster":{"routing":{"allocation":{"enable":"all"}}}}}

Shard stores returned a lot, so I cleaned up one of the replies:

{"indices":{"index-1":{"shards":{"2":{"stores":[{"urZEQshBQourcmigNqeG0g":{"name":"node-1","transport_address":"127.0.0.1:9300","attributes":{"ec2az":"us-east-1a","datacenter":"use1v"}},"version":40,"allocation":"primary"},{"uWmawPeUQ2KYNR1sLKvxfA":{"name":"node-2","transport_address":"127.0.0.1:9300","attributes":{"ec2az":"us-east-1d","datacenter":"use1v"}},"version":25,"allocation":"unused"}]}}},

The explain query dumped a bunch of stuff. Can I give it to you privately?

alicegoldfuss on 12 May 2016

Also I should add that assigning the shard via the API worked. That has always worked. But the cluster doesn't assign automatically.

alicegoldfuss on 12 May 2016

The explain query dumped a bunch of stuff. Can I give it to you privately?

Sure. clinton at elastic dot co

Also I should add that assigning the shard via the API worked. That has always worked. But the cluster doesn't assign automatically.

Do you have ongoing recoveries? What is the output of:

GET _cat/recovery?v

I also see you're using node attributes, and possibly multiple data centres? (usually this is a no-no, unless they're like ec2 availability zones, ie as fast as a lan). Could you paste the settings you're using for shard awareness/allocation?

clintongormley on 12 May 2016

I will send you the query dump and recovery dump.

We are using multiple AWS availability zones, not physical data centers.

alicegoldfuss on 12 May 2016

@alicegoldfuss It looks like you tried to reroute an already assigned shard. Could you try one that is unassigned, eg shard 1 of index logstash-2016.02.13? (with ?explain)

clintongormley on 12 May 2016

Also, could you paste the settings you're using for shard awareness/allocation?

clintongormley on 12 May 2016

@clintongormley I assigned a shard that was unassigned. But when I reran the command to create that file, yes it was already assigned.

I will send you the new shard assignment.

Can you tell me where to find the settings for shard awareness/allocation?

alicegoldfuss on 12 May 2016

@alicegoldfuss Either in index settings or in node settings, depending on what is set. Try:

GET _nodes/settings
GET logstash-2016.02.13/_settings

clintongormley on 12 May 2016

Okay, emailed those settings to you.

alicegoldfuss on 12 May 2016

@alicegoldfuss apparently those shards can be allocated so if you just call _reroute without any body or arguments (now that I think about it you might need to specify an empty body) will shard get allocated? if you allocate one shard manually will others follow once that shards is started?

s1monw on 12 May 2016

Hello!

If I manually allocate one shard, the other shards will not follow suit.

I ran the following:

curl -XPOST 'http://node-1.com:9200/_cluster/reroute'

That returned a large response, full of shard states:

{"state":"STARTED","primary":false,"node":"XXXXXXXXXXX","relocating_node":null,"shard":4,"index":"index-1","version":24,"allocation_id":{"id":"XXXXXXXXXXXX"}}

But no signs of actual allocation.

I ran it with an empty body and got the same.

alicegoldfuss on 12 May 2016

well I think from here on I don't have many more ideas than enabling trace logging for allocation for cluster.routing.allocation.decider as well as cluster.routing.allocation @bleskes any ideas?

s1monw on 12 May 2016

That would require either putting in a Puppet PR (I don't think I could deploy in time) or disabling Puppet and shoving the settings into the config on the boxes (not my favorite choice). So if there's a Plan B I would prefer it.

alicegoldfuss on 12 May 2016

So if there's a Plan B I would prefer it.

You can do it from the settings API:

PUT /_cluster/settings
{
    "transient" : {
        "logger.cluster.routing.allocation.decider": "TRACE",
        "logger.cluster.routing.allocation": "TRACE"
    }
}

jasontedor on 12 May 2016

Thanks! So, put those in place and try to allocate a shard with explain enabled? Or will those settings generate log files?

alicegoldfuss on 12 May 2016

Thanks! So, put those in place and try to allocate a shard with explain enabled? Or will those settings generate log files?

It will probably start generating log lines, but try to do an allocation too and share the master logs from the time you enabled the traces.

jasontedor on 12 May 2016

$ curl -XPUT 'http://node-1.com:9200/_cluster/settings' -d '{"transient" : {"logger.cluster.routing.allocation.decider": "TRACE","logger.cluster.routing.allocation": "TRACE"}}'
{"acknowledged":true,"persistent":{},"transient":{"logger":{"cluster":{"routing":{"allocation":"TRACE","allocation.decider":"TRACE"}}}}}

I successfully allocated two shards with the API. Elasticsearch generated no additional logs. I captured the explain dumps if you would like to see them.

alicegoldfuss on 12 May 2016

Elasticsearch generated no additional logs.

Are you sure that you extracted the logs from the master node? You can check the master with GET /_cat/master against any node in the cluster.

jasontedor on 12 May 2016

Okay! I thought ES logs would be the same across the cluster. I have a 37 MB log file for you @jasontedor where can I send it?

alicegoldfuss on 13 May 2016

Okay! I thought ES logs would be the same across the cluster.

Each node is a special snowflake, and the master is a very special snowflake and in particular is the only node making allocation decisions.

I have a 37 MB log file for you @jasontedor where can I send it?

Can you gzip compress that (if not already) and send it to my first name at the same domain as @clintongormley's email address from earlier?

jasontedor on 13 May 2016

Done :)

alicegoldfuss on 13 May 2016

@alicegoldfuss Can you hit GET /_cluster/settings against the master node and share here?

jasontedor on 13 May 2016

Do you have the timestamps from when you performed the allocation? I see things like:

[2016-05-12 21:47:20,469][TRACE][cluster.routing.allocation.decider] [...] Can not allocate [...], at[2016-05-12T05:48:08.074Z], details[failed recovery, failure RecoveryFailedException[index: Recovery failed from {node}{...}{...}{...}{...} into {...}{...}{...}{...}{...} (no activity after [30m])]; nested: ElasticsearchTimeoutException[no activity after [30m]]; ]]] on node [.....] due to [DisableAllocationDecider]

Especially the DisableAllocationDecider part, it looks like allocation is still disabled at this point, has it been turned back on? I noticed in your command for enabling allocation that you have enabled the EnableAllocationDecider, but perhaps the DisableAllocationDecider (which is deprecated in 2.x ) is still disabling allocation?

In 2.x, both are around since DisableAllocationDecider is deprecated so people can move away from it, however, they both have effect, so if allocation is disabled by the DisableAllocationDecider then it will still be disabled.

dakrone on 13 May 2016

Additionally, I _only_ see logs from the cluster.routing.allocation.decider and cluster.routing.allocation.allocator packages in this file, what happened to the logs from the other packages?

dakrone on 13 May 2016

Ran settings against the master node:

$ curl -XGET http://node-master.com:9200/_cluster/settings
{"persistent":{"cluster":{"routing":{"allocation":{"enable":"all"}}}},"transient":{"cluster":{"routing":{"allocation":{"enable":"all"}}},"logger":{"cluster":{"routing":{"allocation":"TRACE","allocation.decider":"TRACE"}}}}}

I don't have specific timestamps, but I did give you the entire log file. Everything you see in that file is what I can see on the master. The other nodes are full of the 30 minute timeout notifications from when I was allocating hundreds of shards yesterday.

alicegoldfuss on 13 May 2016

How can I toggle the DisableAllocatiorDecider? Give me a command and I'll try running it.

I know this is probably in your docs somewhere, but I'm pretty swamped, so sorry in advance.

alicegoldfuss on 13 May 2016

Okay, it's still possible that the indices have the disable allocation decider enabled on them as I don't have the settings you emailed Clint earlier (he's offline), you can try:

curl -XPUT 'node:9200/_cluster/settings' -d'{"transient": {"cluster.routing.allocation.disable_allocation": false}}'

And additionally (in case it is set on the indices themselves):

curl -XPUT 'node:9200/_all/_settings' -d'{"index.routing.allocation.disable_allocation": false}'

dakrone on 13 May 2016

👍3

YOU DID IT!

It was set on the indices themselves! I issued the second command you provided and BAM shards are initializing automatically!

No idea how that thing got toggled in the first place, but I will never touch it again.

Thank you everyone!

alicegoldfuss on 13 May 2016

🎉1 👍1

Glad to hear that!

I'm going to close this issue, thanks for debugging with us :)

dakrone on 13 May 2016

@clintongormley @dakrone @jasontedor the reason why we didn't see this in the explain output was that we explicitly ignore these allocation deciders when running allocation commands. I think this is trappy and should be optional (specified in the request but not ignored by default) - this could have saved us and the user lots of time and would have prevented @alicegoldfuss to write her python tool in the first place I guess? I think we should change that!

s1monw on 13 May 2016

@alicegoldfuss By the way, you will probably want to disable the allocation trace logging that was enabled earlier lest your master will be over logging (as you saw, it can produce large logs very quickly). Just run the command earlier to enable allocation trace logging but replace TRACE by INFO.

jasontedor on 13 May 2016

@jasontedor good call on the log setting!

alicegoldfuss on 13 May 2016

Was this page helpful?

0 / 5 - 0 ratings