Elasticsearch: Warn about slow node performance eg disk I/O

Created on 8 Jul 2016 · 6Comments · Source: elastic/elasticsearch

Elasticsearch version: 2.3.3

JVM version:1.8.0_60

OS version: centos-7.0

Description of the problem including expected versus actual behavior:
my cluster consists of 10 data node, when one node stuck on disk io(because of hardware problem), the whole cluster write stuck for several minutes(> 10 minutes), the bad node was not removed from cluster automatically.
Steps to reproduce:
1.
2.
3.

Provide logs (if relevant):

Describe the feature:
there may be a thread to monitor disk io timeout, if it happened and exceed a configured threshold, remove the bad node from cluster.

:CorInfrLogging >enhancement help wanted

Source

curu

Most helpful comment

Hi @clintongormley, I think we could expose the i/o wait % on the node stats api. Might be a good metric to watch for i/o problems and can the api can be easily read by external monitoring systems.

andrestc on 16 Jul 2016

👍2

All 6 comments

Hi @curu

We discussed this in FixItFriday. Removing a node because of slow disk I/O is quite an aggressive decision to make. I would be hesitant to have Elasticsearch make this decision. eg you remove one node, so the other nodes have to do shard recovery. now they're slow too, so we remove another node, etc.

Instead, we could log warnings about things like slow disk I/O and a monitoring system could pick up on these warnings and alert the sysadmin.

clintongormley on 15 Jul 2016

andrestc on 16 Jul 2016

👍2

It is worth mentioning that I/O wait was considered in #15915 but we ultimately pulled it out.

jasontedor on 16 Jul 2016

@clintongormley , agreed.

curu on 21 Jul 2016

how to deal with that when confront this issue ? help !!

modeyang on 30 Sep 2017

I don't think logging is the way to go here, instead monitoring. We can consider enhancing stats to include disk I/O wait, but that is a different enhancement altogether. Therefore, I am closing this one.

jasontedor on 13 Mar 2018

Was this page helpful?

0 / 5 - 0 ratings