Elasticsearch: Warn about slow node performance eg disk I/O

Created on 8 Jul 2016  路  6Comments  路  Source: elastic/elasticsearch

Elasticsearch version: 2.3.3

JVM version:1.8.0_60

OS version: centos-7.0

Description of the problem including expected versus actual behavior:
my cluster consists of 10 data node, when one node stuck on disk io(because of hardware problem), the whole cluster write stuck for several minutes(> 10 minutes), the bad node was not removed from cluster automatically.
Steps to reproduce:
1.
2.
3.

Provide logs (if relevant):

Describe the feature:
there may be a thread to monitor disk io timeout, if it happened and exceed a configured threshold, remove the bad node from cluster.

:CorInfrLogging >enhancement help wanted

Most helpful comment

Hi @clintongormley, I think we could expose the i/o wait % on the node stats api. Might be a good metric to watch for i/o problems and can the api can be easily read by external monitoring systems.

All 6 comments

Hi @curu

We discussed this in FixItFriday. Removing a node because of slow disk I/O is quite an aggressive decision to make. I would be hesitant to have Elasticsearch make this decision. eg you remove one node, so the other nodes have to do shard recovery. now they're slow too, so we remove another node, etc.

Instead, we could log warnings about things like slow disk I/O and a monitoring system could pick up on these warnings and alert the sysadmin.

Hi @clintongormley, I think we could expose the i/o wait % on the node stats api. Might be a good metric to watch for i/o problems and can the api can be easily read by external monitoring systems.

It is worth mentioning that I/O wait was considered in #15915 but we ultimately pulled it out.

@clintongormley , agreed.

how to deal with that when confront this issue ? help !!

I don't think logging is the way to go here, instead monitoring. We can consider enhancing stats to include disk I/O wait, but that is a different enhancement altogether. Therefore, I am closing this one.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

abtpst picture abtpst  路  3Comments

Praveen82 picture Praveen82  路  3Comments

clintongormley picture clintongormley  路  3Comments

ttaranov picture ttaranov  路  3Comments

dawi picture dawi  路  3Comments