Kibana: Refreshing a dashboard results in loads of exceptions in ES log (loads of indexes being queried?)

Created on 3 Dec 2014  路  8Comments  路  Source: elastic/kibana

It looks like it might be related to #1272 - I think I'm seeing the same behavior too on BETA2.

I have a single dashboard, looking at last 24 hours. There's around 75000 entries generated every day/index.

With a dashboard open, I can do loads of filters/queries and everything works fine _most of the time_ ( I see some exceptions every couple of searches). However, when I refresh the whole page it I can see loads of exceptions in elasticsearch log every time - around 160 of the ones below per refresh + full stack trace per each.

[2014-12-03 10:48:05,734][DEBUG][action.search.type       ] [Amelia Voght] [logstash-2014.10.19][3], node[MaPkkdRpRZiF_WW1KxC_Ug], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@40cfaf8c] lastShard [true]
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23@2ce3dd0d
--
[2014-12-03 10:48:05,734][DEBUG][action.search.type       ] [Amelia Voght] [logstash-2014.10.22][3], node[MaPkkdRpRZiF_WW1KxC_Ug], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@40cfaf8c] lastShard [true]
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23@406403eb
--
[2014-12-03 10:48:05,735][DEBUG][action.search.type       ] [Amelia Voght] [logstash-2014.10.24][3], node[MaPkkdRpRZiF_WW1KxC_Ug], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@40cfaf8c] lastShard [true]
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23@61470de0
--
[2014-12-03 10:48:05,735][DEBUG][action.search.type       ] [Amelia Voght] [logstash-2014.10.26][3], node[MaPkkdRpRZiF_WW1KxC_Ug], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@40cfaf8c] lastShard [true]
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23@64c90c2
--
[2014-12-03 10:48:05,735][DEBUG][action.search.type       ] [Amelia Voght] [logstash-2014.10.27][3], node[MaPkkdRpRZiF_WW1KxC_Ug], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@40cfaf8c] lastShard [true]
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23@69e057a2
--
[2014-12-03 10:48:05,736][DEBUG][action.search.type       ] [Amelia Voght] [logstash-2014.10.29][3], node[MaPkkdRpRZiF_WW1KxC_Ug], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@40cfaf8c] lastShard [true]
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23@3d036195
--
[2014-12-03 10:48:05,736][DEBUG][action.search.type       ] [Amelia Voght] [logstash-2014.10.30][3], node[MaPkkdRpRZiF_WW1KxC_Ug], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@40cfaf8c] lastShard [true]
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23@1edf2937
--
[2014-12-03 10:48:05,737][DEBUG][action.search.type       ] [Amelia Voght] [logstash-2014.10.31][3], node[MaPkkdRpRZiF_WW1KxC_Ug], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@40cfaf8c] lastShard [true]
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23@2e972032
--
[2014-12-03 10:48:05,737][DEBUG][action.search.type       ] [Amelia Voght] [logstash-2014.11.03][3], node[MaPkkdRpRZiF_WW1KxC_Ug], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@40cfaf8c] lastShard [true]
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23@297d8136
--
[2014-12-03 10:48:05,737][DEBUG][action.search.type       ] [Amelia Voght] [logstash-2014.11.06][3], node[MaPkkdRpRZiF_WW1KxC_Ug], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@40cfaf8c] lastShard [true]
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23@670600eb
--
[2014-12-03 10:48:05,738][DEBUG][action.search.type       ] [Amelia Voght] [logstash-2014.11.10][3], node[MaPkkdRpRZiF_WW1KxC_Ug], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@40cfaf8c] lastShard [true]
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23@5adb48d2
--
[2014-12-03 10:48:05,738][DEBUG][action.search.type       ] [Amelia Voght] [logstash-2014.11.13][3], node[MaPkkdRpRZiF_WW1KxC_Ug], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@40cfaf8c] lastShard [true]
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23@426e9b40
--
[2014-12-03 10:48:05,739][DEBUG][action.search.type       ] [Amelia Voght] [logstash-2014.11.17][3], node[MaPkkdRpRZiF_WW1KxC_Ug], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@40cfaf8c] lastShard [true]
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23@6989aa2a
--
[2014-12-03 10:48:05,739][DEBUG][action.search.type       ] [Amelia Voght] [logstash-2014.11.20][3], node[MaPkkdRpRZiF_WW1KxC_Ug], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@40cfaf8c] lastShard [true]
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23@78248d90
--
[2014-12-03 10:48:05,739][DEBUG][action.search.type       ] [Amelia Voght] [logstash-2014.11.24][3], node[MaPkkdRpRZiF_WW1KxC_Ug], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@40cfaf8c] lastShard [true]
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23@48a98ab1
--
[2014-12-03 10:48:05,740][DEBUG][action.search.type       ] [Amelia Voght] [logstash-2014.11.29][3], node[MaPkkdRpRZiF_WW1KxC_Ug], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@40cfaf8c] lastShard [true]
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23@5a8c78e0

That's 8000 lines, or almost a megabyte per refresh. ( 300mb in the last 80 minutes ).

In a specific example I looked at, there were 169 exceptions, but only 83 unique indexes being queried, so something strange is going on there.

Let me know if I can give you any more information, it's easily reproducible. Or am I simply doing something wrong?

notabug

Most helpful comment

Found it!

The problem was that I had index pattern set as logstash-*.

That means that for every re-load kibana was doing:

{"index":"logstash-*","ignore_unavailable":true}

When I switched the pattern to <[logstash-]YYYY.MM.DD> and ticked <Use event times to create index names> it started working much better.

I guess I should have read the

Tip: there's an optimization in the way of the Use event times to create index names option. Since Logstash creates an index every day, Kibana uses that fact to only search indices that could possibly contain data in your selected time range.

in the introduction.

All 8 comments

I'm guessing you're running with the default 5 shards for each of those indices? Because of that, your thread pools are filling very very quickly. 83 indices * 5 shards = 415 elasticsearch search threads per dashboard panel. Figure 3 dashboard panels thats 1245 threads + overhead.

Given that you only have 75000 events/day you'd be better served switching to monthly, or, if you don't expect your log volume to increase quickly, even yearly indices. 15000 docs per shard is far too low.

Closing since this isn't a kibana issue.

Thanks for the explanation, that makes a lot of sense.

But, at a risk of sounding stupid, I still don't understand. If the above is true, why:

  • why does the out-of-thread thing happen only on full page refresh? I mean, as long as I don't "physically" refresh anything, I can make whatever queries I want and there's no error. All on the same dashboard/data range.
  • why is Kibana even looking at those 83 indexes? I'm only looking at the last 24 hours - shouldn't only those indexes be affected?

Also, but what if someone had more than 15000 docs per shard? Picking an example dashboard with 5 panels, on 5 shards, gives 40 daily-indexes maximum. 10 panels is 20 days max. Does it mean that there's a hard limit on how much back you can look with ELK?

I'll try investigating it again tomorrow, and posting it on a proper support board - I'm posting here hoping that you might feel like teaching me some more :)

Found it!

The problem was that I had index pattern set as logstash-*.

That means that for every re-load kibana was doing:

{"index":"logstash-*","ignore_unavailable":true}

When I switched the pattern to <[logstash-]YYYY.MM.DD> and ticked <Use event times to create index names> it started working much better.

I guess I should have read the

Tip: there's an optimization in the way of the Use event times to create index names option. Since Logstash creates an index every day, Kibana uses that fact to only search indices that could possibly contain data in your selected time range.

in the introduction.

@rabbbit +1 for taking time to record his fix, I just spent like a week doing all kinds of crazy stuff like upgrading logstash to 1.5.1 and going crazy with the field data cache and breaker sizes and all other kinds of nonsense, which was all good for my education, but I just coulnd't understand why kibana3 was ok and kibana4 was not until reading. So, Thanks!

+1 - excellent description and solution for an out-of-the-box problem !

+1 - glad you made a note of this, has helped some of the issues I had been seeing for a month :).

+1 You saved my day ;) thanks !

Hi @rabbbit I am using ES 2.3.2 and Kibana 4.5 and I started to see this issue too when I query for 'today', I now have 3 months of daily indices and ES is querying against every single daily index it has! So, how do I apply your fix to my existing index pattern 'logstash-*' ?

Confusingly the 'create nex index pattern' page in Kibana (under "Settings > Indices") says:
"By default, searches against any time-based index pattern that contains a wildcard will automatically be expanded to query only the indices that contain data within the currently selected time range.

Searching against the index pattern logstash-* will actually query elasticsearch for the specific matching indices (e.g. logstash-2015.12.21) that fall within the current time range."

But this isn't true since my index pattern was the default "logstash-*" with the "Index contains time-based events" checkbox ticked, and still every single index was queried for my 'today' query in Kibana!

Kibana issues a request per visual in my dashboard with JSON as follows:
{"fields":["@timestamp"],"index_constraints":{"@timestamp":{"max_value":{"gte":1469480400000,"format":"epoch_millis"},"min_value":{"lte":1469566799999,"format":"epoch_millis"}}}}

The gte and lte fields are for today (26th July 2016). So it looks like this is a ES bug since I see in elasticsearch.log an entry for every index: eg [2016-07-26 08:40:27,877][DEBUG][action.search] [Invisible Woman] [logstash-2016.06.11] which is not 'today' (26th July 2016).

Also @rashidkpc In Kibana 4.5, when creating a new index pattern, this option for new index patterns says "Use event times to create index names [DEPRECATED]" as per https://github.com/elastic/kibana/issues/4472 But, if I tick this box will it make ES run more efficiently by only querying the relevant indices?

Many thanks.

Was this page helpful?
0 / 5 - 0 ratings