Hi,
I asked the following question at your discussion forum and it turned out that this might be an issue, so I was advised to open an issue here.
Elasticsearch version (bin/elasticsearch --version):
6.4.2
JVM version (java -version):
1.8.0_171
OS version (uname -a if on a Unix-like system):
Ubuntu 18.04 LTS
Description of the problem:
I have a big multi search query which contains pretty complicated aggregations. Recently I updated the Elasticsearch version from 6.2 to 6.4 and when this query is executed using the official PHP package, I am getting an error message in my log file which it looks like this:
I have 4 queries in my multi search query, from which only 2 of them are failing (those are the similar ones - one of those two is pasted bellow). The error is logged every time when the "option group" / "option" top hits aggregation need to be done.
It is important to mention that this is not the case when I try to execute this query via Kibana. When executed through Kibana I receive correct results. Previously, on 6.2 version, this was not an issue at all.
I noticed that the problem appears at the last two top_hits aggregations. If I remove them, I receive correct results.
I had 1 node and 2 shards on my local machine when the problem actually appeared. When I increased the number of shards to 3 or 5, the problem disappeared.
I checked the the two indexes I am querying in my Kibana monitoring section and this is the strange thing I have noticed:
name: index1
status: Health: green Green
document count: 550
data: 211.4 KB
index rate: 0 /s
search rate: 0.01 /s
unassigned shards: 0
It is strange that I have only 28 indexed documents in index1 for real, so I am confused how this numbers are counted and what they actually represent?
I have a workaround for this problem with increasing the number of shards. But it's strange to allocate 5 shards because of two indexes with 28 and 112 documents stored in them in order to make those aggregations working.
Thanks
Pinging @elastic/es-search-aggs
For reference, the forum discussion is here: https://discuss.elastic.co/t/indexoutofboundsexception-after-update-from-6-2-to-6-4/152670
This is the relevant bit of the stack trace:
Caused by: java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkIndex(Buffer.java:540) ~[?:1.8.0_171]
at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:253) ~[?:1.8.0_171]
at org.apache.lucene.store.ByteBufferGuard.getByte(ByteBufferGuard.java:118) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.readByte(ByteBufferIndexInput.java:385) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
at org.apache.lucene.codecs.lucene70.Lucene70NormsProducer$7.longValue(Lucene70NormsProducer.java:263) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
at org.apache.lucene.search.similarities.BM25Similarity$BM25DocScorer.score(BM25Similarity.java:257) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
... which is a bit scary, because it looks like a bug decoding norms in lucene.
Does this happen even after a reindex? And if so, are you able to share the mappings and documents for this index so that we can reproduce this?
The bug is in Elasticsearch, the nested aggregator does some buffering of documents and this confuses the scorer that is used by the top_hits aggregator after it.
Can you try to change the sort of your top_hits aggregation to:
```
"top_hits": {
"sort": "_doc",
"size": 1
}
`````
This should fix the issue you're seeing. By default thetop_hits` aggregator uses the score of the query to sort the documents. However in a nested context the score is always the score of the root document.
@jimczi Thanks, it works! However it is still a bug and I guess this is only a workaround. I wonder why this was not an issue on 6.2 version previously?
Thank you very much! It was a real headache. :smiley:
However it is still a bug and I guess this is only a workaround.
Yes this is why I left the issue open. We need to fix the nested aggregator when scores is required in a sub-aggregation.
Most helpful comment
The bug is in Elasticsearch, the
nestedaggregator does some buffering of documents and this confuses the scorer that is used by thetop_hitsaggregator after it.Can you try to change the sort of your
top_hitsaggregation to:```
"top_hits": { "sort": "_doc", "size": 1 } ````` This should fix the issue you're seeing. By default thetop_hits` aggregator uses the score of the query to sort the documents. However in a nested context the score is always the score of the root document.