Elasticsearch version: 5.9.6 & 6.2.4
Plugins installed: []
JVM version: independent
OS version: independent
Description of the problem including expected versus actual behavior:
The _source is not being returned for hits when explain is set to true for a specific set of queries. The conditions for _source to be missing appear to be:
Steps to reproduce:
PUT /test
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {
"type1": {
"properties": {
"object1": {
"type": "nested",
"properties": {
"field1": {
"type": "text"
}
}
}
}
}
}
}
PUT /test/type1/1
{
"object1": {
"field1": "some text goes here"
}
}
md5-484cc6ab1aeac3a136681ca80dcfc99b
GET /test/_search
{
"from" : 0,
"size" : 10,
"query" : {
"match_all": { }
},
"explain" : true,
"rescore" : [
{
"window_size" : 200,
"query" : {
"rescore_query" : {
"nested" : {
"query" : {
"function_score" : {
"query" : {
"match_all": { }
},
"functions" : [
{
"script_score" : {
"script" : {
"source" : "1",
"lang" : "painless"
}
}
}
],
"score_mode" : "multiply"
}
},
"path" : "object1"
}
},
"query_weight" : 1.0,
"rescore_query_weight" : 1.0,
"score_mode" : "total"
}
}
]
}
Pinging @elastic/es-search-aggs
I finally got to doing some digging on this one. Indeed, it is a very peculiar bug, and I am not sure yet how to solve it, but I can now explain how it happens: this only happens when explaining a script score function that's part of a nested query inside a rescore section.
When trying to explain the score function, we run the script which ends up calling LeafSearchLookup#setDocument which calls SourceLookup#setSegmentAndDocument that ends up zeroing the source that was previously loaded, as we are going through a different docId compared to the previous run (nested doc rather than parent doc).
@javanna do you have a suggestion how to make it still working? Is there some hack?
@fhaase2 unfortunately I can't think of a workaround that doesn't change meaning of the request. If you could rework the request to remove the 'script score' part, that would avoid this issue.
I just opened a refactor #60179 that would fix this issue properly.
@jtibshirani great, thanks for the answer.
As a workaround i thought about making a subsequent request after the explain request, just to retrieve the source of the documents using the Ids query. But this would be a hack, so ill better wait for the refactor #60179
Most helpful comment
@fhaase2 unfortunately I can't think of a workaround that doesn't change meaning of the request. If you could rework the request to remove the 'script score' part, that would avoid this issue.
I just opened a refactor #60179 that would fix this issue properly.