Assume an index with persons and their ages in it. The age is optional:
curl -X PUT 'http://localhost:9200/test'
curl -X PUT 'http://localhost:9200/test/person/1' -d '{ "name": "Alpha", "age": 20 }'
curl -X PUT 'http://localhost:9200/test/person/2' -d '{ "name": "Beta", "age": 30 }'
curl -X PUT 'http://localhost:9200/test/person/3' -d '{ "name": "Gamma" }'
When I now try to find people that are around the age of 22 with a Gaussian decay function, I would naturally expect Gamma _not_ to appear in the search results, or at least with a low score.
However, Gamma receives the score 1:
curl -X GET 'http://localhost:9200/test/person/_search?pretty' -d '{ "query": { "function_score": { "functions": [ { "gauss": { "age": { "origin": 22, "scale": 5, "decay": 0.5 } } } ] } } }'
The query pretty printed:
{
"query": {
"function_score": {
"functions": [
{
"gauss": {
"age": {
"origin": 22,
"scale": 5,
"decay": 0.5
}
}
}
]
}
}
}
The resulting hits:
[
{
"_index" : "test",
"_type" : "person",
"_id" : "3",
"_score" : 1.0,
"_source":{ "name": "Gamma"}
}, {
"_index" : "test",
"_type" : "person",
"_id" : "1",
"_score" : 0.8950251,
"_source":{ "name": "Alpha", "age": 20 }
}, {
"_index" : "test",
"_type" : "person",
"_id" : "2",
"_score" : 0.16957554,
"_source":{ "name": "Beta", "age": 30 }
}
]
The explanation of the query gives the following formulas:
exp(-0.5*pow(MIN of: [0.0],2.0)/18.033688011112044)exp(-0.5*pow(MIN of: [Math.max(Math.abs(20.0(=doc value) - 22.0(=origin))) - 0.0(=offset), 0)],2.0)/18.033688011112044)When the field is present, it's absolute distance from origin is used as the input to the decay function. However, if the field is missing, the value 0 is used, implying a perfect hit.
My expectation would be that the decay function does not even receive the input but the function score query automatically returns a score of 0 if the field is missing.
IMHO the problem lies in DecayFunctionParser returning 0 as the distance, if no fields are found:
Yes, it is like you say. It is also documented but rather hidden at the very end of the docs.
My expectation would be that the decay function does not even receive the input but the function score query automatically returns a score of 0 if the field is missing.
I am not convinced that this is always expected. Also, there is an easy workaround: Just add additional function with a missing filter and put in whatever value you like. Would that work for you?
@brwe could you please explain how this workaround would look like? I'm not sure I understand your proposal correcly. Thanks!
@phjardas this is what @brwe means:
{
"query": {
"function_score": {
"score_mode": "first",
"functions": [
{
"filter": {
"exists": {
"field": "age"
}
},
"gauss": {
"age": {
"origin": 22,
"scale": 5,
"decay": 0.5
}
}
},
{
"script_score": {
"script": "0"
}
}
]
}
}
}
Thanks @clintongormley for the explanation!
NB: this would require dynamic scripts to be enabled, of course.
Most helpful comment
@phjardas this is what @brwe means: