Elasticsearch version:
"version": {
"number": "5.1.1",
"build_hash": "5395e21",
"build_date": "2016-12-06T12:36:15.409Z",
"build_snapshot": false,
"lucene_version": "6.3.0"
}
Plugins installed: []
JVM version (java -version
):
java version "1.8.0_112"
Java(TM) SE Runtime Environment (build 1.8.0_112-b16)
Java HotSpot(TM) 64-Bit Server VM (build 25.112-b16, mixed mode)
OS version (uname -a
if on a Unix-like system):
OS X (Darwin Kernel Version 16.6.0)
Description of the problem including expected versus actual behavior:
Elasticsearch Completion Suggester documentation states:
Suggestions that share the longest prefix to the query prefix will be scored higher.
A quick tests shows that searching for headi
will give the same score to
heading - 4
head - 4
header - 4
_With fuzziness 2, max score will be 3_
After some further testings it seems the max relevance score can only be prefix length - fuzziness
which is strange.
I would expect heading
to have a higher score since it shares 5 letters with the requested prefix.
Steps to reproduce:
Mapping
DELETE eg
POST eg
{
"mappings": {
"eg": {
"properties": {
"complete": {
"type": "completion"
}
}
}
},
"settings": {
"index": {
"number_of_shards": "1"
}
}
}
Data
put eg/eg/1
{ "complete": ["head"] }
put eg/eg/2
{ "complete": ["heading"] }
put eg/eg/3
{ "complete": ["header"] }
Query
POST eg/_search
{
"suggest": {
"autocomplete": {
"prefix": "headi",
"completion": {
"field": "complete",
"fuzzy": {
"fuzziness": 1
}
}
}
}
}
Results
{
...
"autocomplete": [
{
"text": "headi",
"offset": 0,
"length": 5,
"options": [
{
"text": "head",
"_score": 4,
...
},
{
"text": "header",
"_score": 4,
...
},
{
"text": "heading",
"_score": 4,
...
}
]
}
]
}
Reproduced as well.
I think expected behavior should be that all things being equal, a result item's score should be the length of the longest "exact" matching prefix, regardless of fuzziness parameter setting, when that value is higher than prefix length - fuzziness
One side effect of this is that if I query for promo
and I have two documents in my index, prom
and promo
, the latter should be scored higher and come back as first result (which is more intuitive).
One can then control tie breaking or otherwise interacting with this logic using the index-time weight
as usual
cc @elastic/es-search-aggs
currently having the exact same issue, any word on this?
Can reproduce using 6.2.3:
"version": {
"number": "6.2.3",
"build_hash": "c59ff00",
"build_date": "2018-03-13T10:06:29.741383Z",
"build_snapshot": false,
"lucene_version": "7.2.1",
"minimum_wire_compatibility_version": "5.6.0",
"minimum_index_compatibility_version": "5.0.0"
},
hope this gets fixed soon as it's becoming a big headache for our stakeholders and users.
Same here. For instance search for Bonn
, but Bohlen
get returned before Bonn
.
Query
{
"suggest": {
"name-fuzzy-suggest" : {
"prefix" : "Bonn",
"completion" : {
"field" : "suggest",
"fuzzy": {"fuzziness": 2}
}
}
}
Result
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": 0,
"hits": []
},
"suggest": {
"name-fuzzy-suggest": [
{
"text": "Bonn",
"offset": 0,
"length": 4,
"options": [
{
"text": "Bohlen",
"_index": "test_it_1540912066",
"_type": "test_it",
"_id": "e53963c2-d904-5486-b4c0-9cd7c6faf497",
"_score": 2,
"_source": {
"name": "B枚hlen",
"suggest": {
"input": [
"B枚hlen",
"Bohlen"
]
}
}
},
{
"text": "Boizenburg",
"_index": "test_it_1540912066",
"_type": "test_it",
"_id": "ab36d626-be7d-5ee1-a36c-7417ed5a6896",
"_score": 2,
"_source": {
"name": "Boizenburg",
"suggest": {
"input": [
"Boizenburg",
"Boizenburg"
]
}
}
},
{
"text": "Bonn",
"_index": "test_it_1540912066",
"_type": "test_it",
"_id": "8fe4dbd2-0806-5bda-8602-19ad4757f11a",
"_score": 2,
"_source": {
"name": "Bonn",
"suggest": {
"input": [
"Bonn",
"Bonn"
]
}
}
},
{
"text": "Bonnigheim",
"_index": "test_it_1540912066",
"_type": "test_it",
"_id": "4902127c-5422-5c2b-a596-97408fadbae5",
"_score": 2,
"_source": {
"name": "B枚nnigheim",
"suggest": {
"input": [
"B枚nnigheim",
"Bonnigheim"
]
}
}
}
]
}
]
}
Using 6.5.3 and still got the issue. It doesn鈥檛 look good. Any work around?
same with 6.6, this issue can be traced back to 2014 (https://github.com/elastic/elasticsearch/issues/7060)
same with 5.6
seems to be a lucene issue. does anybody know if this is already patched, or is there maybe a workaround?
thank you
I "fixed" this problem partially with a query and partially in code (JS). To continue the "Bonn" example;
First I use 2 suggest query elements with a max size for each of 5 items. One element does a fuzzy search, the other a none fuzzy search. When I get the result (max 10 items) I filter them out with JS, where I store the exact results before the fuzzy results. Also I filter out duplicates.
{
'suggest': {
'autocomplete_fuzzy': {
'prefix': 'Bonn',
'completion': {
'field': 'suggest',
'fuzzy': { 'fuzziness': 2 },
'size': 5
}
},
'autocomplete': {
'prefix': 'Bonn',
'completion': {
'field': 'suggest',
'size': 5
}
},
},
}
results.suggest.autocomplete[0].options.forEach(function (suggestion) {
ids.push(suggestion._id);
suggestions.push(suggestion);
});
results.suggest.autocomplete_fuzzy[0].options.forEach(function (suggestion) {
if (ids.indexOf(suggestion._id) == -1) {
suggestions.push(suggestion);
}
});
I hope it helps.
For prefix queries that involve fuzziness, the completion suggester finds all minimal prefix path that intersect with the suggestions and compute a boost per path that is equal to the length of the shared prefix. For instance the prefix headi
will find head
as the minimal prefix that matches heading
, head
and header
with a fuzziness of 1 and hea
with a fuzziness of 2. Since suggestions are always visited in order of their weight we cannot compute the boost based on the final input, it is always computed from the minimal prefix. I agree that it can be misleading but this limitation is needed to ensure that queries always return the best weights while remaining efficient.
One possible workaround is to run multiple suggestion queries, one per fuzziness value and to rerank the result client side. This will be more efficient than trying to assign specific boost per output so I am closing this issue.
I've worked around this by using the weight attribute on the suggest
field, every time a user selects a value from the completion suggester
and submits the form in which the suggester was used I increment the
weight attribute.
If you're already using the weight attribute for something else this
might not work in your case, but in my case that was no issue.
Using this workaround means I don't need to combine multiple responses
and only do one query.
The biggest downside with this method is that when you don't have any
'usage' data the problem isn't solved, depending on how many values you
have in the suggester and if there is an expected popularity curve in
the data you might not find a benefit in this workaround.
Quick lodash solution for obtaining unique suggestions when running multiple suggestions queries:
const { autocomplete, autocomplete_fuzzy } = responseObj.data.suggest;
const mergedSuggestions = _.concat(
autocomplete[0].options,
autocomplete_fuzzy[0].options
).map(suggestion => {
_.unset(suggestion, "_score");
return suggestion;
});
const uniqueSuggestions = _.uniqWith(mergedSuggestions, _.isEqual);
Most helpful comment
I "fixed" this problem partially with a query and partially in code (JS). To continue the "Bonn" example;
First I use 2 suggest query elements with a max size for each of 5 items. One element does a fuzzy search, the other a none fuzzy search. When I get the result (max 10 items) I filter them out with JS, where I store the exact results before the fuzzy results. Also I filter out duplicates.
I hope it helps.