The term suggester is much less useful than the phrase suggester as it just considers each term independently, while the phrase suggester looks at co-occurring terms.
I see people using the term suggester, but I wonder if this is just because the phrase suggester configuration looks more intimidating. Perhaps we should improve the phrase suggester and remove the term suggester.
Thoughts?
Perhaps we should improve the phrase suggester and remove the term suggester.
At this point the phrase suggester effectively degrades into the term suggester if you don't set up the appropriate mappings. We could look for ways to make that degradation perform as well as the phrase suggester.
In 5.0 I improved the docs for the phrase suggester a bunch so we have an example of the mapping which should help.
I did see cases where the term suggester was being used to just see a part of all terms in a field. I don't know how to do that else without relying on aggregations.
The term suggester solves a different problem than the phrase suggester. For example when somebody wants to implement the "did you mean" kind of behavior and word position does not matter. So keeping is important IMO.
Sorry if I'm missing it but I don't understand what the problem is or what is to be gained by removing the term suggester?
@djschny it is not about word position, it is about making good suggestions by taking the association between words into account. The term suggester just can't do that. Without a shingled field, the phrase suggester falls back to behaving like the term suggester.
Sorry if I'm missing it but I don't understand what the problem is or what is to be gained by removing the term suggester?
It's extra code (with open bugs) that can be removed. I'd rather focus on making the phrase suggester better than fixing a redundant feature.
Discussed in FixItFriday: let's deprecate the term suggester in 5.x for removal in 6.0, and work on improving the API of the phrase suggester.
it is about making good suggestions by taking the association between words into account.
Sometimes that behavior is not desired, hence the use of the term suggester.
Without a shingled field, the phrase suggester falls back to behaving like the term suggester.
I think the issue is the information returned and the response format it is delivered. I'll try to do my best explaining with the following example:
PUT test/doc/1
{
"food": "apple apricot banana bread beer carrot candy"
}
GET test/_suggest
{
"term_suggest": {
"text": "carot bananna",
"term": {
"field": "food"
}
},
"phrase_suggest": {
"text": "carot bananna",
"phrase": {
"field": "food"
}
}
}
{
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"phrase_suggest": [
{
"text": "carot bananna",
"offset": 0,
"length": 13,
"options": [
{
"text": "carot banana",
"score": 0.123355635
},
{
"text": "carrot bananna",
"score": 0.12118797
}
]
}
],
"term_suggest": [
{
"text": "carot",
"offset": 0,
"length": 5,
"options": [
{
"text": "carrot",
"score": 0.8,
"freq": 1
}
]
},
{
"text": "bananna",
"offset": 6,
"length": 7,
"options": [
{
"text": "banana",
"score": 0.8333333,
"freq": 1
}
]
}
]
}
In the above example both terms are misspelled and there is no desire to have the terms be related (for example a dev implementing Google's "did you mean?" style behavior). The big differences for myself as someone implementing suggestions in an app are as follows:
Unless I'm misunderstanding the extent to which the phrase suggester would be modified (I assumed it would be the behavior, but not the response format) it difficult for me to see how a modified phrase suggester being able to solve the same problems as the term suggester. Again, sorry if I'm missing some big item here or not.
Sometimes that behavior is not desired, hence the use of the term suggester.
When would this behaviour not be desired?
In the above example both terms are misspelled and there is no desire to have the terms be related (for example a dev implementing Google's "did you mean?" style behavior).
This is exactly when you want the behaviour of the phrase suggester, not the term suggester. The phrase suggester returns meaningful suggestions.
the suggested text from the phrase gives me two options, but neither option is desired as each one has a bad suggested term
That's because these suggestions only work with statistically significant amounts of data, not just toy examples. Also, if you use shingles (combined with real world amounts of data) you get much better suggestions. Btw, if you set max_errors to 2 (defaults to 1) then the first suggestion is the correctly spelled carrot banana. This is what I mean about improving the phrase suggester.
Based on experiments that I run, term suggester is more relevant for one-term searches than phrase one. I made a mistake by typing r instead of t so I asked both suggesters to show what they got. As you can see, term suggester nailed it, while phrase is miles away from something useful. I also tried to switch phrase suggester to a simple field from shingled one (regress) but it was no help at all. I've run multiple single-term searches like kia, tesla and term suggester again was correct that term is good while phrase was giving some crazy things like ia instead of kia even that I have min_word_length = 3 and tells instead of tesla. I understand that with some huge dataset it might not be a problem but we work with what we have approx. 8MM documents.
{
"phrase":[
{
"text":"oarh",
"offset":0,
"length":4,
"options":[
{
"text":"each",
"highlighted":"<em>each</em>",
"score":0.012882852,
"collate_match":true
},
{
"text":"sarah",
"highlighted":"<em>sarah</em>",
"score":0.010489283,
"collate_match":true
},
{
"text":"oprah",
"highlighted":"<em>oprah</em>",
"score":0.009785955,
"collate_match":true
},
...
]
}
],
"term":[
{
"text":"oarh",
"offset":0,
"length":4,
"options":[
{
"text":"oath",
"score":0.75,
"freq":1811
},
{
"text":"oanh",
"score":0.75,
"freq":30
},
...
]
}
]
}
@elastic/es-search-aggs
Most helpful comment
Sometimes that behavior is not desired, hence the use of the term suggester.
I think the issue is the information returned and the response format it is delivered. I'll try to do my best explaining with the following example:
In the above example both terms are misspelled and there is no desire to have the terms be related (for example a dev implementing Google's "did you mean?" style behavior). The big differences for myself as someone implementing suggestions in an app are as follows:
Unless I'm misunderstanding the extent to which the phrase suggester would be modified (I assumed it would be the behavior, but not the response format) it difficult for me to see how a modified phrase suggester being able to solve the same problems as the term suggester. Again, sorry if I'm missing some big item here or not.