Steps to reproduce -
Index creation script
Observation - Received "<tag1>large</tag1> <tag1>forest</tag1> in <tag1>large</tag1>" rather than "<tag1>large</tag1> forest in <tag1>large</tag1>" for searching big ( big is present in the synonym file)
More observation
Searching forest , i am getting the below highlight -
large <tag1>forest</tag1> in large
And for searching big or large
<tag1>large</tag1> <tag1>forest</tag1> in <tag1>large</tag1>
So conclusion is that on matching a term which don't have a synonym , things work fine
But while doing it on a term with synonym , every indexed term is getting highlighting.
Few more example -
Text - the lake was frozen and high , all my team was at their best. Lets do the best then
Search word - lake
Result - the <tag1>lake</tag1> was frozen and high , all my team was at their best. Lets do the best then
Search word - high
Result - the lake was frozen and <tag1>high</tag1> , <tag1>all</tag1> <tag1>my</tag1> team was at their <tag1>best</tag1>. Lets do the best then
Question - How did the term "all" , "my" and "best" got highlighted ?
Conclusion - Seems every indexed term is not getting highlighted. But there is no pattern in which terms are highlighted. May be all terms having synonym in the text are highlighted !!!
Search word - my
Result - the lake was frozen and high , all <tag1>my</tag1> team was at their best. Lets do the best then
Conclusion - Above conclusion is wrong. If all terms with synonyms were highlighted on any synonym match , it should have happened for my also.
Also i don't find any issue with analyser or the wordnet. You can see the analyser output for the text "large forest" here - https://gist.github.com/Vineeth-Mohan/7165559
As far as i can see , all the tokens are correctly identified and placed. I feel this is a bug with highlighter.
Conclusion - I am not finding a pattern to this bug.
Try indexing with term_vectors and use the FastVectorHighlighter. Using a recent version of ES (0.90.5) it should work flawlessly.
@synhershko - I will do that but if its not giving the results in the default confoguration of highlighting even for ES (0.90.5) , should we see it as a bug.
Hi @Vineeth-Mohan
Any chance you could provide a simpler recreation? If recreations are long and complex, nobody wants to work on them :)
thanks
I think I'm seeing this issue and I鈥檝e created a small test case:
Index:
{
"mappings": {
"ev": {
"properties": {
"description": {
"type": "string",
"term_vector": "with_positions_offsets",
"analyzer": "synonym"
},
"id": {
"type": "integer"
},
"title": {
"type": "string"
}
}
}
},
"settings": {
"index": {
"analysis": {
"filter": {
"synonym": {
"type": "synonym",
"format": "wordnet",
"synonyms_path": "analysis/wn_s.pl"
}
},
"analyzer": {
"synonym": {
"filter": [
"synonym"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
}
}
}
}
Data:
{
"id": 1,
"title": "test 1",
"description": "you'll help to make each visit"
}
Query:
{
"query" : {
"match": {
"description": {
"query": "help"
}
}
},
"highlight" : {
"fields" : {
"description" : {}
}
}
}
Result:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.3787904,
"hits": [
{
"_index": "test",
"_type": "ev",
"_id": "AU3DG06KDdtspwrG66P0",
"_score": 0.3787904,
"_source": {
"id": 1,
"title": "test 1",
"description": "you'll help to make each visit"
},
"highlight": {
"description": [
"you'll <em>help</em> <em>to</em> make <em>each</em> visit"
]
}
}
]
}
}
The problem I'm seeing is that "to" and "each" are being highlighted
@davidtme the problem occurs because you are expanding synonyms both at index time and at search time, so it queries for multiple synonyms and matches all of them, then it highlights a selection of those (but admittedly in the wrong positions).
If you only expand synonyms at search or index time, then the highlighting works correctly.
This appears to be fixed in 5.0, probably because of changes to the positions emitted by the synonyms token filter. Multi-word synonyms will probably still be problematic.
Most helpful comment
@davidtme the problem occurs because you are expanding synonyms both at index time and at search time, so it queries for multiple synonyms and matches all of them, then it highlights a selection of those (but admittedly in the wrong positions).
If you only expand synonyms at search or index time, then the highlighting works correctly.