Describe the feature
Currently, the unified highlighter can only provide context by including the sentence the highlighted word is in. This is sometimes a very short highlight. For example, given text in a field like this:
Some leading context. A short sentence. Some more content. And even more context around that sentence.
Running a query for the term sentence using the unified highlighter and fragment_size set to 300, results in a highlight that, while it includes the word that we're looking for, does not provide much context and is nowhere close to the target size requested:
A short <em>sentence</em>.
In contrast, run the same query with the plain highlighter results in a highlight with much more useful context (and in this case another highlighted word!):
Some leading context. A short <em>sentence</em>. Some more content. And even more context around that <em>sentence</em>.
The unified highlighter should include as much context as possible without going over the target fragment size. This will result in more consistently sized highlights (which is nice for visual consistency) and will provide more useful context in cases where the highlight occurs in a short sentence.
cc: @colings86
Thank you @jimczi!
Sorry to update on an old thread. Since what version is this fix available?
@lfplazas10 you can see the version on the linked pr:
https://github.com/elastic/elasticsearch/pull/28132
It is available since 6.2.0
Note the pr only expands context to the right of the match. Any sentences to the left (i.e leading context) are not included at the moment
Most helpful comment
Note the pr only expands context to the right of the match. Any sentences to the left (i.e leading context) are not included at the moment