Elasticsearch version: 2.2.2
JVM version: 1.8.0_25
OS version: OS X 10.11.5
Description of the problem including expected versus actual behavior:
Fast Vector Highlighter doesn't highlight nested fields, whereas Plain Highlighter does (but has a fragment_size bug so I can't use it). I would expect FVH to be able to highlight everything.
Steps to reproduce:
curl -XPUT 'localhost:9200/nested_fvh?pretty' -d '{
"mappings": {
"type1": {
"properties": {
"nested1": {
"type": "nested",
"properties": {
"field1": {
"type": "string",
"term_vector" : "with_positions_offsets"
}
}
}
}
}
}
}
'
curl -XPUT 'localhost:9200/nested_fvh/type1/1?pretty' -d '{
"nested1": {
"field1": "Hello World!"
}
}
'
curl -XGET 'http://localhost:9200/nested_fvh/type1/_search?pretty' -d '{
"query": {
"nested": {
"path": "nested1",
"query": {
"match": {
"nested1.field1": "hello"
}
}
}
},
"highlight": {
"fields": {
"nested1.field1": {
"type": "fvh"
}
}
}
}
'
Output (fvh):
FVH doesn't highlight nested
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.4231198,
"hits" : [ {
"_index" : "nested_fvh",
"_type" : "type1",
"_id" : "1",
"_score" : 0.4231198,
"_source" : {
"nested1" : {
"field1" : "Hello World!"
}
}
} ]
}
}
Output (plain):
Plain does highlight nested
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.4231198,
"hits" : [ {
"_index" : "nested_fvh",
"_type" : "type1",
"_id" : "1",
"_score" : 0.4231198,
"_source" : {
"nested1" : {
"field1" : "Hello World!"
}
},
"highlight" : {
"nested1.field1" : [ "<em>Hello</em> World!" ]
}
} ]
}
}
(Oooh <details> tags! 馃憤 )
This should have been fixed with https://issues.apache.org/jira/browse/LUCENE-5929 but apparently it is still not working, neither in 2.3 nor in master.
@martijnvg could you take a look please?
@clintongormley @martijnvg i have a similar case. Looks like when position offset is set, the highlighting doesn't work as expected. I have a full repro case that can be used in SENSE:
DELETE test
PUT test
{
"mappings": {
"test_type": {
"properties": {
"nested_field": {
"type": "nested",
"properties": {
"text": {
"type": "string",
"term_vector": "with_positions_offsets",
"fields": {
"raw": {
"type": "string",
"term_vector": "with_positions_offsets"
}
}
}
}
}
}
}
}
}
POST test/test_type
{
"nested_field": [
{
"text": "text field"
}
]
}
POST test/_search
{
"query": {
"nested": {
"query": {
"query_string": {
"query": "text",
"fields": [
"nested_field.text.raw"
]
}
},
"path": "nested_field"
}
},
"highlight": {
"fields": {
"nested_field.text.raw": {}
}
},
"fielddata_fields": ["nested_field.text.raw"]
}
Can you please confirm that it's related to this bug?
Just to clarify, it's not related to multi_field. If you just use the text property you will get the same behaviour.
Indeed, if you remove "term_vector": "with_positions_offsets" then it starts working. Must be related to this.
Indeed, if you remove "term_vector": "with_positions_offsets" then it starts working. Must be related to this.
@gmoskovicz that's because it uses the plain highlighter if term offsets are disabled.
@clintongormley @sfcgeorge @gmoskovicz I wonder if we should support highlighting on nested, has_child and has_parent query at all? This isn't the first problem that the highlighters have with these queries. I think instead we should promote the use of inner hits more. With inner hits highlighting the nested object work and on top of this it is more accurate too (since each nested object will be highlighted in isolation):
{
"query": {
"nested": {
"path": "nested1",
"query": {
"match": {
"nested1.field1": "hello"
}
},
"inner_hits": {
"highlight": {
"fields": {
"nested1.field1": {
"type": "fvh"
}
}
}
}
}
}
}
@martijnvg That feels like making the programmer do something the system should be doing. I get that the implementation is hard because Lucene etc etc but ideally I'd like nested fields to be more seamless and easy to use, not require even more effort.
Using the global highlights as I currently am makes mapping the match back to the original object in my ORM very difficult. Using inner_hits would make that much easier BUT it doesn't work with searching _all. I need to be able to search _all so can't use inner_hits unfortunately. It's not ideal, but I'm stuck with global highlight, no fragment_size and plain highlighter for now.
@sfcgeorge I see, if you just query the _all field than using highlighting via inner_hits for that isn't very straightforward. However I do think that if fields inside nested objects are queried specifically then using highlighting via inner_hits should be used.
@martijnvg We need a Google-like search box that searches and highlights _all, but also advanced search on specific fields and nested fields, at the same time. If _all could percolate highlighting to inner_hits that would be perfect and much easier, but I don't think it can work.
My failed attempts:
Both return correct result but no highlighting.
Here I guess the match_all is stopping the highlighting.
curl -XGET 'http://localhost:9200/nested_fvh/type1/_search?pretty' -d '{
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"query": "hello"
}
},
{
"nested": {
"path": "nested1",
"filter": {
"match_all": {}
},
"inner_hits": {
"highlight": {
"require_field_match": false,
"fields": {
"nested1.field1": {
"type": "plain"
}
}
}
}
}
}
]
}
}
}
'
Global inner hits sounded promising, this seems like it would be a great use-case for it if it worked.
curl -XGET 'http://localhost:9200/nested_fvh/type1/_search?pretty' -d '{
"query": {
"simple_query_string": {
"query": "hello"
}
},
"inner_hits": {
"inner_hits_name1": {
"path": {
"nested1": {
"highlight": {
"require_field_match": false,
"fields": {
"nested1.field1": {
"type": "plain"
}
}
}
}
}
}
}
}
'
The FVH requires positions and offsets to be indexed, but highlighting is performed at the top level document and the positions and offsets for nested documents are inside the nested documents, so highlighting can't access them.
Secondly, highlighting nested documents at the top level will produce incorrect results, eg:
PUT t
{
"mappings": {
"t": {
"properties": {
"foo": {
"type": "nested",
"properties": {
"text": {
"type": "text"
},
"num": {
"type": "integer"
}
}
}
}
}
}
}
PUT t/t/1
{
"foo": [
{
"text": "brown",
"num": 1
},
{
"text": "cow",
"num": 2
}
]
}
GET t/_search
{
"query": {
"nested": {
"path": "foo",
"query": {
"bool": {
"must": [
{
"match": {
"foo.text": "brown cow"
}
},
{
"match": {
"foo.num": 1
}
}
]
}
}
}
},
"highlight": {
"fields": {
"foo.text": {}
}
}
}
returns highlight snippets brown and cow, while cow shouldn't have been highlighted. Highlighting with inner hits works correctly:
GET t/_search
{
"query": {
"nested": {
"path": "foo",
"inner_hits": {
"_source": false,
"highlight": {
"fields": {
"foo.text": {}
}
}
},
"query": {
"bool": {
"must": [
{
"match": {
"foo.text": "brown cow"
}
},
{
"match": {
"foo.num": 1
}
}
]
}
}
}
}
}
That said, if you want to be able to use the FVH on nested fields at the top level (with the incorrect results), then you should be able to use copy_to to copy the nested values into a top-level field and highlight on that:
PUT t
{
"mappings": {
"t": {
"properties": {
"foo": {
"type": "nested",
"properties": {
"text": {
"type": "text",
"copy_to": "foo_text"
}
}
},
"foo_text": {
"type": "text",
"term_vector": "with_positions_offsets",
"store": true
}
}
}
}
}
PUT t/t/1
{
"foo": [
{
"text": "brown"
},
{
"text": "cow"
}
]
}
GET t/_search
{
"query": {
"nested": {
"path": "foo",
"query": {
"match": {
"foo.text": "brown cow"
}
}
}
},
"highlight": {
"require_field_match": false,
"fields": {
"foo_text": {
"type": "fvh"
}
}
}
}
Unfortunately, this doesn't work for some reason. It works with the plain highlighter but not with fvh. This IS a bug and @martijnvg is going to investigate.
Hi @martijnvg,
I know that this issue was solved at #19337, but I still have exactly the same problem as the first comment.
However, I put highlight into nested inner_hits, it worked.
Due to the need of multiple query for same nested path, it needs to give different names of inner_hits for each. It's quite inconvenient to extract all highlight results.
I'm wondering if it can use under global highlight. I'm using version 7.6.1.
Hope someone can help. Thanks in advance.
Most helpful comment
The FVH requires positions and offsets to be indexed, but highlighting is performed at the top level document and the positions and offsets for nested documents are inside the nested documents, so highlighting can't access them.
Secondly, highlighting nested documents at the top level will produce incorrect results, eg:
returns highlight snippets
brownandcow, whilecowshouldn't have been highlighted. Highlighting with inner hits works correctly:That said, if you want to be able to use the FVH on nested fields at the top level (with the incorrect results), then you should be able to use
copy_toto copy the nested values into a top-level field and highlight on that:Unfortunately, this doesn't work for some reason. It works with the
plainhighlighter but not withfvh. This IS a bug and @martijnvg is going to investigate.