Elasticsearch: Highlight not working in elasticsearch 2.1

Created on 25 Nov 2015  路  19Comments  路  Source: elastic/elasticsearch

When I was upgraded es from 2.0 to 2.1, highlight feature stop working.

I got this error message:

{
  "shard": 0,
  "index": "4odevelop_4o",
  "node": "XiZaRocLQGuBwZPg58Naqw",
  "reason": {
    "type": "illegal_state_exception",
    "reason": "can't load global ordinals for reader of type: class org.apache.lucene.search.highlight.WeightedSpanTermExtractor$DelegatingLeafReader must be a DirectoryReader"
  }
}
:SearcHighlighting :SearcSearch >bug v2.4.0

Most helpful comment

I have the same problem with 2.2. In my case, I need highlighting for the parent and the child. If I remove the top-level highlighting then it works fine.

Does not work

{
    query: {
        bool: {
            should: [
                {
                    query_string: 'google'
                },
                {
                    has_child: {
                        type: 'child_doc',
                        score_mode: 'max',
                        query: {
                            query_string: 'google'
                        },
                        inner_hits: {
                            highlight: {
                                order: 'score',
                                fields: {
                                    title: { number_of_fragments: 0 },
                                    body: { number_of_fragments: 3 }
                                }
                            },
                            from: 0,
                            size: 1
                        }
                    }
                }
            ]
        }
    },
    highlight: {
        order: 'score',
        fields: {
            description: { number_of_fragments: 0 }
        }
    }
}

Does work, but no highlighting on parent document

{
    query: {
        bool: {
            should: [
                {
                    query_string: 'google'
                },
                {
                    has_child: {
                        type: 'child_doc',
                        score_mode: 'max',
                        query: {
                            query_string: 'google'
                        },
                        inner_hits: {
                            highlight: {
                                order: 'score',
                                fields: {
                                    title: { number_of_fragments: 0 },
                                    body: { number_of_fragments: 3 }
                                }
                            },
                            from: 0,
                            size: 1
                        }
                    }
                }
            ]
        }
    }
}

All 19 comments

Problem occurs when I using next search query

{
  "from": 0,
  "size": 20,
  "sort": [
    {
      "_score": {
        "missing": "_last",
        "order": "desc"
      }
    }
  ],
  "highlight": {
    "pre_tags": [
      "<lukituki>"
    ],
    "post_tags": [
      "</lukituki>"
    ],
    "fields": {
      "searchText": { }
    }
  },
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "authorizationToken": {
              "value": "1859"
            }
          }
        },
        {
          "term": {
            "deleted": {
              "value": false
            }
          }
        }
      ],
      "should": [
        {
          "has_child": {
            "type": "stream1boost",
            "score_mode": "sum",
            "query": {
              "function_score": {
                "query": {
                  "bool": {
                    "must": [
                      {
                        "term": {
                          "userAccountId": {
                            "boost": 0,
                            "value": 1859
                          }
                        }
                      }
                    ]
                  }
                },
                "score_mode": "sum",
                "boost_mode": "max",
                "script_score": {
                  "script": "doc['searchBoost'].value"
                }
              }
            }
          }
        }
      ],
      "minimum_should_match": "0"
    }
  }
}

@lukapor, I edited your messages to apply code formatting.

This certainly looks like a bug. It'd be super helpful if you could make a gist that recreates this against an empty index using curl. It'd make reproducing the issue locally super easy.

Hello,
unfortunately I do not have script for the generation the index, whereas it creates programming. An easy way to tell just where the problem is.
I have stream1 object that has a child object stream1boost.
When I search from stream1 one of sort is by most used. There i use child function score. If this is used ("should": [{"has_child": { ...) the problem occours otherwise not.

I will try to generate script to reproduce the bug

I attached script to create index mapping, insert data and then query them. On query request will
insertData.txt
mapping.txt
query.txt

fail.

Reduced to the following minimal test case:

PUT /test1
{
  "mappings": {
    "stream1": {
    },
    "stream1boost": {
      "_parent": {
        "type": "stream1"
      }
    }
  }
}

PUT /test1/stream1/1
{
    "searchText": "stream1"
}


PUT /test1/stream1boost/1?parent=1
{
    "searchText": "stream1",
    "searchBoost": 1,
    "userId": 1
}


POST /test1/stream1/_search
{
  "from": 0,
  "size": 20,
  "highlight": {
    "fields": {
      "searchText": { }
    }
  },
  "query": {
    "bool": {
      "should": [
        {
          "has_child": {
            "type": "stream1boost",
            "query": {
              "match_all": {}
            }
          }
        }
      ]
    }
  }
}

@martijnvg could you take a look please

The problem here is that the parent/child queries since 2.0 require a top level reader is used. During highlighting we re-execute the query for each hit using the leaf reader the hit was found in. The parent/child queries refuse to work with this now. Before 2.0 highlighting wouldn't have worked all the time with parent/child queries as the child hit maybe in a different leaf reader (segment) then the parent hit.

The right way for highlighting in this case would be to use inner_hits and move the highlighting part from the top level to the inner hits part:

POST /test1/stream1/_search
{
  "from": 0,
  "size": 20,
  "query": {
    "bool": {
      "should": [
        {
          "has_child": {
            "type": "stream1boost",
            "query": {
              "match": {
                "searchText": "stream1"
              }
            },
            "inner_hits": {
              "highlight": {
                "fields": {
                  "searchText": {}
                }
              }
            }
          }
        }
      ]
    }
  }
}

I think that instead of throwing an error highlighting shouldn't try extract terms from has_child or has_parent, so that if these queries just happen to be part of a bigger query other highlights do get returned in the response.

I can tell you only that the query works in es2.0, it stop working with version 2.1

My use case is next
stream1 holds searchText propertie
stream1boost is user boost for specific stream (stream1 has multiple stream1boost or none), stream1boost has only searchBoost propertie (weight)

So I am searching for stream with some prefix query with different sorts (by name, by most used, ..). Results that I want are stream1 and their highlights. When most used sort is selected I use has_child should query with function_score, that calculate score of current streams.

I think that instead of throwing an error highlighting shouldn't try extract terms from has_child or has_parent, so that if these queries just happen to be part of a bigger query other highlights do get returned in the response.

Makes sense. Without inner hits, you wouldn't expect docs matching a has_child or has_parent query to be returned anyway, so there shouldn't be any highlighting on these docs.

I have the same problem with 2.2. In my case, I need highlighting for the parent and the child. If I remove the top-level highlighting then it works fine.

Does not work

{
    query: {
        bool: {
            should: [
                {
                    query_string: 'google'
                },
                {
                    has_child: {
                        type: 'child_doc',
                        score_mode: 'max',
                        query: {
                            query_string: 'google'
                        },
                        inner_hits: {
                            highlight: {
                                order: 'score',
                                fields: {
                                    title: { number_of_fragments: 0 },
                                    body: { number_of_fragments: 3 }
                                }
                            },
                            from: 0,
                            size: 1
                        }
                    }
                }
            ]
        }
    },
    highlight: {
        order: 'score',
        fields: {
            description: { number_of_fragments: 0 }
        }
    }
}

Does work, but no highlighting on parent document

{
    query: {
        bool: {
            should: [
                {
                    query_string: 'google'
                },
                {
                    has_child: {
                        type: 'child_doc',
                        score_mode: 'max',
                        query: {
                            query_string: 'google'
                        },
                        inner_hits: {
                            highlight: {
                                order: 'score',
                                fields: {
                                    title: { number_of_fragments: 0 },
                                    body: { number_of_fragments: 3 }
                                }
                            },
                            from: 0,
                            size: 1
                        }
                    }
                }
            ]
        }
    }
}

I'm having the same results as @rpedela (using ES 2.1). Any fixes?

I'm having the same results as @rpedela (using ES 2.1). Any fixes?

I'd try using a highlight_query element that doesn't include parent/child.

So the search request body should look like this:

{
    query: {
        bool: {
            should: [
                {
                    query_string: 'google'
                },
                {
                    has_child: {
                        type: 'child_doc',
                        score_mode: 'max',
                        query: {
                            query_string: 'google'
                        },
                        inner_hits: {
                            highlight: {
                                order: 'score',
                                fields: {
                                    title: { number_of_fragments: 0 },
                                    body: { number_of_fragments: 3 }
                                }
                            },
                            from: 0,
                            size: 1
                        }
                    }
                }
            ]
        }
    },
    highlight: {
        order: 'score',
        fields: {
            description: { number_of_fragments: 0 }
        },
        highlight_query: {
            bool: {
            should: [
                {
                    query_string: 'google'
                }
            ]
        }
    }
}

It's still not working for me.

Original code without highlights, working ok.

        {
            query:
            {
                bool:
                {
                    must:
                    [
                        {
                            query_string:
                            {
                                fields: [ 'title', 'body'],
                                query: pUserInput,

                            }
                        },
                        {
                           term: {
                            source_id: pSourceIdInput
                           }
                        },
                        {
                            has_child:
                            {
                                type: "user_item_relation",
                                query:
                                {
                                    bool:
                                    {
                                        must:
                                        [
                                            {
                                                term:
                                                {
                                                    user_id: pUserIdInput
                                                }
                                            }
                                            /*{
                                                term:
                                                {
                                                    favorite: 1
                                                }
                                            }*/

                                        ]
                                    }
                                }
                            }
                        }
                    ]
                }
            }
        }

Code with highlights, query works ok but no highlights are shown.

        {
            query:
            {
                bool:
                {
                    must:
                    [
                        {
                            query_string:
                            {
                                fields: [ 'title', 'body'],
                                query: pUserInput,

                            }
                        },
                        {
                           term: {
                            source_id: pSourceIdInput
                           }
                        },
                        {
                            has_child:
                            {
                                type: "user_item_relation",
                                query:
                                {
                                    bool:
                                    {
                                        must:
                                        [
                                            {
                                                term:
                                                {
                                                    user_id: pUserIdInput
                                                }
                                            }
                                            /*{
                                                term:
                                                {
                                                    favorite: 1
                                                }
                                            }*/

                                        ]
                                    }
                                },
                                inner_hits: {
                                   highlight: {
                                     order: 'score',
                                     fields: {
                                         title: { number_of_fragments: 0 },
                                         body: { number_of_fragments: 3 }
                                     }
                                  }
                                 }
                            }
                        }
                    ]
                }
            },
            highlight: {
                 order: 'score',
                 fields: {
                   title: { number_of_fragments: 0 }
                 },
                 highlight_query: {
                   bool: {
                   should: [
                     {
                         query_string:
                         {
                             fields: [ 'title', 'body'],
                             query: pUserInput,

                         }
                     }
                   ]
                 }
             }
        }
    }

What I'm doing wrong here?
Thanks in advance.

@borjakhet You still get the same error? This should work, I checked it locally. Maybe try to run with @clintongormley minimal reproduction via Sense / curl commands? So that it is easier to see what happens?

It works, it was my API fault, I was reading _source.title instead of highlight.title

Also have this issue with 2.1.0 and using highlight_query only solves it.. Otherwise error is thrown. Should we wait for a bugfix? Current solution looks more like a workaround, even docs explains the true meaning of using highlight_query...

+1
This was quite confusing for me as I was highlighting on parent and child documents at the same time.

Essentially it seems it will work by merely including a highlight_query clause in every highlight clause, even if it's just a match all.

Also, I was using 2.3 and thought it would be fixed by now, seeing all the references to 2.1. Looks like it won't come till 2.4.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

brwe picture brwe  路  3Comments

rjernst picture rjernst  路  3Comments

ttaranov picture ttaranov  路  3Comments

dawi picture dawi  路  3Comments

DhairyashilBhosale picture DhairyashilBhosale  路  3Comments