Elasticsearch: Field is highlighted although not matching enough clauses with minimum_should_match

Created on 11 Feb 2018  路  1Comment  路  Source: elastic/elasticsearch

Elasticsearch version (bin/elasticsearch --version):
Version: 6.1.3, Build: af51318/2018-01-26T18:22:55.523Z, JVM: 1.8.0_151

Plugins installed: []

OS version (uname -a if on a Unix-like system):
Linux ubuntu 4.4.0-87-generic #110-Ubuntu SMP Tue Jul 18 12:55:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
While issuing a query on a field using a match query with minimum_should_match parameter
the field is highlighted in the results although there aren't enough clauses for the field to match.

Steps to reproduce:

  1. create the test index
PUT test_index
{
  "mappings": {
    "doc": {
      "properties": {
        "field1": {
          "type": "text"
        },
        "field2": {
          "type": "keyword"
        }
      }
    }
  }
}
  1. index a document
PUT test_index/doc/1
{
  "field1": "id1 id2 id3",
  "field2": "1"
}
  1. search for the document only by field1
POST test_index/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "field1": {
              "query": "id1 id4 id5",
              "minimum_should_match": "80%"
            }
          }
        }
      ],
      "minimum_should_match": 1
    }
  },
  "highlight": {
    "type": "unified",
    "order": "score",
    "fields": {
      "field1": {}
    }
  }
}

no results

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}
  1. add to the query match on field2 so that the document will return, but highlight only field1
POST test_index/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "field1": {
              "query": "id1 id4 id5",
              "minimum_should_match": "80%"
            }
          }
        },
        {
          "term": {
            "field2": "1"
          }
        }
      ],
      "minimum_should_match": 1
    }
  },
  "highlight": {
    "type": "unified",
    "order": "score",
    "fields": {
      "field1": {}
    }
  }
}

the result: field1 is highlighted. the expected result: field1 should not be highlighted.

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "test_index",
        "_type": "doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "field1": "id1 id2 id3",
          "field2": "1"
        },
        "highlight": {
          "field1": [
            "<em>id1</em> id2 id3"
          ]
        }
      }
    ]
  }
}
:SearcHighlighting :SearcSearch discuss

Most helpful comment

Thanks for opening an issue, @NadavHarnik

The lucene highlighters give a best-effort approximation of where a query has hit, rather than exact matches. One of the query aspects that aren't handled at the moment are boolean combinations, including minimum should match.

This won't be fixed absent some fairly fundamental reworking of how highlighting works, unfortunately.

>All comments

Thanks for opening an issue, @NadavHarnik

The lucene highlighters give a best-effort approximation of where a query has hit, rather than exact matches. One of the query aspects that aren't handled at the moment are boolean combinations, including minimum should match.

This won't be fixed absent some fairly fundamental reworking of how highlighting works, unfortunately.

Was this page helpful?
0 / 5 - 0 ratings