Elasticsearch: Exclude fields from getting searched when using * in index.query.default_field

Created on 2 Oct 2019 · 12Comments · Source: elastic/elasticsearch

Describe the feature:
Starting from ES 6.x it is no more possible to use include_in_all because _all is disabled by default. Now it is easy to set index.query.default_field to * to search in all indexed fields when no prefix field is specified in the query. This is good but it would be great if it would be possible to easily exclude one or more fields, for instance by adding exclude_from_all:true (defaulted to false) to those fields.

Note that the introduction of copy_to does not solve the situation in an easy and efficient way because you need to add a new (possible large) custom field (e.g. my_all) and add it to all your fields: if you have and index with many fields (e.g. 100) and just want to exclude one of them this is difficult to manage and required much more disk space.

:SearcSearch

Source

raicast

All 12 comments

index.query.default_field accept a list of fields since https://github.com/elastic/elasticsearch/pull/26320, would that solve your issue ?

jimczi on 2 Oct 2019

Pinging @elastic/es-search (:Search/Search)

elasticmachine on 2 Oct 2019

@jimczi It's better than nothing but what about performance (time, space)? E.g. if I have 100 fields in an index, specifying a list of all 100 (or 99) fields in index.query.default_field is equivalent to set it to *?

By the way, in many cases it would be easier to specify the few fields to be excluded (often initially defined and fixed) than specifying a long list of (and possibly dynamic) fields to be included.

raicast on 2 Oct 2019

It's not clear to me (I'm doing some experiments with ES 6.2 because I did not find a clear documentation)... is it possible to set index.query.default_field to a special value to match every fields except for specified fields that need to be excluded (e.g. using a regular expression)?
E.g.:
"index.query.default_field": ["my*"] => to match all and only fields starting with "my"
"index.query.default_field": ["*", "-crn"] => to match all fields but excluding "crn"

Actually I do not understand if it is required to explicitly enumerate the full list of fields to take into account (with the only exception of *) or if there are some wild-cards and/or special characters to allow to select subsets of fields (it seems so but there is no documentation about it).

raicast on 2 Oct 2019

do not understand if now it is required to explicitly enumerate the full list of fields to take into account (with the only exception of *) or if there are some wild-cards and/or special characters to select subsets of fields (it seems so but there is no documentation about it).

The list accepts the same syntax than the fields option in multi_match or query_string query so simple wildcards are accepted (e.g: ["a*", "*_foo_*", "title"])

jimczi on 2 Oct 2019

👍1

So, there is no way to exclude fields (e.g. "-crn" or "^crn")? This would be useful...

raicast on 2 Oct 2019

👍1

If I have a field with this mapping:

          "myfield": {
            "type": "text",
            "index": false
          },

and the index setting:

{
  "settings": {
    "index.query.default_field": ["my*"]
  }
}

then I always get an error when performing any query_string query without specifying a prefix:

{
    "query": {
        "query_string" : {
           "query" : "test"
        }
    }
}

this is because:

          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Cannot search on field [myfield] since it is not indexed."
          }

This is just an example (maybe a bug?) that shows the difficulty and problems without having something similar to the old include_in_all...

raicast on 2 Oct 2019

By default the query_string query is lenient only if no fields are provided and the default field in the settings is left unchanged (*). You can control the leniency on a per request basis by setting "lenient": true to ignore these errors if you have a custom list of fields, or on an entire index by setting the dynamic index setting index.query_string.lenient to true.

jimczi on 2 Oct 2019

👍1

@jimczi But to understand the impact on the performance: it's better to set a list of 100 fields in index.query.default_field (i.e. an array of a long number of field names) or is better to add in the mapping copy_to: my_all to all those 100 fields and set index.query.default_field: my_all?

raicast on 3 Oct 2019

I think that the feature is important if there is a significant difference in the performance. Can you provide some info about the last question?

raicast on 4 Oct 2019

The parsing and resolving of the fields in the query shouldn't make a big difference. If the resulting query is the same it shouldn't make a difference to select fields via a wildcard or an exclusion list.
If you have a list of 100 fields it would be faster to create one big copy field at index time.

We try to keep GitHub focused on development efforts like bug reports and feature requests so I hope you don't mind if I close this issue. I'd be happy to continue the discussion in the forum which is more appropriate for these kind of questions.

jimczi on 4 Oct 2019

@jimczi Sorry for replying here, understood it should continue the discussion in the forum, but I couldn't find the continued topic there.

I am facing the same issue here -- need to do exclude in "index.query.default_field" and tried "index.query.default_field": ["-entry_time"] to have the field "entry_time": {"type": "date", "format": "strict_date_optional_time||epoch_millis"} excluded from query. but didn't seem to work. whitelisting all the fields doesn't work as we want to exclude <10 fields from total 500+ fields. copy_to doesn't work either unfortunately