Elasticsearch: Extend field stats to include type, searchable, aggregatable

Created on 14 Apr 2016  路  10Comments  路  Source: elastic/elasticsearch

After much discussion in https://github.com/elastic/elasticsearch/issues/15728 we have agreed to make the following changes to the field stats API:

  • ?fields will accept wildcards, so that it can match all fields
  • field stats will return the field type, and whether it is searchable (index ==true) and aggregatable (doc_values==true || (index==true && fielddata==true))

When ?level=cluster (the default), fields with the same name are collapsed into a single entry unless the fields have a different type, in which case the whole request returns an exception. This collapsing process should change as follows:

  • An entry should be returned for fields with conflicting type, but with an error key which describes the error.
  • searchable and aggregatable are collapsed to true if any of the instances of that field are searchable or aggregatable.

/cc @rashidkpc @spalger @jpountz @martijnvg

:SearcMapping >enhancement blocker v5.0.0-alpha3

Most helpful comment

in which case the whole request returns an exception.

this is what it does today, but the bullet points beneath that line explain what we need to change it to do:

  • An entry should be returned for fields with conflicting type, but with an error key which describes the error.

All 10 comments

field stats will return the field type, and whether it is searchable (index ==true) and aggregatable (doc_values==true || (index==true && fielddata==true))

This will be an issue for some virtual fields like _id, which is searchable even though it is not indexed, and _index, which is aggregatable even though it does not have doc values.

In order not to make it too complicated, I think we can check whether a field is aggregatable by checking whether MappedFieldType.fielddataBuilder() throws an exception since it is the method that throws an exception eg. when a keyword field does not have doc values or when a text field has fielddata=false.

I don't think we have anything similar for searchable fields, so we would probably have to start by checking index=true, plus a hardcoded list of fields that are known to be searchable even though not indexed like _id and _index.

index==true && fielddata==true

This made me realize that it is a bug that you can set fielddata=true on a field which is not indexed, so I opened #17747.

When ?level=cluster (the default), fields with the same name are collapsed into a single entry unless the fields have a different type, in which case the whole request returns an exception.

This won't work for us, assuming it means that the entire request to _field_stats will fail. Ideally we would just exclude the field, perhaps with a way of listing the fields that were excluded due to any exception

unless the fields have a different type, in which case the whole request returns an exception

I think this is a bit drastic. Ideally Kibana would be able to use this and not have to through some fatal error about the field types not matching. Could the conflicting field have "conflict" set for its type when types conflict? Perhaps we could do the same thing for searchable and aggregatable.

in which case the whole request returns an exception.

this is what it does today, but the bullet points beneath that line explain what we need to change it to do:

  • An entry should be returned for fields with conflicting type, but with an error key which describes the error.

field stats will return the field type, and whether it is searchable (index ==true) and aggregatable (doc_values==true || (index==true && fielddata==true))

Currently the field stats API will not return any information if the field is not searchable or if there is no document indexed with a value in that field. It makes sense as none of the stats would be available in such cases.
@clintongormley @jpountz should we add an entry for the "non-searchable/no value yet" fields ?
This could look like this:

      "fields": {
         "field_index_false": {
               "max_doc": 1000,
               "doc_count": -1, // not searchable
               "density": -1,
               "sum_doc_freq": -1,
               "sum_total_term_freq": -1,
               "searchable": false,
               "aggregatable": true,
               "min_value": null,
               "min_value_as_string": "",
               "max_value": null,
               "max_value_as_string": ""
            }
         },
          "field_index_true_no_value": {
               "max_doc": 1000,
               "doc_count": 0, // the field is searchable but no value has been indexed yet.
               "density": 0,
               "sum_doc_freq": 0,
               "sum_total_term_freq": 0,
               "searchable": true,
               "aggregatable": true,
               "min_value": null,
               "min_value_as_string": "",
               "max_value": null,
               "max_value_as_string": ""
            }
         }

Hmm good question... In fact, even if we can aggregate on a field because it has doc values, we can't return the min/max values unless it is also indexed (correct, @jpountz ?). Today, the field stats API will ignore non-indexed fields.

even if we can aggregate on a field because it has doc values, we can't return the min/max values unless it is also indexed

This is correct: we need doc values to aggregate, but these stats either come from points or from the inverted index.

If a field is mapped, I agree there should be an entry regardless of whether it exists in the index or not.

@clintongormley I started working on the Kibana code to consume this new info and I've realized there was a miss with the PR that closed this issue. Field stats still doesn't include the type of each field, and that info is critical for Kibana.

@Bargs, sorry I missed this part in the initial PR.
I submitted a new PR to fix this discrepancy:
https://github.com/elastic/elasticsearch/pull/19241
Could you please verify that it contains the information that you need ?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dadoonet picture dadoonet  路  3Comments

malpani picture malpani  路  3Comments

abrahamduran picture abrahamduran  路  3Comments

Praveen82 picture Praveen82  路  3Comments

rjernst picture rjernst  路  3Comments