Kibana: Support flattened field type from Elasticsearch

Created on 16 Nov 2018  路  17Comments  路  Source: elastic/kibana

A new object field type is coming to ES. I played around with the feature branch a bit today and collected some thoughts and findings. From what I've seen so far, there are some small updates to Kibana we'll definitely want to make and some things we should discuss.

  • Need to add JSON type to kibana (or whatever name the ES lands on for this field type). Currently it shows up in the index pattern as unknown.
  • Autocomplete on values doesn't work because the field is not aggregatable
  • Autocomplete on field names doesn鈥檛 work, because we don't know what sub-fields the object has
  • In KQL we implement wildcard field names ourselves based on the fields in the index pattern, so something like head:application/json actually works but headers.con:application/json does not because we don鈥檛 know about those fields. This could be pretty confusing to users
  • Filters can鈥檛 be created from the doc table (similar to our treatment of arrays of objects today)
  • Filters can't be created from the "Add Filter" UI in the filter bar (this is worse than no autocomplete in the query bar because we don鈥檛 allow free text input for the field name. Currently the field name must be in the index pattern)
  • Highlighting would be nice, however we don鈥檛 currently highlight values inside arrays of objects, so a lack highlighting in a JSON document won鈥檛 be totally surprising to current users.

I only spent about an hour with it so there may be more things I'm missing, would definitely be good to get more eyes on it.

Feature branch here if anyone else wants to check it out: https://github.com/elastic/elasticsearch/tree/object-fields

New Field Type AppServices KibanaApp enhancement

Most helpful comment

Not being able to create visualizations in Kibana against fields in a flattened object is a real blocker to adopting the field type in our mappings. This flattened field type is exactly what we need, as a portion of our document is both dynamic and substantial in field count. But we have a use case that requires us to maintain search capabilities on that data.

Is there any chance this feature gets prioritized in the near future?

All 17 comments

Pinging @elastic/kibana-app

Thanks @Bargs for taking a look at the branch! I had a couple thoughts/ questions.

First, from talking to the Beats team, I think it would be valuable to add support for terms aggregations, both on the root field (like headers) and the keyed values (headers.content-type). This would also allow Kibana to support autocompletion of values when adding a filter. I am not completely sure this is possible to do in a performant way, but is something I am looking into.

Next, we had discussed if there was a way to expose the list of subfields/ keys that are available. I don't think it makes sense to return this as part of the mappings or field capabilities, because there may be a huge number distinct subfields (and the number of field mappings is assumed to be bounded to a reasonable number). A more sensible approach might be to index the subfield names into a separate lucene field, and allow for a terms aggregation that returns the most popular subfields. However, this doesn't fit perfectly with the current API around json fields, and would require storing additional information. The alternative would be to accept that we don't support autocomplete on these subfields, and when filtering allow for free text input on the field name. I'm curious as to your thoughts here.

Sorry for the delay in response, I was out all last week.

From a technical standpoint I think storing the subfield names in an index and doing a terms agg on them would work for autocompleting the field names in Kibana. But while chatting with @lukasolson I realized even with the field names we would still be missing type information and as a result would not be able to intelligently suggest query types. If we go down the path of trying to make these feel like regular searchable fields for average users I think we need to go all the way, so we would need that type information too. I have a feeling that probably complicates things even more. If so, we might be able to do without autocomplete on json fields for the time being. The most important thing to me is that the autocomplete doesn't appear broken or unpredictable to a normal user, but I think we might be able to solve that be adding some warnings in the UI if the user is searching on a JSON field.

I've now started to pick up work JSON fields again and have a couple updates. First, we worked out how to support for keyword-style aggregations like terms, so it should be possible for Kibana to autocomplete on the field values.

Second, after thinking about it more, I think it could be valuable to provide access to the possible keys in the JSON field. Even beyond Kibana, this seems generally useful as part of a search workflow on these fields: a client could first retrieve the common keys in the JSON field, present them to the user, then allow for searches on these keys. Otherwise these keys must be known in advance, or can only be discovered by encountering them within documents returned from another search.

To support this, we could index the keys into a separate field, and the common keys would then be retrieved through a terms aggregation. The API could look something like this:

  • json_field searches only field values
  • json_field.some_key searches only values belonging to the key some_key
  • json_field._keys searches only field keys (new option, not implemented currently)

@Bargs @jpountz @jimczi I was curious about your thoughts on the above. The downsides are that this API isn't as elegant, and it may involve indexing more information. Note that this relates to @jimczi's question here about whether keyed JSON fields should use the _field_names field: https://github.com/elastic/elasticsearch/pull/40069#discussion_r265787443

Sounds good to me! We'd love to have a way to retrieve a list of the keys, whatever form the API takes.

How would this work in practice? Is my understanding correct that Kibana would first retrieve fields from the field capabilities APIs, and as a second step for each json field it would run a terms aggregation on the json_field._keys field to further populate the list of fields that can be searched or aggregated (probably with a reasonable value of terminate_after to keep performance ok)?

At first sight this sounds like a good idea to me, but I'd like to double check that we are ok with the complexity that it introduces in Kibana as well as the trade-off: because there is no upper bound on the number of fields and than making sure to collect all fields would be too slow anyway, most of the time we would only collect a subset of the sub fields that exist in a json object.

I would like to put a spot on something @Bargs mentioned earlier already. Just knowing the field names (via _keys) doesn't help us that much (or just in a couple of places). It would still not allow us to build e.g. any visualization on those fields, since we need to have the type information for those fields available to work with them. Without them having proper types we cannot add them properly into the index pattern and thus maybe just make smaller workaround solution for searches, but not really support them across Kibana. From a technical point, how are those types actually be treated inside Elasticsearch? Are all values just keyword types (in which case we could hardcode that too)?

@timroes Yes they would behave almost exactly like a keyword field.

@jpountz I am a bit worried about the "almost" in that sentence :D Could you tell what are the actual differences? Because we need to know if we are able to simply treat them as "keyword" fields (but that would then apply to all places), or if we can't, in which case we would need to know about that type difference somehow.

@timroes Here are the differences to my knowledge (please review @jtibshirani):

  • produced scores will be different because field statistics such as the document count for a field would be different,
  • min_doc_count=0 on terms aggregations is unsupported (we are hoping to address it in the near future).

Other than that, they should support the same set of queries and aggregations.

Okay that sounds fine to me. We don't mind too much about the score (especially not about specific values) and we're not using min_doc_count=0 on terms aggregations as far as I am aware (it anyway sounds like a weird parameter value to me).

So the plans here sound rather reasonable. I would just suggest that while creating that index pattern, we're giving the user a flag if there are any JSON fields contained, whether or not they want "to use those in Kibana", since it sounds to me, like we could potentially otherwise bloat the index pattern quiet much, and maybe users don't want to use them actually.

@timroes In addition to the aggregation limitation that @jpountz mentioned (which we hope to address), I tried to list the restrictions here: https://github.com/elastic/elasticsearch/blob/object-fields/docs/reference/mapping/types/embedded-json.asciidoc#supported-operations. You'll notice that certain query types like regexp are not supported.

How would this work in practice? ... I'd like to double check that we are ok with the complexity that it introduces in Kibana as well as the trade-off.

I'm also hoping to understand this better, would it be possible to walk through how Kibana would load + display the keys, given the current proposal of running a terms aggregation on json_field._keys?

If I understand the purpose of embedded_json, it lets us avoid really large mappings and field lists, when typically only a handful of fields are of interest (and some fields may be very sparsely populated).

Do we expect (can we assume) these fields are known ahead of time? Or do they need to be discovered via autocomplete (which may not even be possible to do well if the number of keys is large).

Say we expect only a handful of embedded fields are of interest, and that handful doesn't change much - these are two big assumptions - then how about defining the embedded json fields as part of the index pattern (similar to how we do now for scripted fields)? It would not be as nice to work with as automatically discovered fields via autocomplete, but then the field type can be defined and there can be as many or as few as you want.

Pinging @elastic/kibana-app-arch (Team:AppArch)

Not being able to create visualizations in Kibana against fields in a flattened object is a real blocker to adopting the field type in our mappings. This flattened field type is exactly what we need, as a portion of our document is both dynamic and substantial in field count. But we have a use case that requires us to maintain search capabilities on that data.

Is there any chance this feature gets prioritized in the near future?

some news about to create visualizations in Kibana?

just chiming in, agree completely with @andrewkcarter mentioned above, really would like to adopt the flattened object for our use-case but without support in kibana it's tough for us to adopt, would greatly appreciate feedback from elastic on this

Was this page helpful?
0 / 5 - 0 ratings

Related issues

treussart picture treussart  路  3Comments

spalger picture spalger  路  3Comments

bhavyarm picture bhavyarm  路  3Comments

timmolter picture timmolter  路  3Comments

Ginja picture Ginja  路  3Comments