Elasticsearch: Deprecate `include_in_root` and `include_in_parent`?

Created on 25 Jul 2015 · 22Comments · Source: elastic/elasticsearch

Does any purpose remain for the include_in_root and include_in_parent settings for nested fields, now that we have nested and reverse_nested aggregations, and inner hits with highlighting?

If not, we should remove them.

:SearcMapping :SearcSearch >deprecation Search stalled

Source

clintongormley

Most helpful comment

I am in favor of keeping include_in_parent (why is it removed from documentation?). Copy to requires new field names, in which case I might just as well ignore it and simply make my own scheme and copy the data before indexing. Also include_in* are useful for simple search across whole document, something that nested apparently can't do on it's own (simple string query will not search nested even if field is not specified...)

Enerccio on 13 Aug 2018

👍11

All 22 comments

These options also allow to form simple (fast) queries on the parent instead of slower nested queries. We could potentially remove these options, but that would mean that we would rely on users to duplicate information on the parent/root document explicitely at index time.

jpountz on 26 Jul 2015

👍1

Actually maybe it's not necessary and users should just do it explicitly with copy_to instead of implicitely with include_in_*.

jpountz on 28 Sep 2015

👎1

2.0 gives the following exception on trying to put an existing mapping that worked with 1.7.x:
org.elasticsearch.index.mapper.MapperParsingException: Mapping definition for [title] has unsupported parameters: [include_in_root : true]
and also fails to install the mapping.
so maybe this is already merged in 2.0.0 and can be closed? Also would be helpful to mention this and the recommended alternative approach (copy_to) with an example in the documentation.
The migration tool did not catch this either.

adichad on 3 Nov 2015

@adichad I think you have the parameter in the wrong place, these params still work in 2.0:

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "foo": {
          "type": "nested",
          "include_in_root": true,
          "properties": {
            "bar": {
              "type": "string"
            }
          }
        }
      }
    }
  }
}

clintongormley on 8 Nov 2015

I really like the idea of removing include_in_root|parent in favour of the more explicit copy_to per field. This way there is no magic, and you only (re-)index the fields that you really want at the top level.

clintongormley on 20 Nov 2015

I was thinking about it more, one downside of explicit copy_to compared to include_in_all is that it would force to use a different field name, which is worse in terms of index sparsity.

jpountz on 22 Dec 2015

👍1

@clintongormley @jpountz

For tracking purposes, it may be worth noting that we sometimes recommend using "include_in_parent" when someone wants to summarize data stored in nested docs within Kibana using aggregations, for instance, using the Terms aggregation to get Top N values across a field in a nested doc: https://github.com/elastic/kibana/issues/1084#issuecomment-112162113

This approach was also used to develop the Watch History dashboard, since it was based on a nested document structure: https://www.elastic.co/guide/en/watcher/current/watch-history.html

Hopefully the "copy to" functionality being discussed will be an equivalent alternative to that use case. It sounds like it will be and in fact, the added benefit is that you only have to reindex the affected fields, as opposed to all of the data, is that correct?

cc: @rashidkpc @skearns64 @eskibars

tbragin on 10 Feb 2016

Hopefully the "copy to" functionality being discussed will be an equivalent alternative to that use case. It sounds like it will be and in fact, the added benefit is that you only have to reindex the affected fields, as opposed to all of the data, is that correct?

The copy_to functionality already exists and behaves as per your comment.

clintongormley on 13 Feb 2016

I vote in favor of keeping include_in_parent and include_in_root. For complex use cases, nested queries with inner_hits are suited much better, but I think there are use cases (like aggregations, see the Kibana examples) where it is much more convenient to grab the data from the root node.

abulhol on 18 Feb 2016

👍1

I think what Clint suggested would still maintain the functionality of include_in_parent/root, but it would be moved to copy_to. I like this simplification of the api, even if the underlying complexity is still there.

rjernst on 18 Feb 2016

I have not been able to drop-in-replace 'include_in_parent' with 'copy_to' with nested objects (maintaining the same fieldname in 'parent' as that of the nested object). This may be because I have not found detailed documentation that describes the copy_to value args, but if it is not possible I would request not to remove a feature for which there is no full replacement.

sronsiek on 26 Apr 2016

👍1

@sronsiek yeah you wouldn't be able to use the same field name in the parent with copy_to. Why is that important? What's your use case?

clintongormley on 26 Apr 2016

I dunno about that guy, but our use case is "a not very thoughtful person set include_in_parent and include_in_root on _all_ of our (many) mappings, this is now routinely used across the codebase, and we also aggregate on the nested docs". So changing the field type is not an option and changing the queries is not feasible.

smmckay on 30 Sep 2016

Looks like include_in_parent and include_in_root are undocumented settings starting from v2.0. Should we add them back to the documentation? Is it a deliberate decision to not document them?

apidruchny on 30 Jan 2017

I used to see value in include_in_root and include_in_parent due to the fact that they would reuse the same underlying field instead of creating a new sparse field. However due to upcoming improvements with sparse doc values in Lucene 7 and recent improvements to nested queries when include_in_root and include_in_parent are disabled (#23079), I am leaning towards deprecating include_in_root and include_in_parent in favour of copy_to.

jpountz on 21 Feb 2017

👍4

The only way I got nested fields to work in Kibana was using "include_in_parent", please keep it

aarnaout on 27 Jun 2017

The only way I got nested fields to work in Kibana was using "include_in_parent", please keep it

@aarnaout For this purpose, copy_to would work just as well

clintongormley on 29 Jun 2017

Hey there.

After reading the comments of this issue, I understand that copy_to is equivalent to include_in_parent and include_in_root. I think it worth mentioning in the copy_to section in the ElasticSearch documentation.
The classic use case of nested types, is when to root document has a a variety (but limited) nested documents in its hierarchy (something like between 0 to 25 nested documents I guess). Therefore, making a terms query or exists query about a field copied to the root document from all nested documents might be much faster than making a nested query - and really narrow down the documents being aggregated in my use case. I think it would be really nice to mention that in the "Document modeling" section in tune for search speed documentation, in case that you agree.
Actually there is also a recent question about this issue in the Elastic forum.
I know that bool query clauses making an optimization to choose which conditions to start with first. My question is: is it smart enough to first run the conditions in the root document and only then run the conditions involve nested query? Did ElasticSearch 5.4 nested improvements also take that under account?

IdanWo on 1 Jul 2017

After reading the comments of this issue, I understand that copy_to is equivalent to include_in_parent and include_in_root. I think it worth mentioning in the copy_to section in the ElasticSearch documentation.

The only difference is that include_in_* reuse the same field name, which makes a difference to Lucene. Especially with Lucene 6 and earlier versions (which map to all Elasticsearch version up to 5.x included) since it does not like sparse fields that have norms or doc values. However, things are becoming better in the upcoming Lucene 7 so we might want to consider replacing include_in_* with a regular copy_to.

Therefore, making a terms query or exists query about a field copied to the root document from all nested documents might be much faster than making a nested query [...] I think it would be really nice to mention that in the "Document modeling" section in tune for search speed documentation

+1 !

I know that bool query clauses making an optimization to choose which conditions to start with first. My question is: is it smart enough to first run the conditions in the root document and only then run the conditions involve nested query? Did ElasticSearch 5.4 nested improvements also take that under account?

It used to do that before 5.4 already, but 5.4 improved things so that it also works well in the case that a script, phrase or range query occurs under the nested query.

jpountz on 3 Jul 2017

👍2

I'm in favor of keeping include_in_root and include_in_parent. copy_to may be preferred over them in many use cases, but it does not replace their functionality in all cases.

1) If we have nested data types for which we want all fields to be included in the parent, it would be cumbersome to have to specify a copy_to for each and every field AND have to come up with a new fieldname for each of them.
2) If we have customers that query our data, we want them to be able to look at the source document and know how to query each field. With copy_to, they can't know what the new copy_to fieldname is by looking at the source document.
3) If we have multiple levels of nesting, we don't want to have to make a new fieldname to copy_to for each level above the original field. Simply using include_in_parent at each level would make every field available at each level above it.

LiuJoyceC on 2 Aug 2017

👍2

If we have nested data types for which we want all fields to be included in the parent, it would be cumbersome to have to specify a copy_to for each and every field AND have to come up with a new fieldname for each of them.

Actually this is one of the things that I am most interested in with this change. Today the fact that both parents and children sometimes share the same field names prevents us from applying some optimizations to queries.

jpountz on 3 Aug 2017

Enerccio on 13 Aug 2018

👍11

Was this page helpful?

0 / 5 - 0 ratings