Does any purpose remain for the include_in_root
and include_in_parent
settings for nested
fields, now that we have nested
and reverse_nested
aggregations, and inner hits with highlighting?
If not, we should remove them.
These options also allow to form simple (fast) queries on the parent instead of slower nested queries. We could potentially remove these options, but that would mean that we would rely on users to duplicate information on the parent/root document explicitely at index time.
Actually maybe it's not necessary and users should just do it explicitly with copy_to
instead of implicitely with include_in_*
.
2.0 gives the following exception on trying to put an existing mapping that worked with 1.7.x:
org.elasticsearch.index.mapper.MapperParsingException: Mapping definition for [title] has unsupported parameters: [include_in_root : true]
and also fails to install the mapping.
so maybe this is already merged in 2.0.0 and can be closed? Also would be helpful to mention this and the recommended alternative approach (copy_to) with an example in the documentation.
The migration tool did not catch this either.
@adichad I think you have the parameter in the wrong place, these params still work in 2.0:
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"foo": {
"type": "nested",
"include_in_root": true,
"properties": {
"bar": {
"type": "string"
}
}
}
}
}
}
}
I really like the idea of removing include_in_root|parent
in favour of the more explicit copy_to
per field. This way there is no magic, and you only (re-)index the fields that you really want at the top level.
I was thinking about it more, one downside of explicit copy_to compared to include_in_all is that it would force to use a different field name, which is worse in terms of index sparsity.
@clintongormley @jpountz
For tracking purposes, it may be worth noting that we sometimes recommend using "include_in_parent" when someone wants to summarize data stored in nested docs within Kibana using aggregations, for instance, using the Terms aggregation to get Top N values across a field in a nested doc: https://github.com/elastic/kibana/issues/1084#issuecomment-112162113
This approach was also used to develop the Watch History dashboard, since it was based on a nested document structure: https://www.elastic.co/guide/en/watcher/current/watch-history.html
Hopefully the "copy to" functionality being discussed will be an equivalent alternative to that use case. It sounds like it will be and in fact, the added benefit is that you only have to reindex the affected fields, as opposed to all of the data, is that correct?
cc: @rashidkpc @skearns64 @eskibars
Hopefully the "copy to" functionality being discussed will be an equivalent alternative to that use case. It sounds like it will be and in fact, the added benefit is that you only have to reindex the affected fields, as opposed to all of the data, is that correct?
The copy_to
functionality already exists and behaves as per your comment.
I vote in favor of keeping include_in_parent and include_in_root. For complex use cases, nested queries with inner_hits are suited much better, but I think there are use cases (like aggregations, see the Kibana examples) where it is much more convenient to grab the data from the root node.
I think what Clint suggested would still maintain the functionality of include_in_parent/root, but it would be moved to copy_to
. I like this simplification of the api, even if the underlying complexity is still there.
I have not been able to drop-in-replace 'include_in_parent' with 'copy_to' with nested objects (maintaining the same fieldname in 'parent' as that of the nested object). This may be because I have not found detailed documentation that describes the copy_to value args, but if it is not possible I would request not to remove a feature for which there is no full replacement.
@sronsiek yeah you wouldn't be able to use the same field name in the parent with copy_to. Why is that important? What's your use case?
I dunno about that guy, but our use case is "a not very thoughtful person set include_in_parent
and include_in_root
on _all_ of our (many) mappings, this is now routinely used across the codebase, and we also aggregate on the nested docs". So changing the field type is not an option and changing the queries is not feasible.
Looks like include_in_parent and include_in_root are undocumented settings starting from v2.0. Should we add them back to the documentation? Is it a deliberate decision to not document them?
I used to see value in include_in_root
and include_in_parent
due to the fact that they would reuse the same underlying field instead of creating a new sparse field. However due to upcoming improvements with sparse doc values in Lucene 7 and recent improvements to nested
queries when include_in_root
and include_in_parent
are disabled (#23079), I am leaning towards deprecating include_in_root
and include_in_parent
in favour of copy_to
.
The only way I got nested fields to work in Kibana was using "include_in_parent", please keep it
The only way I got nested fields to work in Kibana was using "include_in_parent", please keep it
@aarnaout For this purpose, copy_to
would work just as well
Hey there.
copy_to
is equivalent to include_in_parent
and include_in_root
. I think it worth mentioning in the copy_to section in the ElasticSearch documentation.After reading the comments of this issue, I understand that copy_to is equivalent to include_in_parent and include_in_root. I think it worth mentioning in the copy_to section in the ElasticSearch documentation.
The only difference is that include_in_*
reuse the same field name, which makes a difference to Lucene. Especially with Lucene 6 and earlier versions (which map to all Elasticsearch version up to 5.x included) since it does not like sparse fields that have norms or doc values. However, things are becoming better in the upcoming Lucene 7 so we might want to consider replacing include_in_*
with a regular copy_to
.
Therefore, making a terms query or exists query about a field copied to the root document from all nested documents might be much faster than making a nested query [...] I think it would be really nice to mention that in the "Document modeling" section in tune for search speed documentation
+1 !
I know that bool query clauses making an optimization to choose which conditions to start with first. My question is: is it smart enough to first run the conditions in the root document and only then run the conditions involve nested query? Did ElasticSearch 5.4 nested improvements also take that under account?
It used to do that before 5.4 already, but 5.4 improved things so that it also works well in the case that a script, phrase or range query occurs under the nested
query.
I'm in favor of keeping include_in_root
and include_in_parent
. copy_to
may be preferred over them in many use cases, but it does not replace their functionality in all cases.
1) If we have nested data types for which we want all fields to be included in the parent, it would be cumbersome to have to specify a copy_to
for each and every field AND have to come up with a new fieldname for each of them.
2) If we have customers that query our data, we want them to be able to look at the source document and know how to query each field. With copy_to
, they can't know what the new copy_to
fieldname is by looking at the source document.
3) If we have multiple levels of nesting, we don't want to have to make a new fieldname to copy_to
for each level above the original field. Simply using include_in_parent
at each level would make every field available at each level above it.
If we have nested data types for which we want all fields to be included in the parent, it would be cumbersome to have to specify a copy_to for each and every field AND have to come up with a new fieldname for each of them.
Actually this is one of the things that I am most interested in with this change. Today the fact that both parents and children sometimes share the same field names prevents us from applying some optimizations to queries.
I am in favor of keeping include_in_parent
(why is it removed from documentation?). Copy to requires new field names, in which case I might just as well ignore it and simply make my own scheme and copy the data before indexing. Also include_in*
are useful for simple search across whole document, something that nested apparently can't do on it's own (simple string query will not search nested even if field is not specified...)
Most helpful comment
I am in favor of keeping
include_in_parent
(why is it removed from documentation?). Copy to requires new field names, in which case I might just as well ignore it and simply make my own scheme and copy the data before indexing. Alsoinclude_in*
are useful for simple search across whole document, something that nested apparently can't do on it's own (simple string query will not search nested even if field is not specified...)