The 6.x new composite aggregation provides a way to scroll and page on terms bucket, and it's a good thing.
However, it's not (for now?) possible to set a composite aggregation as a child of another bucket aggregation.
So it's impossible, for example, to composite-aggregate on a nested or a child entity. This request is not allowed :
{
"metadatas" : {
"nested" : {
"path" : "metaDatas"
},
"aggs" : {
"msisdn" : {
"filter" : { "term": { "metaDatas.name": "msisdn" } },
"aggs": {
"msisdns": {
"composite" : {
"size":5,
"sources" : [{ "value": { "terms" : { "field": "metaDatas.value.raw" } } }]
}
}
}
}
}
}
}
I don't understand this limitation.
Furthermore, an ES 6.2 patched without the check seems to work as expected.
@Override
protected AggregatorFactory<?> doBuild(SearchContext context, AggregatorFactory<?> parent,
AggregatorFactories.Builder subfactoriesBuilder) throws IOException {
/*if (parent != null) {
throw new IllegalArgumentException("[composite] aggregation cannot be used with a parent aggregation");
}*/
any insight ?
The limitation was added because the use case was not clear but it is possible as long as all* sources in the composite are in the same nested context. I'll mark this issue with the feature label but as a low priority for now.
@elastic/es-search-aggs
This feature is really useful, since nested is also not allow in sources of composite.
With out this features, we just can't use composite with nested document and must use deprecated terms aggregations with size = Infinity
I also feel this should be possible to have composite aggregation for nested object. I have come across a situation where sources with nested object is the only option and that too does not support nested types. I will have to look for other workaround, may be to use multiple queries to get the desired result.
I find this feature also very helpful to get all values of a specific field in a nested object, grouped by an identifier.
@jimczi Can you possibly provide an example of how to write the composite aggregation when "all* sources in the composite are in the same nested context"? I'm trying to do this, but so far have not been successful.
@justinmcp88 this is not possible currently which is why this issue is still open. I mentioned the all fields the composite must be in the same nested context
as a way to implement the feature described in this issue.
The nested aggregation is only one of the many bucket aggregations that have this limitation. Fixing the nested aggregation scenario (making it work with inner composite aggregations) only solves one fraction of the issue. Using a composite aggregation as a child of a filter aggregation is another very common scenario (for me, at least), and it's still forbidden after #37178.
Can we reopen this? Are all of the other bucket aggregations gonna be considered a hard limitation and be left out of the game? Did I miss some other issue that is responsible for the general scenario?
+1 for the request. Would be extremely helpful to have this capability ported to 6.3 as well so there's a way to paginate term bucket documents in an efficient manner
@jimczi Sorry to ping you, but I fear this could go unnoticed for 7, which would be very sad.
Composite aggregations have the incredible power to actually make Elasticsearch amazing at aggregation level analytics, which is something that people (me included) have to currently work around to be productive. Setting "size": Long.MAX_VALUE
on terms
aggregations to paginate on the application level is terrible, and building intermediate rollup/derived types/indexes don't have the same intuitive level of so many other Elasticsearch solutions, and more elaborate solutions are not maintenance-friendly.
Do we still/{at least} have plans to support all bucket aggregations as parents of composite aggregations?
Do we still/{at least} have plans to support all bucket aggregations as parents of composite aggregations?
No and we never had such plan ;) We added the support for nested
aggregations because otherwise it is impossible to paginate over nested
fields but I don't see why bucket aggregations would be useful. The composite
aggregation must be the root aggregation to allow pagination, that's the design. Can you explain why you'd need to use the composite
as a sub-aggregation (other than switching to a nested
context) ?
Using a composite aggregation as a child of a filter aggregation is another very common scenario (for me, at least), and it's still forbidden after #37178.
Can you move the filter aggregation to the query ? It should be equivalent. There is also a pr open to support a nested/filter
combo but for the main context the query should be preferred.
I didn't mean to imply there was such plan, but the original topic seemed broad enough to cover them all, my bad. But I digress.
Can you explain why you'd need to use the
composite
as a sub-aggregation (other than switching to anested
context) ?
Sure, I'll try.
Many of our queries here target many indexes at the same time, so that we can do "joins" in memory on the application side without requiring N>1 steps/queries (that would elevate complexity on our side a bunch).
So let's say I have indexes iA
and iB
, both already on the 7.0 mindset of index==type. Those indexes/types have many fields in common, for denormalization purposes. They relate to each other in some way (think fkeys), and we need extra data from one another when in a bigger context.
In a sample query, we could be doing something like:
GET /iA,iB/_search
{
"size": 0,
"query": {
"bool": {
"should": [
{
"bool": {
"filter": [
{ "type": { "value": "iA" } },
/* some filter on iA */
]
}
},
{
"bool": {
"filter": [
{ "type": { "value": "iB" } },
/* some filter on iB */
]
}
}
]
},
/* there is some application logic to apply common filters here to all indexes involved */
},
"aggs": {
"filtered_iA": {
"filter": { "type": { "value": "iA" } },
"aggs": {
"and_then_grouped": {
"terms": { /*========= this is the aggregation we wanted to paginate ==========*/
"field": "ia_some_field",
"size": 9999
},
"aggs": {
/* a bunch o metric aggregations, sometimes top_hits too */
}
}
}
},
"filtered_iB": {
"filter": { "type": { "value": "iB" } },
"aggs": {
"raw": {
"terms": {
"field": "_id",
/* sometimes this iB is small enough for us to not even try to paginate it */
/* so we bring it all, since we can't join easily on elasticsearch */
"size": 1000
},
"aggs": {
/* top_hit to project fields we need to augment iA */
}
}
}
}
}
}
Can you move the filter aggregation to the query ? It should be equivalent.
Using the model/approach we currently do, we cannot. It would break the aggregation because there could be multiple indexes involved in the query, but "this" aggregation would want to deal with only a subset of them, hence the filtering on aggregation level.
I was trying to "defend" the feature for all other bucket aggregations, but truth is I would be 99% happy if Filter Aggregation
was added to the "allowed parents" list for composite aggregation.
Another example where composite is also useful is sorting by string fields (terms aggregation only support sorting on numeric fields or single-bucket numeric subaggregation).
Example data structure which describes detection
:
{
"id": {
"type": "long"
},
"time": {
"type": "date"
},
"objectId": {
"type": "long"
},
"objectName": {
"type": "text",
"fields": {
"key": {
"type": "keyword"
}
}
}
}
Example request: Create auto_bucket_histogram
aggregation on detection time
. For each bucket return 10 unique objects with included number of detections sorted by objectName
.
Translated to composite sub-aggregation it would look something like this:
{
"size": 0,
"aggs": {
"ranges": {
"auto_date_histogram": {
"field": "time",
"buckets": 10
},
"aggregations": {
"objects": {
"composite": {
"size": 10,
"sources": [
{
"_sortBy": {
"terms": {
"field": "objectName.key",
"order": "desc"
}
}
},
{
"_groupBy": {
"terms": {
"field": "objectId"
}
}
}
]
},
"aggregations": {
"count": {
"value_count": {
"field": "objectId"
}
}
}
}
}
}
}
}
What is the reason this to be closed?
+1 for having composite aggregations be usable as a child aggregation of a filter aggregation. To understand a real world use-case where this would be helpful let me explain our use case:
We want to use composite aggregations as a method of creating hierarchical faceted searches. For a visual example of the final product that we want here is a screenshot:
Each document in our index has a parent "category" denoting a category i.e. "Men's Shoes", and several SKUs contained within each document that each have their own size "12", "13" etc...
Now, we might be able to accomplish a similar effect by using multiple term subaggregations to simulate parent/child hierarchies. But we wanted to be able to have these heirarchies be extendable (i.e deeper than just a single parent-child), an example of this could be seen here:
While recursive request/response generation to elasticsearch are fun, they make debugging a nightmare when you end up with a large list of configurable aggregations all of potentially varying depth. Composite aggregations gave us a way to very easily (and flatly) express the keys of our aggregation and get our hierarchy and flatly parse them out into a response that can be rendered by a UI by knowing where in the array of the hierarchy they are.
We built a quick proof of concept of rendering these hierarchies using composite aggregations and recently got down to working on the faceted
part (where filters applied from an aggregation don't impact the aggregation itself). This is usually built using global aggregations, and then filtering on any filters that are not filtering on that aggregation's fields:
{
"aggs" : {
"brand_global_and_filtered" : {
"global": {},
"filter" : { ... }, // ... filters that DONT filter on the brand field
"aggs" : {
"brand" : { "terms" : { "field" : "brand" } }
}
}
}
}
However, you cannot have a composite aggregation be the child of a filter aggregation and we're now having to reconsider our approach.
Most helpful comment
This feature is really useful, since nested is also not allow in sources of composite.
With out this features, we just can't use composite with nested document and must use deprecated terms aggregations with size = Infinity