Elasticsearch version (bin/elasticsearch --version): 6.3.0 official docker image
Plugins installed: []
JVM version (java -version): 10.0.1
OS version (uname -a if on a Unix-like system):
Linux 389f11186e5b 4.9.93-linuxkit-aufs #1 SMP Wed Jun 6 16:55:56 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
Haven't seen the Git issue posted for this thread, so I'm posting it to get the ball rolling since my team has encountered it as well.
Pipeline metrics aggregations do not recognize composite aggregations as multi-bucket. However, composite aggregations are multi-bucket, so this should work.
Steps to reproduce:
PUT _template/template_default
{
"mappings": {
"_doc": {
"_all": {
"enabled": false
},
"dynamic": "strict",
"properties": {
"itemId": {
"type": "keyword",
"norms": false
},
"inputQty": {
"type": "integer",
"index": false
},
"orderQty": {
"type": "integer",
"index": false
},
"centerId": {
"type": "keyword",
"eager_global_ordinals": true,
"norms": false
},
"submittedQty": {
"type": "integer",
"index": false
},
"confirmedQty": {
"type": "integer",
"index": false
}
}
}
}
}
POST items-0*/_search?ignore_unavailable=true
{
"size": 0,
"track_total_hits": false,
"aggs" : {
"myBuckets" : {
"composite" : {
"size" : 100000,
"sources" : [
{ "center_name" : { "terms" : { "field" : "centerId"} } }
]
},
"aggs" : {
"requested_units" : { "sum": { "field" : "inputQty" } },
"approved_units" : { "sum": { "field" : "orderQty" } },
"submitted_quantity" : { "sum" : { "field" : "submittedQty"} },
"confirmed_quantity" : { "sum" : { "field" : "confirmedQty"} }
}
},
"check_pipeline_agg": {
"sum_bucket": {
"buckets_path": "fc_buckets>requested_units"
}
}
}
}
Provide logs (if relevant):
The error that comes back will be similar to:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [avg_cardinality] found :org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregationBuilder for buckets path: composite_buckets>cardinality_some_field"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "my_index",
"node": "1_8dwXRuT565uQg11iZ_SA",
"reason": {
"type": "illegal_argument_exception",
"reason": "The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [avg_cardinality] found :org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregationBuilder for buckets path: composite_buckets>cardinality_some_field"
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [avg_cardinality] found :org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregationBuilder for buckets path: composite_buckets>cardinality_some_field",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "The first aggregation in buckets_path must be a multi-bucket aggregation for aggregation [avg_cardinality] found :org.elasticsearch.search.aggregations.bucket.composite.CompositeAggregationBuilder for buckets path: composite_buckets>cardinality_some_field"
}
}
},
"status": 400
}
Observing the same error in v6.4.3.
Without the ability to use the Pipeline aggregation, Composite aggregation loses a lot of flexibility.
I'm willing to work on this missing feature. Looking forward to any hints about the "marker" that is missing as mentioned in the forum link.
There is an error (or typo) in the above mentioned query for reproducing this behaviour i.e. the pipeline aggregation should be as mentioned below:
check_pipeline_agg": {
"sum_bucket": {
"buckets_path": "mybuckets>requested_units"
}
}
@jimczi Any updates on this?
+1
Also having the same issue. Any update on this?
Looking for a way to output a report specifying all customers having sum of sales larger than X.
I'm afraid that a regular aggregation would probably not scale to the amount of customers I want to output.
The composite aggregation seems to be the right tool, but now I'll have to get all customer aggregated buckets first, and then do application-level filtering for the sum. This isn't terrible, but the amount of traffic might be wasteful.
So, very interested in combining these two great abilities.
While the fix would be simple it might be misleading for users since pipeline buckets are applied to the final buckets. One thing I'd like to understand is why this would be needed since the idea of the composite aggregation is to paginate over the buckets instead of returning them all in a single response. Applying a bucket pipeline aggregation on a single page of result is not very helpful if the goal is to get the total sum of the buckets.
Forgot to update this ticket. We discussed this in a team meeting. We think it might be possible to support a subset of the pipeline aggs, namely the ones that are "self contained" like a bucket_script. E.g. the pipeline agg is only enriching an existing bucket, and doesn't rely on data from any other buckets. This would be safe and compatible with composite agg.
Other pipelines like a derivative are not safe, because there is no guarantee that the page has all the required data.
Not entirely sure how we'd go about implementing this -- some kind of new marker interface? Or if it's worth the effort to implement for such a limited functionality.
Most helpful comment
Forgot to update this ticket. We discussed this in a team meeting. We think it might be possible to support a subset of the pipeline aggs, namely the ones that are "self contained" like a bucket_script. E.g. the pipeline agg is only enriching an existing bucket, and doesn't rely on data from any other buckets. This would be safe and compatible with composite agg.
Other pipelines like a derivative are not safe, because there is no guarantee that the page has all the required data.
Not entirely sure how we'd go about implementing this -- some kind of new marker interface? Or if it's worth the effort to implement for such a limited functionality.