Elasticsearch: Unable to sort by result of bucket script aggregation

Created on 18 Jul 2018  路  4Comments  路  Source: elastic/elasticsearch

I get result by using bucket script aggregation, but I can't sort by this aggregation value. For explame,

{
  "from": 0,
  "size": 0,
  "sort": [],
  "aggs": {
    "api_terms": {
      "terms": {
        "field": "name",
        "order": {
          "avg_time": "desc"
        }
      },
      "aggs": {
        "sum_duration": {
          "sum": {
            "field": "duration"
          }
        },
        "sum_count": {
          "sum": {
            "field": "count"
          }
        },
        "avg_time": {
          "bucket_script": {
            "buckets_path": {
              "duration": "sum_duration",
              "count": "sum_count"
            },
            "script": "params.duration / params.count"
          }
        }
      }
    }
  }
}

I hope sort by "avg_time" that is calculated by bucket_script, so I add order in term aggregation ( "order": {"avg_time": "desc"}). But it cause error. This reason of error is 鈥淚nvalid aggregator order path [avg_time]. Unknown aggregation [avg_time]".
Even more puzzling is I use add order by sum_count instead of avg_time, i can get correct value.

:AnalyticAggregations

Most helpful comment

@jtibshirani is correct, you can't sort by pipeline aggs... they are executed after regular aggs execute.

If the terms agg contains all the entries you care about, you can still do pipeline sorting using the bucket_sort pipeline agg: https://www.elastic.co/guide/en/elasticsearch/reference/6.x/search-aggregations-pipeline-bucket-sort-aggregation.html

The caveat is that this performs sorting on the final list of buckets, not while the aggregations are calculating. So it only sorts the list that is returned by the terms agg... if a term/value isn't in the list, it won't get sorted. That's in contrast to sorting on the terms agg itself, which changes the contents of the list.

It'd look something like this (untested):

{
  "from": 0,
  "size": 0,
  "sort": [],
  "aggs": {
    "api_terms": {
      "terms": {
        "field": "name",
        "order": {
          "avg_time": "desc"
        }
      },
      "aggs": {
        "sum_duration": {
          "sum": {
            "field": "duration"
          }
        },
        "sum_count": {
          "sum": {
            "field": "count"
          }
        },
        "avg_time": {
          "bucket_script": {
            "buckets_path": {
              "duration": "sum_duration",
              "count": "sum_count"
            },
            "script": "params.duration / params.count"
          }
        },
        "final_sort": {
          "bucket_sort": {
             "sort": [
               {"avg_time": {"order": "desc"}}
              ]
           }
        }
      }
    }
  }
}




All 4 comments

Pinging @elastic/es-search-aggs

Hi @ydzhu, in a terms aggregation it's unfortunately not possible to order by the result of a pipeline aggregation like bucket_script (see the note here: https://www.elastic.co/guide/en/elasticsearch/reference/6.3/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-order).

It's not good that we're throwing a 503, and the error message is also misleading. We're tracking the error code issue in #20003.

I don't see an easy workaround for your request, but tagging @polyfractal just in case he has an idea.

@jtibshirani is correct, you can't sort by pipeline aggs... they are executed after regular aggs execute.

If the terms agg contains all the entries you care about, you can still do pipeline sorting using the bucket_sort pipeline agg: https://www.elastic.co/guide/en/elasticsearch/reference/6.x/search-aggregations-pipeline-bucket-sort-aggregation.html

The caveat is that this performs sorting on the final list of buckets, not while the aggregations are calculating. So it only sorts the list that is returned by the terms agg... if a term/value isn't in the list, it won't get sorted. That's in contrast to sorting on the terms agg itself, which changes the contents of the list.

It'd look something like this (untested):

{
  "from": 0,
  "size": 0,
  "sort": [],
  "aggs": {
    "api_terms": {
      "terms": {
        "field": "name",
        "order": {
          "avg_time": "desc"
        }
      },
      "aggs": {
        "sum_duration": {
          "sum": {
            "field": "duration"
          }
        },
        "sum_count": {
          "sum": {
            "field": "count"
          }
        },
        "avg_time": {
          "bucket_script": {
            "buckets_path": {
              "duration": "sum_duration",
              "count": "sum_count"
            },
            "script": "params.duration / params.count"
          }
        },
        "final_sort": {
          "bucket_sort": {
             "sort": [
               {"avg_time": {"order": "desc"}}
              ]
           }
        }
      }
    }
  }
}




I opened #32522 in hopes of clarifying the error message. I'll close this out once that goes in.

Was this page helpful?
0 / 5 - 0 ratings