Elasticsearch version (bin/elasticsearch --version): 6.0.0-rc1
JVM version (java -version):
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)
OS version (uname -a if on a Unix-like system):
Linux elasticsearch-data-hot-003 4.11.0-1013-azure #13-Ubuntu SMP Mon Oct 2 17:59:06 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
When doing a bucket script aggregation that depends on a cumulative sum aggregation of another sum aggregation, if the sum aggregation returns null values (Because there are no docs in that time interval bucket), the bucket script aggregation will also return null, instead of relying on the cumulative sum value that was gathered so far.
Steps to reproduce:
Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.
{
@timestamp: '',
bytes: 100
}
{
"aggs": {
"timeseries": {
"date_histogram": {
"field": "@timestamp",
"interval": "1m",
"min_doc_count": 0,
"time_zone": "UTC"
},
"aggs": {
"sum_bytes": {
"sum": {
"field": "bytes"
}
},
"cumulative_bytes": {
"cumulative_sum": {
"buckets_path": "sum_bytes"
}
},
"bucket": {
"bucket_script": {
"buckets_path": {
"bytes": "cumulative_bytes"
},
"script": {
"source": "params.bytes",
"lang": "painless"
}
}
}
}
}
}
}
bucket aggregation does not show the value that its supposed to (The cumulative_bytes value)@shaharmor Did you try setting a missing value on the date_histogram aggregation? Github issues are meant for bugs and feature requests and this sounds more like a question that should be asked on the forum first.
I'm not sure how a missing value on the date_histogram would help, as there are no docs to add the missing field to...
Anyway, I created a forum thread as well: https://discuss.elastic.co/t/bucket-script-fails-when-some-docs-are-missing/107592/1
This is actually a bug.
It seems that the bucket_script aggregation is not executed on buckets that have a doc count of zero. The following recreation script highlights this:
PUT test
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"doc": {
"properties": {
"@timestamp": {
"type": "date"
},
"bytes": {
"type": "long"
}
}
}
}
}
PUT test/doc/1
{
"@timestamp": "2017-01-01T00:00:00",
"bytes": 100
}
PUT test/doc/2
{
"@timestamp": "2017-01-01T00:05:00",
"bytes": 100
}
GET test/_search
{
"size": 0,
"aggs": {
"timeseries": {
"date_histogram": {
"field": "@timestamp",
"interval": "1m",
"min_doc_count": 0,
"time_zone": "UTC"
},
"aggs": {
"sum_bytes": {
"sum": {
"field": "bytes"
}
},
"cumulative_bytes": {
"cumulative_sum": {
"buckets_path": "sum_bytes"
}
},
"bucket": {
"bucket_script": {
"buckets_path": {
"bytes": "cumulative_bytes"
},
"script": {
"source": "params.bytes",
"lang": "painless"
}
}
}
}
}
}
}
The response from the search request is:
{
"took": 9,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"timeseries": {
"buckets": [
{
"key_as_string": "2017-01-01T00:00:00.000Z",
"key": 1483228800000,
"doc_count": 1,
"sum_bytes": {
"value": 100
},
"cumulative_bytes": {
"value": 100
},
"bucket": {
"value": 100
}
},
{
"key_as_string": "2017-01-01T00:01:00.000Z",
"key": 1483228860000,
"doc_count": 0,
"sum_bytes": {
"value": 0
},
"cumulative_bytes": {
"value": 100
}
},
{
"key_as_string": "2017-01-01T00:02:00.000Z",
"key": 1483228920000,
"doc_count": 0,
"sum_bytes": {
"value": 0
},
"cumulative_bytes": {
"value": 100
}
},
{
"key_as_string": "2017-01-01T00:03:00.000Z",
"key": 1483228980000,
"doc_count": 0,
"sum_bytes": {
"value": 0
},
"cumulative_bytes": {
"value": 100
}
},
{
"key_as_string": "2017-01-01T00:04:00.000Z",
"key": 1483229040000,
"doc_count": 0,
"sum_bytes": {
"value": 0
},
"cumulative_bytes": {
"value": 100
}
},
{
"key_as_string": "2017-01-01T00:05:00.000Z",
"key": 1483229100000,
"doc_count": 1,
"sum_bytes": {
"value": 100
},
"cumulative_bytes": {
"value": 200
},
"bucket": {
"value": 200
}
}
]
}
}
}
In the response above the buckets with keys 2017-01-01T00:01:00.000Z to 2017-01-01T00:04:00.000Z should have a sub-aggregation bucket whose value should be 100 (the same as the cumulative_sum aggregation since the script just outputs that value).
Hi @colings86 Could you please provide some instructions to fix this? Thanks in advance!
I just look a look at the code around this and I think its going to need some thought around how we can fix this bug without adversely affecting other pipeline aggregations. The problem arises when we resolve the value for the buckets_path for the empty bucket in BucketHelpers.resolveBucketValue here (https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/search/aggregations/pipeline/BucketHelpers.java#L174):
if (Double.isInfinite(value) || Double.isNaN(value) || (bucket.getDocCount() == 0 && !isDocCountProperty)) {
switch (gapPolicy) {
case INSERT_ZEROS:
return 0.0;
case SKIP:
default:
return Double.NaN;
}
} else {
return value;
}
That if statement return Double.NaN (because the gap policy is SKIP) for any empty buckets, where an empty bucket is defined as a bucket with a doc count of zero.
In BucketScriptPipelineAggregator.reduce() we don't execute on buckets which return Double.NaN so the bucket is skipped.
BucketHelpers.resolveBucketValue() is used by all of the pipeline aggregations so we need to be careful that any changes do not adversely affect the other pipeline aggregations.
We could potentially change the bucket_script aggregation so it executes on all buckets regardless of the retrieved value but we also pass the doc count of the current bucket to the script so it can determine what value to output but I'm worried that this might make this aggregation too unwieldly for the user.
I'll mark this issue as discuss for now as I think we need to decide on an approach before it is implemented.
Discussed in FIxItFriday and we decided that this is requires more in depth discussion within the Search and Aggregations team
Discussed with the search and aggs team and we decided that we should pass the actual value to the script (instead of converting it to NaN or 0.0 using the gap policy and also pass the doc count of the bucket to the script so the user writing the script can know if the bucket is empty and decide how to interpret the value
We should ensure that the solution to this issue also fixes the issue in https://github.com/elastic/elasticsearch/issues/27544
@colings86 Is this something that is being actively worked on? Any ETA?
@shaharmor its not being actively worked on right now but since we now have a way forward on this it is now available to be picked up and worked on. There is no ETA for this currently.
@elastic/es-search-aggs
For anyone stumbling upon the same issue with bucket_selector aggregations, here is my workaround:
I will use the example from the docs:
{
"size": 0,
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "date",
"interval": "month"
},
"aggs": {
"total_sales": {
"sum": {
"field": "price"
}
},
"sales_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"totalSales": "total_sales"
},
"script": "params.totalSales < 200"
}
}
}
}
}
}
The issue: the script will select only non-empty buckets. If you try something like params.totalSales == null || params.totalSales < 200, it still won't select the empty ones.
Workaround is to make use of the special _count path:
"sales_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"totalSales": "total_sales",
"bucketCount": "_count"
},
"script": "params.bucketCount == 0 || params.totalSales < 200"
}
}
At least in 6.3, this works for bucket_path, but not for bucket_script, unfortunately. Can't say for other pipeline aggs.
Is this being fixed in 7?
Hi @shaharmor, I'm afraid there is still no progress on this issue. It's still on our radar, but we don't currently have anyone working on it at this time. We'll update this ticket when there's movement.
Most helpful comment
Hi @shaharmor, I'm afraid there is still no progress on this issue. It's still on our radar, but we don't currently have anyone working on it at this time. We'll update this ticket when there's movement.