Elasticsearch: Date histogram aggregation is incorrect in time zone DST-shift

Created on 31 Oct 2019  路  3Comments  路  Source: elastic/elasticsearch

Elasticsearch version (bin/elasticsearch --version): 7.4.1 (docker)

Plugins installed: []

JVM version (java -version): Bundled with docker

OS version (uname -a if on a Unix-like system): macOS 10.14.6 with docker engine 19.03.4 in docker for desktop

Description of the problem including expected versus actual behavior:

When we create a date histrogram per day in the Europe/Oslo timezone in the DST-shift we get a bucket which is off by one hour and the buckets returned are incorrect. The DST-shift happens in this timezone at 2015-10-25T03:00+02:00[Europe/Oslo] and is adjusted to 2015-10-25T02:00+01:00[Europe/Oslo].

The issue only happens when we have more than one shard.

We expect the following result from the aggregation

  {
    "key_as_string": "2015-10-25T00:00:00.000+02:00",
    "key": 1445724000000,
    "doc_count": 2
  }

But the actual result is sometimes

 {
    "key_as_string": "2015-10-25T00:00:00.000+02:00",
    "key": 1445724000000,
    "doc_count": 1
  },
  {
    "key_as_string": "2015-10-25T01:00:00.000+02:00",
    "key": 1445727600000,
    "doc_count": 1
  }

Steps to reproduce:

The following curl command can be executed. To make it easier to read the output it can be piped to a file and use jq with jq .aggregations.byDay.buckets.

curl --silent --output /dev/null -X PUT "http://localhost:9200/test" -H 'Content-Type: application/json' -d'
{
     "settings": {
       "index": {
         "number_of_shards": 2
       }
     }
}
'

curl --silent --output /dev/null -X PUT "http://localhost:9200/test/_mapping" -H 'Content-Type: application/json' -d'
{
  "properties": {
    "from": {
      "type": "date"
    }
  }
}
'

curl --silent --output /dev/null -XPOST http://localhost:9200/_bulk?routing=0 -H 'Content-Type: application/json' -d '
{ "index": { "_index": "test", "_id": "1" } }
{ "from": "2015-10-25T00:00:00Z" }

'

curl --silent --output /dev/null -XPOST http://localhost:9200/_bulk?routing=1 -H 'Content-Type: application/json' -d '
{ "index": { "_index": "test", "_id": "2" } }
{ "from": "2015-10-25T01:00:00Z" }

'


curl --silent --output /dev/null -XPOST http://localhost:9200/test/_refresh

curl --silent -XPOST http://localhost:9200/test/_search -H 'Content-Type: application/json' -d '
{
  "size": 0,
  "aggs": {
    "byDay": {
      "date_histogram": {
        "field": "from",
        "calendar_interval": "1d",
        "time_zone": "Europe/Oslo"
      }
    }
  }
}'

curl --silent --output /dev/null -XDELETE http://localhost:9200/test/
:AnalyticAggregations >bug Analytics

Most helpful comment

It is still reproducible in master, where it triggers the following assertion:

[2020-04-01T15:56:10,623][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [runTask-0] fatal error in thread [elasticsearch[runTask-0][search][T#49]], exiting
java.lang.AssertionError: key: 1445814000000, nextBucket.key: 1445727600000
        at org.elasticsearch.search.aggregations.bucket.histogram.InternalDateHistogram.addEmptyBuckets(InternalDateHistogram.java:433) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.search.aggregations.bucket.histogram.InternalDateHistogram.reduce(InternalDateHistogram.java:455) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.search.aggregations.InternalAggregations.reduce(InternalAggregations.java:245) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.search.aggregations.InternalAggregations.topLevelReduce(InternalAggregations.java:189) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.search.SearchPhaseController.reduceAggs(SearchPhaseController.java:536) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.search.SearchPhaseController.reducedQueryPhase(SearchPhaseController.java:512) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.search.SearchPhaseController.reducedQueryPhase(SearchPhaseController.java:417) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.search.SearchPhaseController$2.reduce(SearchPhaseController.java:784) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.search.FetchSearchPhase.innerRun(FetchSearchPhase.java:116) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:95) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:691) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at java.lang.Thread.run(Thread.java:830) [?:?]

It looks like you need 2 records in different shards on the border of the bucket pushed into another bucket by the timezone.

All 3 comments

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

It is still reproducible in master, where it triggers the following assertion:

[2020-04-01T15:56:10,623][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [runTask-0] fatal error in thread [elasticsearch[runTask-0][search][T#49]], exiting
java.lang.AssertionError: key: 1445814000000, nextBucket.key: 1445727600000
        at org.elasticsearch.search.aggregations.bucket.histogram.InternalDateHistogram.addEmptyBuckets(InternalDateHistogram.java:433) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.search.aggregations.bucket.histogram.InternalDateHistogram.reduce(InternalDateHistogram.java:455) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.search.aggregations.InternalAggregations.reduce(InternalAggregations.java:245) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.search.aggregations.InternalAggregations.topLevelReduce(InternalAggregations.java:189) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.search.SearchPhaseController.reduceAggs(SearchPhaseController.java:536) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.search.SearchPhaseController.reducedQueryPhase(SearchPhaseController.java:512) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.search.SearchPhaseController.reducedQueryPhase(SearchPhaseController.java:417) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.search.SearchPhaseController$2.reduce(SearchPhaseController.java:784) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.search.FetchSearchPhase.innerRun(FetchSearchPhase.java:116) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:95) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:691) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at java.lang.Thread.run(Thread.java:830) [?:?]

It looks like you need 2 records in different shards on the border of the bucket pushed into another bucket by the timezone.

I believe this is resolved, happily :) I think the recent work on optimizing date handling also had the side-effect of fixing this bug. It doesn't reproduce on master anymore for me, and I confirmed that each shard has one of the documents. :tada:

Was this page helpful?
0 / 5 - 0 ratings