Kibana version:
7.2 BC6
Elasticsearch version:
7.2 BC6
Server OS version:
macOS v10
Browser version:
Chrome Version 74.0.3729.169
Original install method (e.g. download page, yum, from source, etc.):
Download from build candidate.
Describe the bug:
When I create a data frame transform that generates a date field in the destination index, the date is interpreted incorrectly in the Discover page.
Steps to reproduce:
"format": "yyyy" to the date_histogram.date and a format of strict_date_optional_time||epoch_millis||yyyy.Dec 31, 1969 @ 16:00:02.019.Expected behavior:
I would expect the value to be "2019" in the Discover tab too.
Screenshots (if relevant):
Creating the data frame transform (note the correct date value in the preview):

Correct date value in the destination index:

Incorrect date value in Discover:

Errors in browser console (if relevant):
Provide logs and/or server output (if relevant):
Any additional context:
This is the JSON for my data frame:
{
"id": "retest",
"source": {
"index": [
"kibana_sample_data_ecommerce"
],
"query": {
"match_all": {}
}
},
"dest": {
"index": "retest-index"
},
"pivot": {
"group_by": {
"order_date": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "1y",
"format": "yyyy"
}
}
},
"aggregations": {
"order_id.cardinality": {
"cardinality": {
"field": "order_id"
}
}
}
}
}
It seems to me that automatically adding the format field to the date_histogram group_by for the user could cause more problems than necessary. (See related issue: https://github.com/elastic/elasticsearch/issues/43068)
Why not default to not putting in a format but still allow the end user to provide one if they wish?
Pinging @elastic/ml-ui
I remember @pheyos brought this up already and we discussed if this line could be changed? https://github.com/elastic/elasticsearch/pull/41703/files#diff-5e5072b73a7dc03174c4b104a8dfb219R84
The line says
builder.field(FORMAT, DEFAULT_TIME_FORMAT + "||" + format);
In the case of yyyy being 2019 for example could be interpretated as epoch ms and it gets overruled by DEFAULT_TIME_FORMAT. Could we change this to:
builder.field(FORMAT, format + "||" + DEFAULT_TIME_FORMAT);
Then DEFAULT_TIME_FORMAT would be more like a fallback and yyyy would be tried first.
If format causes problems in general I think we should decide if we want to support it at all in the UI. If we remove it from the default configs but add a custom field to override that might as well give users headaches and will be harder for us to support. What do you think?
I am fine removing support for it from the UI entirely, but keeping it as a valid option in the API.
@sophiec20 what do you think?
I looked into this a bit more. Here's a set of Kibana Dev Console statements which should demontrate the underlying issue:
GET _cat/indices
# Create an index with a custom mapping featuring different date formats
# date_raw: default
# date_yyyy: custom format `yyyy`, ES also offers `strict_year` to do exactly that
# date_override: This is the format the data frame backend currently is creating
# date_fallback: This is what I suggeset the data frame backend should be creating
PUT date_test
{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"properties" : {
"date_raw" : { "type" : "date" },
"date_yyyy" : { "type" : "date", "format": "yyyy" },
"date_override" : { "type" : "date", "format": "strict_date_optional_time||epoch_millis||yyyy" },
"date_fallback" : { "type" : "date", "format": "yyyy||strict_date_optional_time||epoch_millis" }
}
}
}
POST date_test/_doc/test_doc_2019
{
"date_raw": 2019,
"date_yyyy": 2019,
"date_override": 2019,
"date_fallback": 2019
}
POST date_test/_doc/test_doc_2018
{
"date_raw": 2018,
"date_yyyy": 2018,
"date_override": 2018,
"date_fallback": 2018
}
POST date_test/_doc/test_doc_2017
{
"date_raw": 2017,
"date_yyyy": 2017,
"date_override": 2017,
"date_fallback": 2017
}
GET date_test/_search
{
"aggs": {
"date_raw": {
"date_histogram": {
"field": "date_raw",
"calendar_interval": "1y"
}
},
"date_yyyy": {
"date_histogram": {
"field": "date_yyyy",
"calendar_interval": "1y"
}
},
"date_override": {
"date_histogram": {
"field": "date_override",
"calendar_interval": "1y"
}
},
"date_fallback": {
"date_histogram": {
"field": "date_fallback",
"calendar_interval": "1y"
}
}
},
"size": 0
}
This is the result of the aggregation search from the snippet above:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"date_raw" : {
"buckets" : [
{
"key_as_string" : "1970-01-01T00:00:00.000Z",
"key" : 0,
"doc_count" : 3
}
]
},
"date_override" : {
"buckets" : [
{
"key_as_string" : "1970-01-01T00:00:00.000Z",
"key" : 0,
"doc_count" : 3
}
]
},
"date_yyyy" : {
"buckets" : [
{
"key_as_string" : "2017",
"key" : 1483228800000,
"doc_count" : 1
},
{
"key_as_string" : "2018",
"key" : 1514764800000,
"doc_count" : 1
},
{
"key_as_string" : "2019",
"key" : 1546300800000,
"doc_count" : 1
}
]
},
"date_fallback" : {
"buckets" : [
{
"key_as_string" : "2017",
"key" : 1483228800000,
"doc_count" : 1
},
{
"key_as_string" : "2018",
"key" : 1514764800000,
"doc_count" : 1
},
{
"key_as_string" : "2019",
"key" : 1546300800000,
"doc_count" : 1
}
]
}
}
}
You can see that date_raw and date_override treat 2019 as ms since epoch, whereas date_yyyy and date_fallback correctly identify 2019 as a year.
This problem will be solved (for transforms created by 7.3 or above) if we implement the proposal outlined in https://github.com/elastic/kibana/issues/39250#issuecomment-503608244.
Since 7.2 is so close to release we should probably just document the problem raised in this issue as a known bug.
For 7.2.1 we could change yyyy to a format that couldn't be mixed up with epoch_millis, for example yyyy-01-01.
Would yyyy-MM-dd give different outputs to yyyy-01-01? If so I think that's hiding another bug, because the aggregation should be rounding the dates to the beginning of buckets. Maybe use of hardcoded 00 and 01 in date formats has disguised a timezone handling problem for example.
But yes, a temporary workaround could be to use yyyy-MM-dd as the minimum granularity.
Honestly, if we are talking about increasing the fidelity of the format, we should just make it a full fidelity format.
Honestly, if we are talking about increasing the fidelity of the format, we should just make it a full fidelity format.
How about:
builder.field(FORMAT, DEFAULT_TIME_FORMAT + "||" + format);
to:
builder.field(FORMAT, format + "||" + DEFAULT_TIME_FORMAT);
in the back end code. No UI changes for 7.2.1.
format in the config and always use epoch millis in the indexed data with UI translation to human readable format as per https://github.com/elastic/kibana/issues/39250#issuecomment-503608244.Since we are covering the removal of auto-formatting in another issue (#39250) and the backend change to swap DEFAULT_TIME_FORMAT priorities is already in I'm closing this issue.
Most helpful comment
How about:
to:
in the back end code. No UI changes for 7.2.1.
formatin the config and always use epoch millis in the indexed data with UI translation to human readable format as per https://github.com/elastic/kibana/issues/39250#issuecomment-503608244.