Elasticsearch: copy_to breaks on date_range type

Created on 20 Nov 2019  路  7Comments  路  Source: elastic/elasticsearch

Elasticsearch version (bin/elasticsearch --version): 7.4.2

Description of the problem including expected versus actual behavior:

Using copy_to with a date_range field causes errors on ingest. Interestingly (worse?), if you define a field that is not mapped and mappings are disabled, then it silently ignores the copy_to request.

Steps to reproduce:

1. Create the index

PUT /test
{
  "mappings": {
    "properties": {
      "date_copy": {
        "type": "date_range"
      },
      "date": {
        "type": "date_range",
        "copy_to": "date_copy"
      }
    }
  }
}

2. Attempt to index a document.

PUT /test/_doc/1
{
  "date": {
    "gte": "2019-11-10T01:00:00.000Z",
    "lt": "2020-01-01"
  }
}

3. Observe error.

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "error parsing field [date_copy], expected an object but got date"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "failed to parse field [date_copy] of type [date_range] in document with id '1'. Preview of field's value: 'null'",
    "caused_by": {
      "type": "mapper_parsing_exception",
      "reason": "error parsing field [date_copy], expected an object but got date"
    }
  },
  "status": 400
}
:SearcMapping >bug Search

All 7 comments

Pinging @elastic/es-search (:Search/Mapping)

I've got a test running this example, and the (slightly abbreviated) stacktrace of the moment we bail here is:

org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [date_copy] of type [date_range] in document with id '1'. Preview of field's value: 'null'
    at __randomizedtesting.SeedInfo.seed([6490F1302D0E7075:658F1CB2ADA8D8DA]:0)
    at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:297)
    at org.elasticsearch.index.mapper.DocumentParser.parseCopy(DocumentParser.java:855)
    at org.elasticsearch.index.mapper.DocumentParser.parseCopyFields(DocumentParser.java:845)
    at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:473)
    at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:489)
    at org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:402)
    at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:379)
    at org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:109)
    at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:68)
    at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:254)
    at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:768)
[...]
Caused by: org.elasticsearch.index.mapper.MapperParsingException: error parsing field [date_copy], expected an object but got date
    at org.elasticsearch.index.mapper.RangeFieldMapper.parseCreateField(RangeFieldMapper.java:395)
    at org.elasticsearch.index.mapper.FieldMapper.parse(FieldMapper.java:277)
    ... 40 more

The way I understand this with my limited knowledge of how copy_to works internally, DocumentParser#parseObject() reads past the full object field the first time it parses the input document. On the second pass on the copy field the parser is already pointing to the end of the "date" field object and so RangeFieldMapper#parseCreateField() cannot read the range object when it is called the second time on the copy field.

Building on @cbuescher's investigation, it seems as though copy_to doesn't work with fields that take the form of objects. For example, using copy_to with geo_point fields fails as well.

I think it would be helpful to discuss generally whether we would like to extend copy_to to work with field types whose values are JSON objects. If we decide not to, we should at least return a clear error message in these cases.

@pickypg I was also wondering what copy_to for a date_range would achieve? Where did this come up? In general I think copy_to makes most sence for text fields where different analyzers are specified or where different fields should be aggregated in one (the classic copy-to-all case and similar)?

In my current example, I have 3 separate reasons that all distinctly represent types of "away" statuses (e.g., holidays, PTO, and leave). I'd like to just search across one value, but still be able to determine (and display) the separate values without manually combining the date ranges.

We discussed this in a larger group and are still not sure if we should support copy_to on fields that take the form of objects, especially for the date_range type. There were some indication the same behaviour might be possible in an ingest pipeline but I haven't really verified that. Currently there seems to be no strong opinion on whether we should try and support this or document it more clearly that copy_to doesn't work in this case (and throw a proper error already in the mapping definitions).
Will do some more digging around possible workarounds in ingest and how complex supporting this would be otherwise.

On discussion it was suggested to use an ingest pipeline to copy the dates to a shared field as a workaround, which can be done with an append processor. The following works e.g. when the "date" fields are e.g. mapped to a keyword field:

PUT _ingest/pipeline/append_date_range
{
  "description": "appends the value of date to date_copy ",
  "processors": [
    {
      "append": {
        "field": "date_copy",
        "value": "{{date}}"
      }
    },{
      "append": {
        "field": "date_copy",
        "value": "{{another_date}}"
      }
    }
  ]
}

However, when I just tried the same with an index where all three fields are mapped to date_range, running this doc through the pipeline fails:

PUT /test/_doc/1?pipeline=append_date_range
{
  "date": {
    "gte": "2019-11-10T01:00:00.000Z",
    "lt": "2020-01-01"
  }
}

The failure looks pretty similar to the one above:

"type": "mapper_parsing_exception",
    "reason": "failed to parse field [date_copy] of type [date_range] in document with id '1'. Preview of field's value: '{lt=2020-01-01, gte=2019-11-10T01:00:00.000Z}'",
    "caused_by": {
      "type": "mapper_parsing_exception",
      "reason": "error parsing field [date_copy], expected an object but got null"
    }

But it looks like in this case the new document source that the append processor creates has its issues, sending an empty String down along with the copy of the range object:

failed to execute bulk item (index) index {[test][_doc][1], source[{"date":{"lt":"2020-01-01","gte":"2019-11-10T01:00:00.000Z"},"date_copy":["{lt=2020-01-01, gte=2019-11-10T01:00:00.000Z}",""]}]}

I'll do some digging here as well and maybe open a separate issue around the ingest processors behaviour.

Edit: The empty String isn't the main problem here, but the append processor stringifies the original Json object to "{lt=2020-01-01, gte=2019-11-10T01:00:00.000Z}"

Was this page helpful?
0 / 5 - 0 ratings

Related issues

brwe picture brwe  路  3Comments

rjernst picture rjernst  路  3Comments

martijnvg picture martijnvg  路  3Comments

clintongormley picture clintongormley  路  3Comments

jasontedor picture jasontedor  路  3Comments