Elasticsearch: Remote reindex fails due to long to keyword conversion

Created on 8 Feb 2017  路  5Comments  路  Source: elastic/elasticsearch

Elasticsearch version:5.1.2

Plugins installed: []

JVM version:1.7.0_111

OS version: Ubuntu 14.04

Description of the problem including expected versus actual behavior:
I want to migrate some documents using remote reindex API from ES 1.x to ES5.1.2
Reindex request is:
{ "waitForCompletion": false, "body":{ "conflicts": "proceed", "source": { "remote": { "host": "host"}, "index":"index", "size": 1000, "type":"default" }, "dest": { "index": "index" } } }

The response for the specified task is:
{..."response":{... "failures": [ { "index": "index", "type": "default", "id": "AU0TIvm5KBq_GySoOJRi", "cause": { "type": "illegal_argument_exception", "reason": "mapper [field1] of different type, current_type [keyword], merged_type [long]" }, "status": 400 }, { "index": "index", "type": "default", "id": "AU0tz1N8KBq_GySoDvr9", "cause": { "type": "illegal_argument_exception", "reason": "mapper [field2] cannot be changed from type [float] to [long]" }, "status": 400 } ] }}}

I'm trying to reindex the following documents:
id: AU0TIvm5KBq_GySoOJRi: {"field1":1} id: AU0tz1N8KBq_GySoDvr9: {"field2":2}

The mappings for index from ES 5.1.2 are:
{"field1":{"type":"keyword"},"field2":{"type":"float"}}

The mappings for index from ES1.x are:
{"field1":{"type":"string","index": "not_analyzed"},"field2":{"type":"double"}}

Creating a document with exactly the same content using POST API works, but reindex fails.
I don't have any coerce settings at index level or field level.

Provide logs (if relevant):
[2017-02-07T16:35:15,952][INFO ][o.e.t.LoggingTaskListener] 2919621 finished with response BulkIndexByScrollResponse[took=2.9s,timed_out=false,sliceId=null,updated=0,created=4998,deleted=0,batches=1,versionConflicts=0,noops=0,retries=0,throttledUntil=0s,bulk_failures=[{"index":"index","type":"default","id":"AU0TIvm5KBq_GySoOJRi","cause":{"type":"illegal_argument_exception","reason":"mapper [field1] of different type, current_type [keyword], merged_type [long]"},"status":400},{"index":"index","type":"default","id":"AU0tz1N8KBq_GySoDvr9","cause":{"type":"illegal_argument_exception","reason":"mapper [field2] cannot be changed from type [float] to [long]"},"status":400}],search_failures=[]]

Later edit:
The second error is resolved using field2":{"type":"double"} mapping in ES 5.1.2 . It seems that there is no coercion from long to float, but from long to double exists. (Of course, I''m speaking for remote reindex)

But the main problem still exists: there is no coercion from long to keyword, and this is a very important problem for me (I can't move all the data from ES 1.x to ES 5.1.2)

Most helpful comment

Hello!

I deleted from ES 1.x the two docs that were in trouble (id: AU0TIvm5KBq_GySoOJRi and id: AU0tz1N8KBq_GySoDvr9) and now the reindex works.

Anyway, there is one problem in reindex and I have a suggestion: the reindex tasks stop when it encounters a mapping error, like the ones up or for example: ' 0.0.0.0' is not an IP string literal. or any mapping error.

It should exists an option, just like "conflicts": "proceed", that if a mapping error occurs just skip the documents. It will minimize the overall time spent in reindexing process.

Thank you for your help.

All 5 comments

You have fields with the same name in different mapping types, so for instance you have a field called field1 in one mapping type which is of type keyword, then your reindex process is trying to add a field called field1 to a different mapping type, but with type long.

Fields with the same name in different types in the same index must have the same data type.

Hello!
Thank you for your quick response, but I only have one type in my index ("default"), and I'm trying to reindex only that type.

Also, as I already said, creating a document with exactly the same content using POST API works, but reindex fails.

And maybe I was not clear enough: reindex works, but only for a few documents, where "field1" is a string. When "field1" is a long, than reindex process stops and throws that error.

There is something you're leaving out, because the scenario you describe works just fine:

Do the following on a 1.x cluster:

PUT index
{
  "mappings": {
    "default": {
      "properties": {
        "field1": {
          "type": "string",
          "index": "not_analyzed"
        },
        "field2": {
          "type": "double"
        }
      }
    }
  }
}

PUT index/default/AU0TIvm5KBq_GySoOJR
{"field1":1} 

PUT index/default/AU0tz1N8KBq_GySoDvr9
{"field2":2}

then do the following on a 5.x cluster:

PUT index
{
  "mappings": {
    "default": {
      "properties": {
        "field1": {
          "type": "keyword"
        },
        "field2": {
          "type": "float"
        }
      }
    }
  }
}

POST _reindex
{
  "source": {
    "remote": {
      "host": "http://ip.for.1.x.cluster:9200"
    },
    "index": "index",
    "type": "default",
    "_source": true
  },
  "dest": {
    "index": "index"
  }
}

The above works just fine

Ok. Thank you for your quick response. I will investigate deeper and I will update this thread.

Thank you again for such a quick response.

Hello!

I deleted from ES 1.x the two docs that were in trouble (id: AU0TIvm5KBq_GySoOJRi and id: AU0tz1N8KBq_GySoDvr9) and now the reindex works.

Anyway, there is one problem in reindex and I have a suggestion: the reindex tasks stop when it encounters a mapping error, like the ones up or for example: ' 0.0.0.0' is not an IP string literal. or any mapping error.

It should exists an option, just like "conflicts": "proceed", that if a mapping error occurs just skip the documents. It will minimize the overall time spent in reindexing process.

Thank you for your help.

Was this page helpful?
0 / 5 - 0 ratings