Elasticsearch: Sometimes MapperParsingException and sometimes not

Created on 24 Oct 2012  路  17Comments  路  Source: elastic/elasticsearch

Hi,

On an empty 1 node cluster elasticsearch v0.19.10, I want to reproduce a MapperParsingException[object mapping for [my_type] tried to parse as object, but got EOF, has a concrete value been provided to it?]

But it's not so easy, here is the script I loop on:

curl -s -XDELETE http://localhost:9200/my_index/?pretty=true;

curl -s -XPUT 'http://localhost:9200/_bulk?pretty=true' --data-binary '
{"index":{"_index":"my_index","_type":"my_type"}}
{"obj":{"key":"value"}}
{"index":{"_index":"my_index","_type":"my_type"}}
{"obj":"value"}
'

Sometimes I have the Exception, and sometimes not :

[root@dahu share]# curl -s -XDELETE http://localhost:9200/my_index/?pretty=true;

curl -s -XPUT 'http://localhost:9200/_bulk?pretty=true' --data-binary '
{"index":{"_index":"my_index","_type":"my_type"}}
{"obj":{"key":"value"}}
{"index":{"_index":"my_index","_type":"my_type"}}
{"obj":"value"}
'{
  "ok" : true,
  "acknowledged" : true
}[root@dahu share]#
[root@dahu share]# curl -s -XPUT 'http://localhost:9200/_bulk?pretty=true' --data-binary '
> {"index":{"_index":"my_index","_type":"my_type"}}
> {"obj":{"key":"value"}}
> {"index":{"_index":"my_index","_type":"my_type"}}
> {"obj":"value"}
> '
{
  "took" : 221,
  "items" : [ {
    "create" : {
      "_index" : "my_index",
      "_type" : "my_type",
      "_id" : "QJz1_TNWT9yhf7Mu5cRlDw",
      "_version" : 1,
      "ok" : true
    }
  }, {
    "create" : {
      "_index" : "my_index",
      "_type" : "my_type",
      "_id" : "ibUrM7JzRaGfalbgaF-aTA",
      "error" : "MapperParsingException[object mapping for [my_type] tried to parse as object, but got EOF, has a concrete value been provided to it?]"
    }
  } ]
}[root@dahu share]#


[root@dahu share]# curl -s -XDELETE http://localhost:9200/my_index/?pretty=true;
curl -s -XPUT 'http://localhost:9200/_bulk?pretty=true' --data-binary '
{"index":{"_index":"my_index","_type":"my_type"}}
{"obj":{"key":"value"}}
{"index":{"_index":"my_index","_type":"my_type"}}
{"obj":"value"}
'{
  "ok" : true,
  "acknowledged" : true
}[root@dahu share]#
[root@dahu share]# curl -s -XPUT 'http://localhost:9200/_bulk?pretty=true' --data-binary '
> {"index":{"_index":"my_index","_type":"my_type"}}
> {"obj":{"key":"value"}}
> {"index":{"_index":"my_index","_type":"my_type"}}
> {"obj":"value"}
> '
{
  "took" : 251,
  "items" : [ {
    "create" : {
      "_index" : "my_index",
      "_type" : "my_type",
      "_id" : "N2PS-hoPQv2mLN7wzkZFHg",
      "_version" : 1,
      "ok" : true
    }
  }, {
    "create" : {
      "_index" : "my_index",
      "_type" : "my_type",
      "_id" : "-G9gf7kYRTi3Jb2A4-OsVQ",
      "_version" : 1,
      "ok" : true
    }
  } ]

Quite strange...

>breaking >enhancement v0.20.3 v0.90.0.Beta1

Most helpful comment

It would be nice if the EOF error would say which field is causing the error

All 17 comments

Your data is inconsistent.

{"obj":{"key":"value"}} tells Elasticsearch to interpret 'obj' as an object field.

{"obj":"value"} tells Elasticsearch that 'obj' is not on object field, but a value field.

Elasticseach is smart, it can upgrade from a value field to an object field automatically, but not vice versa. Therefore, you receive the exception only if your concurrent bulk request with the object-style 'obj 'has been arrived in the mapping before the value-style 'obj' and downgrading is impossible.

So, take care of your data, decide what style the 'obj' field should have, and you'll be safe.

@jprante Note that ES doesn't upgrade a value field to an object field. It just ignores the object

@clintongormley ah, I see https://gist.github.com/4090274 there's no upgrade magic. Sad, I hoped so :) Probably a missing feature?

But how would you upgrade it? what can you reliably convert it to?

One thought is promoting the value-style mapping

{
  "test" : {
    "test" : {
      "properties" : {
        "obj" : {
          "type" : "string"
        }
      }
    }
  }
}

to the object-style mapping

{
  "test" : {
    "test" : {
      "properties" : {
        "obj" : {
          "dynamic" : "true",
          "properties" : {
            "obj" : {
              "type" : "string"
            }
          }
        }
      }
    }
  }
}

so that { "obj" : { "key" : "value" }} could be processed, giving the mapping

{
  "test" : {
    "test" : {
      "properties" : {
        "obj" : {
          "dynamic" : "true",
          "properties" : {
            "obj" : {
              "type" : "string"
            },
            "key" : {
              "type" : "string"
            }
          }
        }
      }
    }
  }
}

But then its tricky, what is "obj", is it a value or an object level mapping. We could potentially support it, but then its weird when it comes to search behavior...

I'm not convinced that you can reliably guess the right thing to do - better to throw an error and make people fix their data. Otherwise it just leads to weird debugging sessions later on

ok, so if it's not possible (that's fine with me) i'd like to ALWAYS have an error. because sometimes

curl -s -XDELETE http://localhost:9200/my_index/?pretty=true;

curl -s -XPUT 'http://localhost:9200/_bulk?pretty=true' --data-binary '
{"index":{"_index":"my_index","_type":"my_type"}}
{"obj":{"key":"value"}}
{"index":{"_index":"my_index","_type":"my_type"}}
{"obj":"value"}

succeed and sometimes it fails. I'd like it to always fail...

is there some work around this? This inconsistency leads to very big problems here, because we can not predict what documents will be integrated...

Index object as a string field doesn't throw any errors because field mapper is trying to interpret this object as a field boost construct. It sees an object and it expects something like this to follow:

{"obj":{"_value":"value", "_boost":2.0}} 

When mapper doesn't find the "_value" property it just ignores this field. What we could do is, perhaps, throw an exception if "_value" property is not present, or if it finds some unknown properties such as "key" in the example. I think the latter might be a nice solution for this issue.

@imotov agreed, just fail if we get a field that we don't expect in the object notation of a string mapped field.

It would be nice if the EOF error would say which field is causing the error

+1 on more detail about the EOF and which field is causing the error. Data is dirty. Fact of life. So anything to help us out would be great.

...And on that note I also don't agree with clintongormley's theory of "make people fix their data." It's not always "their" data...And you can't predict what you might scrape from the internet AND the more schemaless databases and data sources we get the more we run into this issue.

In fact, people have commonly seen this EOF error with Twitter data. So when you work with a very popular data source / API (even if it's "incorrect" or "bad practice" - I'm not arguing that dirty data sucks), it's quite easy to see this error.

You can set the ignore_malformed setting in your index mapping, but it doesn't seem to help this situation. This is what I would expect though. While we can't reasonably convert data types back and forth...We can choose to ignore them. Otherwise, we have this failure going on which stops initial indexing and it's just bad.

EVEN IF it was at the cost of skipping the entire document from being indexed...That would at least let us get past some of these errors which are typically edge cases to begin with (another reason why people "just can't clean their data").

A setting that would simply ignore these problem documents or a clear error message about what field is the actual problem would be so helpful. How it currently works is not helpful.

Maybe for someone it will be usefull to find data's inconsistent with this tool https://github.com/atott/es-mapping-validator.

I wish this tool was not needed, but since it is needed, this looks like a great way to find out more about the error. Thank you for pointing this out and creating this tool!

It's maddening that Elasticsearch will not indicate which field is causing the problem.

Was this page helpful?
0 / 5 - 0 ratings