Elasticsearch: [Bug] 'ignore_malformed' does not always work, and occasionally throws an exception

Created on 12 Mar 2015  路  6Comments  路  Source: elastic/elasticsearch

The idea behind ignore_malformed setting, is to ensure all records being added get added to the index, even if some of the JSON attribute use an invalid data type (compared to the index's expected mapping). This is particularly useful with dynamic JSON document, where it's most important that all objects are "added" (rather than rejected via exception).

However, there are some cases where malformed JSON object are being reject (exception) instead of getting accepted. Repro steps:

  • Make sure the ignore_malformed setting is enabled
  • Add a first record
~$ curl -XPUT 'http://127.0.0.1:9200/tweets/tweet/123' -d '{ "value" : 12345 }'
{"_index":"tweets","_type":"tweet","_id":"123","_version":1,"created":true}~ $

The index now expected a long value for the "value" field.

  • Add a malformed record
~ $ curl -XPUT 'http://127.0.0.1:9200/tweets/tweet/124' -d '{ "value" : "Hello" }'
{"_index":"tweets","_type":"tweet","_id":"124","_version":1,"created":true}

The malformed record is accepted, as expected.

  • Add another malformed record
~ $ curl -XPUT 'http://127.0.0.1:9200/tweets/tweet/125' -d '{ "value" : 123123123123123123123 }'
{"error":"MapperParsingException[failed to parse [value]]; nested: JsonParseException[Numeric value (123123123123123123123) out of range of long (-9223372036854775808 - 9223372036854775807)\n at [Source: [B@52259927; line: 1, column: 34]]; ","status":400}

Expected: This malformed record should still be added, with the 123123123123123123123 value coerced into a string.

:SearcMapping >enhancement help wanted

Most helpful comment

This bug is NOT a duplicate of #11513. This bug is about the fact that the documentation for ignore_malformed is incorrect and misleading. Just because the particular example referenced here (long vs. other numerics) was addressed, doesn't mean the problem went away (for example, it remains with strings vs. objects, etc). At minimum, there should be a warning in the documentation for ignore_malformed that it only applies in a narrow range of circumstances and does not ensure the document will get indexed.

All 6 comments

Hi @Chetane

I agree that ignore malformed just ignore your last document instead of throwing an exception. (It shouldn't try to coerce to a string, which makes no sense for a numeric field, but it should just ignore the illegal value).

Closing in favour of #11513

This bug is NOT a duplicate of #11513. This bug is about the fact that the documentation for ignore_malformed is incorrect and misleading. Just because the particular example referenced here (long vs. other numerics) was addressed, doesn't mean the problem went away (for example, it remains with strings vs. objects, etc). At minimum, there should be a warning in the documentation for ignore_malformed that it only applies in a narrow range of circumstances and does not ensure the document will get indexed.

Agree on that it should have not been closed.

+1 I was burned by the fact that there are a lot of situations that ignore_malformed doesn't apply to (fields containing consecutive dots in the field name, objects vs. concrete values)

+1

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ttaranov picture ttaranov  路  3Comments

brwe picture brwe  路  3Comments

jasontedor picture jasontedor  路  3Comments

clintongormley picture clintongormley  路  3Comments

rjernst picture rjernst  路  3Comments