Elasticsearch version (bin/elasticsearch --version):
Version: 7.9.0, Build: default/tar/a479a2a7fce0389512d6a9361301708b92dff667/2020-08-11T21:36:48.204330Z, JVM: 11.0.7
Plugins installed: []
None
JVM version (java -version):
锘匡豢锘縪penjdk version "11.0.7" 2020-04-14
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.7+10)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.7+10, mixed mode)
OS version (uname -a if on a Unix-like system):
Darwin myron3.local 19.5.0 Darwin Kernel Version 19.5.0: Tue May 26 20:41:44 PDT 2020; root:xnu-6153.121.2~2/RELEASE_X86_64 x86_64
Description of the problem including expected versus actual behavior:
In 7.8, this worked:
1) Create an index with a string property in the mapping with no null_value
2) Use PUT mapping to set the null value on the string property
In 7.9, the PUT mapping call does not explicitly fail, but it does not update the mapping, either.
Steps to reproduce:
Here's a bash script that reproduces it:
#!/usr/bin/env bash
port=$1
url_root="http://localhost:$port"
echo 'Elasticversion:'
curl -is $url_root | grep number
echo
echo 'Deleting example_index to have a clean slate...'
curl -X DELETE -H 'Content-type: application/json' $url_root/example_index?ignore_unavailable=true
echo
echo
echo 'Creating example_index with mapping property name with no null_value parameter'
curl -X PUT -d '{"aliases":{},"mappings":{"dynamic":"strict","properties":{"id":{"type":"keyword"},"name":{"type":"keyword"}}}}' -H 'Content-type: application/json' $url_root/example_index
echo
echo
echo 'The current index mapping is:'
curl -X GET -H 'Content-type: application/json' $url_root/example_index
echo
echo
echo 'Setting `null_value: Anonymous` on the `name` property'
curl -X PUT -d '{"dynamic":"strict","properties":{"id":{"type":"keyword"},"name":{"type":"keyword","null_value":"Anonymous"}}}' -H 'Content-type: application/json' $url_root/example_index/_mappings
echo
echo
echo 'The current index mapping is:'
curl -X GET -H 'Content-type: application/json' $url_root/example_index
echo
I am running Elasticsearch 7.9.0 on port 9234 and Elasticsearch 7.8.1 on 9734. Here's the output from running my test script against 7.9 vs 7.8.
First, against 7.8:
$ script/test_es_put_mapping 9734
Elasticversion:
"number" : "7.8.0",
Deleting example_index to have a clean slate...
{"acknowledged":true}
Creating example_index with mapping property name with no null_value parameter
{"acknowledged":true,"shards_acknowledged":true,"index":"example_index"}
The current index mapping is:
{"example_index":{"aliases":{},"mappings":{"dynamic":"strict","properties":{"id":{"type":"keyword"},"name":{"type":"keyword"}}},"settings":{"index":{"creation_date":"1597960745381","number_of_shards":"1","number_of_replicas":"1","uuid":"QG7QFLspQzGPKgUlwsqb4Q","version":{"created":"7080099"},"provided_name":"example_index"}}}}
Setting `null_value: Anonymous` on the `name` property
{"acknowledged":true}
The current index mapping is:
{"example_index":{"aliases":{},"mappings":{"dynamic":"strict","properties":{"id":{"type":"keyword"},"name":{"type":"keyword","null_value":"Anonymous"}}},"settings":{"index":{"creation_date":"1597960745381","number_of_shards":"1","number_of_replicas":"1","uuid":"QG7QFLspQzGPKgUlwsqb4Q","version":{"created":"7080099"},"provided_name":"example_index"}}}}
As you can see, the mapping gets updated with "null_value":"Anonymous" on the name property.
Here's the result on 7.9.0:
$ script/test_es_put_mapping 9234
Elasticversion:
"number" : "7.9.0",
Deleting example_index to have a clean slate...
{"acknowledged":true}
Creating example_index with mapping property name with no null_value parameter
{"acknowledged":true,"shards_acknowledged":true,"index":"example_index"}
The current index mapping is:
{"example_index":{"aliases":{},"mappings":{"dynamic":"strict","properties":{"id":{"type":"keyword"},"name":{"type":"keyword"}}},"settings":{"index":{"creation_date":"1597960791597","number_of_shards":"1","number_of_replicas":"1","uuid":"bClFvT8yRbSD8in6M1VKhg","version":{"created":"7090099"},"provided_name":"example_index"}}}}
Setting `null_value: Anonymous` on the `name` property
{"acknowledged":true}
The current index mapping is:
{"example_index":{"aliases":{},"mappings":{"dynamic":"strict","properties":{"id":{"type":"keyword"},"name":{"type":"keyword"}}},"settings":{"index":{"creation_date":"1597960791597","number_of_shards":"1","number_of_replicas":"1","uuid":"bClFvT8yRbSD8in6M1VKhg","version":{"created":"7090099"},"provided_name":"example_index"}}}}
As you can see, on 7.9, the PUT mapping call resulted in a successful {"acknowledged":true} response but it did not actually update the mapping.
Pinging @elastic/es-search (:Search/Mapping)
Thanks for opening this @myronmarston.
It looks like I inadvertently removed the ability to update null_value in https://github.com/elastic/elasticsearch/pull/57666, and then in subsequent refactorings just assumed that it should not be updateable because it's an index time setting and we don't have any tests that check for it. In the current branch you'll get an error if you try and update null_value, but in 7.9 it's silently ignored. I think we need to do two, possibly three things here:
a) a bugfix in the 7.9.x branch to restore the ability to update null_value to keyword, number, ip,date, boolean, icu_keyword, scaled_float, flattened and wildcard
b) double-check in the 7.10 branch that these are all either updateable or that the fix in 7.9 forward ports correctly
c) consider whether or not to deprecate making this updateable at all and disallow it in 8 - null_value is an index time setting and you'll get oddly inconsistent search results if you change it after you've indexed a bunch of documents.
We also need to improve our tests and documentation around this - I don't think we document anywhere whether or not a particular mapping parameter is updateable.
I don't think we document anywhere whether or not a particular mapping parameter is updateable.
In at least some cases, the docs do state this. For example, norms:
Norms can be disabled (but not reenabled after the fact), using the PUT mapping API like so:
And meta:
Field metadata is updatable by submitting a mapping update. The metadata of the update will override the metadata of the existing field.
But yeah, it would be good for each mapping parameter to explicitly state if it is updatable or not. The null_value docs definitely don't say either way.
Thinking about this more: I think the fact that null_value can be updated at all is a bug. If you update it after documents have already been indexed then it's going to produce inconsistent results. So I think the actual change here ought to be that instead of silently ignoring the update, we throw an error - in 7.9.x, 7.10 and onwards. This is a breaking change, but I think it's a reasonable one, in that users shouldn't be relying on this behaviour in the first place and you can easily work around it by ensuring that the setting is present in your initial mappings.
That's reasonable. It was unclear to me as a user if null_value is applied at indexing time or at query time. It's easy to imagine an implementation that applies it at query time, in which case there wouldn't be a problem with changing it on an existing index. But given it's applied at indexing time and won't work consistently when its changed, disallowing the update is reasonable.
That said, it's conceivable that a user might be OK with old indexed document having a different null value than newly indexed documents, and they may still want to change the null_value mapping parameter on an existing index. You mention that this can easily be worked around by ensuring the setting is present in the initial mappings, but if you change your mind on what you want the null value to be there's no work around. You just have to make a new index and re-index...which would be pretty annoying if you are OK with old and newly indexed documents having different null values.
Given that (plus the fact that it would be a breaking change to disallow it), would it make more sense to continue to maintain support for changing the null_value parameter, but document the behavior so users are aware of the inconsistency?