Elasticsearch version (bin/elasticsearch --version):
"version" : {
"number" : "6.2.3",
"build_hash" : "c59ff00",
"build_date" : "2018-03-13T10:06:29.741383Z",
"build_snapshot" : false,
"lucene_version" : "7.2.1",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
}
Plugins installed: []
repository-s3 6.2.3
x-pack-core 6.2.3
x-pack-deprecation 6.2.3
x-pack-graph 6.2.3
x-pack-logstash 6.2.3
x-pack-ml 6.2.3
x-pack-monitoring 6.2.3
x-pack-security 6.2.3
x-pack-upgrade 6.2.3
x-pack-watcher 6.2.3
JVM version (java -version):
java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
OS version (uname -a if on a Unix-like system):
Linux 4.4.0-1060-aws #69-Ubuntu SMP Sun May 20 13:42:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
index.number_of_routing_shards value is not visible in index settings after index creation.
If you add include_defaults it will always show a value of 5.
The expected results is that the defined index.number_of_routing_shards should show up under ES:9200/index/_settings
Steps to reproduce:
Create and index with index.number_of_routing_shards defined as something other than 5.
Look up the settings for the new index to confirm it is set to the value on creation.
To verify this is only a visual issue, try to split the newly created index.
Lock index for writes
Split index to a new index.
Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.
Create Index:
$ curl -s -X PUT "http://localhost:9200/split_test_1/" -H 'Content-Type: application/json' -d '{ "settings": { "index.number_of_shards" : "10", "index.number_of_routing_shards" : "20" } }'
{"acknowledged":true,"shards_acknowledged":true,"index":"split_test_1"}
Verify the number_of_routing_shards setting in the index is set to 20 ( note: this does not show up in the settings so I am adding include_defaults to the _settings call
curl -s 'http://localhost:9200/split_test_1/_settings?include_defaults&pretty' | grep number_of_routing_shards
"number_of_routing_shards" : "5",
Block writes to index
curl -s -X PUT "http://localhost:9200/split_test_1/_settings" -H 'Content-Type: application/json' -d '{ "settings": { "index.blocks.write" : true } }'
{"acknowledged":true}
curl -s -X POST "http://localhost:9200/split_test_1/_split/split_test_2/" -H 'Content-Type: application/json' -d '{ "settings": { "index.number_of_shards" : 20 } }'Provide logs (if relevant):
The index.number_of_routing_shards setting is an unusual setting in that it:
This is why the setting is not visible in the get index settings API.
Will this always be this way ? It makes it challenging to confirm if our template has applied the correct setting if we can not query that setting via the API
@tmortensen Heya, I am so sorry for the lack of clarity. That I left the issue open means that I think we should do something about this, yet it is not clear to me exactly what that is yet. We will expose this because I agree with you that it is important.
Pinging @elastic/es-core-infra
I found that calling the _split API with an incorrect number of destination shards returns the number_of_routing_shards value in the error message. Far from fool proof, but it seems to be the only workaround that I could come up with.
We restructure (group and partition etc.) our data before upload to optimize bulk uploads and control/throttle it. We restructure it according to where the data will eventually land. For some cases we use routing to shard mapping for this. Currently due to lack of access to number_of_routing_shards, we resort to using _search_shards api to get this mapping, instead of using OperationRouting facility to generate it offline. This becomes costly when we are not using explicit routing and id is used as the routing value, we have to compute this for every document, instead of a limited set of explicit routing values.
It would be nice to either have the routing-to-shard mapping algorithm made public (which essentially is, being a simple hash and mod computation), or provide access to number_of_routing_shards so we can compute it ourselves using the same method that OperationRouting uses.
I can elaborate more on why we do the restructuring, but essentially it is to ensure that we impose predictive indexing load on each node in the cluster, which is had to control with randomly distributed data that sprays documents to index, all over the cluster. Our intent, is to basically control how many concurrent writes we issue to each node and we do this by using shard-node allocation information and grouping our data accordingly.
Most helpful comment
I found that calling the _split API with an incorrect number of destination shards returns the
number_of_routing_shardsvalue in the error message. Far from fool proof, but it seems to be the only workaround that I could come up with.