Elasticsearch version (bin/elasticsearch --version
):
Version: 5.5.1, Build: 19c13d0/2017-07-18T20:44:24.823Z, JVM: 1.8.0_131
JVM version (java -version
):
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
OS version (uname -a
if on a Unix-like system):
Linux zbook-15-G3 4.4.0-53-generic #74-Ubuntu SMP Fri Dec 2 15:59:10 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
There are 3 elastic clusters and cluster_1 is configured to be able to search across clusters: cluster_2 and cluster_3.
When cluster_3 is down (cluster_1, cluster_2 are up) wildcard cluster search returns failure:
{
"error" : {
"root_cause" : [
{
"type" : "connect_transport_exception",
"reason" : "[][127.0.0.1:9303] connect_timeout[30s]"
}
],
"type" : "transport_exception",
"reason" : "unable to communicate with remote cluster [cluster_3]",
"caused_by" : {
"type" : "connect_transport_exception",
"reason" : "[][127.0.0.1:9303] connect_timeout[30s]",
"caused_by" : {
"type" : "annotated_connect_exception",
"reason" : "Connection refused: /127.0.0.1:9303",
"caused_by" : {
"type" : "connect_exception",
"reason" : "Connection refused"
}
}
}
},
"status" : 500
}
I would expect that result of all up clusters is returned.
Steps to reproduce:
curl -XPUT 'localhost:9202/test-2/data/1?pretty' -H 'Content-Type: application/json' -d'
{
"user" : "user2",
"message" : "test2"
}
'
curl -XPUT 'localhost:9203/test-3/data/1?pretty' -H 'Content-Type: application/json' -d'
{
"user" : "user3",
"message" : "test3"
}
'
curl -XPUT 'localhost:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d'
{
"persistent": {
"search": {
"remote": {
"cluster_2": {
"seeds": [
"127.0.0.1:9302"
]
},
"cluster_3": {
"seeds": [
"127.0.0.1:9303"
]
}
}
}
}
}
'
curl -XPOST 'localhost:9200/*:test-*/data/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"match_all": {}
}
}
'
Please also see console output from the test case:
test.console.txt
The same question was recently asked in our forums.
The problem you are facing is that none of the nodes are available in one of the remote clusters.
When doing a cross cluster search, one node per remote cluster is initially contacted to find out where the shards for such request are located. If this step cannot be completed, the whole search request fails. If you had at least one node available in each of the clusters, the search would rather go on although some other nodes, potentially nodes holding relevant shards, are down. This would just cause partial results, but not a total failure.
I don鈥檛 see a way to work around this other than checking upfront which remote clusters are online and avoid querying the ones that are completely down. In fact cross cluster search wasn't thought to query multiple clusters where some of them can just be shut down completely. Not too sure whether we want to change this, @s1monw what do you think?
I don鈥檛 see a way to work around this other than checking upfront which remote clusters are online and avoid querying the ones that are completely down. In fact cross cluster search wasn't thought to query multiple clusters where some of them can just be shut down completely. Not too sure whether we want to change this, @s1monw what do you think?
I think the problem is not as simple as it's presented here. What happens if a cluster is not connected toady is that we try to reconnect. If we _check_ up-front if a cluster is not connected then we need to skip this step? What should the behavior be? I think we can look into something like an additional setting that simply treats the cluster as not present if it can't be reconnected but I want to make sure that such a setting doesn't complicate the code too much. I think it would be confusing if we'd not fail here it's really a different issue than an unavailable shard.
Another maybe more consistent option would be to skip those clusters when we resolve wildcards. We would still need to figure out when we try to reconnect to them since we usually do that as part of the search request.
An use case would be to give a user access to disjoint data sets in different data centers.
For this use case we would like to be able to query the clusters that are reachable and ignore the rest.
Also from the DevOps standpoint it would not be nice to have dependencies between data centers.
Automation can be done in many different ways, CM tools run (Chef, Ansible, etc) and/or cloud or container-based deployments (OpenStack HEAT stacks, docker swarm, etc). It should be possible to start all elasticsearch nodes local to the datacenter and those nodes to accept writes and local reads and hopefully cross-cluster searches for all reachable clusters.
@acristu sorry for coming back to this so late. I took a closer look at it and my current plan is to allow you to set a setting search.remote.$clustername.ignore_disconnected
that defaults to false
and that allows you to set this per cluster you are connecting to. such that the default behavior is still unchanged but it allows you to basically skip clusters entirely if they are optional. This also allows for mandatory clusters which I personally see as a valid usecase here as well. WDYT /cc @javanna
@s1monw thank you, sounds good, this new setting hopefully will allow:
@s1monw ++ I like the idea
Have a way to check if the query results contain data from the "optional" remote clusters
I think we can do something about it. For instance we can maybe even make this default and add a _cluster
part to the response ie.:
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"_clusters" : {
"total" : 2,
"successful": 1,
"skipped": 1
}
this way we can simply let the user decide what they want to do and it's consistent with the _shards
. @javanna WDYT?
Most helpful comment
I think we can do something about it. For instance we can maybe even make this default and add a
_cluster
part to the response ie.:this way we can simply let the user decide what they want to do and it's consistent with the
_shards
. @javanna WDYT?