Elasticsearch: [CI] MixedClusterTest are failing while waiting for a 4-node cluster to form

Created on 2 Nov 2017  路  16Comments  路  Source: elastic/elasticsearch

There currently are many instances of build failures where the Mixed Cluster Tests are failing because the test is waiting on a connection to check the cluster health but this fails with:

12:48:36 [ant:get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow

Example for failures: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.0+java9-periodic/1185/consoleFull

12:47:52   Task has not declared any outputs.
12:48:06 [ant:echo] ==> [Thu Nov 02 12:48:06 UTC 2017] checking health: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:06      [echo] ==> [Thu Nov 02 12:48:06 UTC 2017] checking health: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:06 [ant:get] Getting: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:06       [get] Getting: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:06 [ant:get] To: /var/lib/jenkins/workspace/elastic+elasticsearch+6.0+java9-periodic/qa/mixed-cluster/build/cluster/v5.6.4-SNAPSHOT#mixedClusterTestCluster node0/cwd/wait.success
12:48:06       [get] To: /var/lib/jenkins/workspace/elastic+elasticsearch+6.0+java9-periodic/qa/mixed-cluster/build/cluster/v5.6.4-SNAPSHOT#mixedClusterTestCluster node0/cwd/wait.success
12:48:36 [ant:get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:36       [get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:36 [ant:get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:36       [get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:36 [ant:get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:36       [get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:36 [ant:get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:36       [get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:36 [ant:get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:36       [get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:36 [ant:get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:36       [get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:36 Node 0 output:
12:48:36 [ant:get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:36 |-----------------------------------------
12:48:36       [get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:36 [ant:get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:36       [get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:36 [ant:get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:36       [get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:36 [ant:get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:36 |  failure marker exists: false
12:48:36 |  pid file exists: true
12:48:36 |  http ports file exists: true
12:48:36       [get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
12:48:36 [ant:get] Can't get http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow to /var/lib/jenkins/workspace/elastic+elasticsearch+6.0+java9-periodic/qa/mixed-cluster/build/cluster/v5.6.4-SNAPSHOT#mixedClusterTestCluster node0/cwd/wait.success
12:48:36       [get] Can't get http://[::1]:40959/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow to /var/lib/jenkins/workspace/elastic+elasticsearch+6.0+java9-periodic/qa/mixed-cluster/build/cluster/v5.6.4-SNAPSHOT#mixedClusterTestCluster node0/cwd/wait.success

The above failure also has:

12:48:36 |    [2017-11-02T12:48:36,421][WARN ][r.suppressed             ] path: /_cluster/health, params: {wait_for_status=yellow, wait_for_nodes=>=4}
12:48:36 |    org.elasticsearch.discovery.MasterNotDiscoveredException: null
12:48:36 |      at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:209) [elasticsearch-5.6.4-SNAPSHOT.jar:5.6.4-SNAPSHOT]
12:48:36 |      at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:311) [elasticsearch-5.6.4-SNAPSHOT.jar:5.6.4-SNAPSHOT]
12:48:36 |      at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:238) [elasticsearch-5.6.4-SNAPSHOT.jar:5.6.4-SNAPSHOT]
12:48:36 |      at org.elasticsearch.cluster.service.ClusterService$NotifyTimeout.run(ClusterService.java:1056) [elasticsearch-5.6.4-SNAPSHOT.jar:5.6.4-SNAPSHOT]
12:48:36 |      at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.4-SNAPSHOT.jar:5.6.4-SNAPSHOT]
12:48:36 |      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167) [?:?]
12:48:36 |      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641) [?:?]
12:48:36 |      at java.lang.Thread.run(Thread.java:844) [?:?]

In one of the node logs.

:DeliverBuild >test >test-failure Delivery

All 16 comments

This one on master looks very similar:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-darwin-compatibility/1301/consoleFull

13:33:58 Executing task ':qa:mixed-cluster:v6.1.0-SNAPSHOT#mixedClusterTestCluster#wait' (up-to-date check took 0.0 secs) due to:
13:33:58   Task has not declared any outputs.
13:34:12 [ant:echo] ==> [Thu Nov 02 06:34:12 PDT 2017] checking health: http://[::1]:54323/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
13:34:12      [echo] ==> [Thu Nov 02 06:34:12 PDT 2017] checking health: http://[::1]:54323/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
13:34:12 [ant:get] Getting: http://[::1]:54323/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
13:34:12       [get] Getting: http://[::1]:54323/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
13:34:12 [ant:get] To: /private/var/lib/jenkins/workspace/elastic+elasticsearch+master+multijob-darwin-compatibility/qa/mixed-cluster/build/cluster/v6.1.0-SNAPSHOT#mixedClusterTestCluster node0/cwd/wait.success
13:34:12       [get] To: /private/var/lib/jenkins/workspace/elastic+elasticsearch+master+multijob-darwin-compatibility/qa/mixed-cluster/build/cluster/v6.1.0-SNAPSHOT#mixedClusterTestCluster node0/cwd/wait.success
13:34:42 [ant:get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:54323/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
13:34:42       [get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:54323/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
13:34:42 [ant:get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:54323/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
13:34:42       [get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:54323/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
13:34:42 [ant:get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://[::1]:54323/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
...
13:34:42 Node 0 output:
[...]
13:34:42 |-----------------------------------------
13:34:42 |  failure marker exists: false
13:34:42 |  pid file exists: true
13:34:42 |  http ports file exists: true
13:34:42 |  transport ports file exists: true
13:34:42 |
13:34:42 |  [ant output]
13:34:42 |
13:34:42 |  [log]
13:34:42 |    warning: ignoring JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8
13:34:42 |    [2017-11-02T06:33:52,707][INFO ][o.e.n.Node               ] [node-0] initializing ...
13:34:42 |    [2017-11-02T06:33:52,785][INFO ][o.e.e.NodeEnvironment    ] [node-0] using [1] data paths, mounts [[/ (/dev/disk0s2)]], net usable_space [857.4gb], net total_space [930.5gb], types [hfs]
13:34:42 |    [2017-11-02T06:33:52,785][INFO ][o.e.e.NodeEnvironment    ] [node-0] heap size [494.9mb], compressed ordinary object pointers [true]
13:34:42 |    [2017-11-02T06:33:52,787][INFO ][o.e.n.Node               ] [node-0] node name [node-0], node ID [MVV26-YjRoeo3tgRJ-vBTA]
13:34:42 |    [2017-11-02T06:33:52,788][INFO ][o.e.n.Node               ] [node-0] version[6.1.0-SNAPSHOT], pid[29965], build[f928d61/2017-11-02T13:31:46.401Z], OS[Mac OS X/10.12.6/x86_64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_141/25.141-b15]
13:34:42 |    [2017-11-02T06:33:52,788][INFO ][o.e.n.Node               ] [node-0] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -XX:+HeapDumpOnOutOfMemoryError, -Xms512m, -Xmx512m, -ea, -esa, -Des.path.home=/private/var/lib/jenkins/workspace/elastic+elasticsearch+master+multijob-darwin-compatibility/qa/mixed-cluster/build/cluster/v6.1.0-SNAPSHOT#mixedClusterTestCluster node0/elasticsearch-6.1.0-SNAPSHOT, -Des.path.conf=/private/var/lib/jenkins/workspace/elastic+elasticsearch+master+multijob-darwin-compatibility/qa/mixed-cluster/build/cluster/v6.1.0-SNAPSHOT#mixedClusterTestCluster node0/elasticsearch-6.1.0-SNAPSHOT/config]
13:34:42 |    [2017-11-02T06:33:52,788][WARN ][o.e.n.Node               ] [node-0] version [6.1.0-SNAPSHOT] is a pre-release version of Elasticsearch and is not suitable for production
13:34:42 |    [2017-11-02T06:33:53,847][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [aggs-matrix-stats]
13:34:42 |    [2017-11-02T06:33:53,847][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [analysis-common]
13:34:42 |    [2017-11-02T06:33:53,847][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [ingest-common]
13:34:42 |    [2017-11-02T06:33:53,847][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [lang-expression]
13:34:42 |    [2017-11-02T06:33:53,848][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [lang-mustache]
13:34:42 |    [2017-11-02T06:33:53,848][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [lang-painless]
13:34:42 |    [2017-11-02T06:33:53,848][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [mapper-extras]
13:34:42 |    [2017-11-02T06:33:53,848][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [parent-join]
13:34:42 |    [2017-11-02T06:33:53,848][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [percolator]
13:34:42 |    [2017-11-02T06:33:53,848][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [reindex]
13:34:42 |    [2017-11-02T06:33:53,848][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [repository-url]
13:34:42 |    [2017-11-02T06:33:53,848][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [transport-netty4]
13:34:42 |    [2017-11-02T06:33:53,848][INFO ][o.e.p.PluginsService     ] [node-0] loaded module [tribe]
13:34:42 |    [2017-11-02T06:33:53,849][INFO ][o.e.p.PluginsService     ] [node-0] no plugins loaded
13:34:42 |    [2017-11-02T06:33:55,503][INFO ][o.e.d.DiscoveryModule    ] [node-0] using discovery type [zen]
13:34:42 |    [2017-11-02T06:33:56,141][INFO ][o.e.n.Node               ] [node-0] initialized
13:34:42 |    [2017-11-02T06:33:56,141][INFO ][o.e.n.Node               ] [node-0] starting ...
13:34:42 |    [2017-11-02T06:33:56,316][INFO ][o.e.t.TransportService   ] [node-0] publish_address {127.0.0.1:54321}, bound_addresses {[fe80::1]:54319}, {[::1]:54320}, {127.0.0.1:54321}
13:34:42 |    [2017-11-02T06:33:56,347][INFO ][o.e.h.n.Netty4HttpServerTransport] [node-0] publish_address {127.0.0.1:54324}, bound_addresses {[fe80::1]:54322}, {[::1]:54323}, {127.0.0.1:54324}
13:34:42 |    [2017-11-02T06:33:56,350][INFO ][o.e.n.Node               ] [node-0] started
13:34:42 |    [2017-11-02T06:33:59,358][WARN ][o.e.d.z.ZenDiscovery     ] [node-0] not enough master nodes discovered during pinging (found [[Candidate{node={node-0}{MVV26-YjRoeo3tgRJ-vBTA}{EfKwddXlRR20DNZwVtHdpA}{127.0.0.1}{127.0.0.1:54321}{testattr=test}, clusterStateVersion=-1}]], but needed [3]), pinging again
13:34:42 |    [2017-11-02T06:34:02,423][WARN ][o.e.d.z.ZenDiscovery     ] [node-0] not enough master nodes discovered during pinging (found [[Candidate{node={node-0}{MVV26-YjRoeo3tgRJ-vBTA}{EfKwddXlRR20DNZwVtHdpA}{127.0.0.1}{127.0.0.1:54321}{testattr=test}, clusterStateVersion=-1}]], but needed [3]), pinging again
13:34:42 |    [2017-11-02T06:34:05,441][WARN ][o.e.d.z.ZenDiscovery     ] [node-0] not enough master nodes discovered during pinging (found [[Candidate{node={node-0}{MVV26-YjRoeo3tgRJ-vBTA}{EfKwddXlRR20DNZwVtHdpA}{127.0.0.1}{127.0.0.1:54321}{testattr=test}, clusterStateVersion=-1}]], but needed [3]), pinging again
13:34:42 |    [2017-11-02T06:34:08,499][WARN ][o.e.d.z.ZenDiscovery     ] [node-0] not enough master nodes discovered during pinging (found [[Candidate{node={node-0}{MVV26-YjRoeo3tgRJ-vBTA}{EfKwddXlRR20DNZwVtHdpA}{127.0.0.1}{127.0.0.1:54321}{testattr=test}, clusterStateVersion=-1}]], but needed [3]), pinging again
13:34:42 |    [2017-11-02T06:34:12,341][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [node-0] no known master node, scheduling a retry
13:34:42 |    [2017-11-02T06:34:42,403][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [node-0] timed out while retrying [cluster:monitor/health] after failure (timeout [30s])
13:34:42 |    [2017-11-02T06:34:42,418][WARN ][r.suppressed             ] path: /_cluster/health, params: {wait_for_status=yellow, wait_for_nodes=>=4}
13:34:42 |    org.elasticsearch.discovery.MasterNotDiscoveredException: null
13:34:42 |      at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:209) [elasticsearch-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
13:34:42 |      at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:317) [elasticsearch-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
13:34:42 |      at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:244) [elasticsearch-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
13:34:42 |      at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:578) [elasticsearch-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
13:34:42 |      at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:568) [elasticsearch-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
13:34:42 |      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_141]
13:34:42 |      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_141]
13:34:42 |      at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]

And another one on master: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=ubuntu/1665/console

This one has HTTP 408 errors instead the 503 from above, and I cannot find MasterNotDiscoveredException, but the rest looks very similar.

13:26:08 [ant:get] Error opening connection java.io.IOException: Server returned HTTP response code: 408 for URL: http://[::1]:35632/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow

This has been failing for quite a while now.
The issue here is that we have a 4 node cluster. minimum_master_nodes is 3 and the first node of the cluster (node-0) is the only host on the unicast.hosts list.
I'm not sure why, but node-1, node-2 and node-3, learning about each other through node-0, form a cluster. Once they've formed a cluster, they stop pinging node-0, which has now no way of getting to know about the other nodes. The test fails while waiting for a 4-node cluster to form.
We could either set min-master-nodes to 4, or extend the unicast hosts lists with more than just the first node.

@rjernst Would you be able to look at this and see if we can change something in the build, or reassign to anybody who might be better to move this forward?

':qa:mixed-cluster:v5.6.4-SNAPSHOT#mixedClusterTestCluster#wait'
HTTP response code: 503 for URL: http://127.0.0.1:53945/_cluster/health?wait_for_nodes=>=4&wait_for_status=yellow
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.0+multijob-windows-compatibility/322/console

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+intake/753/ looks related, but different. In that run, nodes 0, 2 and 3 formed a cluster and node 1 was left out in the cold. Discussed this with @ywelsch and I think we'll try increasing the logging level, as at the moment there's not much to go on.

I've pushed e04e5ab037f6598521cd3707cb57a26a85f364d8

Meanwhile, I managed to reproduce locally using

while gradle :qa:mixed-cluster:v6.1.0-SNAPSHOT#mixedClusterTestRunner -Dtests.seed=516D102959ABE203 -Dtests.class=org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT -Dtests.method="test {p0=indices.shrink/20_source_mapping/Shrink index ignores target template mapping}" -Dtests.security.manager=true -Dtests.locale=et -Dtests.timezone=Asia/Beirut ; do :; done

The failure, like in https://github.com/elastic/elasticsearch/issues/27233#issuecomment-343154332, was that nodes 0, 2 and 3 formed a cluster and node 1 was left out. Here is the tail of node 1's log:

[2017-11-09T14:09:44,243][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [1] sending to {#zen_unicast_[::1]:63689_0#}{j39CE0vISIeTHh2XbSU3xw}{0:0:0:0:0:0:0:1}{[::1]:63689}
[2017-11-09T14:09:44,251][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [1] received response from {#zen_unicast_[::1]:63689_0#}{j39CE0vISIeTHh2XbSU3xw}{0:0:0:0:0:0:0:1}{[::1]:63689}: [ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[3], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[5], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[8], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[10], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[12], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[15], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[1], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[17], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}]
[2017-11-09T14:09:45,077][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [1] sending to {#zen_unicast_[::1]:63689_0#}{j39CE0vISIeTHh2XbSU3xw}{0:0:0:0:0:0:0:1}{[::1]:63689}
[2017-11-09T14:09:45,077][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [1] opening connection to [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}]
[2017-11-09T14:09:45,077][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [1] sending to {node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}
[2017-11-09T14:09:45,080][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [1] received response from {node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}: [ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[1], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[18], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[4], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[5], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}]
[2017-11-09T14:09:45,083][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [1] received response from {#zen_unicast_[::1]:63689_0#}{j39CE0vISIeTHh2XbSU3xw}{0:0:0:0:0:0:0:1}{[::1]:63689}: [ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[5], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[8], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[10], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[12], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[15], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[1], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[18], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[4], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[20], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}]
[2017-11-09T14:09:45,131][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [1] sending to {node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}
[2017-11-09T14:09:45,133][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [1] received response from {node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}: [ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[5], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[8], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[10], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[12], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[15], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[1], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[18], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[4], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[4], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[21], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}]
[2017-11-09T14:09:46,078][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [1] sending to {node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}
[2017-11-09T14:09:46,078][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [1] sending to {#zen_unicast_[::1]:63689_0#}{j39CE0vISIeTHh2XbSU3xw}{0:0:0:0:0:0:0:1}{[::1]:63689}
[2017-11-09T14:09:46,079][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [1] sending to {node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}
[2017-11-09T14:09:46,079][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [1] received response from {node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}: [ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[1], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[18], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[4], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[23], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[7], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[8], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}]
[2017-11-09T14:09:46,093][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [1] received response from {#zen_unicast_[::1]:63689_0#}{j39CE0vISIeTHh2XbSU3xw}{0:0:0:0:0:0:0:1}{[::1]:63689}: [ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[8], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[10], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[12], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[15], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[1], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[18], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[4], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[4], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-2}{OJm_zDfrTueCYDF95mGUKw}{fisftQAhQb-qeVStt3vfMQ}{127.0.0.1}{127.0.0.1:63721}{testattr=test}], id[1], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[23], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[7], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[7], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[26], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}]
[2017-11-09T14:09:46,093][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [1] received response from {node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}: [ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[8], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[10], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[12], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[15], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[1], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[18], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[4], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[4], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-2}{OJm_zDfrTueCYDF95mGUKw}{fisftQAhQb-qeVStt3vfMQ}{127.0.0.1}{127.0.0.1:63721}{testattr=test}], id[1], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[23], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[7], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[25], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}]
[2017-11-09T14:09:47,079][TRACE][o.e.d.z.ZenDiscovery     ] [node-1] full ping responses:
        --> ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[26], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}
        --> ping_response{node [{node-2}{OJm_zDfrTueCYDF95mGUKw}{fisftQAhQb-qeVStt3vfMQ}{127.0.0.1}{127.0.0.1:63721}{testattr=test}], id[1], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}
[2017-11-09T14:09:47,081][DEBUG][o.e.d.z.ZenDiscovery     ] [node-1] filtered ping responses: (ignore_non_masters [false])
        --> ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[26], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}
        --> ping_response{node [{node-2}{OJm_zDfrTueCYDF95mGUKw}{fisftQAhQb-qeVStt3vfMQ}{127.0.0.1}{127.0.0.1:63721}{testattr=test}], id[1], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}
        --> ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[9], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}
[2017-11-09T14:09:47,082][TRACE][o.e.d.z.ZenDiscovery     ] [node-1] candidate Candidate{node={node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}, clusterStateVersion=-1} won election
[2017-11-09T14:09:47,082][DEBUG][o.e.d.z.ZenDiscovery     ] [node-1] elected as master, waiting for incoming joins ([2] needed)
[2017-11-09T14:09:47,083][TRACE][o.e.d.z.NodeJoinController] [node-1] not enough joins for election. Got [0], required [2]
[2017-11-09T14:10:17,086][TRACE][o.e.d.z.NodeJoinController] [node-1] timed out waiting to be elected. waited [30s]. pending master node joins [0]
[2017-11-09T14:10:17,092][TRACE][o.e.d.z.NodeJoinController] [node-1] stopping election ([timed out waiting to be elected])
[2017-11-09T14:10:17,099][DEBUG][o.e.c.s.MasterService    ] [node-1] processing [zen-disco-election-stop [timed out waiting to be elected]]: execute
[2017-11-09T14:10:17,102][TRACE][o.e.d.z.ZenDiscovery     ] [node-1] failed while waiting for nodes to join, rejoining
org.elasticsearch.cluster.NotMasterException: election stopped [zen-disco-election-stop [timed out waiting to be elected]]
[2017-11-09T14:10:17,104][DEBUG][o.e.c.s.MasterService    ] [node-1] processing [zen-disco-election-stop [timed out waiting to be elected]]: took [4ms] no change in cluster state
[2017-11-09T14:10:17,104][TRACE][o.e.d.z.NodeJoinController] [node-1] starting an election context, will accumulate joins
[2017-11-09T14:10:17,104][TRACE][o.e.d.z.ZenDiscovery     ] [node-1] starting to ping
[2017-11-09T14:10:17,105][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] resolved host [[::1]:63689] to [[::1]:63689]
[2017-11-09T14:10:17,105][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [2] sending to {node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}
[2017-11-09T14:10:17,105][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [2] opening connection to [{#zen_unicast_[::1]:63689_0#}{ZyrLye1YRvKyE6Yhlj0cgg}{0:0:0:0:0:0:0:1}{[::1]:63689}]
[2017-11-09T14:10:17,105][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [2] received response from {node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}: [ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[10], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[11], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}]
[2017-11-09T14:10:17,120][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [2] sending to {#zen_unicast_[::1]:63689_0#}{ZyrLye1YRvKyE6Yhlj0cgg}{0:0:0:0:0:0:0:1}{[::1]:63689}
[2017-11-09T14:10:17,121][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [2] received response from {#zen_unicast_[::1]:63689_0#}{ZyrLye1YRvKyE6Yhlj0cgg}{0:0:0:0:0:0:0:1}{[::1]:63689}: [ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[10], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[35], master [{node-3}{0DHD1bJzSpKIJTWbKoyO-g}{-vvW1WIfTyC5L-QO2p_uWQ}{127.0.0.1}{127.0.0.1:63731}{testattr=test}],cluster_state_version [2], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}]
[2017-11-09T14:10:18,106][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [2] sending to {node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}
[2017-11-09T14:10:18,106][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [2] sending to {#zen_unicast_[::1]:63689_0#}{ZyrLye1YRvKyE6Yhlj0cgg}{0:0:0:0:0:0:0:1}{[::1]:63689}
[2017-11-09T14:10:18,106][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [2] received response from {node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}: [ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[10], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[12], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[13], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}]
[2017-11-09T14:10:18,108][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [2] received response from {#zen_unicast_[::1]:63689_0#}{ZyrLye1YRvKyE6Yhlj0cgg}{0:0:0:0:0:0:0:1}{[::1]:63689}: [ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[10], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[12], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[36], master [{node-3}{0DHD1bJzSpKIJTWbKoyO-g}{-vvW1WIfTyC5L-QO2p_uWQ}{127.0.0.1}{127.0.0.1:63731}{testattr=test}],cluster_state_version [2], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}]
[2017-11-09T14:10:19,105][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [2] sending to {node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}
[2017-11-09T14:10:19,105][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [2] sending to {#zen_unicast_[::1]:63689_0#}{ZyrLye1YRvKyE6Yhlj0cgg}{0:0:0:0:0:0:0:1}{[::1]:63689}
[2017-11-09T14:10:19,106][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [2] received response from {node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}: [ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[10], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[12], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[14], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[15], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}]
[2017-11-09T14:10:19,107][TRACE][o.e.d.z.UnicastZenPing   ] [node-1] [2] received response from {#zen_unicast_[::1]:63689_0#}{ZyrLye1YRvKyE6Yhlj0cgg}{0:0:0:0:0:0:0:1}{[::1]:63689}: [ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[10], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[12], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-1}{3pVseLUbRUe1q83D8CUIjg}{U_xJZxXER229ZSWkoPc-Mg}{127.0.0.1}{127.0.0.1:63710}{testattr=test}], id[14], master [null],cluster_state_version [-1], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}, ping_response{node [{node-0}{lKGBv8BlQc2Xxf0ByfEXgw}{5Y6tZSVaQOOGiPGe_FEZ-A}{127.0.0.1}{127.0.0.1:63690}{testattr=test}], id[37], master [{node-3}{0DHD1bJzSpKIJTWbKoyO-g}{-vvW1WIfTyC5L-QO2p_uWQ}{127.0.0.1}{127.0.0.1:63731}{testattr=test}],cluster_state_version [2], cluster_name[qa_mixed-cluster_v6.1.0-SNAPSHOT#mixedClusterTestCluster]}]

So node 1 received pings from nodes 0 and 2 and hence started an election. Meanwhile node 3 also started an election with nodes 0 and 2 and won the race, so they do not vote for node 1, so after 30 seconds it times out...

[2017-11-09T14:10:17,086][TRACE][o.e.d.z.NodeJoinController] [node-1] timed out waiting to be elected. waited [30s]. pending master node joins [0]

... and starts another round of pinging. If this pinging had succeeded then everything would have been fine, but the test only waits for 40 seconds for the cluster to start, and two rounds of pinging plus 30 seconds plus various other overheads manages to exceed the 40-second test timeout.

Two simple resolutions would be to increase the 40-second timeout in the test or to reduce discovery.zen.master_election.wait_for_joins_timeout to something much less than 30 seconds. I don't especially like either: bumping the 40 seconds will increase the test runtime accordingly, and reducing the timeout takes us further away from a "standard" configuration, but of the two I think I prefer the latter. @ywelsch just opened https://github.com/elastic/elasticsearch/pull/27332 to do this.

A further possibility would be to stagger the startup of the nodes enough to reduce the frequency with which this race occurs.

@albertzaharovits I'm not sure about https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.0+multijob-windows-compatibility/322/console - that just looks like a complete failure to form a cluster. Nodes 0 and 1 managed to find each other, but 2 and 3 never got in touch with anyone else. Could also be timeout related, if the CI machine was a slow one.

I've pushed #27344 which should fix the test failures seen here. I would prefer to set minimum_master_nodes to a more realistic value in the REST tests, but that's going to require more work.

@ywelsch it looks like it failed again in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.3+intake/131/console in a very similair way. Could you take a look?

That's a completely different test suite. Also, just because cluster formation did not work, does not mean it's the same failure, there could be N many reasons :)
Can you open a new issue and assign @nik9000. He worked last on this.

Was this page helpful?
0 / 5 - 0 ratings