Elasticsearch: Error in cluster deployment: failed to send join request to master

Created on 20 Jun 2017 · 12Comments · Source: elastic/elasticsearch

I have three virtual machines(192.168.245.128, 192.168.245.129, 192.168.245.130), respectively, in the above installed ES5.1.2, in the configuration of the cluster environment is encountered errors, the error in the three machines are similar.

This can be between the three machines can ping, you can telnet.
The following information is displayed for each machine.

192.168.245.128

[2017-06-20T10:04:25,434][INFO ][o.e.t.TransportService   ] [node-1] publish_address {192.168.245.128:9300}, bound_addresses {192.168.245.128:9300}
[2017-06-20T10:04:25,440][INFO ][o.e.b.BootstrapCheck     ] [node-1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-06-20T10:04:33,811][INFO ][o.e.d.z.ZenDiscovery     ] [node-1] failed to send join request to master [{node-2}{X-m7gPTMQn2TsdlByavfEg}{S2ucttQDSXCqLjcyi7wjKA}{192.168.245.129}{192.168.245.129:9300}], reason [RemoteTransportException[[node-2][192.168.245.129:9300][internal:discovery/zen/join]]; nested: NotMasterException[Node [{node-2}{X-m7gPTMQn2TsdlByavfEg}{S2ucttQDSXCqLjcyi7wjKA}{192.168.245.129}{192.168.245.129:9300}] not master for join request]; ], tried [3] times

192.168.245.129

[2017-06-20T10:04:30,429][INFO ][o.e.t.TransportService   ] [node-2] publish_address {192.168.245.129:9300}, bound_addresses {192.168.245.129:9300}
[2017-06-20T10:04:30,435][INFO ][o.e.b.BootstrapCheck     ] [node-2] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-06-20T10:04:33,813][INFO ][o.e.d.z.ZenDiscovery     ] [node-2] failed to send join request to master [{node-1}{X-m7gPTMQn2TsdlByavfEg}{YFmPk0dUTh-S1Ef9of5RTA}{192.168.245.128}{192.168.245.128:9300}], reason [RemoteTransportException[[node-1][192.168.245.128:9300][internal:discovery/zen/join]]; nested: NotMasterException[Node [{node-1}{X-m7gPTMQn2TsdlByavfEg}{YFmPk0dUTh-S1Ef9of5RTA}{192.168.245.128}{192.168.245.128:9300}] not master for join request]; ], tried [3] times

192.168.245.130

[2017-06-20T10:04:33,983][INFO ][o.e.t.TransportService   ] [node-3] publish_address {192.168.245.130:9300}, bound_addresses {192.168.245.130:9300}
[2017-06-20T10:04:33,991][INFO ][o.e.b.BootstrapCheck     ] [node-3] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-06-20T10:04:37,354][INFO ][o.e.d.z.ZenDiscovery     ] [node-3] failed to send join request to master [{node-1}{X-m7gPTMQn2TsdlByavfEg}{YFmPk0dUTh-S1Ef9of5RTA}{192.168.245.128}{192.168.245.128:9300}], reason [RemoteTransportException[[node-1][192.168.245.128:9300][internal:discovery/zen/join]]; nested: NotMasterException[Node [{node-1}{X-m7gPTMQn2TsdlByavfEg}{YFmPk0dUTh-S1Ef9of5RTA}{192.168.245.128}{192.168.245.128:9300}] not master for join request]; ], tried [3] times

I chose a configuration file on a machine to show:

cluster.name: my-test
node.name: node-1
network.host: 192.168.245.128
discovery.zen.ping.unicast.hosts: ["192.168.245.129", "192.168.245.130"]
discovery.zen.minimum_master_nodes: 2

Source

tianmingxing

Most helpful comment

@bobby259 so basically the issue was as follows (Issue marked in bold):

we've setup images on AWS (one image for master/ingest nodes, one image for data nodes) all using the discovery-ec2 plug-in
we've tested the images and and build AMIs for autoscaling groups
after launching 3 master and 3 data nodes, the data nodes were ok, but the master nodes had the old /data folder inside from the testing step (obviously the nodes were not reachable for the data nodes due to wrong security group etc)

So the fix is: If you encounter this problem, then shutdown your cluster master nodes and delete the /data folders. Start your cluster nodes and it works!

Hope this helps you! Please comment/ confirm.

mlasak on 24 Apr 2018

👍6

All 12 comments

Elastic provides a forum for asking general questions and instead prefers to use GitHub only for verified bug reports and feature requests. There's an active community there that should be able to help get an answer to your question. As such, I hope you don't mind that I close this.

jasontedor on 20 Jun 2017

@jasontedor https://discuss.elastic.co/t/error-in-cluster-deployment-failed-to-send-join-request-to-master/90015, I always get no reply!

tianmingxing on 21 Jun 2017

@xiaoxing598 You asked that 1 day ago. The forums are offered on a best-efforts basis, and it sometimes takes a few days before anyone is available to respond.

In any case, the people who read and process these github issues are also active on the forums. If we haven't had time to answer it on the forum, then you're not going to improve the situation by ignoring our issue guidelines and asking it here.

tvernum on 21 Jun 2017

👍1

@tvernum I am sorry to bring you inconvenience, I just want to complete the task to the boss.

tianmingxing on 21 Jun 2017

Awesome. Stumbled upon similar problem (using ec2-discovery) now. On the forum the discussion on this is closed unanswered. Ok, let's see how many hours i will spend on this :(

mlasak on 14 Mar 2018

@mlasak Please open a discussion in the forum and we can probably help there.

dadoonet on 14 Mar 2018

Ok, i'll do that. But meanwhile i found the reason for the issue. I'll will document the problem and the solution on the forum so people can benefit from it. Thanks.

mlasak on 14 Mar 2018

👍3

@mlasak Can you share how you solved it? or, the link to the forum thread where you posted the solution already?

bobby259 on 24 Apr 2018

@bobby259 so basically the issue was as follows (Issue marked in bold):

we've setup images on AWS (one image for master/ingest nodes, one image for data nodes) all using the discovery-ec2 plug-in
we've tested the images and and build AMIs for autoscaling groups
after launching 3 master and 3 data nodes, the data nodes were ok, but the master nodes had the old /data folder inside from the testing step (obviously the nodes were not reachable for the data nodes due to wrong security group etc)

So the fix is: If you encounter this problem, then shutdown your cluster master nodes and delete the /data folders. Start your cluster nodes and it works!

Hope this helps you! Please comment/ confirm.

mlasak on 24 Apr 2018

👍6

@mlasak Thank you for the quick response.

bobby259 on 24 Apr 2018

@mlasak many thanks for the information : I have copied a test envirronment to prepare the production system including the data folder and cluster wasn't working ..
Removing the data folder and restart of elasticsearch made the job.

te701 on 9 May 2018

👍3

@bobby259 so basically the issue was as follows (Issue marked in bold):

we've setup images on AWS (one image for master/ingest nodes, one image for data nodes) all using the discovery-ec2 plug-in

we've tested the images and and build AMIs for autoscaling groups

after launching 3 master and 3 data nodes, the data nodes were ok, but the master nodes had the old /data folder inside from the testing step (obviously the nodes were not reachable for the data nodes due to wrong security group etc)

So the fix is: If you encounter this problem, then shutdown your cluster master nodes and delete the /data folders. Start your cluster nodes and it works!

Hope this helps you! Please comment/ confirm.

Thanks, guy, I soleved my question, because I copied my VM from node-1 to node-2, after I deleted the relative data, it`s OK now.