Elasticsearch: Error in cluster deployment: failed to send join request to master

Created on 20 Jun 2017  路  12Comments  路  Source: elastic/elasticsearch

I have three virtual machines(192.168.245.128, 192.168.245.129, 192.168.245.130), respectively, in the above installed ES5.1.2, in the configuration of the cluster environment is encountered errors, the error in the three machines are similar.

This can be between the three machines can ping, you can telnet.
The following information is displayed for each machine.

192.168.245.128

[2017-06-20T10:04:25,434][INFO ][o.e.t.TransportService   ] [node-1] publish_address {192.168.245.128:9300}, bound_addresses {192.168.245.128:9300}
[2017-06-20T10:04:25,440][INFO ][o.e.b.BootstrapCheck     ] [node-1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-06-20T10:04:33,811][INFO ][o.e.d.z.ZenDiscovery     ] [node-1] failed to send join request to master [{node-2}{X-m7gPTMQn2TsdlByavfEg}{S2ucttQDSXCqLjcyi7wjKA}{192.168.245.129}{192.168.245.129:9300}], reason [RemoteTransportException[[node-2][192.168.245.129:9300][internal:discovery/zen/join]]; nested: NotMasterException[Node [{node-2}{X-m7gPTMQn2TsdlByavfEg}{S2ucttQDSXCqLjcyi7wjKA}{192.168.245.129}{192.168.245.129:9300}] not master for join request]; ], tried [3] times

192.168.245.129

[2017-06-20T10:04:30,429][INFO ][o.e.t.TransportService   ] [node-2] publish_address {192.168.245.129:9300}, bound_addresses {192.168.245.129:9300}
[2017-06-20T10:04:30,435][INFO ][o.e.b.BootstrapCheck     ] [node-2] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-06-20T10:04:33,813][INFO ][o.e.d.z.ZenDiscovery     ] [node-2] failed to send join request to master [{node-1}{X-m7gPTMQn2TsdlByavfEg}{YFmPk0dUTh-S1Ef9of5RTA}{192.168.245.128}{192.168.245.128:9300}], reason [RemoteTransportException[[node-1][192.168.245.128:9300][internal:discovery/zen/join]]; nested: NotMasterException[Node [{node-1}{X-m7gPTMQn2TsdlByavfEg}{YFmPk0dUTh-S1Ef9of5RTA}{192.168.245.128}{192.168.245.128:9300}] not master for join request]; ], tried [3] times

192.168.245.130

[2017-06-20T10:04:33,983][INFO ][o.e.t.TransportService   ] [node-3] publish_address {192.168.245.130:9300}, bound_addresses {192.168.245.130:9300}
[2017-06-20T10:04:33,991][INFO ][o.e.b.BootstrapCheck     ] [node-3] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-06-20T10:04:37,354][INFO ][o.e.d.z.ZenDiscovery     ] [node-3] failed to send join request to master [{node-1}{X-m7gPTMQn2TsdlByavfEg}{YFmPk0dUTh-S1Ef9of5RTA}{192.168.245.128}{192.168.245.128:9300}], reason [RemoteTransportException[[node-1][192.168.245.128:9300][internal:discovery/zen/join]]; nested: NotMasterException[Node [{node-1}{X-m7gPTMQn2TsdlByavfEg}{YFmPk0dUTh-S1Ef9of5RTA}{192.168.245.128}{192.168.245.128:9300}] not master for join request]; ], tried [3] times

I chose a configuration file on a machine to show:

cluster.name: my-test
node.name: node-1
network.host: 192.168.245.128
discovery.zen.ping.unicast.hosts: ["192.168.245.129", "192.168.245.130"]
discovery.zen.minimum_master_nodes: 2

Most helpful comment

@bobby259 so basically the issue was as follows (Issue marked in bold):

  • we've setup images on AWS (one image for master/ingest nodes, one image for data nodes) all using the discovery-ec2 plug-in
  • we've tested the images and and build AMIs for autoscaling groups
  • after launching 3 master and 3 data nodes, the data nodes were ok, but the master nodes had the old /data folder inside from the testing step (obviously the nodes were not reachable for the data nodes due to wrong security group etc)

So the fix is: If you encounter this problem, then shutdown your cluster master nodes and delete the /data folders. Start your cluster nodes and it works!

Hope this helps you! Please comment/ confirm.

All 12 comments

Elastic provides a forum for asking general questions and instead prefers to use GitHub only for verified bug reports and feature requests. There's an active community there that should be able to help get an answer to your question. As such, I hope you don't mind that I close this.

@xiaoxing598 You asked that 1 day ago. The forums are offered on a best-efforts basis, and it sometimes takes a few days before anyone is available to respond.

In any case, the people who read and process these github issues are also active on the forums. If we haven't had time to answer it on the forum, then you're not going to improve the situation by ignoring our issue guidelines and asking it here.

@tvernum I am sorry to bring you inconvenience, I just want to complete the task to the boss.

Awesome. Stumbled upon similar problem (using ec2-discovery) now. On the forum the discussion on this is closed unanswered. Ok, let's see how many hours i will spend on this :(

@mlasak Please open a discussion in the forum and we can probably help there.

Ok, i'll do that. But meanwhile i found the reason for the issue. I'll will document the problem and the solution on the forum so people can benefit from it. Thanks.

@mlasak Can you share how you solved it? or, the link to the forum thread where you posted the solution already?

@bobby259 so basically the issue was as follows (Issue marked in bold):

  • we've setup images on AWS (one image for master/ingest nodes, one image for data nodes) all using the discovery-ec2 plug-in
  • we've tested the images and and build AMIs for autoscaling groups
  • after launching 3 master and 3 data nodes, the data nodes were ok, but the master nodes had the old /data folder inside from the testing step (obviously the nodes were not reachable for the data nodes due to wrong security group etc)

So the fix is: If you encounter this problem, then shutdown your cluster master nodes and delete the /data folders. Start your cluster nodes and it works!

Hope this helps you! Please comment/ confirm.

@mlasak Thank you for the quick response.

@mlasak many thanks for the information : I have copied a test envirronment to prepare the production system including the data folder and cluster wasn't working ..
Removing the data folder and restart of elasticsearch made the job.

@bobby259 so basically the issue was as follows (Issue marked in bold):

  • we've setup images on AWS (one image for master/ingest nodes, one image for data nodes) all using the discovery-ec2 plug-in
  • we've tested the images and and build AMIs for autoscaling groups
  • after launching 3 master and 3 data nodes, the data nodes were ok, but the master nodes had the old /data folder inside from the testing step (obviously the nodes were not reachable for the data nodes due to wrong security group etc)

So the fix is: If you encounter this problem, then shutdown your cluster master nodes and delete the /data folders. Start your cluster nodes and it works!

Hope this helps you! Please comment/ confirm.

Thanks, guy, I soleved my question, because I copied my VM from node-1 to node-2, after I deleted the relative data, it`s OK now.

Was this page helpful?
0 / 5 - 0 ratings