Elasticsearch: Master not discovered exception on elasticsearch 5.0.0

Created on 14 Nov 2016 · 9Comments · Source: elastic/elasticsearch

Hello,
I am trying to create a 3 node elasticsearch 5.0 cluster on digital ocean. I am using systemd to run elasticsearch and each elasticsearch instance runs inside a docker container ( the official elasticsearch container ). This is the elasticsearch config file

https://gist.github.com/girishramnani/f5f9a425e7ad986ce746117da62dd831

And this is the ExecStart portion of the unit file

ExecStart=/bin/sh -c '/usr/bin/docker run -v /home/core/esdata:/usr/share/elasticsearch/data --rm --name elasticsearch-${instance_index}  \            
             -e ES_HOSTS="elasticsearch-3,elasticsearch-2,elasticsearch-1," \
             -e MIN_MASTER_NODES="2" \
             -e NUM_REPLICAS="1" \
             -p 9200:9200 -p 9300:9300 elasticsearch:5 \
            /bin/sh -c \'chown -R elasticsearch:elasticsearch /usr/share/elasticsearch/data ; \
            gosu elasticsearch elasticsearch \''

After all the 3 nodes start, none of them are able to connect and a master not discovered error shows on health check. This is the error trace.

https://gist.github.com/girishramnani/0dabde87ffada26df7f658ff34f09ba1

The ES_HOSTS variable is set in the env as elasticsearch-3,elasticsearch-2,elasticsearch-1,.
This is the complete env inside the elasticsearch containers.

HOSTNAME=elasticsearch_1
ES_HOSTS=elasticsearch-3,elasticsearch-2,elasticsearch-1,
NUM_REPLICAS=1
CA_CERTIFICATES_JAVA_VERSION=20140324
PATH=/usr/share/elasticsearch/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
WEAVE_CIDR=192.168.16.1/24
PWD=/usr/share/elasticsearch
JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre
LANG=C.UTF-8
JAVA_VERSION=8u111
SHLVL=1
HOME=/root
JAVA_DEBIAN_VERSION=8u111-b14-2~bpo8+1
MIN_MASTER_NODES=2
ELASTICSEARCH_VERSION=5.0.0
GOSU_VERSION=1.7
_=/usr/bin/env

and from inside the container I can reach the other elasticsearch instances i.e. from elasticsearch-1 i can reach elasticsearch-2

curl http://elasticsearch-2:9200                                                                  
{
  "name" : "xGIB2_i",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "_na_",
  "version" : {
    "number" : "5.0.0",
    "build_hash" : "253032b",
    "build_date" : "2016-10-26T05:11:34.737Z",
    "build_snapshot" : false,
    "lucene_version" : "6.2.0"
  },
  "tagline" : "You Know, for Search"
}

OS specification

OS - CoreOS
version - stable (1185.3)

:DeliverPackaging Delivery feedback_needed

Source

girishramnani

👍1

Most helpful comment

On Redhat for elastic search 5.3.1
Delete contents of data folder(/var/lib/elasticsearch/nodes/0) and restart it worked for me!

pkapil on 9 May 2017

👍5

All 9 comments

Can you provide the output of curl http://elasticsearch-1:9200/_nodes/transport?pretty=1?

jasontedor on 15 Nov 2016

elasticsearch:5

Also, one important note; you are not using the official Docker image. You are using an image that Docker calls the official image, but it is not affiliated nor supported by Elastic in any form. We do provide our own image which we do support; you can check out the repository too.

jasontedor on 15 Nov 2016

i docker exec'ed into the elasticsearch-2 and from there i did a
curl http://elasticsearch-1:9200/_nodes/transport?pretty=1

this is the output that i get

{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "elasticsearch",
  "nodes" : {
    "4vAivtSbSTelIO4vDm51BA" : {
      "name" : "4vAivtS",
      "transport_address" : "192.168.16.1:9300",
      "host" : "192.168.16.1",
      "ip" : "192.168.16.1",
      "version" : "5.0.1",
      "build_hash" : "080bb47",
      "roles" : [
        "master",
        "data",
        "ingest"
      ],
      "transport" : {
        "bound_address" : [
          "[::]:9300"
        ],
        "publish_address" : "192.168.16.1:9300",
        "profiles" : { }
      }
    }
  }
}

girishramnani on 19 Nov 2016

What is 192.168.16.1? What does elasticsearch-2 resolve to from elasticsearch-1?

jasontedor on 21 Nov 2016

Hello guys.
I was going to create a separate thread, but found that one very similar to my issue.
I'm using ES5.0.1 :

root@cookie-dev-es-0:~# curl localhost:9200
{
  "name" : "dev-0",
  "cluster_name" : "cookie",
  "cluster_uuid" : "x6c1d37ZSdyqUgD0R61gbQ",
  "version" : {
    "number" : "5.0.1",
    "build_hash" : "080bb47",
    "build_date" : "2016-11-11T22:08:49.812Z",
    "build_snapshot" : false,
    "lucene_version" : "6.2.1"
  },
  "tagline" : "You Know, for Search"
}

I'm trying to create 2 nodes cluster on digital ocean. I've setup elastic search on 2 droplets, it works fine separately. I've set up both to discover each other. So, the node which is started first works fine, the 2nd one says "master_not_discovered_exception" when I'm checking it's state/health.
Here is logs output: https://gist.github.com/flamewow/7a1958140c39cc06ad3a164675c042d8

Before creating digital ocean droplets I've been testing on my local machine (simulated 2nd node with ubuntu in virtual box) and it worked fine.

flamewow on 22 Nov 2016

@flamewow Your issue is here:

found existing node {dev-0}{ETLvUToNSpabEBKkLPLTAQ}{I1sytZy8RQyozjYQDbvwwA}{104.236.138.157}{104.236.138.157:9300} with the same id but is a different node instance]; ]

I think that you copied the data folder from one to the other. In particular, this means that the node ID was copied along with it, and we do not allow two nodes with the same ID to join the cluster.

If you have additional questions, please open a topic on the Elastic Discourse forum. We use GitHub for verified bug reports and feature requests.

jasontedor on 22 Nov 2016

👍3

On Redhat for elastic search 5.3.1
Delete contents of data folder(/var/lib/elasticsearch/nodes/0) and restart it worked for me!

pkapil on 9 May 2017

👍5

@pkapil you have saved my day. :)