Elasticsearch: Unicast hostname failing to resolve blocks cluster from starting

Created on 2 Nov 2015  路  38Comments  路  Source: elastic/elasticsearch

Hi folks,

there seems to be a regression from 1.7.3 to 2.0.0 regarding the gateway.recover_after_time setting.

docker run --name="elasticsearch" -p 9200:9200 -p 9300:9300 -e "SERVICE_ID=$hostname" -e "SERVICE_NAME=elasticsearch" elasticsearch:2.0.0 -Des.index.number_of_shards=3 -Des.index.number_of_replicas=2 -Des.gateway.recover_after_nodes=2 -Des.gateway.expected_nodes=3 -Des.gateway.recover_after_time=5m -Des.discovery.zen.ping.multicast.enabled=false -Des.discovery.zen.ping.unicast.hosts="01.elasticsearch,02.elasticsearch,03.elasticsearch" -Des.cluster.name="logging-test" -Des.network.publish_host="${PUBLIC_IP}"

I run Elasticsearch on three nodes (Docker hosts). The DNS names are not immediatly available. In 1.7.3 the cluster stabilizes after a couple of minutes. In 2.0.0 the container crashes directly.

Exception in thread "main" java.lang.IllegalArgumentException: Failed to resolve address for [02.elasticsearch]
Likely root cause: java.net.UnknownHostException: 02.elasticsearch: unknown error
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at org.elasticsearch.transport.netty.NettyTransport.parse(NettyTransport.java:668)
at org.elasticsearch.transport.netty.NettyTransport.addressesFromString(NettyTransport.java:620)
at org.elasticsearch.transport.TransportService.addressesFromString(TransportService.java:398)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.<init>(UnicastZenPing.java:141)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at <<<guice>>>
at org.elasticsearch.node.Node.<init>(Node.java:198)
at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:145)
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:170)
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:270)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35)
:DistributeNetwork >bug blocker discuss v5.1.1

Most helpful comment

This needs to be fixed. This really is a problem for my elastic dockerization attempt.

All 38 comments

@wkruse it looks like this doesn't have anything to do with the gateway.recover_after_time setting. In 2.0.0 the networking is more strict regarding networking settings. In your case, one of your unicast hosts, 02.elasticsearch fails to resolve to an IP address.

Is there an option to turn off the exception that occurs when host resolution fails? In 1.7 I used to set up a list of hosts and then add them as needed (it is "elastic" search after all). But now in 2.0 I have to have all of my hosts preconfigured with DNS or I get an exception and ES wont even start :(

I've had it similar to @meatheadmike. CoreOS cluster and DNS service discovery per Elasticsearch container. The current solution is to use the hostnames for unicast.

I think the unicast hosts not resolving shouldn't fail the node starting up, what do you think @bleskes ?

Here's the current scenario and why it's an issue to me...

I spin up my nodes in Openstack. When I spin up a new node, I name it right then and there. So the DNS name doesn't exist prior. Neither does the IP address.

My current solution is to feed a list of potential node names in to a bash script, cycle through them and remove the ones that don't respond prior to creating the discovery.zen.ping.unicast.hosts entry in elasticsearch.yml.

It's an ugly hack, but it works. It allows me to spin up new nodes without worrying about what currently exists and what doesn't. But this used to not be necessary :(

I think the unicast hosts not resolving shouldn't fail the node starting up, what do you think @bleskes ?

I'm a bit wary of this as it may also mean a configuration typo and which will cause the node from joining the cluster and one needs to go grep the logs to find out what's going on...

Normally the answer would be to make the unicast host dynamically updatable. This doesn't work cleanly here because those settings can not be persisted in the cluster state - they are used before a node joins the cluster and has no access to it.

I'll munch on this some more. Ideas welcome.

This poses a problem for anyone using CM such as chef or puppet. With the way it currently is if a node is rebooting that means that you must deregister it from CM and bootstrap on launch again as well as every node to converge to remove it from its list. Otherwise the ES service on any node cant be (re)started until the machine comes back up. This seems to me at least to be far more common an issue than a misconfiguration of a list of hosts or ips.

The same logic to fail on invalid host was in 1.7, but it was just buggy:

https://github.com/elastic/elasticsearch/blob/1.7/src/main/java/org/elasticsearch/discovery/zen/ping/unicast/UnicastZenPing.java#L141-L143

From what I remember, hostname resolution at a lower level had several bugs, in such a way that unknown hosts wouldn't always deliver exceptions, e.g. be silently treated as a wildcard address (0.0.0.0) instead.

It might have been silent conversion to 127.0.0.1 instead, can't remember which. The problems were something sneaky, like an unexpected null value -> InetAddress.getByName(null) -> 127.0.0.1 or something like that. http://docs.oracle.com/javase/7/docs/api/java/net/InetAddress.html#getByName%28java.lang.String%29

The other big problem was use of https://docs.oracle.com/javase/7/docs/api/java/net/InetSocketAddress.html#InetSocketAddress%28java.lang.String,%20int%29, but then not dealing with the 'unresolved' case correctly or at all across the codebase. Then the return method of null from the unresolved address would be treated the same as 0.0.0.0 for some methods.

If we want to change the behavior for pings, lets please not go back to using that trappy ctor (which is banned in forbidden apis for a reason). I would also greatly prefer if we didn't create unresolved InetSocketAddresses in general (so lets ban https://docs.oracle.com/javase/7/docs/api/java/net/InetSocketAddress.html#createUnresolved%28java.lang.String,%20int%29 too).

The reason is, I think we should always explicitly do DNS lookups -> addresses ourselves, this is an important operation, and need not to be "mixed in" with a convenience holder class that combines an IP address with a port number (InetSocketAddress), especially in the very sneaky lenient and trappy way it handles the case.

So for pings, it would mean that we'd just use String and ping would explicitly do DNS lookup itself if that is what we want to do.

Does anybody now, is it good idea to keep "hostnames" in unicast config instead of IPs?
Is there any performance degradation while ES cluster is up and running? Or host name lookup happens only on start? Does DNS failure affect existing cluster?

it only affects during cluster boot to my knowledge

@jsirex host names look up is currently done only on startup, which is also the source of the trouble reported here - it if fails the node won't start. Note though - this is not fault detection pinging but just the mechanism a node uses to find the cluster once it's starts up/ was isolated from it. You shouldn't worry about performance here.

It seems the consensus is to not resolve the host config on node start but rather keep it as an array of strings. The resolution should be done when starting to ping, ignoring errors (but WARN log it!). The issue is already marked as adopt me, so people can consider it. I'll add a low hanging fruit.

+1 馃憤

This needs to be fixed. This really is a problem for my elastic dockerization attempt.

+1

馃憤 +1

+1 . Seriously, this is a massive breaking change. I have a cluster down because of this

Seriously, this is a massive breaking change. I have a cluster down because of this

This is not a breaking change. It was always like this but, because of other networking bugs, might have been hidden. See https://github.com/elastic/elasticsearch/issues/14441#issuecomment-153784620

@clintongormley It might have been intended to work this way before, but it de-facto didn't. Fixing this created a breaking change. And to be fair, the old method made more sense (specially in scalable environments): you tell the node that the other nodes are X, Y, Z. If one is down, go to the next one.

I just ran into this with docker swarm as well. I'm trying to bootstrap a cluster with two swarm services that use each other to find the master using the swarm registered dns name. The problem is that the DNS names don't exist until after the service finishes starting. So there's a chicken egg problem. Also I noticed that even if there are valid hosts on the unicast list, it still crashes because one of them is not reachable: they all have to be reachable.

In general DNS entries temporarily not resolving, not being available, or changing over time should all be expected behavior with DNS. In both cases the desirable/expected behavior would be to move on to the next unicast address and keep on trying until something succeeds (would be nice to have an optional timeout on that). Anything that assumes otherwise is fundamentally going to be flaky.

@clintongormley the behavior of exiting with an error code is new I believe even though the previous behavior was hardly correct/desirable.

IMHO elasticsearch could be a lot more forgiving with networking than it currently is.

BTW. So far, I've failed to get elasticsearch to cluster when run as a docker swarm. I've so far found no way around the many networking limitations in docker swarm and elasticsearch being very strict about its config.

+1 same situation as @jillesvangurp

Running into the same issue on 5.0.0 in a Mesos environment.

My workaround/hack is to introduce a delay before the elasticsearch process starts up so that the DNS entries have time to get established.

I see there's already a "blocker" label on the issue, but I'll throw in that it's a blocker for the stuff I'm working on, too.

For whom interested. Actually you can start cluster from scratch in dynamic environment. When tried Elasticsearch with docker swarm mode, it worked. The main concern here is resolvability of hosts provided. If you are running in docker swarm mode or consul with registrator etc., after container starts and right before Elasticsearch process is being fired, DNS name already present in related service (swarm/consul/etc.). For instance:

docker service create --name elasticsearch-cluster1 --mode global \ --mount src=logs-es-data,dst=/usr/share/elasticsearch/data --network test-net \ encom/elasticsearch -Ecluster.name=test1 -Ediscovery.zen.ping.unicast.hosts=tasks.elasticsearch-cluster1

works fine.

In docker swarm mode tasks.elasticsearch-cluster1 represents DNS entry with list of all containers under the service. For consul and registrator it would be elasticsearch.service.dc1. Worth to mention that in docker swarm service also has VIP (depending on your configuration) and related DNS entry elasticsearch-cluster1. Since it is VIP and behaves like host, it is bad idea to use it.

For folks that have been waiting for this, the forthcoming 5.1.0 (no date, sorry) will no longer fail on startup if a host fails to resolve. Instead, during discovery we will ping the hosts that do resolve. If another round of pinging is needed, we will resolve again and so on for each subsequent round of pinging. Please consult the Zen discovery docs for the interplay between this, the JVM DNS cache, and the Java security manager.

@jasontedor it might be helpful to link to the specific documentation you speaking of while we patiently wait. thanks again!

@JarenGlover At this time, permalinks for the 5.1.0 docs are not available; as soon as such a permalink is available, I will link here.

Version 5.1.1 is released and the aforementioned docs are available now.

Ran into this issue with my own dockerization attempt. Alas, we are stuck with 2.4.x for the time being.
Any chance that this is backported?
BTW: I ran into a similar issue when I dockerized Zookeeper (where cluster "discovery" works in a similar fashion - by providing the list of cluster member hostnames & ports to each instance).
Workaround for me was to create a script that goes over the list of hostnames and checks if they are available in DNS and retries until they all are (so basically doing externally what Zookeeper/Elasticsearch should be doing from the start - treating a DNS lookup error like any other network problem and retry...).
This approach also has the drawback that at least for startup your cluster needs to be fully available.

@jgoeres this will not be backported to 2.4 as we only backport major (security) fixes. It is also extremely unlikely there will be another 2.4 release. I suggest you focus your efforts on 5.x (where this is solved) or even on the coming 6.0 release.

@bleskes Thanks for the info.
Alas, the update to 5.x affects many components and thus teams which are all busy with other stuff and is therefore not possible for our upcoming release.
I will see if the "wait for DNS" script approach works until we can do the migration.

+1 same situation as @jillesvangurp

Hi, everyone!
Quite a pain to get over this hostname issue while using ES (2.4.6) with Docker swarm, but I finally found a way to get it working as a swarm service.

Before explaining the solution, here is my create.sh script.

!/bin/bash

echo "!!! Manual step on target hosts: mkdir -p /persistent/docker/es-logs/data"
echo -e "Instance no (ex: 01): \c"
read -e v_no

docker service create \
--name es-logs$v_no \
--detach=false \
--restart-condition=any \
--restart-delay 5s \
--restart-window 120s \
--stop-grace-period 30s \
--network my-network01 \
--publish 93$v_no:9300 \
--replicas 1 \
--constraint engine.labels.name==es-logs$v_no \
--reserve-memory 7g \
--limit-memory 7g \
--mount type=bind,src=/persistent/docker/es-logs/data,dst=/usr/share/elasticsearch/data \
registry.internal:5000/elasticsearch \
-Des.cluster.name=es-logs \
-Des.discovery.zen.minimum_master_nodes=2 \
-Des.discovery.zen.ping.unicast.hosts=xxx.xxx.xxx.xxx:9301,yyy.yyy.yyy.yyy:9302,zzz.zzz.zzz.zzz:9303 \
-Des.node.master=true \
-Des.node.max_local_storage_nodes=1 \
-Des.node.name=es-logs$v_no

Explanation:
I've published the 9300 port as 9301 9302 and 9303 in my swarm for 3 nodes which will be the core ES servers. Because of publishing, I could then point to the IP (where Docker is listening to) of each of these hosts. This way I'm not writing hostnames anymore (would be es-logs01, es-logs02 and es-logs03 in my case), because they can fail due to the same issue explained by @jillesvangurp.

So as long as you restrict the first few nodes of ES to run on specific swarm nodes, you can publish those IPs and use them for unicast in the entire cluster.

I'm using the $v_no shell variable to change the published port because each ES node is a different service, and, as you know, we can't have the same published port for multiple services.
The IP of each host will be used just for reaching the other nodes, then all connections seem to go on the private IP assigned by Docker swarm (ES configured to listen on 0.0.0.0, which then links to the private IP given inside the swarm on network "my-network01") (custom network created with: docker network create --attachable --driver overlay --subnet 10.0.9.0/24 my-network01)

It took a few days to find out ways of getting the hostname version to work, but I finally settled with this. Hope it will inspire/help others too. Cheers :)

@zmirc, that sounds like an awesome approach. In general it would be nice to get more native support in ES for clustering in a responsible way in the mainstream clustering environments. Consul, kubernetes, docker swarm all work in similar ways. Having a kubernetes and docker swarm ready container that self configures in a sane way would be awesome. From what I've seen mostly the blockers here should be fairly easy to address. IMHO manual configuration of ips should not be needed in most cases.

One thing I've experimented with in AWS is to run ES as three separate cloudformation stacks (one per AZ). I started doing this after seeing a couple of examples of cloudformation stacks ending up in an unusable state. Technically they are a single point of failure because your only way to recover when this happens is to rebuild the stack. Having three separate stacks makes things like rolling restarts easy to orchestrate as well since you can just replace them one by one. This might be a pattern that you'd want to support in a clustered environment as well. Automated rolling restarts remain a challenge in most environments.

I've seen mostly the blockers here should be fairly easy to address.

@jillesvangurp perhaps I'm missing something you need, but this issue has been fixed in 5.1.1 . We now resolve the unicast host list with every ping (there are some JVM caching & restrictions to positive resolved host). See https://github.com/elastic/elasticsearch/pull/21630

@bleskes 5.1.1 ... but many projects are using older versions (like 2.4.X in our case) because we can't afford to move a large system to a new major version. So...for us this fix doesn't exist, and probably for others as well. It would be nice to have the fix ported on 2.X as well or at least 2.4.X

Sorry @zmirc, but the enhancement will not be ported to 2.4 as that version has reached end-of-life for major changes, instead it only receives critical fixes (e.g., security) or small bug fixes as our focus is on stability for that version now. You can read more about our maintenance policy.

Was this page helpful?
0 / 5 - 0 ratings