Is this a request for help?: No
BUG REPORT:
Version of Helm and Kubernetes:
Client: &version.Version{SemVer:"v2.7.2", GitCommit:"8478fb4fc723885b155c924d1c8c410b7a9444e6", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.7.2", GitCommit:"8478fb4fc723885b155c924d1c8c410b7a9444e6", GitTreeState:"clean"}
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0", GitCommit:"0b9efaeb34a2fc51ff8e4d34ad9bc6375459c4a4", GitTreeState:"clean", BuildDate:"2017-09-29T05:56:06Z", GoVersion:"go1.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.4+coreos.0", GitCommit:"4292f9682595afddbb4f8b1483673449c74f9619", GitTreeState:"clean", BuildDate:"2017-11-21T17:22:25Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Which chart:
incubator/elasticsearch
What happened:
The master nodes cannot form a cluster as they cannot see each other; each master node can only see itself (i.e., my-es-test-elasticsearch-master-0 sees only my-es-test-elasticsearch-master-0, etc.).
[2018-01-08T20:05:06,677][INFO ][o.e.p.PluginsService ] [my-es-test-elasticsearch-master-2] loaded plugin [x-pack]
[2018-01-08T20:05:12,693][INFO ][o.e.d.DiscoveryModule ] [my-es-test-elasticsearch-master-2] using discovery type [zen]
[2018-01-08T20:05:14,173][INFO ][o.e.n.Node ] [my-es-test-elasticsearch-master-2] initialized
[2018-01-08T20:05:14,173][INFO ][o.e.n.Node ] [my-es-test-elasticsearch-master-2] starting ...
[2018-01-08T20:05:14,555][INFO ][o.e.t.TransportService ] [my-es-test-elasticsearch-master-2] publish_address {10.2.6.150:9300}, bound_addresses {0.0.0.0:9300}
[2018-01-08T20:05:14,570][INFO ][o.e.b.BootstrapChecks ] [my-es-test-elasticsearch-master-2] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
[2018-01-08T20:05:25,566][WARN ][i.f.e.d.k.KubernetesUnicastHostsProvider] [my-es-test-elasticsearch-master-2] Exception caught during discovery: io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
[2018-01-08T20:05:28,642][WARN ][o.e.d.z.ZenDiscovery ] [my-es-test-elasticsearch-master-2] not enough master nodes discovered during pinging (found [[Candidate{node={my-es-test-elasticsearch-master-2}{GlIuJEuuT1uO2mC9OfLrDw}{VB1o749VQTe2vZ4Gqg9i2w}{10.2.6.150}{10.2.6.150:9300}, clusterStateVersion=-1}]], but needed [2]), pinging again
[2018-01-08T20:05:38,755][WARN ][i.f.e.d.k.KubernetesUnicastHostsProvider] [my-es-test-elasticsearch-master-2] Exception caught during discovery: io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
[2018-01-08T20:05:41,756][WARN ][o.e.d.z.ZenDiscovery ] [my-es-test-elasticsearch-master-2] not enough master nodes discovered during pinging (found [[Candidate{node={my-es-test-elasticsearch-master-2}{GlIuJEuuT1uO2mC9OfLrDw}{VB1o749VQTe2vZ4Gqg9i2w}{10.2.6.150}{10.2.6.150:9300}, clusterStateVersion=-1}]], but needed [2]), pinging again
[2018-01-08T20:05:44,657][WARN ][o.e.n.Node ] [my-es-test-elasticsearch-master-2] timed out while waiting for initial discovery state - timeout: 30s
[2018-01-08T20:05:44,748][INFO ][o.e.h.n.Netty4HttpServerTransport] [my-es-test-elasticsearch-master-2] publish_address {10.2.6.150:9200}, bound_addresses {0.0.0.0:9200}
[2018-01-08T20:05:44,753][INFO ][o.e.n.Node ] [my-es-test-elasticsearch-master-2] started
Also, client nodes cannot see any of the master
[2018-01-08T20:16:17,609][WARN ][i.f.e.d.k.KubernetesUnicastHostsProvider] [my-es-test-elasticsearch-client-7cb4f84476-hbcll] Exception caught during discovery: io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
[2018-01-08T20:16:17,614][WARN ][o.e.d.z.ZenDiscovery ] [my-es-test-elasticsearch-client-7cb4f84476-hbcll] Ping execution failed
java.util.concurrent.ExecutionException: EsRejectedExecutionException[rejected execution of java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@4a6931e3 on java.util.concurrent.ScheduledThreadPoolExecutor@2f7459cc[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 187]]
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) ~[?:1.8.0_131]
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) ~[?:1.8.0_131]
at org.elasticsearch.discovery.zen.ZenDiscovery.pingAndWait(ZenDiscovery.java:1026) [elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.discovery.zen.ZenDiscovery.findMaster(ZenDiscovery.java:878) [elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:387) [elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.discovery.zen.ZenDiscovery.access$4100(ZenDiscovery.java:83) [elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1197) [elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.4.2.jar:5.4.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@4a6931e3 on java.util.concurrent.ScheduledThreadPoolExecutor@2f7459cc[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 187]
at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:50) ~[elasticsearch-5.4.2.jar:5.4.2]
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) ~[?:1.8.0_131]
at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326) ~[?:1.8.0_131]
at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533) ~[?:1.8.0_131]
at org.elasticsearch.threadpool.ThreadPool.schedule(ThreadPool.java:358) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.discovery.zen.UnicastZenPing.ping(UnicastZenPing.java:336) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.discovery.zen.UnicastZenPing.ping(UnicastZenPing.java:287) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.discovery.zen.ZenDiscovery.pingAndWait(ZenDiscovery.java:1019) ~[elasticsearch-5.4.2.jar:5.4.2]
... 8 more
[2018-01-08T20:16:17,617][WARN ][o.e.d.z.ZenDiscovery ] [my-es-test-elasticsearch-client-7cb4f84476-hbcll] not enough master nodes discovered during pinging (found [[]], but needed [2]), pinging again
[2018-01-08T20:16:17,697][INFO ][o.e.n.Node ] [my-es-test-elasticsearch-client-7cb4f84476-hbcll] closed
What you expected to happen:
I expected both that the master nodes are able to form a cluster and that client nodes can talk to the master nodes.
How to reproduce it (as minimally and precisely as possible):
helm install --name my-es-test incubator/elasticsearch --set data.persistence.enabled=false,master.persistence.enabled=false
cc @simonswine @icereval
@tiago-loureiro, if your cluster has RBAC enabled you'll want to run the helm install command with one additional parameter: --set rbac.create=true.
I have the same problem.
rbac create was activated via values.yaml.
Hopefully https://github.com/kubernetes/charts/pull/2889 will help.
I am also stuck here...
is there any hint if it is related to a specific version (or combination)?
Chart elasticsearch-0.4.7
Client Version: v1.9.3
Server Version: v1.9.2
Client: v2.8.1+g6af75a8
Server: v2.8.1+g6af75a8
Having the same issues. RBAC is enabled though..
Chart 0.4.9
kubectl
Client Version: v1.9.3+coreos.0
Server Version: v1.9.3+coreos.0
helm
Client: v2.8.2+ga802316
Server: v2.8.2+ga802316
We found 2 problems:
1)The problem with DNS. master-0 doesnt see master-1, because they trying to discover each other by short name (host name), but they reachable only by fqdn. And when we add zone
{service}.{namespace}.cluster.local to /etc/resolve.conf we found second problem
2) we have to add all master_hosts in elasticsearch.conf (can be added in configmap: discovery.zen.ping.unicast.hosts)
only after those manipulations all hosts see eachother and cluster works
any suggestions?
@ppcololo Don't use user.local anywhere as it is customizable! Just use service.namespace as it resolves to the proper DNS including cluster name.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
This issue is being automatically closed due to inactivity.
Most helpful comment
@tiago-loureiro, if your cluster has RBAC enabled you'll want to run the helm install command with one additional parameter:
--set rbac.create=true.