Openshift-ansible: Fresh install or upgrade of logging stack to v3.6.0 === Unknown Discovery type [kubernetes]

Created on 21 Sep 2017 · 21Comments · Source: openshift/openshift-ansible

Description

Attempting to upgrade our logging stack from v1.5.1 => v3.6.0 succeeds running through Ansible, but the ES containers do not deploy successfully.

I thought it might be a corruption, so I tried after wiping the storage for the ES containers which didn't make a diff. I then tried to install fresh and same problem occurs. v1.5.1 works fine, but would like to keep the logging aligned with the cluster version.

Not sure if this is the right place to put this or if there's someone else that maintains the v3.6.0 logging images (if the problem lies there) -- Any help would be appreciated.

Version

Your ansible version per ansible --version

ansible 2.3.2.0
  config file = /Users/hef/work/openshift-ansible/ansible.cfg
  configured module search path = Default w/o overrides
  python version = 2.7.13 (default, Jul 18 2017, 09:17:00) [GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)]

Steps To Reproduce

Upgrade with/without existing data from v1.5.1 to v3.6.0 || Fresh install of v3.6.0

In Ansible repo, git checkout release-3.6
git pull --rebase to update
ansible-playbook playbooks/byo/openshift-cluster/openshift-logging.yml

(also tried on master branch with no luck)

Expected Results

Successful install and/or upgrade the container images in logging project to v3.6.0 + any other changes necessary to rev up to a v3.6.0 cluster.

Observed Results

ES containers do not come up (in crash loop) with the following output:

[2017-09-21 19:09:40,650][INFO ][container.run            ] Begin Elasticsearch startup script
--
  | [2017-09-21 19:09:40,663][INFO ][container.run            ] Comparing the specified RAM to the maximum recommended for Elasticsearch...
  | [2017-09-21 19:09:40,664][INFO ][container.run            ] Inspecting the maximum RAM available...
  | [2017-09-21 19:09:40,668][INFO ][container.run            ] ES_HEAP_SIZE: '1024m'
  | [2017-09-21 19:09:40,669][INFO ][container.run            ] Setting heap dump location /elasticsearch/persistent/heapdump.hprof
  | [2017-09-21 19:09:40,672][INFO ][container.run            ] Checking if Elasticsearch is ready on https://localhost:9200
  | Exception in thread "main" java.lang.IllegalArgumentException: Unknown Discovery type [kubernetes]
  | at org.elasticsearch.discovery.DiscoveryModule.configure(DiscoveryModule.java:100)
  | at <<<guice>>>
  | at org.elasticsearch.node.Node.<init>(Node.java:213)
  | at org.elasticsearch.node.Node.<init>(Node.java:140)
  | at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:143)
  | at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:194)
  | at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:286)
  | at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:45)
  | Refer to the log for complete error details.

Additional Information

Your operating system and version, ie: RHEL 7.2, Fedora 23 ($ cat /etc/redhat-release)
CentOS Linux release 7.3.1611 (Core)
Your inventory file (especially any non-standard configuration parameters)
https://gist.github.com/rhefner/f50283b1e01ba1ac8d5208c03b5bc2b7

Source

rhefner

Most helpful comment

You have an updated opneshift-ansible but old ES image. If you are getting the ES image from https://hub.docker.com/r/openshift/origin-logging-elasticsearch/tags/, it looks like only latest has been updated. I recommend not changing anything in the ES config map and pull the latest ES image.

If you want some background about what is exactly happening or other solution than updating the ES images, read on. In September, we introduced a new type of master discovery algorithm in ES images - by label and port, because discovering by service didn't work well with readiness probe.

It has relevant changes in:
1) openshift-ansible - https://github.com/openshift/openshift-ansible/pull/5209

turning back on readiness probe
changing the discovery algorithm in ES configmap

2) ES image - https://github.com/openshift/origin-aggregated-logging/pull/609

new library supporting the new discovery algorithm

If you don't want to update the ES image then you need to:

disable readiness probe - oc edit dc logging-es-data-master-... each ES DeploymentConfig and remove part starting readinessProbe:
revert back the master discovery algorithm - oc edit cm logging-elasticsearch and change

cloud:
   kubernetes:
     pod_label: ${POD_LABEL}
     pod_port: 9300
     namespace: ${NAMESPACE}

cloud:
   kubernetes:
     service: ${SERVICE_DNS}
     namespace: ${NAMESPACE}

wozniakjan on 22 Sep 2017

👍2

All 21 comments

@rhefner Can you provide the output from oc get configmap/logging-elasticsearch -o yaml ?

cc @portante @richm

ewolinetz on 21 Sep 2017

@ewolinetz: Yes, sir. Here you go: https://gist.github.com/rhefner/3d949f7b4074920636a69f6688c121bf

rhefner on 21 Sep 2017

By the way, once it crashes for a while, the pods output this:

Comparing the specificed RAM to the maximum recommended for ElasticSearch...
--
  | Inspecting the maximum RAM available...
  | ES_JAVA_OPTS: '-Dmapper.allow_dots_in_name=true -Xms128M -Xmx1024m'
  | Exception in thread "main" java.lang.IllegalArgumentException: Could not resolve placeholder 'HAS_DATA'
  | at org.elasticsearch.common.property.PropertyPlaceholder.parseStringValue(PropertyPlaceholder.java:128)
  | at org.elasticsearch.common.property.PropertyPlaceholder.replacePlaceholders(PropertyPlaceholder.java:81)
  | at org.elasticsearch.common.settings.Settings$Builder.replacePropertyPlaceholders(Settings.java:1179)
  | at org.elasticsearch.node.internal.InternalSettingsPreparer.initializeSettings(InternalSettingsPreparer.java:131)
  | at org.elasticsearch.node.internal.InternalSettingsPreparer.prepareEnvironment(InternalSettingsPreparer.java:100)
  | at org.elasticsearch.common.cli.CliTool.<init>(CliTool.java:107)
  | at org.elasticsearch.common.cli.CliTool.<init>(CliTool.java:100)
  | at org.elasticsearch.bootstrap.BootstrapCLIParser.<init>(BootstrapCLIParser.java:48)
  | at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:242)
  | at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:45)
  | Refer to the log for complete error details.
  | Checking if Elasticsearch is ready on https://localhost:9200 ..

rhefner on 21 Sep 2017

Having the same issue here. From what I know the elasticsearch.yml
discovery.type: kubernetes

should be:

discovery.zen.hosts_provider: kubernetes

I believe this is from a change in the kubernetes plugin

After changing this, I got farther, but have searchguard issues saying it was not initialized.

ttindell2 on 21 Sep 2017

👍1

@ttindell2: Good call, changing the discovery type to zen.hosts_provider got me further as well. Now, ES is just timing out for me:

[2017-09-21 21:56:36,810][INFO ][container.run            ] Begin Elasticsearch startup script
--
  | [2017-09-21 21:56:36,826][INFO ][container.run            ] Comparing the specified RAM to the maximum recommended for Elasticsearch...
  | [2017-09-21 21:56:36,827][INFO ][container.run            ] Inspecting the maximum RAM available...
  | [2017-09-21 21:56:36,831][INFO ][container.run            ] ES_HEAP_SIZE: '1024m'
  | [2017-09-21 21:56:36,833][INFO ][container.run            ] Setting heap dump location /elasticsearch/persistent/heapdump.hprof
  | [2017-09-21 21:56:36,837][INFO ][container.run            ] Checking if Elasticsearch is ready on https://localhost:9200
  | [2017-09-21 22:02:06,481][ERROR][container.run            ] Timed out waiting for Elasticsearch to be ready
  | cat: elasticsearch_connect_log.txt: No such file or directory

rhefner on 22 Sep 2017

@rhefner
A very useful log is on the pod at /elasticsearch/${CLUSTER_NAME}/logs/

There are a few logs in there, but only one will have info. could you see what that log says?

ttindell2 on 22 Sep 2017

@ttindell2 Seems like some searchguard issues, not sure if it's the same as yours..?

sh-4.2$ cat /elasticsearch/logging-es/logs/logging-es.log
[2017-09-22 00:41:40,797][INFO ][node                     ] [logging-es-data-master-bk8ocbgu] version[2.4.4], pid[1], build[fcbb46d/2017-01-03T11:33:16Z]
[2017-09-22 00:41:40,798][INFO ][node                     ] [logging-es-data-master-bk8ocbgu] initializing ...
[2017-09-22 00:41:42,415][INFO ][plugins                  ] [logging-es-data-master-bk8ocbgu] modules [reindex, lang-expression, lang-groovy], plugins [openshift-elasticsearch, cloud-kubernetes], sites []
[2017-09-22 00:41:42,530][INFO ][env                      ] [logging-es-data-master-bk8ocbgu] using [1] data paths, mounts [[/elasticsearch/persistent (/dev/loop0)]], net usable_space [99.9gb], net total_space [99.9gb], spins? [possibly], types [xfs]
[2017-09-22 00:41:42,530][INFO ][env                      ] [logging-es-data-master-bk8ocbgu] heap size [989.8mb], compressed ordinary object pointers [true]
[2017-09-22 00:41:43,545][INFO ][http                     ] [logging-es-data-master-bk8ocbgu] Using [org.elasticsearch.http.netty.NettyHttpServerTransport] as http transport, overridden by [search-guard2]
[2017-09-22 00:41:43,816][INFO ][transport                ] [logging-es-data-master-bk8ocbgu] Using [com.floragunn.searchguard.transport.SearchGuardTransportService] as transport service, overridden by [search-guard2]
[2017-09-22 00:41:43,817][INFO ][transport                ] [logging-es-data-master-bk8ocbgu] Using [com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] as transport, overridden by [search-guard-ssl]
[2017-09-22 00:41:48,511][INFO ][io.fabric8.elasticsearch.plugin.kibana.IndexMappingLoader] Trying to load Kibana mapping for io.fabric8.elasticsearch.kibana.mapping.app from plugin: /usr/share/elasticsearch/index_patterns/com.redhat.viaq-openshift.index-pattern.json
[2017-09-22 00:41:48,516][INFO ][io.fabric8.elasticsearch.plugin.kibana.IndexMappingLoader] Trying to load Kibana mapping for io.fabric8.elasticsearch.kibana.mapping.ops from plugin: /usr/share/elasticsearch/index_patterns/com.redhat.viaq-openshift.index-pattern.json
[2017-09-22 00:41:48,517][INFO ][io.fabric8.elasticsearch.plugin.kibana.IndexMappingLoader] Trying to load Kibana mapping for io.fabric8.elasticsearch.kibana.mapping.empty from plugin: /usr/share/elasticsearch/index_patterns/com.redhat.viaq-openshift.index-pattern.json
[2017-09-22 00:41:48,698][INFO ][node                     ] [logging-es-data-master-bk8ocbgu] initialized
[2017-09-22 00:41:48,698][INFO ][node                     ] [logging-es-data-master-bk8ocbgu] starting ...
[2017-09-22 00:41:48,921][INFO ][discovery                ] [logging-es-data-master-bk8ocbgu] logging-es/ENzlPG2kTy2jfumf_9u80w
[2017-09-22 00:42:18,922][WARN ][discovery                ] [logging-es-data-master-bk8ocbgu] waited for 30s and no initial state was set by the discovery
[2017-09-22 00:42:19,105][INFO ][http                     ] [logging-es-data-master-bk8ocbgu] publish_address {10.1.0.245:9200}, bound_addresses {[::]:9200}
[2017-09-22 00:42:19,105][INFO ][node                     ] [logging-es-data-master-bk8ocbgu] started
[2017-09-22 00:42:19,125][WARN ][discovery.zen.ping.unicast] [logging-es-data-master-bk8ocbgu] failed to send ping to [{#zen_unicast_6#}{::1}{[::1]:9300}]
SendRequestTransportException[[][[::1]:9300][internal:discovery/zen/unicast]]; nested: NodeNotConnectedException[[][[::1]:9300] Node not connected];
    at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:340)
    at com.floragunn.searchguard.transport.SearchGuardTransportService.sendRequest(SearchGuardTransportService.java:88)
    at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPingRequestToNode(UnicastZenPing.java:440)
    at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPings(UnicastZenPing.java:426)
    at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.ping(UnicastZenPing.java:240)
    at org.elasticsearch.discovery.zen.ping.ZenPingService.ping(ZenPingService.java:106)
    at org.elasticsearch.discovery.zen.ping.ZenPingService.pingAndWait(ZenPingService.java:84)
    at org.elasticsearch.discovery.zen.ZenDiscovery.findMaster(ZenDiscovery.java:945)
    at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:360)
    at org.elasticsearch.discovery.zen.ZenDiscovery.access$4400(ZenDiscovery.java:96)
    at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1296)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: NodeNotConnectedException[[][[::1]:9300] Node not connected]
    at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:1141)
    at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:830)
    at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:329)
    ... 13 more
[2017-09-22 00:42:31,213][WARN ][discovery.zen.ping.unicast] [logging-es-data-master-bk8ocbgu] failed to send ping to [{#zen_unicast_6#}{::1}{[::1]:9300}]
SendRequestTransportException[[][[::1]:9300][internal:discovery/zen/unicast]]; nested: NodeNotConnectedException[[][[::1]:9300] Node not connected];
    at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:340)
    at com.floragunn.searchguard.transport.SearchGuardTransportService.sendRequest(SearchGuardTransportService.java:88)
    at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPingRequestToNode(UnicastZenPing.java:440)
    at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPings(UnicastZenPing.java:426)
    at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.ping(UnicastZenPing.java:240)
    at org.elasticsearch.discovery.zen.ping.ZenPingService.ping(ZenPingService.java:106)
    at org.elasticsearch.discovery.zen.ping.ZenPingService.pingAndWait(ZenPingService.java:84)
    at org.elasticsearch.discovery.zen.ZenDiscovery.findMaster(ZenDiscovery.java:945)
    at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:360)
    at org.elasticsearch.discovery.zen.ZenDiscovery.access$4400(ZenDiscovery.java:96)
    at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1296)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: NodeNotConnectedException[[][[::1]:9300] Node not connected]
    at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:1141)
    at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:830)
    at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:329)
    ... 13 more
[2017-09-22 00:42:48,922][ERROR][com.floragunn.searchguard.action.configupdate.TransportConfigUpdateAction] [logging-es-data-master-bk8ocbgu] Failure while checking .searchguard.logging-es-data-master-bk8ocbgu index MasterNotDiscoveredException[null]
MasterNotDiscoveredException[null]
    at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$5.onTimeout(TransportMasterNodeAction.java:234)
    at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:236)
    at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:816)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
[2017-09-22 00:43:10,305][ERROR][com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] [logging-es-data-master-bk8ocbgu] SSL Problem renegotiation attempted by peer; closing the connection
javax.net.ssl.SSLException: renegotiation attempted by peer; closing the connection
    at org.jboss.netty.handler.ssl.SslHandler.handleRenegotiation(SslHandler.java:1368)
    at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1248)
    at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:852)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
[2017-09-22 00:43:18,927][WARN ][com.floragunn.searchguard.action.configupdate.TransportConfigUpdateAction] [logging-es-data-master-bk8ocbgu] index '.searchguard.logging-es-data-master-bk8ocbgu' not healthy yet, we try again ... (Reason: no response)
[2017-09-22 00:43:19,323][WARN ][discovery.zen.ping.unicast] [logging-es-data-master-bk8ocbgu] failed to send ping to [{#zen_unicast_6#}{::1}{[::1]:9300}]
SendRequestTransportException[[][[::1]:9300][internal:discovery/zen/unicast]]; nested: NodeNotConnectedException[[][[::1]:9300] Node not connected];
    at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:340)
    at com.floragunn.searchguard.transport.SearchGuardTransportService.sendRequest(SearchGuardTransportService.java:88)
    at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPingRequestToNode(UnicastZenPing.java:440)
    at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPings(UnicastZenPing.java:426)
    at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.ping(UnicastZenPing.java:240)
    at org.elasticsearch.discovery.zen.ping.ZenPingService.ping(ZenPingService.java:106)
    at org.elasticsearch.discovery.zen.ping.ZenPingService.pingAndWait(ZenPingService.java:84)
    at org.elasticsearch.discovery.zen.ZenDiscovery.findMaster(ZenDiscovery.java:945)
    at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:360)
    at org.elasticsearch.discovery.zen.ZenDiscovery.access$4400(ZenDiscovery.java:96)
    at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1296)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: NodeNotConnectedException[[][[::1]:9300] Node not connected]
    at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:1141)
    at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:830)
    at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:329)
    ... 13 more
[2017-09-22 00:43:51,929][WARN ][com.floragunn.searchguard.action.configupdate.TransportConfigUpdateAction] [logging-es-data-master-bk8ocbgu] index '.searchguard.logging-es-data-master-bk8ocbgu' not healthy yet, we try again ... (Reason: no response)
[2017-09-22 00:44:24,931][WARN ][com.floragunn.searchguard.action.configupdate.TransportConfigUpdateAction] [logging-es-data-master-bk8ocbgu] index '.searchguard.logging-es-data-master-bk8ocbgu' not healthy yet, we try again ... (Reason: no response)
[2017-09-22 00:44:46,504][ERROR][com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] [logging-es-data-master-bk8ocbgu] SSL Problem renegotiation attempted by peer; closing the connection
javax.net.ssl.SSLException: renegotiation attempted by peer; closing the connection
    at org.jboss.netty.handler.ssl.SslHandler.handleRenegotiation(SslHandler.java:1368)
    at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1248)
    at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:852)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
[2017-09-22 00:44:57,933][WARN ][com.floragunn.searchguard.action.configupdate.TransportConfigUpdateAction] [logging-es-data-master-bk8ocbgu] index '.searchguard.logging-es-data-master-bk8ocbgu' not healthy yet, we try again ... (Reason: no response)
[2017-09-22 00:45:01,527][ERROR][com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] [logging-es-data-master-bk8ocbgu] SSL Problem renegotiation attempted by peer; closing the connection
javax.net.ssl.SSLException: renegotiation attempted by peer; closing the connection
    at org.jboss.netty.handler.ssl.SslHandler.handleRenegotiation(SslHandler.java:1368)
    at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1248)
    at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:852)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
[2017-09-22 00:45:30,934][WARN ][com.floragunn.searchguard.action.configupdate.TransportConfigUpdateAction] [logging-es-data-master-bk8ocbgu] index '.searchguard.logging-es-data-master-bk8ocbgu' not healthy yet, we try again ... (Reason: no response)
[2017-09-22 00:46:03,936][WARN ][com.floragunn.searchguard.action.configupdate.TransportConfigUpdateAction] [logging-es-data-master-bk8ocbgu] index '.searchguard.logging-es-data-master-bk8ocbgu' not healthy yet, we try again ... (Reason: no response)
[2017-09-22 00:46:16,681][ERROR][com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] [logging-es-data-master-bk8ocbgu] SSL Problem renegotiation attempted by peer; closing the connection
javax.net.ssl.SSLException: renegotiation attempted by peer; closing the connection
    at org.jboss.netty.handler.ssl.SslHandler.handleRenegotiation(SslHandler.java:1368)
    at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1248)
    at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:852)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
[2017-09-22 00:46:36,937][WARN ][com.floragunn.searchguard.action.configupdate.TransportConfigUpdateAction] [logging-es-data-master-bk8ocbgu] index '.searchguard.logging-es-data-master-bk8ocbgu' not healthy yet, we try again ... (Reason: no response)
[2017-09-22 00:46:52,738][ERROR][com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] [logging-es-data-master-bk8ocbgu] SSL Problem renegotiation attempted by peer; closing the connection
javax.net.ssl.SSLException: renegotiation attempted by peer; closing the connection
    at org.jboss.netty.handler.ssl.SslHandler.handleRenegotiation(SslHandler.java:1368)
    at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1248)
    at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:852)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
[2017-09-22 00:47:09,939][WARN ][com.floragunn.searchguard.action.configupdate.TransportConfigUpdateAction] [logging-es-data-master-bk8ocbgu] index '.searchguard.logging-es-data-master-bk8ocbgu' not healthy yet, we try again ... (Reason: no response)
[2017-09-22 00:47:25,788][ERROR][com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] [logging-es-data-master-bk8ocbgu] SSL Problem renegotiation attempted by peer; closing the connection
javax.net.ssl.SSLException: renegotiation attempted by peer; closing the connection
    at org.jboss.netty.handler.ssl.SslHandler.handleRenegotiation(SslHandler.java:1368)
    at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1248)
    at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:852)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
[2017-09-22 00:47:28,792][ERROR][com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] [logging-es-data-master-bk8ocbgu] SSL Problem renegotiation attempted by peer; closing the connection
javax.net.ssl.SSLException: renegotiation attempted by peer; closing the connection
    at org.jboss.netty.handler.ssl.SslHandler.handleRenegotiation(SslHandler.java:1368)
    at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1248)
    at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:852)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
[2017-09-22 00:47:31,796][ERROR][com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] [logging-es-data-master-bk8ocbgu] SSL Problem renegotiation attempted by peer; closing the connection
javax.net.ssl.SSLException: renegotiation attempted by peer; closing the connection
    at org.jboss.netty.handler.ssl.SslHandler.handleRenegotiation(SslHandler.java:1368)
    at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1248)
    at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:852)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
[2017-09-22 00:47:42,940][WARN ][com.floragunn.searchguard.action.configupdate.TransportConfigUpdateAction] [logging-es-data-master-bk8ocbgu] index '.searchguard.logging-es-data-master-bk8ocbgu' not healthy yet, we try again ... (Reason: no response)
[2017-09-22 00:47:46,822][ERROR][com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] [logging-es-data-master-bk8ocbgu] SSL Problem renegotiation attempted by peer; closing the connection
javax.net.ssl.SSLException: renegotiation attempted by peer; closing the connection
    at org.jboss.netty.handler.ssl.SslHandler.handleRenegotiation(SslHandler.java:1368)
    at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1248)
    at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:852)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
[2017-09-22 00:48:04,846][ERROR][com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] [logging-es-data-master-bk8ocbgu] SSL Problem renegotiation attempted by peer; closing the connection
javax.net.ssl.SSLException: renegotiation attempted by peer; closing the connection
    at org.jboss.netty.handler.ssl.SslHandler.handleRenegotiation(SslHandler.java:1368)
    at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1248)
    at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:852)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
[2017-09-22 00:48:15,944][WARN ][com.floragunn.searchguard.action.configupdate.TransportConfigUpdateAction] [logging-es-data-master-bk8ocbgu] index '.searchguard.logging-es-data-master-bk8ocbgu' not healthy yet, we try again ... (Reason: no response)

rhefner on 22 Sep 2017

@rhefner
yep same exact issue here. I havent found any solution to it yet. Rolling back to v1.5.1 didnt help either.

ttindell2 on 22 Sep 2017

@wozniakjan any ideas?

richm on 22 Sep 2017

It has relevant changes in:
1) openshift-ansible - https://github.com/openshift/openshift-ansible/pull/5209

turning back on readiness probe
changing the discovery algorithm in ES configmap

2) ES image - https://github.com/openshift/origin-aggregated-logging/pull/609

new library supporting the new discovery algorithm

If you don't want to update the ES image then you need to:

disable readiness probe - oc edit dc logging-es-data-master-... each ES DeploymentConfig and remove part starting readinessProbe:
revert back the master discovery algorithm - oc edit cm logging-elasticsearch and change

cloud:
   kubernetes:
     pod_label: ${POD_LABEL}
     pod_port: 9300
     namespace: ${NAMESPACE}

cloud:
   kubernetes:
     service: ${SERVICE_DNS}
     namespace: ${NAMESPACE}

wozniakjan on 22 Sep 2017

👍2

Reverted elasticsearch.yml back to what it was and pulled latest ES container. Cluster is now in yellow state. Cluster is ok now.

ttindell2 on 22 Sep 2017

@rhefner does the suggested above resolve the issue you are seeing as well?

ewolinetz on 22 Sep 2017

@ewolinetz Yes, it looks like it did, indeed.

rhefner on 23 Sep 2017

Closing due to pulling from latest resolved the issue

ewolinetz on 25 Sep 2017

@ewolinetz will there be an updated tagged image? I'm not happy about running latest in production :(

mhutter on 27 Oct 2017

@mhutter yes, there will be. you will not be expected to run with the latest tag in production.

ewolinetz on 27 Oct 2017

🎉1

@mhutter you can now use v3.6.1 https://hub.docker.com/r/openshift/origin-logging-elasticsearch/tags/

wozniakjan on 31 Oct 2017

👍1

Just tried a new origin deployment switching to the v3.6.1 images and ES is failing to start.
This was done with the ansible installer with these definititions:

openshift_release=v3.6
openshift_hosted_logging_deployer_version=v3.6.1

This is what is seen in the logs of the logging-es-data-master pod

[2017-11-01 15:10:02,491][INFO ][container.run            ] Begin Elasticsearch startup script
--
  | [2017-11-01 15:10:02,498][INFO ][container.run            ] Comparing the specified RAM to the maximum recommended for Elasticsearch...
  | [2017-11-01 15:10:02,499][INFO ][container.run            ] Inspecting the maximum RAM available...
  | [2017-11-01 15:10:02,503][INFO ][container.run            ] ES_HEAP_SIZE: '4096m'
  | [2017-11-01 15:10:02,506][INFO ][container.run            ] Setting heap dump location /elasticsearch/persistent/heapdump.hprof
  | [2017-11-01 15:10:02,509][INFO ][container.run            ] Checking if Elasticsearch is ready on https://localhost:9200
  | Exception in thread "main" java.lang.IllegalArgumentException: Unknown Discovery type [kubernetes]
  | at org.elasticsearch.discovery.DiscoveryModule.configure(DiscoveryModule.java:100)
  | at <<<guice>>>
  | at org.elasticsearch.node.Node.<init>(Node.java:213)
  | at org.elasticsearch.node.Node.<init>(Node.java:140)
  | at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:143)
  | at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:194)
  | at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:286)
  | at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:45)
  | Refer to the log for complete error details.

tdudgeon on 1 Nov 2017

@tdudgeon thanks! Two details:
1) openshift_hosted_logging_deployer_version was deprecated in https://github.com/openshift/openshift-ansible/pull/5176, please try to use openshift_logging_image_version=v3.6.1
2) but unfortunately, our release engineers may have pushed 3.6.0 into 3.6.1 (judging from the same sha256), so only usable remains the latest

wozniakjan on 1 Nov 2017

We have introduced and released new tag v3.6. This tag will be updated regularly, will no longer have to wait for release engineers to push a new image. More info here https://github.com/openshift/origin-aggregated-logging/pull/758

This should be working now:
openshift_logging_image_version=v3.6

wozniakjan on 9 Nov 2017

👍1

Reproducible on v3.7:

[root@master ~]# oc version
oc v3.7.0+7ed6862
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://openshift2.example.com
openshift v3.7.0+7ed6862
kubernetes v1.7.6+a08f5eeb62

Elasticsearch config map was not showing latest discovery algorithm _(cloud:(kubernetes: (service: ${SERVICE_DNS}, namespace: ${NAMESPACE})) )_

After manually updating config map, container had the following logs:

sh-4.2$ tail -f /elasticsearch/logging-es/logs/logging-es.log
[2018-01-23 10:28:35,897][INFO ][node                     ] [logging-es-data-master-96f1ifqf] started
[2018-01-23 10:29:05,798][ERROR][com.floragunn.searchguard.action.configupdate.TransportConfigUpdateAction] [logging-es-data-master-96f1ifqf] Failure while checking .searchguard.logging-es-data-master-96f1ifqf index MasterNotDiscoveredException[null]
MasterNotDiscoveredException[null]
        at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$5.onTimeout(TransportMasterNodeAction.java:234)
        at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:236)
        at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:816)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
[2018-01-23 10:29:35,807][WARN ][com.floragunn.searchguard.action.configupdate.TransportConfigUpdateAction] [logging-es-data-master-96f1ifqf] index '.searchguard.logging-es-data-master-96f1ifqf' not healthy yet, we try again ... (Reason: no response)
[2018-01-23 10:30:08,808][WARN ][com.floragunn.searchguard.action.configupdate.TransportConfigUpdateAction] [logging-es-data-master-96f1ifqf] index '.searchguard.logging-es-data-master-96f1ifqf' not healthy yet, we try again ... (Reason: no response)
[2018-01-23 10:30:41,810][WARN ][com.floragunn.searchguard.action.configupdate.TransportConfigUpdateAction] [logging-es-data-master-96f1ifqf] index '.searchguard.logging-es-data-master-96f1ifqf' not healthy yet, we try again ... (Reason: no response)
[2018-01-23 10:31:03,307][WARN ][discovery.zen.ping.unicast] [logging-es-data-master-96f1ifqf] failed to send ping to [{#zen_unicast_6#}{::1}{[::1]:9300}]
SendRequestTransportException[[][[::1]:9300][internal:discovery/zen/unicast]]; nested: NodeNotConnectedException[[][[::1]:9300] Node not connected];
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:340)
        at com.floragunn.searchguard.transport.SearchGuardTransportService.sendRequest(SearchGuardTransportService.java:88)
        at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPingRequestToNode(UnicastZenPing.java:440)
        at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPings(UnicastZenPing.java:426)
        at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.ping(UnicastZenPing.java:240)
        at org.elasticsearch.discovery.zen.ping.ZenPingService.ping(ZenPingService.java:106)
        at org.elasticsearch.discovery.zen.ping.ZenPingService.pingAndWait(ZenPingService.java:84)
        at org.elasticsearch.discovery.zen.ZenDiscovery.findMaster(ZenDiscovery.java:945)
        at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:360)
        at org.elasticsearch.discovery.zen.ZenDiscovery.access$4400(ZenDiscovery.java:96)
        at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1296)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: NodeNotConnectedException[[][[::1]:9300] Node not connected]
        at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:1141)
        at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:830)
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:329)
        ... 13 more
[2018-01-23 10:31:14,812][WARN ][com.floragunn.searchguard.action.configupdate.TransportConfigUpdateAction] [logging-es-data-master-96f1ifqf] index '.searchguard.logging-es-data-master-96f1ifqf' not healthy yet, we try again ... (Reason: no response)
[2018-01-23 10:31:47,814][WARN ][com.floragunn.searchguard.action.configupdate.TransportConfigUpdateAction] [logging-es-data-master-96f1ifqf] index '.searchguard.logging-es-data-master-96f1ifqf' not healthy yet, we try again ... (Reason: no response)