Elasticsearch version (bin/elasticsearch --version): 6.1.3
Plugins installed: [ingest-geoip,ingest-user-agent,repository-s3,x-pack]
JVM version (java -version):
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
OS version (uname -a if on a Unix-like system): Linux bd9a03495c76 4.4.0-66-generic #87~14.04.1-Ubuntu SMP Fri Mar 3 17:32:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
Node suffers an OOM error and the following message is logged:
[2018-03-09T04:18:01,808][ERROR][org.elasticsearch.bootstrap.ElasticsearchUncaughtExceptionHandler] fatal error in thread [elasticsearch[instance-0000000008][bulk][T#739]], exiting java.lang.OutOfMemoryError: Java heap space
However, the instance does not die and it keeps on throwing OOMs from time remaining totally unusable.
Steps to reproduce:
Provide logs (if relevant):
Lots of similar logs are there in the last 24h:
[2018-03-09T11:21:55,312][ERROR][org.elasticsearch.index.engine.Engine] tragic event in index writer java.lang.OutOfMemoryError: Java heap space
This is the full stacktrace of one of the errors:
This is the first OOM that was thrown and did not result on the instance dying:
[2018-03-08T12:31:30,448][ERROR][org.elasticsearch.bootstrap.ElasticsearchUncaughtExceptionHandler] fatal error in thread [Thread-35], exiting
java.lang.OutOfMemoryError: Java heap space
at io.netty.buffer.UnpooledHeapByteBuf.allocateArray(UnpooledHeapByteBuf.java:85) ~[?:?]
at io.netty.buffer.UnpooledByteBufAllocator.allocateArray(UnpooledByteBufAllocator.java:147) ~[?:?]
at io.netty.buffer.UnpooledHeapByteBuf.<init>(UnpooledHeapByteBuf.java:58) ~[?:?]
at io.netty.buffer.UnpooledByteBufAllocator.<init>(UnpooledByteBufAllocator.java:142) ~[?:?]
at io.netty.buffer.UnpooledByteBufAllocator.newHeapBuffer(UnpooledByteBufAllocator.java:64) ~[?:?]
at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:162) ~[?:?]
at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:153) ~[?:?]
at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:135) ~[?:?]
at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:80) ~[?:?]
at io.netty.channel.nio.AbstractNioByteChannel.read(AbstractNioByteChannel.java:122) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498) ~[?:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor.run(SingleThreadEventExecutor.java:858) ~[?:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
Other, later, error I've seen that does not result on the instance dying:
[2018-03-09T06:34:19,412][WARN ][org.elasticsearch.cluster.action.shard.ShardStateAction] [filebeat-2018.03.06][3] received shard failed for shard id [[filebeat-2018.03.06][3]], allocation id [2-mEAryVT_2ElzJAYhZWcQ], primary term [0], message [shard failure, reason [lucene commit failed]], failure [AlreadyClosedException[this IndexWriter is closed]; nested: OutOfMemoryError[Java heap space]; ]
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:896) ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 - ubuntu - 2017-10-13 16:12:42]
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:910) ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 - ubuntu - 2017-10-13 16:12:42]
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3377) ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 - ubuntu - 2017-10-13 16:12:42]
at org.elasticsearch.index.engine.InternalEngine.commitIndexWriter(InternalEngine.java:2113) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.index.engine.InternalEngine.commitIndexWriter(InternalEngine.java:2106) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.index.engine.InternalEngine.flush(InternalEngine.java:1493) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.index.shard.IndexShard.flush(IndexShard.java:1024) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.indices.flush.SyncedFlushService.performPreSyncedFlush(SyncedFlushService.java:414) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.indices.flush.SyncedFlushService.access(SyncedFlushService.java:70) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.indices.flush.SyncedFlushService.messageReceived(SyncedFlushService.java:696) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.indices.flush.SyncedFlushService.messageReceived(SyncedFlushService.java:692) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor.doRun(SecurityServerTransportInterceptor.java:258) ~[?:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.common.util.concurrent.EsExecutors.execute(EsExecutors.java:135) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor.lambda-zsh(SecurityServerTransportInterceptor.java:307) ~[?:?]
at org.elasticsearch.action.ActionListener.onResponse(ActionListener.java:60) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.xpack.security.transport.ServerTransportFilter.lambda(ServerTransportFilter.java:165) ~[?:?]
at org.elasticsearch.xpack.security.authz.AuthorizationUtils.maybeRun(AuthorizationUtils.java:182) ~[?:?]
at org.elasticsearch.xpack.security.authz.AuthorizationUtils.setRunAsRoles(AuthorizationUtils.java:176) ~[?:?]
at org.elasticsearch.xpack.security.authz.AuthorizationUtils.authorize(AuthorizationUtils.java:158) ~[?:?]
at org.elasticsearch.xpack.security.transport.ServerTransportFilter.lambda(ServerTransportFilter.java:167) ~[?:?]
at org.elasticsearch.action.ActionListener.onResponse(ActionListener.java:60) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.xpack.security.authc.AuthenticationService.lambda(AuthenticationService.java:195) ~[?:?]
at org.elasticsearch.xpack.security.authc.AuthenticationService.lambda(AuthenticationService.java:228) ~[?:?]
at org.elasticsearch.xpack.security.authc.AuthenticationService.lookForExistingAuthentication(AuthenticationService.java:239) ~[?:?]
at org.elasticsearch.xpack.security.authc.AuthenticationService.authenticateAsync(AuthenticationService.java:193) ~[?:?]
at org.elasticsearch.xpack.security.authc.AuthenticationService.access-zsh(AuthenticationService.java:147) ~[?:?]
at org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:116) ~[?:?]
at org.elasticsearch.xpack.security.transport.ServerTransportFilter.inbound(ServerTransportFilter.java:141) ~[?:?]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor.messageReceived(SecurityServerTransportInterceptor.java:314) ~[?:?]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.transport.TransportService.doRun(TransportService.java:652) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.common.util.concurrent.ThreadContext.doRun(ThreadContext.java:637) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.1.3.jar:6.1.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
Caused by: java.lang.OutOfMemoryError: Java heap space
Pinging @elastic/es-core-infra
Relates to #21721. /cc @jasontedor @s1monw
@nachogiljaldo Can you attach your complete log (in a gist)? This should not be possible. After the error message fatal error in thread [Thread-35], exiting, we call Runtime.halt(). This should cause the jvm to shutdown.
The first one is a bug and I will open a fix soon. The second one I do not believe without sufficiently strong evidence. The third one I am still investigating.
This is the first OOM that was thrown and did not result on the instance dying:
[2018-03-08T12:31:30,448][ERROR][org.elasticsearch.bootstrap.ElasticsearchUncaughtExceptionHandler] fatal error in thread [Thread-35], exiting
java.lang.OutOfMemoryError: Java heap space
@nachogiljaldo Would you please provide evidence that this one did not result in the node dying?
Other, later, error I've seen that does not result on the instance dying:
[2018-03-09T06:34:19,412][WARN ][org.elasticsearch.cluster.action.shard.ShardStateAction] [filebeat-2018.03.06][3] received shard failed for shard id [[filebeat-2018.03.06][3]], allocation id [2-mEAryVT_2ElzJAYhZWcQ], primary term [0], message [shard failure, reason [lucene commit failed]], failure [AlreadyClosedException[this IndexWriter is closed]; nested: OutOfMemoryError[Java heap space]; ]
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:896) ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 - ubuntu - 2017-10-13 16:12:42]
.
.
.
Caused by: java.lang.OutOfMemoryError: Java heap space
@nachogiljaldo Do you have any more to this stack trace?
@jasontedor / @rjernst
@nachogiljaldo Would you please provide evidence that this one did not result in the node dying?
[2018-03-09T17:49:16,336][WARN ][io.netty.channel.AbstractChannelHandlerContext] An exception 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
org.elasticsearch.ElasticsearchException: java.lang.OutOfMemoryError: Java heap space
Caused by: java.lang.OutOfMemoryError: Java heap space
org.elasticsearch.ElasticsearchException: java.lang.OutOfMemoryError: Java heap space
Caused by: java.lang.OutOfMemoryError: Java heap space
root@aadf270ee991:/# ps -ef | grep -i java
foundus+ 929 16 28 Jan31 ? 10-12:49:11 /usr/bin/java -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Des.allow_insecure_settings=true -XX:ParallelGCThreads=2 -XX:ConcGCThreads=1 -Xms409M -Xmx409M -Dio.netty.allocator.type=unpooled -Djava.nio.file.spi.DefaultFileSystemProvider=co.elastic.cloud.quotaawarefs.QuotaAwareFileSystemProvider -Dio.netty.recycler.maxCapacityPerThread=0 -Djava.security.policy=file:///app/config/gelf.policy -Des.cgroups.hierarchy.override=/ -Des.path.home=/elasticsearch -Des.path.conf=/app/config -cp /elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /app/es.pid -Epath.logs=/app/logs -Epath.data=/app/data
root 103337 103318 0 18:19 ? 00:00:00 grep --color=auto -i java
@nachogiljaldo Do you have any more to this stack trace?
I have more examples of those if that's what you mean. If you ask if there's a stacktrace following it, there aren't.
[2018-03-09T06:34:19,412][WARN ][org.elasticsearch.cluster.action.shard.ShardStateAction] [filebeat-2018.03.06][3] received shard failed for shard id [[filebeat-2018.03.06][3]], allocation id [2-mEAryVT_2ElzJAYhZWcQ], primary term [0], message [shard failure, reason [lucene commit failed]], failure [AlreadyClosedException[this IndexWriter is closed]; nested: OutOfMemoryError[Java heap space]; ]
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:896) ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 - ubuntu - 2017-10-13 16:12:42]
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:910) ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 - ubuntu - 2017-10-13 16:12:42]
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3377) ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 - ubuntu - 2017-10-13 16:12:42]
at org.elasticsearch.index.engine.InternalEngine.commitIndexWriter(InternalEngine.java:2113) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.index.engine.InternalEngine.commitIndexWriter(InternalEngine.java:2106) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.index.engine.InternalEngine.flush(InternalEngine.java:1493) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.index.shard.IndexShard.flush(IndexShard.java:1024) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.indices.flush.SyncedFlushService.performPreSyncedFlush(SyncedFlushService.java:414) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.indices.flush.SyncedFlushService.access(SyncedFlushService.java:70) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.indices.flush.SyncedFlushService.messageReceived(SyncedFlushService.java:696) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.indices.flush.SyncedFlushService.messageReceived(SyncedFlushService.java:692) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor.doRun(SecurityServerTransportInterceptor.java:258) ~[?:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.common.util.concurrent.EsExecutors.execute(EsExecutors.java:135) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor.lambda-zsh(SecurityServerTransportInterceptor.java:307) ~[?:?]
at org.elasticsearch.action.ActionListener.onResponse(ActionListener.java:60) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.xpack.security.transport.ServerTransportFilter.lambda(ServerTransportFilter.java:165) ~[?:?]
at org.elasticsearch.xpack.security.authz.AuthorizationUtils.maybeRun(AuthorizationUtils.java:182) ~[?:?]
at org.elasticsearch.xpack.security.authz.AuthorizationUtils.setRunAsRoles(AuthorizationUtils.java:176) ~[?:?]
at org.elasticsearch.xpack.security.authz.AuthorizationUtils.authorize(AuthorizationUtils.java:158) ~[?:?]
at org.elasticsearch.xpack.security.transport.ServerTransportFilter.lambda(ServerTransportFilter.java:167) ~[?:?]
at org.elasticsearch.action.ActionListener.onResponse(ActionListener.java:60) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.xpack.security.authc.AuthenticationService.lambda(AuthenticationService.java:195) ~[?:?]
at org.elasticsearch.xpack.security.authc.AuthenticationService.lambda(AuthenticationService.java:228) ~[?:?]
at org.elasticsearch.xpack.security.authc.AuthenticationService.lookForExistingAuthentication(AuthenticationService.java:239) ~[?:?]
at org.elasticsearch.xpack.security.authc.AuthenticationService.authenticateAsync(AuthenticationService.java:193) ~[?:?]
at org.elasticsearch.xpack.security.authc.AuthenticationService.access-zsh(AuthenticationService.java:147) ~[?:?]
at org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:116) ~[?:?]
at org.elasticsearch.xpack.security.transport.ServerTransportFilter.inbound(ServerTransportFilter.java:141) ~[?:?]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor.messageReceived(SecurityServerTransportInterceptor.java:314) ~[?:?]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.transport.TransportService.doRun(TransportService.java:652) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.common.util.concurrent.ThreadContext.doRun(ThreadContext.java:637) ~[elasticsearch-6.1.3.jar:6.1.3]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.1.3.jar:6.1.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
Caused by: java.lang.OutOfMemoryError: Java heap space
Btw. this is printed to stdout: Exception: java.lang.SecurityException thrown from the UncaughtExceptionHandler in thread "Thread-428833"
@henrikno Is there a stack trace?
No, that's the weird thing. Just that line, several times, with different thread names.
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "java-sdk-http-connection-reaper"
Exception: java.lang.SecurityException thrown from the UncaughtExceptionHandler in thread "Thread-233"
Exception: java.lang.SecurityException thrown from the UncaughtExceptionHandler in thread "elasticsearch[instance-0000000008][generic][T#9274]"
Okay, no worries; I have a gut feeling what is occurring. I will try to reproduce and validate. This will knock out the first two of these. I will still need more information on the third one.
Okay, here is the problem. The change #27482 introduced a bug that causes a security manager exception when we attempt to exit after a fatal error. This was immediately detected and #27518 was introduced. The bug from #27482 was suppose to have never been released yet the backport from #27518 to 6.1 was missed, it never happened. This means that #27482 is in 6.1 without the corresponding fix. This bug preventing exit in situations like this occurs in 6.1 only, upgrading to 6.2 will address the issue.
I understand the situation with the third exception and will work on a fix.
I opened #28973.
Most helpful comment
I understand the situation with the third exception and will work on a fix.