Elasticsearch: java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached

Created on 12 Jul 2018 · 6Comments · Source: elastic/elasticsearch

Elasticsearch version (bin/elasticsearch --version):
6.3.1
Plugins installed: [xpack]

JVM version (java -version):10

OS version (uname -a if on a Unix-like system):centos

Description of the problem including expected versus actual behavior:

java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
but my os has free 80g memory

i used docker.elastic.co/elasticsearch/elasticsearch:6.3.1,
jvm config:
-Xms32g
-Xmx32g

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 514880
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Provide logs (if relevant):

[67329.555s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.

[67329.557s][warning][os,thread] Failed to start thread - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.

[2018-07-12T02:47:53,549][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [e11redis28.mercury.corp] fatal error in thread [elasticsearch[e11redis28.mercury.corp][refresh][T#2]], exiting

java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached

t java.lang.Thread.start0(Native Method) ~[?:?]

t java.lang.Thread.start(Thread.java:813) ~[?:?]

t java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:944) ~[?:?]

t java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1012) ~[?:?]

t java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:?]

t java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]

t java.lang.Thread.run(Thread.java:844) [?:?]

:CorInfrCore

Source

TrumanDu

Most helpful comment

@vladimirdolzhenko @danielmitterdorfer Thanks for writing this! It helped me debug a similar (but likely unrelated) issue in our app (which is using the ES client). For whatever reason, it had gone berserk during the weekend, spawning 9400 threads which made the machine fail in new thread creation for the same user account.

ps -o nlwp,pid -fe helped me spot this, so I could kill the bad process and get the system back to a usable state. Greatly appreciated!

perlun on 18 Feb 2019

👍3 ❤2

All 6 comments

@TrumanDu can you please provide stats about that specific jvm like
number of threads ps -o nlwp -p $PID (where $PID is the pid of that jvm - you can find pid with jps -lvm)
and memory details of process top -p $PID

vladimirdolzhenko on 12 Jul 2018

@vladimirdolzhenko thanks for you reply!

top

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 9566 1000      20   0  2.624t 0.036t 2.631g S 120.3 29.3  20:04.17 java

ps -o nlwp -p

This is the current situation

After running for a while, it will down.

this node size of the data is probably 2.6 T

TrumanDu on 12 Jul 2018

Pinging @elastic/es-core-infra

elasticmachine on 13 Jul 2018

I think OutOfMemoryError is a bit misleading choice by the JVM in this case. You are hitting a limit on the number of processes that this user is allowed to run which is almost certainly unrelated to the amount of free memory in your case. As limits can be configured in a variety of places it can take a bit of time to find out what exactly is causing it.

A while back we had a similar issue in our CI environment and our blog post We are out of memory provides pointers on what you should check.

Having that said, this is an issue that is related to the configuration of the environment and thus I think we should close this and take further discussion to our Discuss forum.