Elasticsearch: new generation too small / excessive garbage collection of young generation

Created on 13 Apr 2018 · 9Comments · Source: elastic/elasticsearch

Wrongly reported in https://github.com/elastic/elasticsearch-docker/issues/155 1st, now adding here instead.

The sizing for the young-generation seems off in the standard jvm.options seems not very well chosen. When reading docs/guidelines from elastic, the guideline is to not change the GC-settings, only change the heap-size. I am using the standard elastic image: https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html (or rather https://hub.docker.com/r/davidkarlsen/elasticsearchhq/ which adds a plugin, but that's the only change).

fgrep MaxNewSize *
gc.log.0.current:CommandLine flags: -XX:+AlwaysPreTouch -XX:CMSInitiatingOccupancyFraction=75 -XX:GCLogFileSize=67108864 -XX:+HeapDumpOnOutOfMemoryError -XX:InitialHeapSize=19327352832 -XX:MaxHeapSize=19327352832 -XX:MaxNewSize=174485504 -XX:MaxTenuringThreshold=6 -XX:NewSize=174485504 -XX:NumberOfGCLogFiles=32 -XX:OldPLABSize=16 -XX:OldSize=348971008 -XX:-OmitStackTraceInFastThrow -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:ThreadStackSize=1024 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation -XX:+UseParNewGC

gclog:

2018-04-13T09:29:39.897+0200: 130451.177: [GC (Allocation Failure) 2018-04-13T09:29:39.897+0200: 130451.178: [ParNew
Desired survivor size 8716288 bytes, new threshold 2 (max 6)
- age   1:    6692208 bytes,    6692208 total
- age   2:    3618576 bytes,   10310784 total
- age   3:    3794016 bytes,   14104800 total
: 145061K->16167K(153344K), 0.0685724 secs] 11959807K->11830914K(18857344K), 0.0686832 secs] [Times: user=0.12 sys=0.00, real=0.07 secs] 
2018-04-13T09:29:39.966+0200: 130451.246: Total time for which application threads were stopped: 0.0691450 seconds, Stopping threads took: 0.0001155 seconds
2018-04-13T09:29:41.634+0200: 130452.914: [GC (Allocation Failure) 2018-04-13T09:29:41.634+0200: 130452.914: [ParNew
Desired survivor size 8716288 bytes, new threshold 2 (max 6)
- age   1:    6921680 bytes,    6921680 total
- age   2:    6343104 bytes,   13264784 total
: 152487K->17024K(153344K), 0.0926382 secs] 11967234K->11839050K(18857344K), 0.0927670 secs] [Times: user=0.15 sys=0.00, real=0.10 secs] 
2018-04-13T09:29:41.726+0200: 130453.006: Total time for which application threads were stopped: 0.0931676 seconds, Stopping threads took: 0.0000598 seconds
2018-04-13T09:29:45.003+0200: 130456.283: [GC (Allocation Failure) 2018-04-13T09:29:45.003+0200: 130456.283: [ParNew
Desired survivor size 8716288 bytes, new threshold 2 (max 6)
- age   1:    7243696 bytes,    7243696 total
- age   2:    5887968 bytes,   13131664 total
: 153344K->17024K(153344K), 0.1166746 secs] 11975370K->11845278K(18857344K), 0.1167974 secs] [Times: user=0.14 sys=0.00, real=0.12 secs] 
2018-04-13T09:29:45.120+0200: 130456.400: Total time for which application threads were stopped: 0.1172260 seconds, Stopping threads took: 0.0000525 seconds
2018-04-13T09:29:45.289+0200: 130456.569: [GC (Allocation Failure) 2018-04-13T09:29:45.289+0200: 130456.569: [ParNew
Desired survivor size 8716288 bytes, new threshold 2 (max 6)
- age   1:    7422480 bytes,    7422480 total
- age   2:    5845600 bytes,   13268080 total
: 152937K->17024K(153344K), 0.1204861 secs] 11981191K->11850982K(18857344K), 0.1206075 secs] [Times: user=0.15 sys=0.00, real=0.12 secs] 
2018-04-13T09:29:45.410+0200: 130456.690: Total time for which application threads were stopped: 0.1210332 seconds, Stopping threads took: 0.0000611 seconds
2018-04-13T09:29:45.504+0200: 130456.784: [GC (Allocation Failure) 2018-04-13T09:29:45.504+0200: 130456.784: [ParNew
Desired survivor size 8716288 bytes, new threshold 2 (max 6)
- age   1:    7577560 bytes,    7577560 total
- age   2:    7010416 bytes,   14587976 total
: 153344K->17024K(153344K), 0.0757820 secs] 11987302K->11859466K(18857344K), 0.0759058 secs] [Times: user=0.14 sys=0.00, real=0.07 secs] 
2018-04-13T09:29:45.580+0200: 130456.860: Total time for which application threads were stopped: 0.0773335 seconds, Stopping threads took: 0.0010613 seconds
2018-04-13T09:29:45.715+0200: 130456.995: [GC (Allocation Failure) 2018-04-13T09:29:45.715+0200: 130456.995: [ParNew
Desired survivor size 8716288 bytes, new threshold 2 (max 6)
- age   1:    4911008 bytes,    4911008 total
- age   2:    6793904 bytes,   11704912 total
: 153344K->13913K(153344K), 0.0846712 secs] 11995786K->11863057K(18857344K), 0.0848011 secs] [Times: user=0.15 sys=0.00, real=0.08 secs] 
2018-04-13T09:29:45.800+0200: 130457.080: Total time for which application threads were stopped: 0.0852069 seconds, Stopping threads took: 0.0000540 seconds
2018-04-13T09:29:45.960+0200: 130457.240: [GC (Allocation Failure) 2018-04-13T09:29:45.960+0200: 130457.240: [ParNew
Desired survivor size 8716288 bytes, new threshold 6 (max 6)
- age   1:    2795432 bytes,    2795432 total
- age   2:    4395024 bytes,    7190456 total
: 150189K->12051K(153344K), 0.0770999 secs] 11999332K->11867784K(18857344K), 0.0772358 secs] [Times: user=0.13 sys=0.00, real=0.08 secs]

The gc-monitor complains:

sudo tail -f /var/log/messages|grep -i monitor
Apr 13 09:30:23 alp-aot-ccm03 docker/fa5c9da092fe[25991]: [2018-04-13T07:30:23,968][INFO ][o.e.m.j.JvmGcMonitorService] [HUCBRNR] [gc][129959] overhead, spent [268ms] collecting in the last [1s]
Apr 13 09:30:29 alp-aot-ccm03 docker/fa5c9da092fe[25991]: [2018-04-13T07:30:29,971][INFO ][o.e.m.j.JvmGcMonitorService] [HUCBRNR] [gc][129965] overhead, spent [346ms] collecting in the last [1s]
Apr 13 09:30:41 alp-aot-ccm03 docker/fa5c9da092fe[25991]: [2018-04-13T07:30:41,047][INFO ][o.e.m.j.JvmGcMonitorService] [HUCBRNR] [gc][129976] overhead, spent [286ms] collecting in the last [1s]

I am seeing that old gen is collected around every 40minutes, but the young gen is collecting like crazy (the number is around "30" in x-pack monitoring view in kibana monitoring app - but not so easy to understand what that figure is, probably frequency between gc's)

from the stats api after running for > 24hrs:

"pools" : {
            "young" : {
              "used_in_bytes" : 139303520,
              "max_in_bytes" : 139591680, <-- only 139 Mb
              "peak_used_in_bytes" : 139591680,
              "peak_max_in_bytes" : 139591680
            },
            "survivor" : {
              "used_in_bytes" : 17432576,
              "max_in_bytes" : 17432576, <-- only 17 Mb!
              "peak_used_in_bytes" : 17432576,
              "peak_max_in_bytes" : 17432576
            },
            "old" : {
              "used_in_bytes" : 4988199848,
              "max_in_bytes" : 19152896000,
              "peak_used_in_bytes" : 14391151600,
              "peak_max_in_bytes" : 19152896000
            }
          }
        },
        "threads" : {
          "count" : 57,
          "peak_count" : 61
        },
        "gc" : {
          "collectors" : {
            "young" : {
              "collection_count" : 110044,
              "collection_time_in_millis" : 8392102
            },
            "old" : {
              "collection_count" : 41,
              "collection_time_in_millis" : 4808
            }
          }

Elasticsearch version (bin/elasticsearch --version):
elasticsearch --version
Version: 6.2.3, Build: c59ff00/2018-03-13T10:06:29.741383Z, JVM: 1.8.0_161

Plugins installed: []

https://github.com/vvanholl/elasticsearch-prometheus-exporter
x-pack (from the default image)

JVM version (java -version):
/usr/lib/jvm/jre-1.8.0-openjdk/bin/java -version
openjdk version "1.8.0_161"
OpenJDK Runtime Environment (build 1.8.0_161-b14)
OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)

OS version (uname -a if on a Unix-like system):
uname -a
Linux fa5c9da092fe 3.10.0-693.1.1.el7.x86_64 #1 SMP Thu Aug 3 08:15:31 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
See discussion (and log-excerpts) from
https://discuss.elastic.co/t/excessive-garbage-collection/127527/6

Steps to reproduce:

Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.

use standard logstash index mapping
add -Xmx18g -Xms18g
push in events at a rate of 300e/s

Provide logs (if relevant):
See the forum for output, or ask for anything and I can add more info.

:CorInfrCore

Source

davidkarlsen

Most helpful comment

You are right: Your young generation size is quite small, however that is an ergonomic choice of the JVM an we do not want to override its default choice. Hence I'll close the ticket because we will not change anything in Elasticsearch.

However, I want to explain what (I think) is going on.

Reproduction

I tried to reproduce your scenario on a machine with 32GB of RAM and 4 physical cores (8 with hyperthreading).

I ran:

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -e ES_JAVA_OPTS="-Xms18g -Xmx18g" docker.elastic.co/elasticsearch/elasticsearch:6.2.3

in the gc.log we see:

CommandLine flags: -XX:+AlwaysPreTouch -XX:CMSInitiatingOccupancyFraction=75 -XX:GCLogFileSize=67108864 -XX:+HeapDumpOnOutOfMemoryError -XX:InitialHeapSize=19327352832 -XX:MaxHeapSize=19327352832 -XX:MaxNewSize=697933824 -XX:MaxTenuringThreshold=6 -XX:NewSize=697933824 -XX:NumberOfGCLogFiles=32 -XX:OldPLABSize=16 -XX:OldSize=1395867648 -XX:-OmitStackTraceInFastThrow -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:ThreadStackSize=1024 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation -XX:+UseParNewGC

The node stats show:

{
        "mem" : {
          "heap_used_in_bytes" : 494350496,
          "heap_used_percent" : 2,
          "heap_committed_in_bytes" : 19257622528,
          "heap_max_in_bytes" : 19257622528,
          "non_heap_used_in_bytes" : 99471720,
          "non_heap_committed_in_bytes" : 107511808,
          "pools" : {
            "young" : {
              "used_in_bytes" : 413565824,
              "max_in_bytes" : 558432256,
              "peak_used_in_bytes" : 558432256,
              "peak_max_in_bytes" : 558432256
            },
            "survivor" : {
              "used_in_bytes" : 36792080,
              "max_in_bytes" : 69730304,
              "peak_used_in_bytes" : 60211504,
              "peak_max_in_bytes" : 69730304
            },
            "old" : {
              "used_in_bytes" : 43992592,
              "max_in_bytes" : 18629459968,
              "peak_used_in_bytes" : 43992592,
              "peak_max_in_bytes" : 18629459968
            }
          }
        }
}

Adding young (which is more appropriately: Eden) + 2 * survivor (there are two survivor spaces) gives 558432256 + 2*69730304 = 697892864 bytes = 665MB.

If I try the following on the command line:

java -Xms18G -Xmx18G -XX:+UseConcMarkSweepGC -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version | egrep -i "(NewSize|OldSize)"
    uintx MaxNewSize                               := 697892864                           {product}
    uintx NewSize                                  := 697892864                           {product}
    uintx NewSizeThreadIncrease                     = 5320                                {pd product}
    uintx OldSize                                  := 18629459968                         {product}

I get the same value for NewSize this matches the calculation and the output in the gc log file.

The question is now: Why do you get a (vastly) different value? Let's turn to the JVM source code. The relevant piece of code is in /src/share/vm/runtime/arguments.cpp#l1265:

const size_t preferred_max_new_size_unaligned =
    MIN2(max_heap/(NewRatio+1), ScaleForWordSize(young_gen_per_worker * parallel_gc_threads));

with such a large heap, only the part ScaleForWordSize(young_gen_per_worker * parallel_gc_threads) matters as the JVM will use the minimum of the two terms. The parameters for that term are:

young_gen_per_worker is 64MB for x86_64.
parallel_gc_threads is (again) chosen ergonomically by the JVM.

You can check the latter for your environment with:

java -Xms18G -Xmx18G -XX:+UseConcMarkSweepGC -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version | grep -i "ParallelGCThreads"

On my machine it says:

    uintx ParallelGCThreads                         = 8                                   {product}

If I explicitly set ParallelGCThreads to 2, I get a size that is very similar to yours (yours was 174485504 bytes, mine is 174456832 bytes):

java -Xms18G -Xmx18G -XX:ParallelGCThreads=2 -XX:+UseConcMarkSweepGC -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version | egrep -i "(NewSize|OldSize)"
    uintx MaxNewSize                               := 174456832                           {product}
    uintx NewSize                                  := 174456832                           {product}
    uintx NewSizeThreadIncrease                     = 5320                                {pd product}
    uintx OldSize                                  := 19152896000                         {product}

Next steps for you

you should check whether the number of ParallelGCThreads is indeed as low as I think it is.

java -Xms18G -Xmx18G -XX:+UseConcMarkSweepGC -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version | grep -i "ParallelGCThreads"

This is probably due to the number of CPUs on the system.

You can now either use a different system with more CPUs or manually increase the size of the young generation via an explicit JVM flag. However, you should keep in mind that due to the low number of GC threads you might get into trouble later (I'd not be surprised about long GC pauses because of that).

Btw, I saw your WTF comment in https://hub.docker.com/r/davidkarlsen/elasticsearchhq/~/dockerfile/. According to the docs, the setting you're referring to (xpack.reporting.enabled) is intended for kibana.yml and unknown to Elasticsearch that's why Elasticsearch complains upon startup.

danielmitterdorfer on 13 Apr 2018

👍14 ❤2

All 9 comments

Pinging @elastic/es-core-infra

elasticmachine on 13 Apr 2018

However, I want to explain what (I think) is going on.

Reproduction

I tried to reproduce your scenario on a machine with 32GB of RAM and 4 physical cores (8 with hyperthreading).

I ran:

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -e ES_JAVA_OPTS="-Xms18g -Xmx18g" docker.elastic.co/elasticsearch/elasticsearch:6.2.3

in the gc.log we see:

CommandLine flags: -XX:+AlwaysPreTouch -XX:CMSInitiatingOccupancyFraction=75 -XX:GCLogFileSize=67108864 -XX:+HeapDumpOnOutOfMemoryError -XX:InitialHeapSize=19327352832 -XX:MaxHeapSize=19327352832 -XX:MaxNewSize=697933824 -XX:MaxTenuringThreshold=6 -XX:NewSize=697933824 -XX:NumberOfGCLogFiles=32 -XX:OldPLABSize=16 -XX:OldSize=1395867648 -XX:-OmitStackTraceInFastThrow -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:ThreadStackSize=1024 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation -XX:+UseParNewGC

The node stats show:

{
        "mem" : {
          "heap_used_in_bytes" : 494350496,
          "heap_used_percent" : 2,
          "heap_committed_in_bytes" : 19257622528,
          "heap_max_in_bytes" : 19257622528,
          "non_heap_used_in_bytes" : 99471720,
          "non_heap_committed_in_bytes" : 107511808,
          "pools" : {
            "young" : {
              "used_in_bytes" : 413565824,
              "max_in_bytes" : 558432256,
              "peak_used_in_bytes" : 558432256,
              "peak_max_in_bytes" : 558432256
            },
            "survivor" : {
              "used_in_bytes" : 36792080,
              "max_in_bytes" : 69730304,
              "peak_used_in_bytes" : 60211504,
              "peak_max_in_bytes" : 69730304
            },
            "old" : {
              "used_in_bytes" : 43992592,
              "max_in_bytes" : 18629459968,
              "peak_used_in_bytes" : 43992592,
              "peak_max_in_bytes" : 18629459968
            }
          }
        }
}

Adding young (which is more appropriately: Eden) + 2 * survivor (there are two survivor spaces) gives 558432256 + 2*69730304 = 697892864 bytes = 665MB.

If I try the following on the command line:

java -Xms18G -Xmx18G -XX:+UseConcMarkSweepGC -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version | egrep -i "(NewSize|OldSize)"
    uintx MaxNewSize                               := 697892864                           {product}
    uintx NewSize                                  := 697892864                           {product}
    uintx NewSizeThreadIncrease                     = 5320                                {pd product}
    uintx OldSize                                  := 18629459968                         {product}

I get the same value for NewSize this matches the calculation and the output in the gc log file.

The question is now: Why do you get a (vastly) different value? Let's turn to the JVM source code. The relevant piece of code is in /src/share/vm/runtime/arguments.cpp#l1265:

const size_t preferred_max_new_size_unaligned =
    MIN2(max_heap/(NewRatio+1), ScaleForWordSize(young_gen_per_worker * parallel_gc_threads));

with such a large heap, only the part ScaleForWordSize(young_gen_per_worker * parallel_gc_threads) matters as the JVM will use the minimum of the two terms. The parameters for that term are:

young_gen_per_worker is 64MB for x86_64.
parallel_gc_threads is (again) chosen ergonomically by the JVM.

You can check the latter for your environment with:

java -Xms18G -Xmx18G -XX:+UseConcMarkSweepGC -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version | grep -i "ParallelGCThreads"

On my machine it says:

    uintx ParallelGCThreads                         = 8                                   {product}

If I explicitly set ParallelGCThreads to 2, I get a size that is very similar to yours (yours was 174485504 bytes, mine is 174456832 bytes):

java -Xms18G -Xmx18G -XX:ParallelGCThreads=2 -XX:+UseConcMarkSweepGC -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version | egrep -i "(NewSize|OldSize)"
    uintx MaxNewSize                               := 174456832                           {product}
    uintx NewSize                                  := 174456832                           {product}
    uintx NewSizeThreadIncrease                     = 5320                                {pd product}
    uintx OldSize                                  := 19152896000                         {product}

Next steps for you

you should check whether the number of ParallelGCThreads is indeed as low as I think it is.

java -Xms18G -Xmx18G -XX:+UseConcMarkSweepGC -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version | grep -i "ParallelGCThreads"

This is probably due to the number of CPUs on the system.

danielmitterdorfer on 13 Apr 2018

👍14 ❤2

Thank you very much for the thorough and knowledgeable response.

Num GC-threads was indeed 2, on my 4-core (non-hyperthreading) box - and the reason for that is that cpus have been added hot-plugged and the docker daemon not restarted, so I went to /proc...cpuset and enabled all cpus doing echo 0-3 into cpuset, and recreated the container in order to get a fresh java process with an updated view on the world, and indeed I now have 4 gc-threads (along with larger regions). The effect is very apparent and the young-gen gc immediately halved. Additionally I have requested to flip the CPUs over to hyperthreading (as this is a virtual environment) - so I expect to get even better figures after that.

The gc monitor can still kick in when I do a search over 24hr worth of data, but the searches will complete and it stabilizes (this is a mostly write env).

Again - thanks a lot for the explanation!

davidkarlsen on 13 Apr 2018

You're welcome, glad that it helped you.

danielmitterdorfer on 16 Apr 2018

So on 4 cores machine it is not recommended to run elastic with heap size of 30G?

ezraroi on 19 Jul 2018

So on 4 cores machine it is not recommended to run elastic with heap size of 30G?

I think that conclusion is oversimplifying my analysis. The JVM chooses the number of GC threads ergonomically and also bases the young generation size on this decision. With a small number of cores you end up with less GC threads and thus a smaller young generation. Consequently, it will promote objects earlier from the young generation to the old generation (because there is less space available for the objects that are still alive in the young generation). You need to measure the performance impact in your scenario, decide whether you are satisfied and if not potentially tune the GC / heap settings.

danielmitterdorfer on 19 Jul 2018

yes but what we see is that the JVM is 65% of the time doing GC on the young generation as most of the time our data nodes are doing heavy load writing which makes them very slow

ezraroi on 20 Jul 2018

I'm observing the opposite behavior while trying to solve high CPU usage. Server is the huge: 48 CPU, 256GB RAM (https://www.hetzner.com/dedicated-rootserver/ax160/configurator) with fast NVI SSD.

Here is the stats of 5 nodes (they all live on that single server for now, having 20GB ram each). They eventually will migrate to separate smaller nodes, I'm trying to solve the problem asap.

$ curl -X GET "localhost:9200/_cat/nodes?v"
ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
127.0.0.1           23          87  93   69.45   64.85    56.70 mdi       *      node1
127.0.0.1           12          87  85   69.45   64.85    56.70 mdi       -      node4
127.0.0.1           17          87  85   69.45   64.85    56.70 mdi       -      node5
127.0.0.1           14          87  85   69.45   64.85    56.70 mdi       -      node3
127.0.0.1           11          87  85   69.45   64.85    56.70 mdi       -      node2


$ curl -X GET "localhost:9200/_cat/allocation?v"
shards disk.indices disk.used disk.avail disk.total disk.percent host      ip        node
     9       10.7gb   314.6gb    560.9gb    875.5gb           35 127.0.0.1 127.0.0.1 node5
     9       10.4gb   314.6gb    560.9gb    875.5gb           35 127.0.0.1 127.0.0.1 node3
     9       10.3gb   314.6gb    560.9gb    875.5gb           35 127.0.0.1 127.0.0.1 node2
     9       10.2gb   314.6gb    560.9gb    875.5gb           35 127.0.0.1 127.0.0.1 node4
     9       10.4gb   314.6gb    560.8gb    875.5gb           35 127.0.0.1 127.0.0.1 node1

$ curl -X GET "localhost:9200/_cat/health?v"
epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1532135075 01:04:35  elasticsearch green           5         5     45  45    0    0        0             0                  -                100.0%

# root @ cmc-web-1 in /var/log/elasticsearch [0:55:28]
$ java -Xms18G -Xmx18G -XX:+UseConcMarkSweepGC -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version | grep -i "ParallelGCThreads"
    uintx ParallelGCThreads                         = 33                                  {product}
openjdk version "1.8.0_171"
OpenJDK Runtime Environment (build 1.8.0_171-8u171-b11-0ubuntu0.16.04.1-b11)
OpenJDK 64-Bit Server VM (build 25.171-b11, mixed mode)

# root @ cmc-web-1 in /var/log/elasticsearch [0:56:15]
$ java -Xms18G -Xmx18G -XX:ParallelGCThreads=2 -XX:+UseConcMarkSweepGC -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version | egrep -i "(NewSize|OldSize)"
    uintx MaxNewSize                               := 174456832                           {product}
    uintx NewSize                                  := 174456832                           {product}
    uintx NewSizeThreadIncrease                     = 5320                                {pd product}
    uintx OldSize                                  := 19152896000                         {product}
openjdk version "1.8.0_171"
OpenJDK Runtime Environment (build 1.8.0_171-8u171-b11-0ubuntu0.16.04.1-b11)
OpenJDK 64-Bit Server VM (build 25.171-b11, mixed mode)

# root @ cmc-web-1 in /var/log/elasticsearch [1:04:35]
$ curl -X GET "localhost:9200/"
{
  "name" : "node1",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "gva6U2gLR6mozUzJHkUR5g",
  "version" : {
    "number" : "6.1.4",
    "build_hash" : "d838f2d",
    "build_date" : "2018-03-14T08:28:22.470Z",
    "build_snapshot" : false,
    "lucene_version" : "7.1.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

htop constantly says each node eats 300+% of CPU. Logs of each nodes shows that GC runs 3-4 times a minute (not always, it may occupy one node and leave in peace the other for a while..). As you can see I've allocated x2 more ram than size of data.

Its not a high traffic website yet, 6–10K pageviews per day, each pageview triggers search across e-commerce products database with lots of aggregations. Portion of the queries uses random_score sorting.

I will highly appreciate any directions - what logs should I investigate to find out what causes this problem. Thank you!

Typical Query:

{
    "from": 0,
    "size": 248,
    "sort": [
        {
            "meta._product_popularity.long": {
                "order": "desc"
            }
        }
    ],
    "query": {
        "bool": {
            "must": [
                {
                    "bool": {
                        "must": [
                            {
                                "terms": {
                                    "terms.product_cat.slug": [
                                        "majtki-damskie-koronkowe",
                                        "majtki-damskie-szorty",
                                        "panty-damskie",
                                        "majtki-damskie",
                                        "bokserki-damskie",
                                        "figi",
                                        "stringi"
                                    ]
                                }
                            }
                        ],
                        "must_not": [
                            {
                                "terms": {
                                    "terms.product_visibility.term_taxonomy_id": [
                                        3293
                                    ]
                                }
                            }
                        ]
                    }
                },
                {
                    "term": {
                        "post_type.raw": "product"
                    }
                },
                {
                    "terms": {
                        "post_status": [
                            "publish",
                            "acf-disabled"
                        ]
                    }
                }
            ]
        }
    },
    "post_filter": {
        "bool": {
            "must": [
                {
                    "bool": {
                        "must": [
                            {
                                "terms": {
                                    "terms.product_cat.slug": [
                                        "majtki-damskie-koronkowe",
                                        "majtki-damskie-szorty",
                                        "panty-damskie",
                                        "majtki-damskie",
                                        "bokserki-damskie",
                                        "figi",
                                        "stringi"
                                    ]
                                }
                            }
                        ],
                        "must_not": [
                            {
                                "terms": {
                                    "terms.product_visibility.term_taxonomy_id": [
                                        3293
                                    ]
                                }
                            }
                        ]
                    }
                },
                {
                    "term": {
                        "post_type.raw": "product"
                    }
                },
                {
                    "terms": {
                        "post_status": [
                            "publish",
                            "acf-disabled"
                        ]
                    }
                }
            ]
        }
    },
    "aggs": {
        "pa_brand": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "terms": {
                                "terms.product_cat.slug": [
                                    "majtki-damskie-koronkowe",
                                    "majtki-damskie-szorty",
                                    "panty-damskie",
                                    "majtki-damskie",
                                    "bokserki-damskie",
                                    "figi",
                                    "stringi"
                                ]
                            }
                        }
                    ]
                }
            },
            "aggs": {
                "pa_brand": {
                    "terms": {
                        "size": "6952",
                        "field": "terms.pa_brand.slug",
                        "order": {
                            "_term": "asc"
                        }
                    },
                    "aggs": {
                        "pa_brand": {
                            "top_hits": {
                                "size": 1,
                                "_source": [
                                    "terms.pa_brand"
                                ]
                            }
                        }
                    }
                }
            }
        },
        "pa_color": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "terms": {
                                "terms.product_cat.slug": [
                                    "majtki-damskie-koronkowe",
                                    "majtki-damskie-szorty",
                                    "panty-damskie",
                                    "majtki-damskie",
                                    "bokserki-damskie",
                                    "figi",
                                    "stringi"
                                ]
                            }
                        }
                    ]
                }
            },
            "aggs": {
                "pa_color": {
                    "terms": {
                        "size": "11",
                        "field": "terms.pa_color.slug",
                        "order": {
                            "_term": "asc"
                        }
                    },
                    "aggs": {
                        "pa_color": {
                            "top_hits": {
                                "size": 1,
                                "_source": [
                                    "terms.pa_color"
                                ]
                            }
                        }
                    }
                }
            }
        },
        "pa_merchant": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "terms": {
                                "terms.product_cat.slug": [
                                    "majtki-damskie-koronkowe",
                                    "majtki-damskie-szorty",
                                    "panty-damskie",
                                    "majtki-damskie",
                                    "bokserki-damskie",
                                    "figi",
                                    "stringi"
                                ]
                            }
                        }
                    ]
                }
            },
            "aggs": {
                "pa_merchant": {
                    "terms": {
                        "size": "116",
                        "field": "terms.pa_merchant.slug",
                        "order": {
                            "_term": "asc"
                        }
                    },
                    "aggs": {
                        "pa_merchant": {
                            "top_hits": {
                                "size": 1,
                                "_source": [
                                    "terms.pa_merchant"
                                ]
                            }
                        }
                    }
                }
            }
        },
        "pa_wzor": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "terms": {
                                "terms.product_cat.slug": [
                                    "majtki-damskie-koronkowe",
                                    "majtki-damskie-szorty",
                                    "panty-damskie",
                                    "majtki-damskie",
                                    "bokserki-damskie",
                                    "figi",
                                    "stringi"
                                ]
                            }
                        }
                    ]
                }
            },
            "aggs": {
                "pa_wzor": {
                    "terms": {
                        "size": "24",
                        "field": "terms.pa_wzor.slug",
                        "order": {
                            "_term": "asc"
                        }
                    },
                    "aggs": {
                        "pa_wzor": {
                            "top_hits": {
                                "size": 1,
                                "_source": [
                                    "terms.pa_wzor"
                                ]
                            }
                        }
                    }
                }
            }
        },
        "pa_material": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "terms": {
                                "terms.product_cat.slug": [
                                    "majtki-damskie-koronkowe",
                                    "majtki-damskie-szorty",
                                    "panty-damskie",
                                    "majtki-damskie",
                                    "bokserki-damskie",
                                    "figi",
                                    "stringi"
                                ]
                            }
                        }
                    ]
                }
            },
            "aggs": {
                "pa_material": {
                    "terms": {
                        "size": "54",
                        "field": "terms.pa_material.slug",
                        "order": {
                            "_term": "asc"
                        }
                    },
                    "aggs": {
                        "pa_material": {
                            "top_hits": {
                                "size": 1,
                                "_source": [
                                    "terms.pa_material"
                                ]
                            }
                        }
                    }
                }
            }
        },
        "pa_rozmiar": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "terms": {
                                "terms.product_cat.slug": [
                                    "majtki-damskie-koronkowe",
                                    "majtki-damskie-szorty",
                                    "panty-damskie",
                                    "majtki-damskie",
                                    "bokserki-damskie",
                                    "figi",
                                    "stringi"
                                ]
                            }
                        }
                    ]
                }
            },
            "aggs": {
                "pa_rozmiar": {
                    "terms": {
                        "size": "9",
                        "field": "terms.pa_rozmiar.slug",
                        "order": {
                            "_term": "asc"
                        }
                    },
                    "aggs": {
                        "pa_rozmiar": {
                            "top_hits": {
                                "size": 1,
                                "_source": [
                                    "terms.pa_rozmiar"
                                ]
                            }
                        }
                    }
                }
            }
        },
        "product_cat": {
            "filter": {
                "match_all": {
                    "boost": 1
                }
            },
            "aggs": {
                "product_cat": {
                    "terms": {
                        "size": "750",
                        "field": "terms.product_cat.slug",
                        "order": {
                            "_term": "asc"
                        }
                    },
                    "aggs": {
                        "product_cat": {
                            "top_hits": {
                                "size": 1,
                                "_source": [
                                    "terms.product_cat"
                                ]
                            }
                        }
                    }
                }
            }
        },
        "price": {
            "aggs": {
                "percentiles": {
                    "percentiles": {
                        "field": "meta._price.long"
                    }
                },
                "min": {
                    "min": {
                        "field": "meta._price.long"
                    }
                },
                "max": {
                    "max": {
                        "field": "meta._price.long"
                    }
                }
            },
            "filter": {
                "bool": {
                    "must": [
                        {
                            "terms": {
                                "terms.product_cat.slug": [
                                    "majtki-damskie-koronkowe",
                                    "majtki-damskie-szorty",
                                    "panty-damskie",
                                    "majtki-damskie",
                                    "bokserki-damskie",
                                    "figi",
                                    "stringi"
                                ]
                            }
                        }
                    ]
                }
            }
        },
        "discount": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "range": {
                                "meta._sale_price.long": {
                                    "gt": 0
                                }
                            }
                        },
                        {
                            "terms": {
                                "terms.product_cat.slug": [
                                    "majtki-damskie-koronkowe",
                                    "majtki-damskie-szorty",
                                    "panty-damskie",
                                    "majtki-damskie",
                                    "bokserki-damskie",
                                    "figi",
                                    "stringi"
                                ]
                            }
                        }
                    ]
                }
            }
        }
    }
}

pySilver on 21 Jul 2018

Leaving this here for anyone seeing excessive GC after upgrading to ES6 using JDK8. In my searching I came across this github article but needed more to find our solution. For us we have a 5 node cluster each with 32GB Ram and each has a Heap of 16G and the young gen was only 600M. Not sure exactly our setting for young gen before the upgrade. ES6, for us anyway, uses a ton of young gen but not so much old gen. So setting the newRatio=1 along with the other changes mentioned in the article below gave us our solution.

First step would be to upload your current log to gceasy.io and see what's going on. For us we were getting about 4-5 full GCs per day and the graph showed that we are maxing the young gen immediately and old gen was fairly full. Our issue in a nutshell

"When the young generation is undersized, objects will be prematurely promoted to the old generation. Collecting garbage from the old generation takes more time than collecting it from the young generation. Thus, increasing the young generation size has the potential to reduce long GC pauses."

Changes to make per node (also should uncomment the logging entries):

jvm.options

GC configuration

-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=50 # <= changed from 75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:NewRatio=1 # <= set to 1
-XX:ParallelGCThreads=8 <= set to match the number of CPUs

(With the NewRatio set to 1 this gave our young and old gen each 50% of the heap. Young gen went from 600M to 8G for us)

elasticsearch.yml add this
indices.memory.index_buffer_size: 20%

And using a curl command change this setting for the cluster:
index.refresh_interval to 30s

curl -XPUT -u es_admin:your_passwd -H "Content-Type: application/json" 'https://127.0.0.1:9200/_all/_settings?preserve_existing=true' -d '{"index.refresh_interval" : "30s"}' -k

Would be good to mv or rm your old gc log files so as to be sure you have a fresh new one after the node restart to upload for analysis.

After restarting your cluster let run for at least an hour with traffic and upload your gc.log to gceasy.io and see how they are doing.

Article that saved us: https://e-mc2.net/elasticsearch-garbage-collection-hell