Elasticsearch: Groovy scripting - still memory leak in es 2.4

Created on 29 Nov 2016 · 4Comments · Source: elastic/elasticsearch

Elasticsearch version:
2.4.1

Plugins installed:
[head]

JVM version:
java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)

OS version:
Amazon Linux 2015.09
Linux ip-10-0-3-95 3.14.48-33.39.amzn1.x86_64 #1 SMP Tue Jul 14 23:43:07 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
Our real time increment index depends on groovy script update. I know the groovy script memory-leak problem has been fixed in es 2.4, so last week our company updated elasticsearch from 2.3.5 to 2.4.1. But it still has the problem. We run es on ec2. ES_HEAP_SIZE is set to 12g(30g total). And we use the default JAVA_OPTS: -Xms12g -Xmx12g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Djna.nosys=true. Here is the jmap -histo:live result:

num     #instances         #bytes  class name
----------------------------------------------
   1:       1859295     3119887784  [C
   2:        535574     1873825064  [B
   3:      31419182     1759474192  org.codehaus.groovy.runtime.metaclass.MetaMethodIndex$Entry
   4:      24807473      708769280  [Ljava.lang.Object;
   5:      22751538      546036912  org.codehaus.groovy.util.FastArray
   6:      12738630      407636160  java.util.HashMap$Node
   7:       2527999      364032752  [Lorg.codehaus.groovy.util.ComplexKeyHashMap$Entry;
   8:       4634544      222458112  java.lang.invoke.MethodHandleImpl$CountingWrapper
   9:       5958956      190686592  org.codehaus.groovy.util.SingleKeyHashMap$Entry
  10:        180570      187793824  [Lorg.codehaus.groovy.runtime.metaclass.MetaMethodIndex$Entry;
  11:       1279161      174849808  [Ljava.util.HashMap$Node;
  12:       3865263      154610520  java.lang.invoke.BoundMethodHandle$Species_LL
  13:       3095493      148583664  java.util.HashMap
  14:       2317277      111229296  java.lang.invoke.BoundMethodHandle$Species_L4
  15:       1088514       95789232  java.lang.reflect.Method
  16:       2708544       86673408  groovy.lang.MetaBeanProperty
  17:        906208       72496640  java.lang.reflect.Constructor
  18:       1407429       67556592  java.lang.invoke.BoundMethodHandle$Species_L5
  19:       1264365       60689520  org.codehaus.groovy.util.ReferenceType$SoftRef
  20:        909843       58229952  java.lang.invoke.BoundMethodHandle$Species_L4IL4
  21:       1815072       58082304  java.lang.ref.WeakReference
  22:       2325497       56798168  [I
  23:        722541       52022952  java.lang.reflect.Field
  24:       2166858       52004592  org.codehaus.groovy.util.SingleKeyHashMap
  25:       1672177       40132248  java.lang.String

It caused GC every minute and nearly 30 seconds every GC. When es is under GC, it cannot serve requests and will not remove from the cluster. So the requests to the GC-ing service would be timeout. And we shard requests by user id, we have 5 nodes in the cluster, so nearly 20% user cannot be served.

Steps to reproduce:
We update index in this way(scala), we use the java native transport client:

val script = new Script(s"""ctx._source.available_dates = ctx._source.available_dates.minus("$ts")""")
val resp = client
  .prepareUpdate(idx, datatype, teacherId)
  .setScript(script)
  .get()

available_dates is an array of string, we use groovy script to add/remove a specific element.

So, my questions is

Do we use groovy script correctly?
If groovy script has GC problem, do we have any alternative ways to implement the same function?

Provide logs (if relevant):
gc logs:

[2016-11-29 07:29:24,714][WARN ][monitor.jvm              ] [vipkid_node_1] [gc][old][283912][4360] duration [47.6s], collections [2]/[48.4s], total [47.6s]/[36m], memory [11.8gb]->[11gb]/[11.8gb], all_pools {[young] [865.3mb]->[105.3mb]/[865.3mb]}{[survivor] [24.4mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2016-11-29 07:30:12,784][WARN ][monitor.jvm              ] [vipkid_node_1] [gc][old][283935][4362] duration [25.8s], collections [1]/[26s], total [25.8s]/[36.5m], memory [11.8gb]->[10.9gb]/[11.8gb], all_pools {[young] [865.3mb]->[4.4mb]/[865.3mb]}{[survivor] [92.6mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2016-11-29 07:30:41,706][WARN ][monitor.jvm              ] [vipkid_node_1] [gc][old][283938][4363] duration [26s], collections [1]/[26.7s], total [26s]/[36.9m], memory [11.5gb]->[10.9gb]/[11.8gb], all_pools {[young] [703.8mb]->[31.5mb]/[865.3mb]}{[survivor] [0b]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2016-11-29 07:31:20,277][WARN ][monitor.jvm              ] [vipkid_node_1] [gc][old][283948][4364] duration [29.1s], collections [1]/[29.3s], total [29.1s]/[37.4m], memory [11.8gb]->[10.9gb]/[11.8gb], all_pools {[young] [865.3mb]->[2.5mb]/[865.3mb]}{[survivor] [55.9mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2016-11-29 07:31:49,872][WARN ][monitor.jvm              ] [vipkid_node_1] [gc][old][283951][4365] duration [26.7s], collections [1]/[27.4s], total [26.7s]/[37.8m], memory [11.7gb]->[10.9gb]/[11.8gb], all_pools {[young] [784mb]->[32mb]/[865.3mb]}{[survivor] [0b]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2016-11-29 07:32:21,285][WARN ][monitor.jvm              ] [vipkid_node_1] [gc][old][283954][4366] duration [28.5s], collections [1]/[29.2s], total [28.5s]/[38.3m], memory [11.5gb]->[11gb]/[11.8gb], all_pools {[young] [671.3mb]->[87.2mb]/[865.3mb]}{[survivor] [0b]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2016-11-29 07:32:50,411][WARN ][monitor.jvm              ] [vipkid_node_1] [gc][old][283958][4367] duration [25.6s], collections [1]/[25.9s], total [25.6s]/[38.7m], memory [11.8gb]->[11gb]/[11.8gb], all_pools {[young] [865.3mb]->[80.6mb]/[865.3mb]}{[survivor] [30.9mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2016-11-29 07:33:19,505][WARN ][monitor.jvm              ] [vipkid_node_1] [gc][old][283961][4368] duration [26.8s], collections [1]/[26.9s], total [26.8s]/[39.2m], memory [11.8gb]->[11gb]/[11.8gb], all_pools {[young] [865.3mb]->[102mb]/[865.3mb]}{[survivor] [101.7mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2016-11-29 07:33:48,545][WARN ][monitor.jvm              ] [vipkid_node_1] [gc][old][283964][4369] duration [26s], collections [1]/[26.8s], total [26s]/[39.6m], memory [11.6gb]->[11gb]/[11.8gb], all_pools {[young] [750.8mb]->[101.6mb]/[865.3mb]}{[survivor] [0b]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2016-11-29 07:34:52,952][WARN ][monitor.jvm              ] [vipkid_node_1] [gc][old][283973][4371] duration [56s], collections [2]/[56.2s], total [56s]/[40.6m], memory [11.8gb]->[11gb]/[11.8gb], all_pools {[young] [865.3mb]->[124.9mb]/[865.3mb]}{[survivor] [71.7mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}

java transport client log, 10.0.3.96 is the GC-ing machine:

org.elasticsearch.transport.ReceiveTimeoutTransportException: [][10.0.3.96:9300][cluster:monitor/nodes/liveness] request_id [978567] timed out after [5001ms]
2016-11-29 09:46:12.043 [elasticsearch[Armadillo][generic][T#60]] INFO                   org.elasticsearch.client.transport - [Armadillo] failed to get node info for {#transport#-2}{10.0.3.96}{10.0.3.96:9300}, disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][10.0.3.96:9300][cluster:monitor/nodes/liveness] request_id [978651] timed out after [5000ms]
2016-11-29 09:46:22.049 [elasticsearch[Armadillo][generic][T#60]] INFO                   org.elasticsearch.client.transport - [Armadillo] failed to get node info for {#transport#-2}{10.0.3.96}{10.0.3.96:9300}, disconnecting...

Source

odinliu

Most helpful comment

Do we use groovy script correctly?

No, you're hard coding a changing parameter into the script, which causes the script to recompile every time you use it. Instead, pass in a named parameter, eg:

{
  "script": {
    "lang": "groovy",
    "inline": "ctx._source.available_dates = ctx._source.available_dates.minus(ts)",
    "params": {
      "ts": 12345
    }
}

This way the script is only compiled once.

clintongormley on 29 Nov 2016

👍4

All 4 comments

Do we use groovy script correctly?

No, you're hard coding a changing parameter into the script, which causes the script to recompile every time you use it. Instead, pass in a named parameter, eg:

{
  "script": {
    "lang": "groovy",
    "inline": "ctx._source.available_dates = ctx._source.available_dates.minus(ts)",
    "params": {
      "ts": 12345
    }
}

This way the script is only compiled once.

clintongormley on 29 Nov 2016

👍4

遇到一样的问题，试试params看看效果。。