Elasticsearch version:
2.4.1
Plugins installed:
[head]
JVM version:
java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
OS version:
Amazon Linux 2015.09
Linux ip-10-0-3-95 3.14.48-33.39.amzn1.x86_64 #1 SMP Tue Jul 14 23:43:07 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
Our real time increment index depends on groovy script update. I know the groovy script memory-leak problem has been fixed in es 2.4, so last week our company updated elasticsearch from 2.3.5 to 2.4.1. But it still has the problem. We run es on ec2. ES_HEAP_SIZE is set to 12g(30g total). And we use the default JAVA_OPTS: -Xms12g -Xmx12g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Djna.nosys=true. Here is the jmap -histo:live result:
num #instances #bytes class name
----------------------------------------------
1: 1859295 3119887784 [C
2: 535574 1873825064 [B
3: 31419182 1759474192 org.codehaus.groovy.runtime.metaclass.MetaMethodIndex$Entry
4: 24807473 708769280 [Ljava.lang.Object;
5: 22751538 546036912 org.codehaus.groovy.util.FastArray
6: 12738630 407636160 java.util.HashMap$Node
7: 2527999 364032752 [Lorg.codehaus.groovy.util.ComplexKeyHashMap$Entry;
8: 4634544 222458112 java.lang.invoke.MethodHandleImpl$CountingWrapper
9: 5958956 190686592 org.codehaus.groovy.util.SingleKeyHashMap$Entry
10: 180570 187793824 [Lorg.codehaus.groovy.runtime.metaclass.MetaMethodIndex$Entry;
11: 1279161 174849808 [Ljava.util.HashMap$Node;
12: 3865263 154610520 java.lang.invoke.BoundMethodHandle$Species_LL
13: 3095493 148583664 java.util.HashMap
14: 2317277 111229296 java.lang.invoke.BoundMethodHandle$Species_L4
15: 1088514 95789232 java.lang.reflect.Method
16: 2708544 86673408 groovy.lang.MetaBeanProperty
17: 906208 72496640 java.lang.reflect.Constructor
18: 1407429 67556592 java.lang.invoke.BoundMethodHandle$Species_L5
19: 1264365 60689520 org.codehaus.groovy.util.ReferenceType$SoftRef
20: 909843 58229952 java.lang.invoke.BoundMethodHandle$Species_L4IL4
21: 1815072 58082304 java.lang.ref.WeakReference
22: 2325497 56798168 [I
23: 722541 52022952 java.lang.reflect.Field
24: 2166858 52004592 org.codehaus.groovy.util.SingleKeyHashMap
25: 1672177 40132248 java.lang.String
It caused GC every minute and nearly 30 seconds every GC. When es is under GC, it cannot serve requests and will not remove from the cluster. So the requests to the GC-ing service would be timeout. And we shard requests by user id, we have 5 nodes in the cluster, so nearly 20% user cannot be served.
Steps to reproduce:
We update index in this way(scala), we use the java native transport client:
val script = new Script(s"""ctx._source.available_dates = ctx._source.available_dates.minus("$ts")""")
val resp = client
.prepareUpdate(idx, datatype, teacherId)
.setScript(script)
.get()
available_dates is an array of string, we use groovy script to add/remove a specific element.
So, my questions is
Provide logs (if relevant):
gc logs:
[2016-11-29 07:29:24,714][WARN ][monitor.jvm ] [vipkid_node_1] [gc][old][283912][4360] duration [47.6s], collections [2]/[48.4s], total [47.6s]/[36m], memory [11.8gb]->[11gb]/[11.8gb], all_pools {[young] [865.3mb]->[105.3mb]/[865.3mb]}{[survivor] [24.4mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2016-11-29 07:30:12,784][WARN ][monitor.jvm ] [vipkid_node_1] [gc][old][283935][4362] duration [25.8s], collections [1]/[26s], total [25.8s]/[36.5m], memory [11.8gb]->[10.9gb]/[11.8gb], all_pools {[young] [865.3mb]->[4.4mb]/[865.3mb]}{[survivor] [92.6mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2016-11-29 07:30:41,706][WARN ][monitor.jvm ] [vipkid_node_1] [gc][old][283938][4363] duration [26s], collections [1]/[26.7s], total [26s]/[36.9m], memory [11.5gb]->[10.9gb]/[11.8gb], all_pools {[young] [703.8mb]->[31.5mb]/[865.3mb]}{[survivor] [0b]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2016-11-29 07:31:20,277][WARN ][monitor.jvm ] [vipkid_node_1] [gc][old][283948][4364] duration [29.1s], collections [1]/[29.3s], total [29.1s]/[37.4m], memory [11.8gb]->[10.9gb]/[11.8gb], all_pools {[young] [865.3mb]->[2.5mb]/[865.3mb]}{[survivor] [55.9mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2016-11-29 07:31:49,872][WARN ][monitor.jvm ] [vipkid_node_1] [gc][old][283951][4365] duration [26.7s], collections [1]/[27.4s], total [26.7s]/[37.8m], memory [11.7gb]->[10.9gb]/[11.8gb], all_pools {[young] [784mb]->[32mb]/[865.3mb]}{[survivor] [0b]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2016-11-29 07:32:21,285][WARN ][monitor.jvm ] [vipkid_node_1] [gc][old][283954][4366] duration [28.5s], collections [1]/[29.2s], total [28.5s]/[38.3m], memory [11.5gb]->[11gb]/[11.8gb], all_pools {[young] [671.3mb]->[87.2mb]/[865.3mb]}{[survivor] [0b]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2016-11-29 07:32:50,411][WARN ][monitor.jvm ] [vipkid_node_1] [gc][old][283958][4367] duration [25.6s], collections [1]/[25.9s], total [25.6s]/[38.7m], memory [11.8gb]->[11gb]/[11.8gb], all_pools {[young] [865.3mb]->[80.6mb]/[865.3mb]}{[survivor] [30.9mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2016-11-29 07:33:19,505][WARN ][monitor.jvm ] [vipkid_node_1] [gc][old][283961][4368] duration [26.8s], collections [1]/[26.9s], total [26.8s]/[39.2m], memory [11.8gb]->[11gb]/[11.8gb], all_pools {[young] [865.3mb]->[102mb]/[865.3mb]}{[survivor] [101.7mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2016-11-29 07:33:48,545][WARN ][monitor.jvm ] [vipkid_node_1] [gc][old][283964][4369] duration [26s], collections [1]/[26.8s], total [26s]/[39.6m], memory [11.6gb]->[11gb]/[11.8gb], all_pools {[young] [750.8mb]->[101.6mb]/[865.3mb]}{[survivor] [0b]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
[2016-11-29 07:34:52,952][WARN ][monitor.jvm ] [vipkid_node_1] [gc][old][283973][4371] duration [56s], collections [2]/[56.2s], total [56s]/[40.6m], memory [11.8gb]->[11gb]/[11.8gb], all_pools {[young] [865.3mb]->[124.9mb]/[865.3mb]}{[survivor] [71.7mb]->[0b]/[108.1mb]}{[old] [10.9gb]->[10.9gb]/[10.9gb]}
java transport client log, 10.0.3.96 is the GC-ing machine:
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][10.0.3.96:9300][cluster:monitor/nodes/liveness] request_id [978567] timed out after [5001ms]
2016-11-29 09:46:12.043 [elasticsearch[Armadillo][generic][T#60]] INFO org.elasticsearch.client.transport - [Armadillo] failed to get node info for {#transport#-2}{10.0.3.96}{10.0.3.96:9300}, disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][10.0.3.96:9300][cluster:monitor/nodes/liveness] request_id [978651] timed out after [5000ms]
2016-11-29 09:46:22.049 [elasticsearch[Armadillo][generic][T#60]] INFO org.elasticsearch.client.transport - [Armadillo] failed to get node info for {#transport#-2}{10.0.3.96}{10.0.3.96:9300}, disconnecting...
Do we use groovy script correctly?
No, you're hard coding a changing parameter into the script, which causes the script to recompile every time you use it. Instead, pass in a named parameter, eg:
{
"script": {
"lang": "groovy",
"inline": "ctx._source.available_dates = ctx._source.available_dates.minus(ts)",
"params": {
"ts": 12345
}
}
This way the script is only compiled once.
遇到一样的问题,试试params看看效果。。
@hailin0 I also encountered ,How did you solve the problem?
@jarvisxiong replacing variables in a script with parameter placeholders
e.g
$ curl -XPOST 'localhost:9200/xiaorui.cc/blog/1/_update' -d '{
"script" : "ctx._source.counter += count",
"params" : {
"count" : 4
}
}'
Most helpful comment
No, you're hard coding a changing parameter into the script, which causes the script to recompile every time you use it. Instead, pass in a named parameter, eg:
This way the script is only compiled once.