When running an application with a JVMTI agent, and the agent installs field access/modification watches, methods using those fields fail to JIT-compile.
This seems to happen at the second JIT attempt, i.e. if there are methods A and B, both of which involve watched fields, and A is called first, then only B's JIT fails.
A bare-bones sample of the issue is in https://github.com/urisimchoni/jitfaildemo
An excerpt from the sample with some comments:
public class Hello {
public static void main(String[] args) {
AManyFields amf = new AManyFields();
//native call to the agent - it installs watches over the fields of the class of amf
MyAgent.analyzeObj(amf);
BManyFields bmf = new BManyFields();
//native call to the agent - it installs watches over the fields of the class of bmf
MyAgent.analyzeObj(bmf);
int sum=0;
sum+=amf.getSum();
sum+=bmf.getSum();
System.out.printf("Sum is %d\n",sum);
}
}
Running this with -Xjit:count=0,verbose={compileStart|compileEnd} emits the following:
(no-opt) Compiling AManyFields.getSum()I OrdinaryMethod j9m=00000000022CA7B0 t=2831 compThread=0 memLimit=262144 KB freePhysicalMemory=10853 MB
+ (no-opt) AManyFields.getSum()I @ 00007FCBA80FA010-00007FCBA80FA117 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=6 j9m=00000000022CA7B0 bcsz=14 sync OSR compThread=0 CpuLoad=115%(28%avg) JvmCpu=99%
(no-opt) Compiling BManyFields.getSum()I OrdinaryMethod j9m=00000000022CACA8 t=2831 compThread=0 memLimit=262144 KB freePhysicalMemory=10853 MB
! BManyFields.getSum()I time=79us Compilation Exception <TRANSLATION FAILURE: out of scratch memory>
Removing the watch from agent code causes JIT to succeed.
This issue seems to be correlated with a JVM crash we're investigating, related to a Java EE application and a JVMTI agent. While we haven't been able to reproduce the crash with a simple app and a simple agent, we do see that in the full app and agent, we get those JIT errors, and if we remove functionality from the agent so as to avoid the crash, those errors also disappear.
Forgot to add version info so here it is: it happens with Java 8, both with IBM's JDK and own-built OpenJ9 and OpenJDK.
The IBM version info:
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 8.0.5.10 - pxa6480sr5fp10-20180214_01(SR5 FP10))
IBM J9 VM (build 2.9, JRE 1.8.0 Linux amd64-64 Compressed References 20180208_378436 (JIT enabled, AOT enabled)
OpenJ9 - 39bb844
OMR - c04ccb2
IBM - 2321a81)
JCL - 20180209_01 based on Oracle jdk8u161-b12
The OpenJDK version:
openjdk version "1.8.0_162-internal-debug"
OpenJDK Runtime Environment (build 1.8.0_162-internal-debug-_2018_03_27_19_22-b00)
Eclipse OpenJ9 VM (build master-9c05da6, JRE 1.8.0 Linux amd64-64 Compressed References 20180327_000000 (JIT enabled, AOT enabled)
OpenJ9 - 9c05da6
OMR - 03270bf
JCL - 8599c6f based on jdk8u162-b12)
Hi, many thanks for opening the issue and providing a test-case. I was able to reproduce the problem easily by following your instructions.
The compilation of the second getSum() method fails very early:
<compile
method="BManyFields.getSum()I"
hotness="warm"
isProfilingCompile=0>
=======>BManyFields.getSum()I
<ilgen
method="BManyFields.getSum()I">
<request> {OrdinaryMethod(<NULL>)} </request>
+------------- Byte Code Index
| +-------------------- OpCode
| | +------------- First Field
| | | +------------- Branch Target
| | | | +------- Const Pool Index
| | | | | +------------- Constant
| | | | | |
V V V V V V
0, JBiconst0
1, JBistore1
2, JBiload1
3, JBaload0getfield
4, JBgetfield 4
7, JBgetfield 5
10, JBiadd
11, JBistore1
12, JBiload1
13, JBreturn1
Compilation Failed Because: offset == J9JIT_RESOLVE_FAIL_COMPILE
</compile>
There are 4 places in j9method.cpp where we can generate that failure code, and I'm not sure which one is the cause, but just picking on the first example, we do
offset = jitCTResolveInstanceFieldRefWithMethod(_fe->vmThread(), ramMethod(), cpIndex, isStore, &fieldShape);
followed by
if (offset == J9JIT_RESOLVE_FAIL_COMPILE)
{
comp->failCompilation<TR::CompilationException>("offset == J9JIT_RESOLVE_FAIL_COMPILE");
}
Got a bit further, the fail code is being set in jvmtiHookCheckForDataBreakpoint in jvmtiHook.c at
} else {
if (data->resolvedField == romField) {
data->result = J9_JIT_RESOLVE_FAIL_COMPILE;
break;
}
}
@DanHeidinga I'm not familiar with this code - who would be a good person to look at it?
@gacholio Can you comment on the behaviour of jvmtiHookCheckForDataBreakpoint and why we're deliberately failing compiles when the fields are watched?
When you put it like that, it seems quite likely that we can't support watched fields from jitted code. But then the question would be why does one method get JIT compiled and one not.
The JIT will fail the compilations because the watch is not supported by the JIT and I don't think these are directly correlated with the error - this is how I would expect the JIT to behave. We can check with @gacholio on what compensation logic runs when the watch is set for methods that have already been compiled, but I imagine that we would need to stop running JIT'd code that might reference the problem field.
Note that runnint with count=0 is an unusual configuration in that most of the Java code will not have run and most references will not be resolved. A more natural configuration is at least count=1,disableAsyncCompilation which means after a method is run one the method will be compiled on the application thread (count=0 implies disableAsyncCompilation).
Perhaps some more background on the real issue you are trying to debug might allow us to make suggestions to help your investigation?
One thing of interest; I modified the testcase to print from the callbacks and I don't get any output for either of the objects, even when running Xint.
Re: callbacks don't print - that's because field access events are not enabled, e.g.:
jvmti->SetEventNotificationMode(JVMTI_ENABLE, JVMTI_EVENT_FIELD_ACCESS, NULL);
Ah, that works better :)
Hmm, ok, and with that I can see that something is going wrong. Running with -Xint or without count=0 I get both the expected callbacks. In the case where count=0 is set (globally) the first method gets JIT compiled and we only get one callback. I haven't put any diagnostic in to differentiate between the callback invocations, but it seems likely that the method that gets JIT compiled is the one that doesn't make the callback. Looks like there might be a race condition for avoiding JIT compiling a watched method and/or compensating for an already compiled method that has a watched field.
However, as per Andrew's comment, a missing callback probably isn't related to the crash you're investigating.
The original issue I'm having is with a WebSphere setup with a native agent that installs field watches, and after some time the JVM crashes like below:
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007F944219EBC0 Handler2=00007F9441AA2E50 InaccessibleAddress=0000000000000000
RDI=00000000034B4E00 RSI=0000000000000000 RAX=0000000000000010 RBX=00007F93E8DD0AD0
RCX=0000000000000000 RDX=000000000433B738 R8=0000016286A7A800 R9=0000000000000000
R10=00007F94428E6620 R11=0000000000000246 R12=000000000433B720 R13=00007F93FC572BB3
R14=000000000433B720 R15=0000000000000057
RIP=00007F944215BCE4 GS=0000 FS=0000 RSP=00007F93E8DD0680
EFlags=0000000000010202 CS=0033 RBP=00000000034B4E00 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000000000000
xmm0 00007f93e8dd0be0 (f: 3906800640.000000, d: 6.930419e-310)
xmm1 0000000000000010 (f: 16.000000, d: 7.905050e-323)
xmm2 77a30000342e302e (f: 875442240.000000, d: 1.960468e+268)
xmm3 5274736163696e55 (f: 1667853952.000000, d: 1.627298e+89)
xmm4 00031e33610c91b4 (f: 1628213632.000000, d: 4.336073e-309)
xmm5 2e7265767265732e (f: 1919251200.000000, d: 5.918513e-85)
xmm6 4022e0b7ee1488dc (f: 3994323200.000000, d: 9.438903e+00)
xmm7 3bbcc86800000000 (f: 0.000000, d: 6.095003e-21)
xmm8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm9 3ff0000000000000 (f: 0.000000, d: 1.000000e+00)
xmm10 3ed3000000000000 (f: 0.000000, d: 4.529953e-06)
xmm11 3d6b14eca14e7a4a (f: 2706274816.000000, d: 7.697092e-13)
xmm12 bcc6000000000000 (f: 0.000000, d: -6.106227e-16)
xmm13 bc66000000000000 (f: 0.000000, d: -9.540979e-18)
xmm14 3c60e63a5f01c691 (f: 1593951872.000000, d: 7.328915e-18)
xmm15 4023687a9f1af100 (f: 2669342976.000000, d: 9.704061e+00)
Module=/home/vagrant/IBM/WebSphere/AppServer/java/8.0/jre/lib/amd64/compressedrefs/libj9vm29.so
Module_base_address=00007F9442114000
Target=2_90_20180208_378436 (Linux 3.10.0-693.11.6.el7.x86_64)
CPU=amd64 (4 logical CPUs) (0x3e9ecf000 RAM)
----------- Stack Backtrace -----------
(0x00007F944215BCE4 [libj9vm29.so+0x47ce4])
(0x00007F9442148AC0 [libj9vm29.so+0x34ac0])
JVMDUMP039I Processing dump event "gpf", detail "" at 2018/04/02 13:59:17 - please wait.
JVMDUMP032I JVM requested System dump using '/home/vagrant/IBM/WebSphere/AppServer/profiles/AppSrv01/core.20180402.135917.2364.0001.dmp' in response to an event
JVMDUMP010I System dump written to /home/vagrant/IBM/WebSphere/AppServer/profiles/AppSrv01/core.20180402.135917.2364.0001.dmp
JVMDUMP032I JVM requested Java dump using '/home/vagrant/IBM/WebSphere/AppServer/profiles/AppSrv01/javacore.20180402.135917.2364.0002.txt' in response to an event
JVMDUMP010I Java dump written to /home/vagrant/IBM/WebSphere/AppServer/profiles/AppSrv01/javacore.20180402.135917.2364.0002.txt
JVMDUMP032I JVM requested Snap dump using '/home/vagrant/IBM/WebSphere/AppServer/profiles/AppSrv01/Snap.20180402.135917.2364.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /home/vagrant/IBM/WebSphere/AppServer/profiles/AppSrv01/Snap.20180402.135917.2364.0003.trc
JVMDUMP007I JVM Requesting JIT dump using '/home/vagrant/IBM/WebSphere/AppServer/profiles/AppSrv01/jitdump.20180402.135917.2364.0004.dmp'
JVMDUMP010I JIT dump written to /home/vagrant/IBM/WebSphere/AppServer/profiles/AppSrv01/jitdump.20180402.135917.2364.0004.dmp
The native stack trace always show the crash is in the interpreter:
I decided to see what happens without an interpreter, hence the -Xjit:count=0, and that caused the JVM to crash more quickly. It seemed to always crash after installing 12 watches, and the crash happened IFF watches were installed. I was then hoping to create a POC for the crash using as little code as possible. I didn't succeed in crashing the JVM using the stripped-down version, but wanting to see that it actually JIT-compiles the code, I added some more tracing and came at the reported issue (or non-issue). What struck me as odd and caused me to hope this is related to the crash was:
I wonder - when a JVMTI watch is install, does the JVM discards all JITted methods because they might contain a field-access opcode (or search for methods with corresponding field access)?
Any info on what to look for would be appreciated.
Thanks,
Uri.
Hi Uri, I think probably we need someone from the interpreter side to look into one of your core files and see what's going on. Do you have a service contract for WebSphere? It might be easier to get the right eyes on the problem via a conventional PMR.
OK, feel free to close the issue if it's a non-issue, or leave it open b/c of the seemingly inconsistent behavior of JIT-compiling one method and not the other.
Thanks for having a look at this,
Uri
I think it's worth keeping this issue to explore why we have the inconsistent behaviour across the two methods and the missing callback invocation. I'm pretty sure something isn't working quite right there.
@urisimchoni If you have a core file that you can share, I'd like to have someone on my team take a look. We don't like it when the JVM crashes =)
@urisimchoni You are correct - when a field watch is added, all compiled methods are discarded because we don't keep sufficient data to know which fields are used in which compilations (which may have fully or partially inlined other methods).
The "out of scratch memory" message is misleading - it's no doubt printed as a result of the compile-time resolution of the watched field failing.
@urisimchoni Thanks for sending me the core file. We've been able to identify the root cause (missing syncObject) and are working through the right way to fix this. GAC has opened #1637 and described the root issue there.
Closing this now as we'll track the fix in the more specific #1637.
@DanHeidinga Do we have a label for stuff that we might need to come back to one day? As per my comments above, the test-case demonstrates a failure in the mechanism that avoids running jitted code when a field is being watched.
Sorry @JamesKingdon, I missed that comment. Can you open a new issue for that specific part of this investigation?
There is no failure here (other than the crash, of course). The only thing to possibly address is the message in the verbose log, which I'm sure will never be done. Being able to run code with watched fields would be an enhancement, which again I find unlikely to ever be done, given the frequency of use of this facility.
@gacholio The test-case shows inconsistent behaviour. The first of two methods gets compiled by the JIT and fails to invoke the callback, despite both methods having watches set.
@JamesKingdon I know the VM is allowed to delay having a JVMTI event hookup for some quantum of time after the event is enabled. The count=0 case is an odd one because the method could be compiled before the agent is attached and before the method ran. I don't know if the code in question ran before the agent had completed init and if the event had been fully enabled, etc. At non-zero counts I would expect completely consistent behavior.
Yes, but this appears to be worse than that. If I repeat the invocation of the methods I continue to get missing events. It looks very much as if we have permanently missed the opportunity to back out of the jit compilation of the first method.
Expected behaviour (with Xint):
$ ./runtest
Agent succesfully loaded
in Java_MyAgent_analyzeObj
1 watched fields
in Java_MyAgent_analyzeObj
2 watched fields
in onFieldAccess
in onFieldAccess
Sum is 0
in onFieldAccess
in onFieldAccess
Sum is 0
in onFieldAccess
in onFieldAccess
Sum is 0
in onFieldAccess
in onFieldAccess
Sum is 0
in onFieldAccess
in onFieldAccess
Sum is 0
in onFieldAccess
in onFieldAccess
Sum is 0
in onFieldAccess
in onFieldAccess
Sum is 0
in onFieldAccess
in onFieldAccess
Sum is 0
in onFieldAccess
in onFieldAccess
Sum is 0
in onFieldAccess
in onFieldAccess
Sum is 0
Problem behaviour with JIT, note only one "in onFieldAccess" message per repeat:
$ ./runtest
Agent succesfully loaded
in Java_MyAgent_analyzeObj
1 watched fields
in Java_MyAgent_analyzeObj
2 watched fields
in onFieldAccess
Sum is 0
in onFieldAccess
Sum is 0
in onFieldAccess
Sum is 0
in onFieldAccess
Sum is 0
in onFieldAccess
Sum is 0
in onFieldAccess
Sum is 0
in onFieldAccess
Sum is 0
in onFieldAccess
Sum is 0
in onFieldAccess
Sum is 0
in onFieldAccess
Sum is 0
This does not seem desirable to me.
However, you are right that it is somehow related to the use of count=0, the expected behaviour returns with count=1,disableAsyncCompilation
Does this put it in the realm of too obscure to merit further investigation?
I'm not sure I'd say doesn't merit further investigation, but the urgency of that investigation is much lower. I am curious about what runtest is doing so I may speak to you offline with a result posted here.
Does this put it in the realm of too obscure to merit further investigation?
I doubt anyone is going to track down the count=0 issue given the root cause of the crash has been identified and the other issues that would affect more users.
Thanks all and sorry to have "barked on the wrong tree". Hopefully the core dump will lead in the right direction.
Thanks so much for reporting the issue! You've provided great context and a core that made it easy to identify a decompilation bug. A big help anyone running with the debugger / jvmti watches.
Much appreciated!
Most helpful comment
I doubt anyone is going to track down the
count=0issue given the root cause of the crash has been identified and the other issues that would affect more users.