Openj9: JIT failure when JVMTI field watches are added

Created on 29 Mar 2018 · 30Comments · Source: eclipse/openj9

When running an application with a JVMTI agent, and the agent installs field access/modification watches, methods using those fields fail to JIT-compile.

This seems to happen at the second JIT attempt, i.e. if there are methods A and B, both of which involve watched fields, and A is called first, then only B's JIT fails.

A bare-bones sample of the issue is in https://github.com/urisimchoni/jitfaildemo

An excerpt from the sample with some comments:

public class Hello {
    public static void main(String[] args) {
        AManyFields amf = new AManyFields();
        //native call to the agent - it installs watches over the fields of the class of amf
        MyAgent.analyzeObj(amf);
        BManyFields bmf = new BManyFields();
        //native call to the agent - it installs watches over the fields of the class of bmf
        MyAgent.analyzeObj(bmf);

        int sum=0;
        sum+=amf.getSum();
        sum+=bmf.getSum();
        System.out.printf("Sum is %d\n",sum);
    }
}

Running this with -Xjit:count=0,verbose={compileStart|compileEnd} emits the following:

 (no-opt) Compiling AManyFields.getSum()I  OrdinaryMethod j9m=00000000022CA7B0 t=2831 compThread=0 memLimit=262144 KB freePhysicalMemory=10853 MB
+ (no-opt) AManyFields.getSum()I @ 00007FCBA80FA010-00007FCBA80FA117 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=6 j9m=00000000022CA7B0 bcsz=14 sync OSR compThread=0 CpuLoad=115%(28%avg) JvmCpu=99%
 (no-opt) Compiling BManyFields.getSum()I  OrdinaryMethod j9m=00000000022CACA8 t=2831 compThread=0 memLimit=262144 KB freePhysicalMemory=10853 MB
! BManyFields.getSum()I time=79us Compilation Exception <TRANSLATION FAILURE: out of scratch memory>

Removing the watch from agent code causes JIT to succeed.

This issue seems to be correlated with a JVM crash we're investigating, related to a Java EE application and a JVMTI agent. While we haven't been able to reproduce the crash with a simple app and a simple agent, we do see that in the full app and agent, we get those JIT errors, and if we remove functionality from the agent so as to avoid the crash, those errors also disappear.

jit userRaised

Source

urisimchoni

Most helpful comment

Does this put it in the realm of too obscure to merit further investigation?

I doubt anyone is going to track down the count=0 issue given the root cause of the crash has been identified and the other issues that would affect more users.

DanHeidinga on 9 Apr 2018

👍2

All 30 comments

Forgot to add version info so here it is: it happens with Java 8, both with IBM's JDK and own-built OpenJ9 and OpenJDK.

The IBM version info:
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 8.0.5.10 - pxa6480sr5fp10-20180214_01(SR5 FP10))
IBM J9 VM (build 2.9, JRE 1.8.0 Linux amd64-64 Compressed References 20180208_378436 (JIT enabled, AOT enabled)
OpenJ9 - 39bb844
OMR - c04ccb2
IBM - 2321a81)
JCL - 20180209_01 based on Oracle jdk8u161-b12

The OpenJDK version:
openjdk version "1.8.0_162-internal-debug"
OpenJDK Runtime Environment (build 1.8.0_162-internal-debug-_2018_03_27_19_22-b00)
Eclipse OpenJ9 VM (build master-9c05da6, JRE 1.8.0 Linux amd64-64 Compressed References 20180327_000000 (JIT enabled, AOT enabled)
OpenJ9 - 9c05da6
OMR - 03270bf
JCL - 8599c6f based on jdk8u162-b12)

urisimchoni on 29 Mar 2018

Hi, many thanks for opening the issue and providing a test-case. I was able to reproduce the problem easily by following your instructions.

The compilation of the second getSum() method fails very early:

<compile
        method="BManyFields.getSum()I"
        hotness="warm"
        isProfilingCompile=0>

=======>BManyFields.getSum()I
<ilgen
        method="BManyFields.getSum()I">
   <request> {OrdinaryMethod(<NULL>)} </request>

        +------------- Byte Code Index
        |  +-------------------- OpCode
        |  |                        +------------- First Field
        |  |                        |     +------------- Branch Target
        |  |                        |     |      +------- Const Pool Index
        |  |                        |     |      |    +------------- Constant
        |  |                        |     |      |    |
        V  V                        V     V      V    V

        0, JBiconst0            
        1, JBistore1            
        2, JBiload1             
        3, JBaload0getfield      
        4, JBgetfield                            4
        7, JBgetfield                            5
       10, JBiadd               
       11, JBistore1            
       12, JBiload1             
       13, JBreturn1            

Compilation Failed Because: offset == J9JIT_RESOLVE_FAIL_COMPILE
</compile>

JamesKingdon on 31 Mar 2018

There are 4 places in j9method.cpp where we can generate that failure code, and I'm not sure which one is the cause, but just picking on the first example, we do
offset = jitCTResolveInstanceFieldRefWithMethod(_fe->vmThread(), ramMethod(), cpIndex, isStore, &fieldShape);
followed by

if (offset == J9JIT_RESOLVE_FAIL_COMPILE)
      {
      comp->failCompilation<TR::CompilationException>("offset == J9JIT_RESOLVE_FAIL_COMPILE");
      }

JamesKingdon on 31 Mar 2018

Got a bit further, the fail code is being set in jvmtiHookCheckForDataBreakpoint in jvmtiHook.c at

        } else {
            if (data->resolvedField == romField) {
                data->result = J9_JIT_RESOLVE_FAIL_COMPILE;
                break;
            }
        }

JamesKingdon on 31 Mar 2018

@DanHeidinga I'm not familiar with this code - who would be a good person to look at it?

JamesKingdon on 31 Mar 2018

@gacholio Can you comment on the behaviour of jvmtiHookCheckForDataBreakpoint and why we're deliberately failing compiles when the fields are watched?

DanHeidinga on 2 Apr 2018

When you put it like that, it seems quite likely that we can't support watched fields from jitted code. But then the question would be why does one method get JIT compiled and one not.

JamesKingdon on 2 Apr 2018

The JIT will fail the compilations because the watch is not supported by the JIT and I don't think these are directly correlated with the error - this is how I would expect the JIT to behave. We can check with @gacholio on what compensation logic runs when the watch is set for methods that have already been compiled, but I imagine that we would need to stop running JIT'd code that might reference the problem field.

Note that runnint with count=0 is an unusual configuration in that most of the Java code will not have run and most references will not be resolved. A more natural configuration is at least count=1,disableAsyncCompilation which means after a method is run one the method will be compiled on the application thread (count=0 implies disableAsyncCompilation).

Perhaps some more background on the real issue you are trying to debug might allow us to make suggestions to help your investigation?

andrewcraik on 2 Apr 2018

One thing of interest; I modified the testcase to print from the callbacks and I don't get any output for either of the objects, even when running Xint.

JamesKingdon on 2 Apr 2018

Re: callbacks don't print - that's because field access events are not enabled, e.g.:
jvmti->SetEventNotificationMode(JVMTI_ENABLE, JVMTI_EVENT_FIELD_ACCESS, NULL);

urisimchoni on 2 Apr 2018

Ah, that works better :)

JamesKingdon on 2 Apr 2018

Hmm, ok, and with that I can see that something is going wrong. Running with -Xint or without count=0 I get both the expected callbacks. In the case where count=0 is set (globally) the first method gets JIT compiled and we only get one callback. I haven't put any diagnostic in to differentiate between the callback invocations, but it seems likely that the method that gets JIT compiled is the one that doesn't make the callback. Looks like there might be a race condition for avoiding JIT compiling a watched method and/or compensating for an already compiled method that has a watched field.

However, as per Andrew's comment, a missing callback probably isn't related to the crash you're investigating.

JamesKingdon on 2 Apr 2018

The original issue I'm having is with a WebSphere setup with a native agent that installs field watches, and after some time the JVM crashes like below:

Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007F944219EBC0 Handler2=00007F9441AA2E50 InaccessibleAddress=0000000000000000
RDI=00000000034B4E00 RSI=0000000000000000 RAX=0000000000000010 RBX=00007F93E8DD0AD0
RCX=0000000000000000 RDX=000000000433B738 R8=0000016286A7A800 R9=0000000000000000
R10=00007F94428E6620 R11=0000000000000246 R12=000000000433B720 R13=00007F93FC572BB3
R14=000000000433B720 R15=0000000000000057
RIP=00007F944215BCE4 GS=0000 FS=0000 RSP=00007F93E8DD0680
EFlags=0000000000010202 CS=0033 RBP=00000000034B4E00 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000000000000
xmm0 00007f93e8dd0be0 (f: 3906800640.000000, d: 6.930419e-310)
xmm1 0000000000000010 (f: 16.000000, d: 7.905050e-323)
xmm2 77a30000342e302e (f: 875442240.000000, d: 1.960468e+268)
xmm3 5274736163696e55 (f: 1667853952.000000, d: 1.627298e+89)
xmm4 00031e33610c91b4 (f: 1628213632.000000, d: 4.336073e-309)
xmm5 2e7265767265732e (f: 1919251200.000000, d: 5.918513e-85)
xmm6 4022e0b7ee1488dc (f: 3994323200.000000, d: 9.438903e+00)
xmm7 3bbcc86800000000 (f: 0.000000, d: 6.095003e-21)
xmm8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm9 3ff0000000000000 (f: 0.000000, d: 1.000000e+00)
xmm10 3ed3000000000000 (f: 0.000000, d: 4.529953e-06)
xmm11 3d6b14eca14e7a4a (f: 2706274816.000000, d: 7.697092e-13)
xmm12 bcc6000000000000 (f: 0.000000, d: -6.106227e-16)
xmm13 bc66000000000000 (f: 0.000000, d: -9.540979e-18)
xmm14 3c60e63a5f01c691 (f: 1593951872.000000, d: 7.328915e-18)
xmm15 4023687a9f1af100 (f: 2669342976.000000, d: 9.704061e+00)
Module=/home/vagrant/IBM/WebSphere/AppServer/java/8.0/jre/lib/amd64/compressedrefs/libj9vm29.so
Module_base_address=00007F9442114000
Target=2_90_20180208_378436 (Linux 3.10.0-693.11.6.el7.x86_64)
CPU=amd64 (4 logical CPUs) (0x3e9ecf000 RAM)
----------- Stack Backtrace -----------
(0x00007F944215BCE4 [libj9vm29.so+0x47ce4])
(0x00007F9442148AC0 [libj9vm29.so+0x34ac0])

(0x00007F94421D9902 [libj9vm29.so+0xc5902])

JVMDUMP039I Processing dump event "gpf", detail "" at 2018/04/02 13:59:17 - please wait.
JVMDUMP032I JVM requested System dump using '/home/vagrant/IBM/WebSphere/AppServer/profiles/AppSrv01/core.20180402.135917.2364.0001.dmp' in response to an event
JVMDUMP010I System dump written to /home/vagrant/IBM/WebSphere/AppServer/profiles/AppSrv01/core.20180402.135917.2364.0001.dmp
JVMDUMP032I JVM requested Java dump using '/home/vagrant/IBM/WebSphere/AppServer/profiles/AppSrv01/javacore.20180402.135917.2364.0002.txt' in response to an event
JVMDUMP010I Java dump written to /home/vagrant/IBM/WebSphere/AppServer/profiles/AppSrv01/javacore.20180402.135917.2364.0002.txt
JVMDUMP032I JVM requested Snap dump using '/home/vagrant/IBM/WebSphere/AppServer/profiles/AppSrv01/Snap.20180402.135917.2364.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /home/vagrant/IBM/WebSphere/AppServer/profiles/AppSrv01/Snap.20180402.135917.2364.0003.trc
JVMDUMP007I JVM Requesting JIT dump using '/home/vagrant/IBM/WebSphere/AppServer/profiles/AppSrv01/jitdump.20180402.135917.2364.0004.dmp'
JVMDUMP010I JIT dump written to /home/vagrant/IBM/WebSphere/AppServer/profiles/AppSrv01/jitdump.20180402.135917.2364.0004.dmp

The native stack trace always show the crash is in the interpreter:

0 0x00007f9443fef9b1 in pthread_kill () from /lib64/libpthread.so.0

1 0x00007f9441ad3eeb in omrdump_create () from /home/vagrant/IBM/WebSphere/AppServer/java/8.0/jre/lib/amd64/compressedrefs/libj9prt29.so

2 0x00007f9440e2d4c0 in doSystemDump () from /home/vagrant/IBM/WebSphere/AppServer/java/8.0/jre/lib/amd64/compressedrefs/libj9dmp29.so

3 0x00007f9440e29805 in protectedDumpFunction () from /home/vagrant/IBM/WebSphere/AppServer/java/8.0/jre/lib/amd64/compressedrefs/libj9dmp29.so

4 0x00007f9441aa33f5 in omrsig_protect () from /home/vagrant/IBM/WebSphere/AppServer/java/8.0/jre/lib/amd64/compressedrefs/libj9prt29.so

5 0x00007f9440e297c9 in runDumpFunction () from /home/vagrant/IBM/WebSphere/AppServer/java/8.0/jre/lib/amd64/compressedrefs/libj9dmp29.so

6 0x00007f9440e2d88a in runDumpAgent () from /home/vagrant/IBM/WebSphere/AppServer/java/8.0/jre/lib/amd64/compressedrefs/libj9dmp29.so

7 0x00007f9440e420af in triggerDumpAgents () from /home/vagrant/IBM/WebSphere/AppServer/java/8.0/jre/lib/amd64/compressedrefs/libj9dmp29.so

8 0x00007f944219dfee in generateDiagnosticFiles () from /home/vagrant/IBM/WebSphere/AppServer/java/8.0/jre/lib/amd64/compressedrefs/libj9vm29.so

9 0x00007f9441aa33f5 in omrsig_protect () from /home/vagrant/IBM/WebSphere/AppServer/java/8.0/jre/lib/amd64/compressedrefs/libj9prt29.so

10 0x00007f944219dc88 in vmSignalHandler () from /home/vagrant/IBM/WebSphere/AppServer/java/8.0/jre/lib/amd64/compressedrefs/libj9vm29.so

11 0x00007f944219ebf1 in structuredSignalHandler () from /home/vagrant/IBM/WebSphere/AppServer/java/8.0/jre/lib/amd64/compressedrefs/libj9vm29.so

12 0x00007f9441aa3066 in masterSynchSignalHandler () from /home/vagrant/IBM/WebSphere/AppServer/java/8.0/jre/lib/amd64/compressedrefs/libj9prt29.so

13 14 0x00007f944215bce4 in VM_DebugBytecodeInterpreter::run(J9VMThread*) () from /home/vagrant/IBM/WebSphere/AppServer/java/8.0/jre/lib/amd64/compressedrefs/libj9vm29.so

15 0x00007f9442148ac0 in debugBytecodeLoop () from /home/vagrant/IBM/WebSphere/AppServer/java/8.0/jre/lib/amd64/compressedrefs/libj9vm29.so

16 0x00007f94421d9902 in c_cInterpreter () from /home/vagrant/IBM/WebSphere/AppServer/java/8.0/jre/lib/amd64/compressedrefs/libj9vm29.so

17 0x00007f944218b97f in runJavaThread () from /home/vagrant/IBM/WebSphere/AppServer/java/8.0/jre/lib/amd64/compressedrefs/libj9vm29.so

18 0x00007f94421d8256 in javaProtectedThreadProc () from /home/vagrant/IBM/WebSphere/AppServer/java/8.0/jre/lib/amd64/compressedrefs/libj9vm29.so

19 0x00007f9441aa33f5 in omrsig_protect () from /home/vagrant/IBM/WebSphere/AppServer/java/8.0/jre/lib/amd64/compressedrefs/libj9prt29.so

20 0x00007f94421d81b9 in javaThreadProc () from /home/vagrant/IBM/WebSphere/AppServer/java/8.0/jre/lib/amd64/compressedrefs/libj9vm29.so

21 0x00007f9441f0938e in thread_wrapper () from /home/vagrant/IBM/WebSphere/AppServer/java/8.0/jre/lib/amd64/compressedrefs/libj9thr29.so

22 0x00007f9443feae25 in start_thread () from /lib64/libpthread.so.0

23 0x00007f94438fd34d in clone () from /lib64/libc.so.6

I decided to see what happens without an interpreter, hence the -Xjit:count=0, and that caused the JVM to crash more quickly. It seemed to always crash after installing 12 watches, and the crash happened IFF watches were installed. I was then hoping to create a POC for the crash using as little code as possible. I didn't succeed in crashing the JVM using the stripped-down version, but wanting to see that it actually JIT-compiles the code, I added some more tracing and came at the reported issue (or non-issue). What struck me as odd and caused me to hope this is related to the crash was:

one method succeeds and one fails JIT
the "out of scratch memory" message, where with the stripped-down version at least there was nothing to stress memory - this is obviously a bogus message.

I wonder - when a JVMTI watch is install, does the JVM discards all JITted methods because they might contain a field-access opcode (or search for methods with corresponding field access)?

Any info on what to look for would be appreciated.
Thanks,
Uri.

urisimchoni on 2 Apr 2018

Hi Uri, I think probably we need someone from the interpreter side to look into one of your core files and see what's going on. Do you have a service contract for WebSphere? It might be easier to get the right eyes on the problem via a conventional PMR.

JamesKingdon on 3 Apr 2018

OK, feel free to close the issue if it's a non-issue, or leave it open b/c of the seemingly inconsistent behavior of JIT-compiling one method and not the other.

Thanks for having a look at this,
Uri

urisimchoni on 4 Apr 2018

I think it's worth keeping this issue to explore why we have the inconsistent behaviour across the two methods and the missing callback invocation. I'm pretty sure something isn't working quite right there.

JamesKingdon on 4 Apr 2018

@urisimchoni If you have a core file that you can share, I'd like to have someone on my team take a look. We don't like it when the JVM crashes =)

DanHeidinga on 4 Apr 2018

👍1

@urisimchoni You are correct - when a field watch is added, all compiled methods are discarded because we don't keep sufficient data to know which fields are used in which compilations (which may have fully or partially inlined other methods).

The "out of scratch memory" message is misleading - it's no doubt printed as a result of the compile-time resolution of the watched field failing.

gacholio on 4 Apr 2018

@urisimchoni Thanks for sending me the core file. We've been able to identify the root cause (missing syncObject) and are working through the right way to fix this. GAC has opened #1637 and described the root issue there.

Closing this now as we'll track the fix in the more specific #1637.

DanHeidinga on 9 Apr 2018

🎉1

@DanHeidinga Do we have a label for stuff that we might need to come back to one day? As per my comments above, the test-case demonstrates a failure in the mechanism that avoids running jitted code when a field is being watched.

JamesKingdon on 9 Apr 2018

Sorry @JamesKingdon, I missed that comment. Can you open a new issue for that specific part of this investigation?

DanHeidinga on 9 Apr 2018

There is no failure here (other than the crash, of course). The only thing to possibly address is the message in the verbose log, which I'm sure will never be done. Being able to run code with watched fields would be an enhancement, which again I find unlikely to ever be done, given the frequency of use of this facility.

gacholio on 9 Apr 2018

@gacholio The test-case shows inconsistent behaviour. The first of two methods gets compiled by the JIT and fails to invoke the callback, despite both methods having watches set.

JamesKingdon on 9 Apr 2018

@JamesKingdon I know the VM is allowed to delay having a JVMTI event hookup for some quantum of time after the event is enabled. The count=0 case is an odd one because the method could be compiled before the agent is attached and before the method ran. I don't know if the code in question ran before the agent had completed init and if the event had been fully enabled, etc. At non-zero counts I would expect completely consistent behavior.

andrewcraik on 9 Apr 2018

Yes, but this appears to be worse than that. If I repeat the invocation of the methods I continue to get missing events. It looks very much as if we have permanently missed the opportunity to back out of the jit compilation of the first method.

Expected behaviour (with Xint):

$ ./runtest 
Agent succesfully loaded
in Java_MyAgent_analyzeObj
1 watched fields
in Java_MyAgent_analyzeObj
2 watched fields
in onFieldAccess
in onFieldAccess
Sum is 0
in onFieldAccess
in onFieldAccess
Sum is 0
in onFieldAccess
in onFieldAccess
Sum is 0
in onFieldAccess
in onFieldAccess
Sum is 0
in onFieldAccess
in onFieldAccess
Sum is 0
in onFieldAccess
in onFieldAccess
Sum is 0
in onFieldAccess
in onFieldAccess
Sum is 0
in onFieldAccess
in onFieldAccess
Sum is 0
in onFieldAccess
in onFieldAccess
Sum is 0
in onFieldAccess
in onFieldAccess
Sum is 0

Problem behaviour with JIT, note only one "in onFieldAccess" message per repeat:

$ ./runtest 
Agent succesfully loaded
in Java_MyAgent_analyzeObj
1 watched fields
in Java_MyAgent_analyzeObj
2 watched fields
in onFieldAccess
Sum is 0
in onFieldAccess
Sum is 0
in onFieldAccess
Sum is 0
in onFieldAccess
Sum is 0
in onFieldAccess
Sum is 0
in onFieldAccess
Sum is 0
in onFieldAccess
Sum is 0
in onFieldAccess
Sum is 0
in onFieldAccess
Sum is 0
in onFieldAccess
Sum is 0

This does not seem desirable to me.

JamesKingdon on 9 Apr 2018

However, you are right that it is somehow related to the use of count=0, the expected behaviour returns with count=1,disableAsyncCompilation

Does this put it in the realm of too obscure to merit further investigation?

JamesKingdon on 9 Apr 2018

I'm not sure I'd say doesn't merit further investigation, but the urgency of that investigation is much lower. I am curious about what runtest is doing so I may speak to you offline with a result posted here.

andrewcraik on 9 Apr 2018

Does this put it in the realm of too obscure to merit further investigation?

I doubt anyone is going to track down the count=0 issue given the root cause of the crash has been identified and the other issues that would affect more users.

DanHeidinga on 9 Apr 2018

👍2

Thanks all and sorry to have "barked on the wrong tree". Hopefully the core dump will lead in the right direction.

urisimchoni on 10 Apr 2018

Thanks so much for reporting the issue! You've provided great context and a core that made it easy to identify a decompilation bug. A big help anyone running with the debugger / jvmti watches.

Much appreciated!

DanHeidinga on 10 Apr 2018

Was this page helpful?

0 / 5 - 0 ratings