Openj9: java/lang/ref/FinalizeOverride timeout

Created on 22 May 2020  路  59Comments  路  Source: eclipse/openj9

https://ci.eclipse.org/openj9/job/Test_openjdk11_j9_sanity.openjdk_ppc64le_linux_Nightly/61
java/lang/ref/FinalizeOverride.java

00:41:03  ACTION: main -- Error. Agent error: java.lang.Exception: Agent 35 timed out with a timeout of 960 seconds; check console log for any additional details
00:41:03  REASON: Assumed action based on file name: run main FinalizeOverride 
00:41:03  TIME:   962.632 seconds
00:41:03  messages:
00:41:03  command: main FinalizeOverride
00:41:03  reason: Assumed action based on file name: run main FinalizeOverride 
00:41:03  Mode: agentvm
00:41:03  Agent id: 35
00:41:03  Timeout refired 960 times
00:41:03  Timeout information:
00:41:03  Running jstack on process 31842
00:41:03  2020-05-22T01:39:09.394211345
00:41:03  Virtual machine: 31842 JVM information:
00:41:03  JRE 11 Linux ppc64le-64-Bit Compressed References 20200521_395 (JIT enabled, AOT enabled)
00:41:03  OpenJ9   - 561026fca
00:41:03  OMR      - 00689235c
00:41:03  JCL      - cfce36dfff5 based on jdk-11.0.8+3
00:41:03  
00:41:03  "main" prio=5 Id=1 WAITING
00:41:03    at [email protected]/java.lang.Object.wait(Native Method)
00:41:03    at [email protected]/java.lang.Object.wait(Object.java:221)
00:41:03    at [email protected]/java.lang.Thread.join(Thread.java:716)
00:41:03    - locked java.lang.Thread@4043657b
00:41:03    at app//com.sun.javatest.regtest.agent.MainActionHelper.runClass(MainActionHelper.java:184)
00:41:03    at app//com.sun.javatest.regtest.agent.AgentServer.doMain(AgentServer.java:301)
00:41:03    at app//com.sun.javatest.regtest.agent.AgentServer.run(AgentServer.java:232)
00:41:03    at app//com.sun.javatest.regtest.agent.AgentServer.main(AgentServer.java:69)
00:41:03  
00:41:03  "JIT Compilation Thread-000" prio=10 Id=3 RUNNABLE
00:41:03  
00:41:03  "JIT Compilation Thread-001 Suspended" prio=10 Id=4 RUNNABLE
00:41:03  
00:41:03  "JIT Compilation Thread-002 Suspended" prio=10 Id=5 RUNNABLE
00:41:03  
00:41:03  "JIT Compilation Thread-003 Suspended" prio=10 Id=6 RUNNABLE
00:41:03  
00:41:03  "JIT Compilation Thread-004 Suspended" prio=10 Id=7 RUNNABLE
00:41:03  
00:41:03  "JIT Compilation Thread-005 Suspended" prio=10 Id=8 RUNNABLE
00:41:03  
00:41:03  "JIT Compilation Thread-006 Suspended" prio=10 Id=9 RUNNABLE
00:41:03  
00:41:03  "JIT Diagnostic Compilation Thread-007 Suspended" prio=10 Id=10 RUNNABLE
00:41:03  
00:41:03  "JIT-SamplerThread" prio=10 Id=11 TIMED_WAITING
00:41:03  
00:41:03  "IProfiler" prio=5 Id=12 RUNNABLE
00:41:03  
00:41:03  "Common-Cleaner" prio=8 Id=2 TIMED_WAITING
00:41:03    at [email protected]/java.lang.Object.wait(Native Method)
00:41:03    at [email protected]/java.lang.Object.wait(Object.java:221)
00:41:03    at [email protected]/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:138)
00:41:03    - locked java.lang.ref.ReferenceQueue@e8d74150
00:41:03    at [email protected]/jdk.internal.ref.CleanerImpl.run(CleanerImpl.java:148)
00:41:03    at [email protected]/java.lang.Thread.run(Thread.java:836)
00:41:03    at [email protected]/jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:134)
00:41:03  
00:41:03  "Concurrent Mark Helper" prio=1 Id=13 RUNNABLE
00:41:03  
00:41:03  "Finalizer thread" prio=5 Id=14 RUNNABLE
00:41:03  
00:41:03  "GC Slave" prio=5 Id=15 RUNNABLE
00:41:03  
00:41:03  "GC Slave" prio=5 Id=16 RUNNABLE
00:41:03  
00:41:03  "GC Slave" prio=5 Id=17 RUNNABLE
00:41:03  
00:41:03  "GC Slave" prio=5 Id=18 RUNNABLE
00:41:03  
00:41:03  "GC Slave" prio=5 Id=19 RUNNABLE
00:41:03  
00:41:03  "GC Slave" prio=5 Id=20 RUNNABLE
00:41:03  
00:41:03  "GC Slave" prio=5 Id=21 RUNNABLE
00:41:03  
00:41:03  "Attach API wait loop" prio=10 Id=24 RUNNABLE
00:41:03    at [email protected]/openj9.internal.tools.attach.target.IPC.waitSemaphore(Native Method)
00:41:03    at [email protected]/openj9.internal.tools.attach.target.CommonDirectory.waitSemaphore(CommonDirectory.java:259)
00:41:03    at [email protected]/openj9.internal.tools.attach.target.WaitLoop.waitForNotification(WaitLoop.java:66)
00:41:03    at [email protected]/openj9.internal.tools.attach.target.WaitLoop.run(WaitLoop.java:154)
00:41:03  
00:41:03  "pool-1-thread-1" prio=5 Id=25 TIMED_WAITING
00:41:03    at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
00:41:03    at [email protected]/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234)
00:41:03    at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123)
00:41:03    at [email protected]/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1182)
00:41:03    at [email protected]/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:899)
00:41:03    at [email protected]/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)
00:41:03    at [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
00:41:03    at [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
00:41:03    at [email protected]/java.lang.Thread.run(Thread.java:836)
00:41:03  
00:41:03  "AgentVMThread" prio=5 Id=26 TIMED_WAITING
00:41:03    at [email protected]/java.lang.Thread.sleep(Native Method)
00:41:03    at [email protected]/java.lang.Thread.sleep(Thread.java:966)
00:41:03    at app//FinalizeOverride.test(FinalizeOverride.java:75)
00:41:03    at app//FinalizeOverride.main(FinalizeOverride.java:52)
00:41:03    at [email protected]/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
00:41:03    at [email protected]/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
00:41:03    at [email protected]/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
00:41:03    at [email protected]/java.lang.reflect.Method.invoke(Method.java:566)
00:41:03    at app//com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
00:41:03    at [email protected]/java.lang.Thread.run(Thread.java:836)
00:41:03  
00:41:03  "Attachment 44538" prio=10 Id=27 RUNNABLE
00:41:03    at [email protected]/openj9.internal.tools.attach.target.DiagnosticUtils.dumpAllThreadsImpl(Native Method)
00:41:03    at [email protected]/openj9.internal.tools.attach.target.DiagnosticUtils.getThreadInfo(DiagnosticUtils.java:233)
00:41:03    at app//openj9.internal.tools.attach.target.DiagnosticUtils$$Lambda$66/00000000F4005C80.apply(Unknown Source)
00:41:03    at [email protected]/openj9.internal.tools.attach.target.DiagnosticUtils.executeDiagnosticCommand(DiagnosticUtils.java:169)
00:41:03    at [email protected]/openj9.internal.tools.attach.target.Attachment.doCommand(Attachment.java:249)
00:41:03    at [email protected]/openj9.internal.tools.attach.target.Attachment.run(Attachment.java:160)
00:41:03  
00:41:03  "file lock watchdog" prio=10 Id=28 TIMED_WAITING
00:41:03    at [email protected]/java.lang.Object.wait(Native Method)
00:41:03    at [email protected]/java.lang.Object.wait(Object.java:221)
00:41:03    at [email protected]/java.util.TimerThread.mainLoop(Timer.java:553)
00:41:03    - locked java.util.TaskQueue@793acdc7
00:41:03    at [email protected]/java.util.TimerThread.run(Timer.java:506)
00:41:03  
00:41:03  
00:41:03  --- Timeout information end.
jit test excluded test failure

All 59 comments

https://ci.eclipse.org/openj9/job/Test_openjdk11_j9_sanity.openjdk_x86-64_linux_cm_Nightly/19
01:33:20 at app//FinalizeOverride.test(FinalizeOverride.java:75)
01:33:20 at app//FinalizeOverride.main(FinalizeOverride.java:52)

https://ci.eclipse.org/openj9/job/Test_openjdk14_j9_sanity.openjdk_ppc64le_linux_Nightly/1
01:02:28 at app//FinalizeOverride.test(FinalizeOverride.java:75)
01:02:28 at app//FinalizeOverride.main(FinalizeOverride.java:51)

https://ci.eclipse.org/openj9/job/Test_openjdk14_j9_sanity.openjdk_x86-64_linux_Nightly/9
01:31:01 at app//FinalizeOverride.test(FinalizeOverride.java:75)
01:31:01 at app//FinalizeOverride.main(FinalizeOverride.java:52)

https://ci.eclipse.org/openj9/job/Test_openjdk11_j9_sanity.openjdk_x86-64_linux_xl_Nightly/58
01:07:06 at app//FinalizeOverride.test(FinalizeOverride.java:75)
01:07:06 at app//FinalizeOverride.main(FinalizeOverride.java:52)

https://ci.eclipse.org/openj9/job/Test_openjdk14_j9_sanity.openjdk_x86-64_linux_Nightly/12
01:54:10 at app//FinalizeOverride.test(FinalizeOverride.java:75)
01:54:10 at app//FinalizeOverride.main(FinalizeOverride.java:51)

Ran a grinder to get a successful result. https://ci.eclipse.org/openj9/job/Grinder/907

Base.finalize() sum += 10
Subclass.finalize() sum += 0
SubSubclass.finalize() sum +=30
PublicFinalize.finalize() sum += 40+40*100
Base.finalize() sum += 50
Base.finalize() sum += 60

@dmitripivkine can you pls take a look.

Unfortunately, although the test contains printlns which might help diagnose the problem, the failure result doesn't show any System.out or System.err output even though System.out would contain some output.

@smlambert @llxia to proceed to diagnose this, we need to capture a core file when the test times out. Do you know how we could do that, or do we have the source to the test harness used to run the openjdk tests that we could modify to send a kill -QUIT signal to the test process? I see the test harness runs jstack on the test atm when a timeout occurs.

@sophia-guo do you happen to know anything that could help us diagnose in this case?

References for openjdk regression test harness (jtreg):
https://openjdk.java.net/jtreg/index.html
https://openjdk.java.net/jtreg/tag-spec.html

You could potentially set a vm.opts or java.opts that would create a core at end of test run (which is at the point the test times out)

System property | Environment variable | Notes
-- | -- | --
test.vm.opts | TESTVMOPTS | Additional VM options for all JVMs used in the test.
test.java.opts | TESTJAVAOPTS | Additional VM options for all invocations of the main Java launcher used in the test. (i.e.excluding JDK tools like "javac".)

There is also sometimes additional info in the .jtr files in the test_output artifact, have you looked at those?

There is also sometimes additional info in the .jtr files in the test_output artifact, have you looked at those?

Yes, we've analyzed the available information and determined the next step is to get a core file to proceed further.

You could potentially set a vm.opts or java.opts that would create a core at end of test run (which is at the point the test times out)

The test fails intermittently. We need a solution that can be enabled for all testing until we get the core file we need. The test harness already has some handling for timeouts, i.e. running jstack, but it's not enough to figure out the cause of the problem. Probably in general it would be good to capture core files on timeouts to help diagnose the problems. I'm surprised the openjdk tests aren't already trying to do something like that. Although perhaps they are but it doesn't apply to OpenJ9.

What is the corefile name format? We have set to obtain files with following name format if tests fail or error.
JTREG_BASIC_OPTIONS += -retain:fail,error,.dmp,javacore.,heapdump.,.trc

.dmp should cover it, the cores are named core.*.dmp. There is no core file generated atm, we're not failing to capture it.

The source for jtreg lives here

I'll take a look and see if I can make the required mods. I don't expect openjdk is going to take mods to capture OpenJ9 core files. thinking we'll need to mirror this to git and modify it.

We allow for passing in custom jtreg binaries (needed to do this for zOS), so if you build it, there is a way to use a specialized one, though I would caution against this as a 'long-term' plan, but it may help to resolve this particular instance.

Although I don't see it in the doc, looking at the code it seems there is support for a custom timeout handler, using -th <classname> or -timeoutHandler <classname>. I've created the handler we need, need to see if I can get it working in practice.

It's working.
-timeoutHandler:openj9.jtreg.CoreDumpTimeoutHandler -timeoutHandlerDir:/home/peter/oj9coredumpth.jar

Timeout information:
Running jcmd on process 18627
Dump written to /home/peter/test_output/scratch/0/core.20200610.195421.18627.0001.dmp
--- Timeout information end.

Although it shows scratch/0, at the end the core is actually in java/lang/ref/FinalizeOverride where you'd expect.

If I can get a few pointers for where to put the code and whether we need a build.xml or a jenkins job, we can modify the OpenJ9 sanity.openjdk to use it.

Look at internal server, Test_openjdk11_j9_sanity.openjdk_s390x_zos_Personal/32/consoleFull
as an example

If you set an environment variable JTREG_URL to a URL where your custom jtreg build is, it will get used instead of the default location where we pick it up from (at AdoptOpenJDK).
JTREG_URL=someLinkToYourJtregBuild

JTREG_URL=https://artifactoryServer/artifactory/sys-rt-generic-local/UploadFile/buildId/yourCustomJtreg.tar.gz

You can use the UploadFile job to push your custom jtreg build to artifactory for use.

I don't have a custom jtreg build. The timeout handler works with the existing jtreg.jar, and is a separate jar file. I'd like to deliver the code somewhere (Oracle copyright and GPL license, with my modifications) and build the new jar, and have it available when running openjdk tests, so we can add command line arguments to use it (permanently).

I get it now... so I guess you need to figure out where you are going to land your code first. I will stop replying in git issues now, as I am clearly too tired to read them clearly. :)

I guess you need to figure out where you are going to land your code first.

I was hoping you would have a spot for it, but if not I could put it into https://github.com/ibmruntimes/openj9-openjdk-jdk, or at least start a discussion about doing that.

oj9coredumpth.zip

@smlambert if the previous commit looks ok to you, I'll go ahead and create the PR and ask for a review. Would we then create a jenkins build to create the jar file?

Yes, that looks good, we can create a Jenkins job at ci.eclipse.org/openj9 (https://ci.eclipse.org/openj9/view/Infrastructure/job/Build_JDK_Timeout_Handler/) then can fetch the jar in a target in the openjdk-tests/openjdk/build.xml file.

https://ci.eclipse.org/openj9/job/Test_openjdk11_j9_sanity.openjdk_ppc64le_linux_xl_Nightly/60
23:55:45 at app//FinalizeOverride.test(FinalizeOverride.java:75)
23:55:45 at app//FinalizeOverride.main(FinalizeOverride.java:53)

https://ci.eclipse.org/openj9/job/Test_openjdk11_j9_sanity.openjdk_x86-64_windows_Release/3
11:34:45 at app//FinalizeOverride.test(FinalizeOverride.java:75)
11:34:45 at app//FinalizeOverride.main(FinalizeOverride.java:52)

https://ci.eclipse.org/openj9/job/Test_openjdk14_j9_sanity.openjdk_x86-64_windows_Personal/11
16:32:49 at app//FinalizeOverride.test(FinalizeOverride.java:75)
16:32:49 at app//FinalizeOverride.main(FinalizeOverride.java:51)

https://ci.eclipse.org/openj9/job/Test_openjdk14_j9_sanity.openjdk_ppc64le_linux_Nightly/16
01:13:01 at app//FinalizeOverride.test(FinalizeOverride.java:75)
01:13:01 at app//FinalizeOverride.main(FinalizeOverride.java:52)

The final piece to install the timeout handler when running openjdk tests https://github.com/AdoptOpenJDK/openjdk-tests/pull/1834

https://ci.eclipse.org/openj9/job/Test_openjdk14_j9_sanity.openjdk_ppc64le_linux_Nightly/18
00:47:18 at app//FinalizeOverride.test(FinalizeOverride.java:75)
00:47:18 at app//FinalizeOverride.main(FinalizeOverride.java:52)

Here is a finalization hang, but a different test. FYI there were a number of other failures in the same build. https://github.com/eclipse/openj9/issues/8972#issuecomment-646685660

https://ci.eclipse.org/openj9/job/Test_openjdk11_j9_sanity.openjdk_s390x_linux_Nightly/90
java/lang/System/finalization/FinThreads.java
00:02:08 at app//FinThreads.main(FinThreads.java:120)

@dmitripivkine there is a core in https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Test/Test_openjdk11_j9_sanity.openjdk_s390x_linux_Nightly/90/openjdk_test_output.tar.gz

app//FinalizeOverride.main(FinalizeOverride.java:52) is

        test(new PrivateFinalize(50), 50);

so looking to Java stack frame for FinalizeOverride.test:

<7ffa086d4a00> JIT frame: bp = 0x00007FF96000B078, pc = 0x00007FF9CC5DA083, unwindSP = 0x00007FF96000B020, cp = 0x00007FFA086D05D0, arg0EA = 0x00007FF96000B088, jitInfo = 0x00007FF9C5B64B38
<7ffa086d4a00>  Method: FinalizeOverride.test(Ljava/lang/Object;I)V !j9method 0x00007FFA086D0BF8
<7ffa086d4a00>  Bytecode index = 41, inlineDepth = 0, PC offset = 0x0000000000000143
<7ffa086d4a00>  stackMap=0x00007FF9C5B64D19, slots=I16(0x0002) parmBaseOffset=I16(0x0010), parmSlots=U16(0x0001), localBaseOffset=I16(0xFFF8)
<7ffa086d4a00>  Described JIT args starting at 0x00007FF96000B088 for U16(0x0001) slots
<7ffa086d4a00>      O-Slot: : a0[0x00007FF96000B088] = 0x00007FFA04D1AAE8
<7ffa086d4a00>  Described JIT temps starting at 0x00007FF96000B070 for IDATA(0x0000000000000001) slots
<7ffa086d4a00>      I-Slot: : t0[0x00007FF96000B070] = 0x00007FF96000B0A8
<7ffa086d4a00>  JIT-RegisterMap = UDATA(0x0000000000000000)
<7ffa086d4a00>  JIT-Frame-RegisterMap[0x00007FF96000B048] = UDATA(0x00007FFA04D958F8) (jit_rbx)
<7ffa086d4a00>  JIT-Frame-RegisterMap[0x00007FF96000B050] = UDATA(0x0000000000000000) (jit_r9)

Object we are waiting to be Finalized is

> !j9object 0x00007FFA04D1AAE8
!J9Object 0x00007FFA04D1AAE8 {
    struct J9Class* clazz = !j9class 0x7FF960012B00 // FinalizeOverride$PrivateFinalize
    Object flags = 0x00000000;
    J lockword = 0x0000000000000000 (offset = 0) (java/lang/Object) <hidden>
    I value = 0x00000032 (offset = 16) (FinalizeOverride$Base)
    J finalizeLink = 0x00007FFA04D1A6E0 (offset = 8) (FinalizeOverride$Base) <hidden>
}

but it can not be because O-Slot: : a0[0x00007FF96000B088] = 0x00007FFA04D1AAE8 keeps it alive. GC tracks this object in Unfinalized list and ready to send it to Finalization as soon as it is detected dead.

@andrewcraik fyi this transfer to jit.

@dmitripivkine so is the problem that the JIT is just keeping the object alive too long for the test? If so at low optimization levels we do not minimize the lifetime of object slots - there is a JIT option (-Xjit:enableAggressiveLiveness) to force it but we don't run but it is not the default.

If the enableAggressiveLiveness fixes it, is the test even valid?

@dmitripivkine so is the problem that the JIT is just keeping the object alive too long for the test? If so at low optimization levels we do not minimize the lifetime of object slots - there is a JIT option (-Xjit:enableAggressiveLiveness) to force it but we don't run but it is not the default.

If the enableAggressiveLiveness fixes it, is the test even valid?

Test tries to kill object by setting it to NULL before waiting for finalization in the loop https://github.com/ibmruntimes/openj9-openjdk-jdk11/blob/60b1852c4d140e2913bdc544d2e9f9ef5cef90f4/test/jdk/java/lang/ref/FinalizeOverride.java#L62
However finalization never occur because object kept alive by O-slot.

This test fails very intermittently in Adopt runs (framework uses javac) but it is hard to reproduce in grinder (depends on hotness of the method?). I am not sure how to verify effectiveness of suggested option in this conditions

Looking at the code the JIT could easily eliminate the dead store to the param - no one observes the value so there is no requirement to do that write. Further, there is no requirement for us to mark the parameter dead - we can do so, but are not required so I expect -Xjit:enableAggressiveLiveness - the test seems invalid to me.

It seems like an issue that could occur with user code, where the code intentionally nulls variables so they can be gc'ed, but the JIT keeps them alive.

The test is run as part of the jdk_lang group. We can modify the entire group to run with the -Xjit:enableAggressiveLiveness option if that seems appropriate and won't hide problems.

so there is no guarantee that doing local = null is going to cause the object to be released. At lower opt levels the JIT doesn't run the optimization to minimize object lifetimes for compile-time reasons - it takes too long...

So are we going to run with the option?

It doesn't seem right to run all the jdk_lang testing with an option which isn't enabled by default. I'll try to figure out how we can exclude this particular test from the jdk_lang bucket and run it separately with the option.

so there is no guarantee that doing local = null is going to cause the object to be released. At lower opt levels the JIT doesn't run the optimization to minimize object lifetimes for compile-time reasons - it takes too long...

Is there opportunity to recognize that local = null ends the live region for that reference without running the full optimization to limit lifetimes? Nulling out references to large reference graphs is a relatively common approach to avoid keeping linked list / graph style collections alive.

Is there opportunity to recognize that local = null ends the live region for that reference without running the full optimization to limit lifetimes? Nulling out references to large reference graphs is a relatively common approach to avoid keeping linked list / graph style collections alive.

No because the store would likely be eliminated as dead - no one ever reads it so it won't remain as that kind of marker and trying to guess which null stores are important is not really viable.

Test excluded from jdk_lang_j9 and run in its own target jdk_lang_ref_FinalizeOverride_j9 with extra JIT option via:
https://github.com/AdoptOpenJDK/openjdk-tests/pull/1845

@andrewcraik the failure still occurs with -Xjit:enableAggressiveLiveness. There is a core file in the failure artifact https://140-211-168-230-openstack.osuosl.org/artifactory/ci-eclipse-openj9/Test/Test_openjdk14_j9_sanity.openjdk_x86-64_linux_Nightly/26/openjdk_test_output.tar.gz

Confirmed the option is present in the core via info proc.
-ea -esa -Xmx512m -Xjit:enableAggressiveLiveness -XX:-JITServerTechPreviewMessage

https://ci.eclipse.org/openj9/job/Test_openjdk14_j9_sanity.openjdk_x86-64_linux_Nightly/26/
01:40:56 at app//FinalizeOverride.test(FinalizeOverride.java:75)
01:40:56 at app//FinalizeOverride.main(FinalizeOverride.java:52)

https://ci.eclipse.org/openj9/job/Test_openjdk11_j9_sanity.openjdk_s390x_linux_xl_Nightly/73/
01:00:24 at app//FinalizeOverride.test(FinalizeOverride.java:75)
01:00:24 at app//FinalizeOverride.main(FinalizeOverride.java:53)

Given the liveness doesn't seem like the problem given the above experiment this is going to take a bit of time to dig into and fix. I am going to target this to 0.22 since we are out of time on 0.21.

@andrewcraik Since this fails quite often, archiving a core file each time which consumes space we are currently short of, I'm going to modify the playlist to remove the timeout handler to avoid the core file. Pls archive the existing core if you need it, and we can add the timeout handler back in later if necessary.

https://github.com/AdoptOpenJDK/openjdk-tests/pull/1851

I'll stop reporting the failures that occur, but pls assume they still occur.

@andrewcraik btw, you should look at this as a regression, as it wasn't failing prior to May 22.

FYI https://github.com/AdoptOpenJDK/openjdk-tests/pull/1863, the test runs in special.openjdk now instead of sanity.

The reason for failure is extended life of the object (kept alive by o-slot from JIT frame). According @andrewcraik and @0xdaryl life of the object might be extended with high levels of hotness of JIT method and this is not a new behaviour. This failure is intermittent (1 of 25) and observed in Adopt builds only. The same test never failed in internal builds. I noticed the difference how test frameworks run this test - internal builds using java directly but Adopt uses javac compiling test and run as an agent. I believe the way how Adopt runs the test might have higher levels of hotness. Based on fact this is not a regression changing milestones for this item to 0.23

Object life-time in JIT optimized code can vary from the original program. Any test which aims to try and cause an object lifetime to end at a specific point is unlikely to succeed in general given the very liberal approach to object lifetime and the points where GC occurs in Java. I suspect this is likely to be a won't fix because there isn't anything which says the reference has to be freed at the point described in the test.

Object life-time in JIT optimized code can vary from the original program. Any test which aims to try and cause an object lifetime to end at a specific point is unlikely to succeed in general given the very liberal approach to object lifetime and the points where GC occurs in Java. I suspect this is likely to be a won't fix because there isn't anything which says the reference has to be freed at the point described in the test.

My only worry this is very popular test pattern. Also seems other JVMs passed this tests so set behaviour standard de facto.

Here's what I've found so far. I enabled the verbose option to check what optimization level was being reached for FinalizeOverride.test. What I noticed was that in almost every case where the test passed, that method was not actually compiled. In the cases where it failed, it was compiled uniformly at optLevel warm and then at profiled very-hot. I'm guessing that the profiled very-hot compilation likely happened while the warm JITted method was stuck running in a loop, waiting for the object to be collected.

I noticed one passing case in which FinalizeOverride.test was compiled only at warm. I'm not certain whether that JITted method was run, or if the test completed before entering the JITted method.

With that information, I've been able to reproduce this failure reliably running locally on Linux x86 with OpenJ9 0.17.0, 0.18.1, 0.20.0 and 0.21.1, with the options -Xjit:enableAggressiveLiveness,{*FinalizeOverride.test*}(count=0). I will retry without count=0, and then without enableAggressiveLiveness.

Earlier I was unable to reproduce the failure with nightly builds, but I reran with -Xjit:enableAggressiveLiveness,{*FinalizeOverride.test*}(count=5),disableAsyncCompilation and -Xjit:{*FinalizeOverride.test*}(count=5),disableAsyncCompilation. With enableAggressiveLiveness, there were 4/20 failures; without that option, there were 9/20 failures.

I'll check for any differences in the logs in the successful and failing runs, with and without enableAggressiveLiveness.

I'll check for any differences in the logs in the successful and failing runs, with and without enableAggressiveLiveness.

In every passing case, the verbose log indicated FinalizeOverride.test was compiled with OSR and in every failing case it had GCR.

Seems to only fail with GCR, failures are intermittent and the failure has existed since at least 0.17 based on the analysis from @hzongaro . We are still not certain there is an actual JIT problem here - will continue investigation, but will not have a fix for 0.23 so moving out to 0.24

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_sanity.openjdk_s390x_linux_xl_Nightly/227
java/lang/ref/FinalizeOverride.java
Timeout

22:27:34  "AgentVMThread" Id=21 RUNNABLE
22:27:34    at java.lang.ref.Finalizer.runFinalizationImpl(Native Method)
22:27:34    at java.lang.ref.Finalizer.runFinalization(Finalizer.java:29)
22:27:34    at java.lang.Runtime.runFinalization0(Native Method)
22:27:34    at java.lang.Runtime.runFinalization(Runtime.java:713)
22:27:34    at java.lang.System.runFinalization(System.java:573)
22:27:34    at FinalizeOverride.test(FinalizeOverride.java:65)
22:27:34    at FinalizeOverride.main(FinalizeOverride.java:53)
22:27:34    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
22:27:34    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
22:27:34    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
22:27:34    at java.lang.reflect.Method.invoke(Method.java:498)
22:27:34    at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312)
22:27:34    at java.lang.Thread.run(Thread.java:823)

I guess we never excluded the test on jdk8 (https://github.com/AdoptOpenJDK/openjdk-tests/pull/1845), but it hasn't been failing until now.

Just in case, changes since last nightly build
https://github.com/eclipse/openj9/compare/d615676...2f4ea25

Moving this forward again as there is little likelihood of making progress investigating in the 0.24 timeframe.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

pshipton picture pshipton  路  3Comments

DanHeidinga picture DanHeidinga  路  4Comments

BillOgden picture BillOgden  路  6Comments

JamesKingdon picture JamesKingdon  路  5Comments

JasonFengJ9 picture JasonFengJ9  路  5Comments