Openj9: Three jdk_net tests produce similar Segmentation Errors on OpenJ9 JDK11 on Windows

Created on 18 Mar 2019  路  21Comments  路  Source: eclipse/openj9

Failure link

https://ci.adoptopenjdk.net/job/openjdk11_j9_openjdktest_x86-64_windows/155/consoleFull
https://ci.adoptopenjdk.net/job/openjdk11_j9_openjdktest_x86-64_windows/156/consoleFull
https://ci.adoptopenjdk.net/job/openjdk11_j9_openjdktest_x86-64_windows/159/consoleFull

Optional info

  • intermittent failure (yes|no): yes
  • regression or new test: regression

Failure output

  • Test Names

java/net/httpclient/StreamingBody.java
java/net/httpclient/MappingResponseSubscriber.java
java/net/httpclient/websocket/WSHandshakeExceptionTest.java

  • Example Output:

23:38:22 Type=Segmentation error vmState=0x00000000
23:38:22 Windows_ExceptionCode=c0000005 J9Generic_Signal=00000004 ExceptionAddress=00007FFE999F3EFD ContextFlags=0010005f
23:38:22 Handler1=00007FFE99A6BE90 Handler2=00007FFE999AED60 InaccessibleReadAddress=0000000099669976
23:38:22 RDI=0000002FFCA6F730 RSI=000000000039E4C8 RAX=0000000099669966 RBX=00000000003A1D00
23:38:22 RCX=0000002FFCA6F748 RDX=0000000000412778 R8=0000002FFBD6C298 R9=0000000000040022
23:38:22 R10=0000002FFBD6CCCA R11=0000002FFCA6F080 R12=0000002FFCA6F738 R13=0000002FFCA6F728
23:38:22 R14=0000000000412780 R15=0000002FFCA6F720
23:38:22 RIP=00007FFE999F3EFD RSP=0000002FFCA6F410 RBP=00000000004127C8 GS=002B
23:38:22 FS=0053 ES=002B DS=002B
23:38:22 XMM0 000000000039e4c8 (f: 3794120.000000, d: 1.874544e-317)
23:38:22 XMM1 000000000039e4e8 (f: 3794152.000000, d: 1.874560e-317)
23:38:22 XMM2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:38:22 XMM3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:38:22 XMM4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:38:22 XMM5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:38:22 XMM6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:38:22 XMM7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:38:22 XMM8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:38:22 XMM9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:38:22 XMM10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:38:22 XMM11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:38:22 XMM12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:38:22 XMM13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:38:22 XMM14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:38:22 XMM15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
23:38:22 Module=C:\Users\jenkins\workspace\openjdk11_j9_openjdktest_x86-64_windows\openjdkbinary\j2sdk-image\bin\compressedrefs\j9vm29.dll
23:38:22 Module_base_address=00007FFE999F0000 Offset_in_DLL=0000000000003efd
23:38:22 Target=2_90_20190317_167 (Windows Server 2012 R2 6.3 build 9600)
23:38:22 CPU=amd64 (4 logical CPUs) (0x1fff8d000 RAM)
23:38:22 ----------- Stack Backtrace -----------
23:38:22 (0x00007FFE999F3EFD [j9vm29+0x3efd])
23:38:22 (0x0000000000412CF8)
23:38:22 (0x0000000000412A00)
23:38:22 (0x00000000FFFAA500)
23:38:22 (0x0000000000412730)
23:38:22 ---------------------------------------

test failure

Most helpful comment

For future reference, when running the test framework locally, here's the steps I used (edit as needed):

Step 1: Log into windows box (in this case test-azure-win2012r2-x64-1 on the adopt ci system).

Step 2: Download a copy of OpenJDK 11 on OpenJ9.

Step 3: Run these commands:

export BUILD_LIST=openjdk_regression
export AUTO_DETECT=null
export JDK_IMPL=openj9
export TERM=cygwin
export TARGET=sanity.openjdk
export JAVA_HOME=C:\Users\adoptopenjdk\adamfarl\11j9
export JAVA_BIN=C:\Users\adoptopenjdk\adamfarl\11j9\bin
git clone http://github.com/adoptopenjdk/openjdk-tests
./openjdk-tests/get.sh -t /home/adoptopenjdk/openjdk-tests -p x64_windows -j 11 -i openj9
./openjdk-tests/maketest.sh ./openjdk-tests
cd ./openjdk-tests/TestConfig
make -f makeGen.mk AUTO_DETECT=null
cd ..
./maketest.sh ./openjdk_regression _jdk_net

Step 4: Re-run final step until the test fails with segmentation error.

Notes:

  • Do not use linux directory names for java location.
  • Do use double-backslashes for JAVA_* path seperators.
  • When running tests locally, the framework does not delete core files. Instead, it moves them here:
    C:\cygwin64\home\(username)\jvmtest\openjdk_regression\work\(failing test)
  • The tests that most often see this bug were all excluded, so you may want to remove the lines in the openjdk-tests ProblemList_openjdk11-openj9.txt file which contain this issue's URL.
  • If you have already built the test material, then the ProblemList you need to edit is located here:
    C:\cygwin64\home\(username)\jvmtest\openjdk_regression\ProblemList_openjdk11-openj9.txt

All 21 comments

@adamfarley Can you capture the cores from these failures? It will make it much easier to determine the cause without needing to wait for a reproduction

Will take a look.

No, it looks like they get deleted as a matter of course, likely to save space. A single run of the jdk_net sanity group on windows should be enough to reproduce, as this segmentation error crops up in at least one unit test during a given run 3 times out of 4.

Attempts to reproduce the bug locally have failed. Windows 10 run without framework, repeated execution on nightly build.

Four additional failures seen. Some differences, unsure which ones matter.

https://ci.adoptopenjdk.net/job/openjdk11_j9_openjdktest_x86-64_windows/173/consoleFull
https://ci.adoptopenjdk.net/job/openjdk12_j9_openjdktest_x86-64_windows/26/consoleFull

Example:

12:43:45  Type=Segmentation error vmState=0x00000000
12:43:45  Windows_ExceptionCode=c0000005 J9Generic_Signal=00000004 ExceptionAddress=00007FFE99FC3EFD ContextFlags=0010005f
12:43:45  Handler1=00007FFE9A03BF10 Handler2=00007FFE9B84EE90 InaccessibleReadAddress=0000000099669976
12:43:45  RDI=00000076514AF4B0 RSI=0000000000334CC8 RAX=0000000099669966 RBX=0000000000339100
12:43:45  RCX=00000076514AF4C8 RDX=000000000035ED18 R8=00000076509F73D0 R9=0000000000040002
12:43:45  R10=00000076509F7DFB R11=00000076514AEE00 R12=00000076514AF4B8 R13=00000076514AF4A8
12:43:45  R14=000000000035ED20 R15=00000076514AF4A0
12:43:45  RIP=00007FFE99FC3EFD RSP=00000076514AF190 RBP=000000000035ED68 GS=002B
12:43:45  FS=0053 ES=002B DS=002B
12:43:45  XMM0 0000000000334cc8 (f: 3361992.000000, d: 1.661045e-317)
12:43:45  XMM1 0000000000334ce8 (f: 3362024.000000, d: 1.661061e-317)
12:43:45  XMM2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:43:45  XMM3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:43:45  XMM4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:43:45  XMM5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:43:45  XMM6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:43:45  XMM7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:43:45  XMM8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:43:45  XMM9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:43:45  XMM10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:43:45  XMM11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:43:45  XMM12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:43:45  XMM13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:43:45  XMM14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:43:45  XMM15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
12:43:45  Module=C:\Users\jenkins\workspace\openjdk12_j9_openjdktest_x86-64_windows\openjdkbinary\j2sdk-image\bin\compressedrefs\j9vm29.dll
12:43:45  Module_base_address=00007FFE99FC0000 Offset_in_DLL=0000000000003efd
12:43:45  Target=2_90_20190406_49 (Windows Server 2012 R2 6.3 build 9600)
12:43:45  CPU=amd64 (4 logical CPUs) (0x1fff8d000 RAM)
12:43:45  ----------- Stack Backtrace -----------
12:43:45  (0x00007FFE99FC3EFD [j9vm29+0x3efd])
12:43:45  (0x000000000035F2F8)
12:43:45  (0x000000000035F000)
12:43:45  (0x00000000FFDF7AD8)
12:43:45  (0x000000000035ECD0)
12:43:45  ---------------------------------------

Still unable to reproduce locally. Will request access to one of the test machines seeing the failure.

Access requested. Still waiting.

Tried running several of the failing tests on another windows server 2012 box.

No luck.

@adamfarley I understand you've been granted machine access. Have you been able to get a core yet? We're closing in on the release date and I'm concerned about running out of time to resolve this.

Sadly no. We've only recently managed to run the tests outside of the jenkins framework, and the tests refuse to fail when the jdk_net test group is run on its own. Ditto for the individual failing unit tests run using the jtreg jar.

Will try a full sanity run overnight.

On the upside, if it's this hard to reproduce the failures, the odds of a customer seeing them goes way down.

Finally, we have core and other diagnostic files. Zipping and copying into box now.

Files are on box, and Dan has access.

They are in /j9_issue_5140_files

For future reference, when running the test framework locally, here's the steps I used (edit as needed):

Step 1: Log into windows box (in this case test-azure-win2012r2-x64-1 on the adopt ci system).

Step 2: Download a copy of OpenJDK 11 on OpenJ9.

Step 3: Run these commands:

export BUILD_LIST=openjdk_regression
export AUTO_DETECT=null
export JDK_IMPL=openj9
export TERM=cygwin
export TARGET=sanity.openjdk
export JAVA_HOME=C:\Users\adoptopenjdk\adamfarl\11j9
export JAVA_BIN=C:\Users\adoptopenjdk\adamfarl\11j9\bin
git clone http://github.com/adoptopenjdk/openjdk-tests
./openjdk-tests/get.sh -t /home/adoptopenjdk/openjdk-tests -p x64_windows -j 11 -i openj9
./openjdk-tests/maketest.sh ./openjdk-tests
cd ./openjdk-tests/TestConfig
make -f makeGen.mk AUTO_DETECT=null
cd ..
./maketest.sh ./openjdk_regression _jdk_net

Step 4: Re-run final step until the test fails with segmentation error.

Notes:

  • Do not use linux directory names for java location.
  • Do use double-backslashes for JAVA_* path seperators.
  • When running tests locally, the framework does not delete core files. Instead, it moves them here:
    C:\cygwin64\home\(username)\jvmtest\openjdk_regression\work\(failing test)
  • The tests that most often see this bug were all excluded, so you may want to remove the lines in the openjdk-tests ProblemList_openjdk11-openj9.txt file which contain this issue's URL.
  • If you have already built the test material, then the ProblemList you need to edit is located here:
    C:\cygwin64\home\(username)\jvmtest\openjdk_regression\ProblemList_openjdk11-openj9.txt

@DanHeidinga are there any results from looking at the cores?

@fengxue-IS has been looking at them. Jack, can you provide an update?

I reproduced the error in Jenkins grinder, but wasn鈥檛 able to reproduce it locally on a Windows7 machine (50+ runs) to get a core with full debug info, trying to reproduce this on Windows 10 farm machine currently.

@fengxue-IS - Is this the lack of information a general problem with system cores produced by the jtreg tests? Is there is a fixed set of options that could make systemcores more useful?

Also, we can reproduce the defect with extra options if you could supply them.

Moving this to milestone 0.15.0 as we're running into the 0.14.0 release date and don't have a fix for this. Also, Jack's testing has indicated that the failure isn't very reproducible.

@adamfarley The lack of information is due to builds from AdoptOpenJDK not including the debug symbols (ie .pdb files)

I've tagged this issue for a potential future release that would occur before the 0.15 release. There is no firm date for this, and this is not a commitment that we will create an interim release.

Ah, so adding options at test-time won't help. Do let me know if you need a hand reproducing the issue on a debug build.

Analyzed core generated by debug info, failure occurred during invokeVirtual bytecode,
With help from @JasonFengJ9, confirmed this is caused by the same error as #4778

Was this page helpful?
0 / 5 - 0 ratings