Openjdk-infrastructure: Test machines naming with '-XJ' don't have required module installed

Created on 19 Jun 2019  路  23Comments  路  Source: AdoptOpenJDK/openjdk-infrastructure

Test running on https://ci.adoptopenjdk.net/computer/test-macincloud-macos1010-3-XJ/ failed with message:

06:31:54  Can't locate Text/CSV.pm in @INC (you may need to install the Text::CSV module) (@INC contains: ./makeGenTool /Library/Perl/5.18/darwin-thread-multi-2level /Library/Perl/5.18 /Network/Library/Perl/5.18/darwin-thread-multi-2level /Network/Library/Perl/5.18 /Library/Perl/Updates/5.18.2 /System/Library/Perl/5.18/darwin-thread-multi-2level /System/Library/Perl/5.18 /System/Library/Perl/Extras/5.18/darwin-thread-multi-2level /System/Library/Perl/Extras/5.18 .) at makeGenTool/parseFiles.pl line 27.
06:31:54  BEGIN failed--compilation aborted at makeGenTool/parseFiles.pl line 27.
06:31:54  Compilation failed in require at makeGenTool/mkgen.pl line 93.
06:31:54  Using projectRootDir: /Users/jenkins/workspace/openjdk8_j9_openjdktest_x86-64_macos/openjdk-tests/TestConfig/scripts/testKitGen/../../..
06:31:54  Getting modes data from modes.xml and ottawa.csv...
06:31:54  settings.mk:54: /Users/jenkins/workspace/openjdk8_j9_openjdktest_x86-64_macos/openjdk-tests/TestConfig/../TestConfig/utils.mk: No such file or directory
06:31:54  makefile:39: count.mk: No such file or directory
06:31:54  make: *** No rule to make target `count.mk'.  Stop.

https://ci.adoptopenjdk.net/view/Test_openjdk/job/openjdk8_j9_openjdktest_x86-64_macos/217/consoleFull

Text/CSV.pm is required for running Testkitgen

bug

All 23 comments

Same issue for tests on s390x test-marist-ubuntu1604-s390x-2-XJ.

The s390x box should be ok now - let me know if there are any further isssues

Have now also installed Text::CSV, XML::Parser and JSON onto the mac system

disabled node: test-macincloud-macos1010-3-XJ

brew install ant-contrib executed and linked /usr/local/Cellar/ant-contrib/1.0b3/share/ant/ant-contrib-1.0b3.jar to /usr/local/Cellar/ant/1.10.1/lib/ant-contrib.jar

https://ci.adoptopenjdk.net/job/openjdk8_hs_openjdktest_x86-64_macos/373/ has passed the problematic section so the above appears to have worked.

@sxa555 I see that https://ci.adoptopenjdk.net/computer/test-macincloud-macos1010-3-XJ/ is still offline and jobs are waiting for available machines. Could you re-enable it? Thanks!

@sxa555 both jdk and system test jobs on https://ci.adoptopenjdk.net/computer/test-marist-ubuntu1604-s390x-2-XJ/ are running with unexpected long time and extra errors.

System tests :
running on https://ci.adoptopenjdk.net/computer/test-marist-ubuntu1604-s390x-1/ pass, 2 hours
running on https://ci.adoptopenjdk.net/computer/test-marist-ubuntu1604-s390x-2-XJ/ failed, 4 hours
https://ci.adoptopenjdk.net/view/Test_system/job/openjdk11_j9_systemtest_s390x_linux/

jdk Tests:
running on https://ci.adoptopenjdk.net/computer/test-marist-ubuntu1604-s390x-1/ failed, round 1.5 hours
running on https://ci.adoptopenjdk.net/computer/test-marist-ubuntu1604-s390x-2-XJ/ failed, 9 hours and time out
https://ci.adoptopenjdk.net/view/Test_openjdk/job/openjdk11_j9_openjdktest_s390x_linux/

Similar issues for other version or implements.

Wondered any configuration difference between those two machines?

Please ensure that if an issue still persists on a closed issue that you reopen it or any comments will likely not be actioned.

I've taken the macos box back online.

For the s390x box are all the errors network timeouts? (I'm basing that on run 260 of the job you mentioned)

Unfortunately I don't have this repo's reopen permission :-(

Yes, most of failures are timeouts, which make the job take much longer time than on the other machine and make the build timeouts. I wondered if any configuration hidden issue?

My question was whether they were all network timeouts specifically - are they?

I'm not sure what "configuration hidden issue" is suggesting - if there's an issue we need to debug and identify it as I can't tell what's wrong at the moment :-)

We need to know what operations in particular are getting stuck to be able to debug this further

The following tests are failing on test-macincloud-macos1010-3-XJ but pass on test-macincloud-macos1010-1

java/util/prefs/AddNodeChangeListener.java.AddNodeChangeListener
java/util/prefs/CheckUserPrefsStorage.sh.CheckUserPrefsStorage
java/util/prefs/RemoveReadOnlyNode.java.RemoveReadOnlyNode
java/util/prefs/RemoveUnregedListener.java.RemoveUnregedListener

The prefs tests are the same ones as were failing here: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8079418. The underlying issue there was user permissions - but also that is now 'resolved'.

NOTE: mac machine has been renamed from test-macincloud-macos1010-3-XJ to test-macstadium-macos1010-1-XJ as the hosting provider was incorrect

@sxa555 can I close this as the machine has been deleted (https://github.com/AdoptOpenJDK/openjdk-infrastructure/issues/849)

@gdams No this should be kept open as this covers issues with more than just the macos machine (Thanks for not leaving this closed @karianna)

@sophia-guo @lumpfish As per earlier question are the failures on the s390x box all _network_ timeouts? We need to get this understood and resolved as it seems to be the cause of a lot of zLinux slowness at the moment. Can someone who understands the test suite determine what specific operations are hanging on the machine?

I'm going to abort #13 on https://ci.adoptopenjdk.net/job/Test_openjdk8_j9_sanity.openjdk_s390x_linux/13/ for now so I can quiesce test-marist-ubuntu1604-s390x-2-XJ and see if there are any processes left around.

Answer: lots Ref: jenkins.maristXJ.log.gz

Here is a samsnippet of the ps listing with the July 29th stuck processes - 19 of them of which 16 were from a base openjdktest run:

sxa@x220t:~$ gzip -cd jenkins.maristXJ.log.gz | grep Jun29 | cut -c-200 | grep openjdktest_s
jenkins  48465  0.0  0.2 2089944 23408 ?       SLl  Jun29   0:40 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins  48514  0.0  0.2 2089944 23292 ?       SLl  Jun29   0:41 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins  50493  0.0  0.2 2089944 22380 ?       SLl  Jun29   0:41 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins  51539  0.0  0.2 2089944 23072 ?       SLl  Jun29   0:42 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins  52199  0.0  0.2 2089944 23136 ?       SLl  Jun29   0:40 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins  53244  0.0  0.2 2089944 23664 ?       SLl  Jun29   0:42 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins  54032  0.0  0.2 2089944 23556 ?       SLl  Jun29   0:41 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins  55486  0.0  0.1 2090200 14856 ?       SLl  Jun29   0:41 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins  56161  0.0  0.2 2089944 22812 ?       SLl  Jun29   0:42 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins  57244  0.0  0.2 2089944 22720 ?       SLl  Jun29   0:37 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins  57923  0.0  0.2 2089944 22852 ?       SLl  Jun29   0:41 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins  59612  0.0  0.1 2089944 14888 ?       SLl  Jun29   0:38 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins  61204  0.0  0.1 2089944 14600 ?       SLl  Jun29   0:40 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins  62194  0.0  0.1 2089944 14836 ?       SLl  Jun29   0:39 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins  63804  0.0  0.1 2090200 14640 ?       SLl  Jun29   0:41 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins
jenkins  64710  0.0  0.1 2089944 14556 ?       SLl  Jun29   0:42 /home/jenkins/workspace/openjdk8_j9_openjdktest_s390x_linux/openjdkbinary/j2sdk-image/jre/bin/java -Djava.security.policy=/home/jenkins

(For the record, these process listings are also created regularly and are visible at https://ci.adoptopenjdk.net/job/SXA-processCheck/label=test-marist-ubuntu1604-s390x-2-XJ/)

I have cleared out the processes (close to 100 of them), re-enabled the executor and https://ci.adoptopenjdk.net/job/Test_openjdk13_j9_sanity.system_s390x_linux/7/ is the first job to get scheduled on it

FYI @smlambert

Jun29 was just me attempting to show a sample snapshot from a random day :-)
Thanks for those two links - I figured you might have some other issues on this somewhere so great to have them all linked now. Not all of the hung processes were from Hotspot runs but they could have been the trigger for others failing.

@sxa555 For JDK tests yes, almost failing tests (around 110) are rmi, nio, net group. The error message is either ' timeout ' or 'Cannot assign requested address' (which is assign a network address). https://ci.adoptopenjdk.net/view/Test_openjdk/job/Test_openjdk8_hs_sanity.openjdk_s390x_linux/17/#showFailuresLink

SInce the original Marist machines have now been decomissioned, both -XJ machines that this issue refers to are no longer in the test machine set, therefore closing

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Mesbah-Alam picture Mesbah-Alam  路  4Comments

smlambert picture smlambert  路  6Comments

M-Davies picture M-Davies  路  4Comments

sxa picture sxa  路  3Comments

sxa picture sxa  路  4Comments