Part of the jdk8u242-b08_openj9-0.18.0 triage
Platform: xlinux
Machine: test-packet-ubuntu1604-x64-2
Tests:
java/net/Inet6Address/B6206527.java
trying LL addr: /fe80:0:0:0:a863:4eff:fe29:3b2e%veth3d7f09a
trying LL addr: /fe80:0:0:0:a863:4eff:fe29:3b2e
java.net.BindException: Cannot assign requested address (Bind failed)
at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
at java.net.ServerSocket.bind(ServerSocket.java:390)
at java.net.ServerSocket.bind(ServerSocket.java:344)
at B6206527.main(B6206527.java:53)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
at java.lang.Thread.run(Thread.java:821)
JavaTest Message: Test threw exception: java.net.BindException
JavaTest Message: shutting down test
java/net/ipv6tests/B6521014.java
java.lang.RuntimeException: Test failed: cannot create socket.
at B6521014.main(B6521014.java:123)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
at java.lang.Thread.run(Thread.java:821)
Caused by: java.net.BindException: Cannot assign requested address (Bind failed)
at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
at java.net.Socket.bind(Socket.java:662)
at B6521014.test2(B6521014.java:103)
at B6521014.main(B6521014.java:121)
... 6 more
JavaTest Message: Test threw exception: java.lang.RuntimeException
JavaTest Message: shutting down test
Re-build grinders
Failing on test-packet-ubuntu1604-x64-2: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1810/
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1809/
Passing on other machines:
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1813/
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1812/
Passed at https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1828/console after I got rid of a running docker container that was presumably blocking it.
root@test-packet-ubuntu1604-x64-2:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
19bee1fa3099 b1226cfc7094 "/bin/bash /kafka-te…" 2 months ago Up 2 months cocky_lewin
root@test-packet-ubuntu1604-x64-2:~# docker rm 19bee1fa3099
Error response from daemon: You cannot remove a running container 19bee1fa30990f1e53f1df997c27e83185455be827fd534cfa226ea1648a00b9. Stop the container before attempting removal or force remove
root@test-packet-ubuntu1604-x64-2:~# docker stop 19bee1fa3099
19bee1fa3099
@sxa555 could you have a poke around on test-osuosl-ubuntu1804-ppc64le-2 and see if it's also got an old running docker process on it? Am seeing the same failures as above on this machine
Also test-packet-ubuntu1604-x64-3 and test-softlayer-ubuntu1604-x64-1 are now showing the same failures. Any idea why this is a recurring problem?
test-marist-ubuntu1604-s390x-2 as well now
Re-iterating the full list of machines that I believe still have this problem:
test-softlayer-ubuntu1604-x64-1
test-osuosl-ubuntu1804-ppc64le-2
test-packet-ubuntu1604-x64-3
test-packet-ubuntu1604-x64-1
test-scaleway-ubuntu1604-x64-1
test-marist-ubuntu1604-s390x-2
I've excluded these tests on openj9 for jdk8 and 11. Couldn't find any instances of failures on hotspot or jdk14
@smlambert Given that this seems a fairly wide variety of boxes, do you know if there's any extra config we could apply that would resolve these test issues? Have we seen this internally at IBM on any of your systems?
yes we have same issue internally. yes also companies running ipv6 on Azure Devops (where osx does not have ipv6) also have this issue.
related: https://github.com/AdoptOpenJDK/openjdk-tests/issues/1524
Now that https://github.com/AdoptOpenJDK/openjdk-infrastructure/pull/1298 is been merged it might be worth seeing if this solution can be used to resolve the problem described above.
I've had a quick look at test-packet-ubuntu1604-x64-1 as it happens to be a machine I have access to.
all #1298 does is enable ipv6. I did the following on a U16 Vagrant VM, as it appears to be the Ubuntu equivalent:
sysctl -w net.ipv6.conf.all.disable_ipv6=0
sysctl -w net.ipv6.conf.default.disable_ipv6=0
sysctl -w net.ipv6.conf.lo.disable_ipv6=0
It enabled IPv6 on the VM, however the test machine I was looking at already has it enabled:
root@test-packet-ubuntu1604-x64-1:~# sysctl -a | grep disable_ipv6
...
net.ipv6.conf.all.disable_ipv6 = 0
...
net.ipv6.conf.default.disable_ipv6 = 0
...
net.ipv6.conf.lo.disable_ipv6 = 0
@adam-thorpe can we run a Grinder to make sure the problem still affects the machine ?
Still seems to be having problems, different exception but same line.
java/net/Inet6Address/B6206527.java: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3082
10:48:14 STDOUT:
10:48:14 trying LL addr: /fe80:0:0:0:f0bf:e2ff:fe62:740%veth2be7f48
10:48:14 trying LL addr: /fe80:0:0:0:f0bf:e2ff:fe62:740
10:48:14 STDERR:
10:48:14 java.net.SocketException: No such device (Bind failed)
10:48:14 at java.net.PlainSocketImpl.socketBind(Native Method)
10:48:14 at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
10:48:14 at java.net.ServerSocket.bind(ServerSocket.java:390)
10:48:14 at java.net.ServerSocket.bind(ServerSocket.java:344)
10:48:14 at B6206527.main(B6206527.java:53)
10:48:14 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
10:48:14 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
10:48:14 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
10:48:14 at java.lang.reflect.Method.invoke(Method.java:498)
10:48:14 at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
10:48:14 at java.lang.Thread.run(Thread.java:823)
Alright - I've looked at the machine and the same docker container that @sxa found when he was fixing the first machine is there - It appears to be hanging whilst running kafka-test.sh - presumably that's what is taking up the socket that's causing other tests to fail. It may not be relevant, but the version of Kafka is 2.12-2.5.0-SNAPSHOT, on the Docker container and the process that's being ran on the machine itself is
jenkins 20857 0.0 0.0 452996 5916 ? Sl 2019 17:20 docker run --rm adoptopenjdk-kafka-test:latest
According to docker ps -a , it had been running for 6 months(!).
Removing the container fixed the issue again:
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3084/console
I'll go through the list of machines that are affected and clear them all off. If it recurs on a machine that already been cleared up (as I noticed that test-packet-ubuntu1604-x64-2 is still succeeding, so it hasn't recurred), we could look into whats the cause of it.
Cleanup list:
FYI:test-osuosl-ubuntu1804-ppc64le-2 didn't have that container on it- however running the Grinder job failed with:
12:30:41 unzip file: OpenJDK8U-jdk_x64_linux_openj9_2020-05-27-09-44.tar.gz ...
12:30:42 Run /home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/bin/java -version
12:30:43 warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
12:30:43 /lib64/ld-linux-x86-64.so.2: No such file or directory
Not related to this issue I don't think- also may not be a coincidence that that is the only non-ubuntu1604 machine there.
Theres the same issue as above with the test-marist-ubuntu1604-s390x-2 machine too :
https://ci.adoptopenjdk.net/job/Grinder/3100/console
12:30:41 unzip file: OpenJDK8U-jdk_x64_linux_openj9_2020-05-27-09-44.tar.gz ...
12:30:42 Run /home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/bin/java -version
12:30:43 warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
12:30:43 /lib64/ld-linux-x86-64.so.2: No such file or directory
Why is it pulling an x64 JDK for a ppc64le test? Not too surprising the CPU doesn't support it ...
Ah that will be my ignorance of Grinder. Rerunning with correct variables:
https://ci.adoptopenjdk.net/job/Grinder/3101/
https://ci.adoptopenjdk.net/job/Grinder/3102/
https://ci.adoptopenjdk.net/job/Grinder/3128/console
Last machine has been fixed! Closing issue :-)