Openjdk-infrastructure: Ipv6 confingurations missing on test machines

Created on 21 Jan 2020  Â·  17Comments  Â·  Source: AdoptOpenJDK/openjdk-infrastructure

Part of the jdk8u242-b08_openj9-0.18.0 triage
Platform: xlinux
Machine: test-packet-ubuntu1604-x64-2

Tests:
java/net/Inet6Address/B6206527.java

trying LL addr: /fe80:0:0:0:a863:4eff:fe29:3b2e%veth3d7f09a
trying LL addr: /fe80:0:0:0:a863:4eff:fe29:3b2e

java.net.BindException: Cannot assign requested address (Bind failed)
    at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
    at java.net.ServerSocket.bind(ServerSocket.java:390)
    at java.net.ServerSocket.bind(ServerSocket.java:344)
    at B6206527.main(B6206527.java:53)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
    at java.lang.Thread.run(Thread.java:821)

JavaTest Message: Test threw exception: java.net.BindException
JavaTest Message: shutting down test

java/net/ipv6tests/B6521014.java

java.lang.RuntimeException: Test failed: cannot create socket.
    at B6521014.main(B6521014.java:123)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
    at java.lang.Thread.run(Thread.java:821)
Caused by: java.net.BindException: Cannot assign requested address (Bind failed)
    at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
    at java.net.Socket.bind(Socket.java:662)
    at B6521014.test2(B6521014.java:103)
    at B6521014.main(B6521014.java:121)
    ... 6 more

JavaTest Message: Test threw exception: java.lang.RuntimeException
JavaTest Message: shutting down test

Re-build grinders
Failing on test-packet-ubuntu1604-x64-2: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1810/
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1809/
Passing on other machines:
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1813/
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1812/

bug

All 17 comments

Passed at https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1828/console after I got rid of a running docker container that was presumably blocking it.

root@test-packet-ubuntu1604-x64-2:~# docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
19bee1fa3099        b1226cfc7094        "/bin/bash /kafka-te…"   2 months ago        Up 2 months                             cocky_lewin
root@test-packet-ubuntu1604-x64-2:~# docker rm 19bee1fa3099
Error response from daemon: You cannot remove a running container 19bee1fa30990f1e53f1df997c27e83185455be827fd534cfa226ea1648a00b9. Stop the container before attempting removal or force remove
root@test-packet-ubuntu1604-x64-2:~# docker stop 19bee1fa3099
19bee1fa3099

@sxa555 could you have a poke around on test-osuosl-ubuntu1804-ppc64le-2 and see if it's also got an old running docker process on it? Am seeing the same failures as above on this machine

Also test-packet-ubuntu1604-x64-3 and test-softlayer-ubuntu1604-x64-1 are now showing the same failures. Any idea why this is a recurring problem?

test-marist-ubuntu1604-s390x-2 as well now

Re-iterating the full list of machines that I believe still have this problem:

test-softlayer-ubuntu1604-x64-1
test-osuosl-ubuntu1804-ppc64le-2
test-packet-ubuntu1604-x64-3
test-packet-ubuntu1604-x64-1
test-scaleway-ubuntu1604-x64-1
test-marist-ubuntu1604-s390x-2

I've excluded these tests on openj9 for jdk8 and 11. Couldn't find any instances of failures on hotspot or jdk14

@smlambert Given that this seems a fairly wide variety of boxes, do you know if there's any extra config we could apply that would resolve these test issues? Have we seen this internally at IBM on any of your systems?

yes we have same issue internally. yes also companies running ipv6 on Azure Devops (where osx does not have ipv6) also have this issue.

related: https://github.com/AdoptOpenJDK/openjdk-tests/issues/1524

Now that https://github.com/AdoptOpenJDK/openjdk-infrastructure/pull/1298 is been merged it might be worth seeing if this solution can be used to resolve the problem described above.

I've had a quick look at test-packet-ubuntu1604-x64-1 as it happens to be a machine I have access to.
all #1298 does is enable ipv6. I did the following on a U16 Vagrant VM, as it appears to be the Ubuntu equivalent:

sysctl -w net.ipv6.conf.all.disable_ipv6=0
sysctl -w net.ipv6.conf.default.disable_ipv6=0
sysctl -w net.ipv6.conf.lo.disable_ipv6=0

It enabled IPv6 on the VM, however the test machine I was looking at already has it enabled:

root@test-packet-ubuntu1604-x64-1:~# sysctl -a | grep disable_ipv6
...
net.ipv6.conf.all.disable_ipv6 = 0
...
net.ipv6.conf.default.disable_ipv6 = 0
...
net.ipv6.conf.lo.disable_ipv6 = 0

@adam-thorpe can we run a Grinder to make sure the problem still affects the machine ?

Still seems to be having problems, different exception but same line.

java/net/Inet6Address/B6206527.java: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3082

10:48:14  STDOUT:
10:48:14  trying LL addr: /fe80:0:0:0:f0bf:e2ff:fe62:740%veth2be7f48
10:48:14  trying LL addr: /fe80:0:0:0:f0bf:e2ff:fe62:740
10:48:14  STDERR:
10:48:14  java.net.SocketException: No such device (Bind failed)
10:48:14    at java.net.PlainSocketImpl.socketBind(Native Method)
10:48:14    at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
10:48:14    at java.net.ServerSocket.bind(ServerSocket.java:390)
10:48:14    at java.net.ServerSocket.bind(ServerSocket.java:344)
10:48:14    at B6206527.main(B6206527.java:53)
10:48:14    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
10:48:14    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
10:48:14    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
10:48:14    at java.lang.reflect.Method.invoke(Method.java:498)
10:48:14    at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
10:48:14    at java.lang.Thread.run(Thread.java:823)

Alright - I've looked at the machine and the same docker container that @sxa found when he was fixing the first machine is there - It appears to be hanging whilst running kafka-test.sh - presumably that's what is taking up the socket that's causing other tests to fail. It may not be relevant, but the version of Kafka is 2.12-2.5.0-SNAPSHOT, on the Docker container and the process that's being ran on the machine itself is

jenkins  20857  0.0  0.0 452996  5916 ?        Sl    2019  17:20 docker run --rm adoptopenjdk-kafka-test:latest

According to docker ps -a , it had been running for 6 months(!).
Removing the container fixed the issue again:
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3084/console

I'll go through the list of machines that are affected and clear them all off. If it recurs on a machine that already been cleared up (as I noticed that test-packet-ubuntu1604-x64-2 is still succeeding, so it hasn't recurred), we could look into whats the cause of it.

Cleanup list:

  • [x] test-softlayer-ubuntu1604-x64-1
  • [x] test-osuosl-ubuntu1804-ppc64le-2
  • [x] test-packet-ubuntu1604-x64-3
  • [x] test-packet-ubuntu1604-x64-1
  • [x] test-scaleway-ubuntu1604-x64-1
  • [x] test-marist-ubuntu1604-s390x-2

FYI:test-osuosl-ubuntu1804-ppc64le-2 didn't have that container on it- however running the Grinder job failed with:

12:30:41  unzip file: OpenJDK8U-jdk_x64_linux_openj9_2020-05-27-09-44.tar.gz ...
12:30:42  Run /home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/bin/java -version
12:30:43  warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
12:30:43  /lib64/ld-linux-x86-64.so.2: No such file or directory

Not related to this issue I don't think- also may not be a coincidence that that is the only non-ubuntu1604 machine there.

Theres the same issue as above with the test-marist-ubuntu1604-s390x-2 machine too :
https://ci.adoptopenjdk.net/job/Grinder/3100/console

12:30:41  unzip file: OpenJDK8U-jdk_x64_linux_openj9_2020-05-27-09-44.tar.gz ...
12:30:42  Run /home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/bin/java -version
12:30:43  warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
12:30:43  /lib64/ld-linux-x86-64.so.2: No such file or directory

Why is it pulling an x64 JDK for a ppc64le test? Not too surprising the CPU doesn't support it ...

Ah that will be my ignorance of Grinder. Rerunning with correct variables:
https://ci.adoptopenjdk.net/job/Grinder/3101/
https://ci.adoptopenjdk.net/job/Grinder/3102/

https://ci.adoptopenjdk.net/job/Grinder/3128/console
Last machine has been fixed! Closing issue :-)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sxa picture sxa  Â·  3Comments

adamfarley picture adamfarley  Â·  3Comments

smlambert picture smlambert  Â·  4Comments

judovana picture judovana  Â·  5Comments

M-Davies picture M-Davies  Â·  4Comments