Some jdk_jdi tests are failing due to a machine config issue, and therefore excluded.
Grabbing comment from:
https://github.com/AdoptOpenJDK/openjdk-tests/issues/132
It's a machine configuration issue.
Assign a host name to the machine by following below steps:
Choose Apple menu -> System Preferences, then click Sharing).
Click Edit, then enter a local hostname.
Add this machine name entry in /etc/hosts file with machine ip address.
Eg: 127.0.0.1 mymachine
Reboot the machine to reflect the changes.
FYI @gdams
I don't think I can make this change, as I won't have privileges to reboot the machine.
This test example provided speaks about MAC OS. Is this issue limited to MAC OS? or does simular tests that run on other linux OS' also rely on the /etc/hosts file to retrieve the hostname?
If we have testing the require the system's hostname to be listed in the /etc/hosts file (as opposed to calling the 'hostname' command) then we will also need to ensure this change is made to all active systems and that the needed changes are applied to the playbooks.
It is not limited to MAC OS, in all OS it rely on configuration in the /etc/hosts file. Right now our Linux test machines are configured properly, so this issue (AdoptOpenJDK/openjdk-tests#132) is applicable/reproducible only in MAC OS.
FYI I've had comparable issues with the /etc/hosts for the JCK as well. I haven't yet adjusted those playbooks to do the right thing (Some Ubuntus are comint to us out of the box with the system's hostname against 127.0.1.1 instead of 127.0.0.1 which causes problems)
perhaps we should remove /etc/hosts from each machine and then use ansible to template it so that they are the same across all of our providers ?
Maybe, although it could be different depending on whether IPv6 and the like have been configured so that might not be ideal. And there's probably some clever reason why 127.0.1.1 was in there as well as 127.0.0.1
We definitely need some sort of consistent strategy going forward. The test machines that were causing https://github.com/AdoptOpenJDK/openjdk-systemtest/issues/66 didn't have any entries for the system's hostname in /etc/hosts.
Most Ubuntu's seem to set the systems hostname against the loopback IP 127.0.0.1 but would it make more sense to have an entry with the real IP in there against the hostname? Maybe ...
Either way we could do with some consistency. Would anyone object going forward to having a strategy of making sure the hostname on the machine is a bit more consistent with what's in jenkins? For that mauve test failure the machine calls itself test-ubuntu-16-04-1 but in jenkins it is test-osuosl-ppc64le-ubuntu-16.04-2 - it's going to make debugging a lot easier if we have them consistent when going through log files.
@smlambert With the way this is going perhaps we should change the title of this issue to "Sanitize /etc/hosts and hostnames on all our cloud machines?" although I appreciate that you possibly need a tactical short-term fix until we've thrashed it out.
Maybe something like this? thought?
- name: Update /etc/hosts file - IP FQDN hostname
lineinfile:
dest: /etc/hosts
regexp: "^(.*){{ ansible_hostname }}(.*)$"
line: "{{ ansible_default_ipv4.address }} {{ ansible_fqdn }} {{ ansible_hostname }}"
state: present
tags: hosts_file
- name: Update /etc/hosts file - 127.0.0.1
lineinfile:
dest: /etc/hosts
regexp: "^(.*)127.0.0.1(.*)$"
line: "127.0.0.1 localhost"
state: present
tags: hosts_file
I've updated the /etc/hosts file on both of our macs.
@smlambert please test and let us know if this fixes https://github.com/AdoptOpenJDK/openjdk-tests/issues/132
@bblondin Wouldn't that wipe an entry such as the following (which I think we get by default on some installs):
127.0.0.1 localhost myhostname
and not replace it with the second section because the 127.0.0.1 entry would be removed entirely in the first section? This sort of thing has made me paranoid about doing it automatically - but I do think it's worth thrashing out ;-)
(Edit: Assuming state:present will add it if the regexp doesn't match then it's probably all good!)
We will hit an issue where we have disconnects between FQDN and the hostname on the machine - we've had to replace . characters in the FQDN with other characters on the machines used for the JCK for example (Can't find the relevant issue just now - will amend sometime later)
@sxa555 yes it would 'replace' those entires.
but I don't think 127.0.0.1 localhost *myhostname* is a default
I think its simply 127.0.0.1 localhost or 127.0.0.1 localhost localhost
Yes: state:present will add it if the regexp doesn't match
I'd like to know more about this replacing of the peroid in the FQDN
In some case (virtual machines) there may not be a FQDN however ansible {{ ansible_fqdn }} will return just the hostname in those cases give us MyIPAddress MyHostname MyHostname
Regarding the . issue - we have some tests that cannot run properly if the machine's hostname has those characters - for those machines we've replaced the . with a - on the machine, but the names as stored in jenkins etc. are left as-is with the .. I think as well as having a discussion on the jenkins tags for machines https://github.com/AdoptOpenJDK/openjdk-infrastructure/issues/93 we should consider standardising and documenting what the hostnames should be too going forward.
Ref the default entries, here's one of the more unusual examples from build-scaleway-x64-ubuntu-16-04-2:
[sxa@sxa ~]$ ssh [email protected] cat /etc/hosts
127.0.1.1 build-scaleway-x64-ubuntu-16-04-2 build-scaleway-x64-ubuntu-16-04-2
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
Our joyent ubuntu machine build-joyent-x64-ubuntu-16.04-2 has this (the "random" hex string doesn't match hostname FWIW
[sxa@sxa ~]$ ssh [email protected] cat /etc/hosts
127.0.0.1 localhost 378108a1-c01c-e82d-9b57-d80d22317d7e
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
I think your proposed rules would santize them all quite well though if that's the way we want to go (I'm always a touch nervious about having non-default configs for OSs in case we mask errors a customer may see on their systems, but from our perspective it would likely make things work more consistently
I think the following would be the best of both worlds:
Added backup: yes, this will backup file with a timestamp. This way if there is an issue the administrator can easily recover the original file.
- name: Update /etc/hosts file - IP FQDN hostname
lineinfile:
dest: /etc/hosts
regexp: "^(.*){{ ansible_hostname }}(.*)$"
line: "{{ ansible_default_ipv4.address }} {{ ansible_fqdn }} {{ ansible_hostname }}"
state: present
backup: yes
tags: hosts_file
- name: Update /etc/hosts file - 127.0.0.1
lineinfile:
dest: /etc/hosts
regexp: "^(.*)127.0.0.1(.*)$"
line: "127.0.0.1 localhost"
state: present
backup: yes
tags: hosts_file
I'd be tempted to add localhost.localdomain to the 127.0.0.1 since that seems quite common too
Pull request #136
(includes localhost.localdomain)
I reran the tests referenced in openjdk-tests issue 132 just now (on test-macincloud-macos1010-1), but they still fail:
ERROR: transport error 202: gethostbyname: unknown host
ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [debugInit.c:750]
You can find the entire set of test results here: https://ci.adoptopenjdk.net/view/work%20in%20progress/job/test_personal/129/testReport/
Looking at the /etc/hosts file on test-macincloud-macos1010-1, it does not appear to have changed from the before the issue was reported.
@bblondin - I believe you updated then 2 build machines, the 2 test macs (test-macincloud-macos1010-1 and test-macincloud-macos1010-2) do not appear to be updated.
@smlambert I updated the wrong macs... (build-macstadium-macos1010-1 and 2)
Updated test-macincloud-macos1010-1 and test-macincloud-macos1010-2
sh-3.2# ping test-macincloud-macos1010-2
ping: cannot resolve test-macincloud-macos1010-2: Unknown host
sh-3.2# vi /etc/hosts
sh-3.2# ping test-macincloud-macos1010-2
PING dxu773 (74.80.250.173): 56 data bytes
64 bytes from 74.80.250.173: icmp_seq=0 ttl=64 time=0.046 ms
64 bytes from 74.80.250.173: icmp_seq=1 ttl=64 time=0.061 ms
64 bytes from 74.80.250.173: icmp_seq=2 ttl=64 time=0.060 ms
@smlambert Have you had a chance to rerun the test?
Yes, and now 89/90 tests that used to fail are passing, thanks.
Apologies, I closed this issue because the test problem was addressed, but remember that this issue was broadened to address all machines so will reopen.
Pull request #136 addresses this