Openjdk-infrastructure: Ansible request for AWX to deploy to Linux

Created on 2 Mar 2021  路  22Comments  路  Source: AdoptOpenJDK/openjdk-infrastructure

Ref: #695 #1909

Details:
Go through and rerun the playbooks on all Linux Machines, using AWX. I'll do them in batches to make sure it's obvious what caused an issue, if any occur.

ansible

All 22 comments

Running on build*rhel* : https://awx.adoptopenjdk.net/#/jobs/playbook/648
EDIT: No Issues, moving on

Funny thing I noticed about AWX. In the centos run - it appears to only be running a single centos machine from each provider. As in, build-osusol-centos74-ppc64le-2 is in the run, but not build-osusol-centos74-ppc64le-1 - despite build-osusol-centos74-ppc64le-1 being in the inventory. Very odd - I'll note down the machines that aren't run in subsequent runs too.

After build*centos* run:

build-packet-centos74-ppc64le-2 succeeded

on Apt Upgrade task: fatal: [build-digitalocean-centos69-x64-2]: FAILED! => {"changed": false, "msg": "Error: Cannot find a valid baseurl for repo: base\n", "rc": 1, "results": []}
on Enable EPEL Release task: fatal: [build-osuosl-centos74-ppc64le-2]: FAILED! => {"changed": false, "module_stderr": "Shared connection to 140.211.168.117 closed.\r\n", "module_stdout": "error: rpmdb: BDB0113 Thread/process 7996/70367273601024 failed: BDB1507 Thread died in Berkeley DB library\r\nerror: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery

I've seen the EPEL release task failure in https://github.com/AdoptOpenJDK/openjdk-infrastructure/issues/1868
and the Base URL issue here: https://github.com/AdoptOpenJDK/openjdk-infrastructure/issues/1745

I'll get to fixing those and rerun the playbook

EDIT: Fixes done (I also accidentally rebuilt the databases on build-osuosl-centos74-ppc64le-1 too :facepalm: ), new build*centos* run here: https://awx.adoptopenjdk.net/#/jobs/playbook/662

build*centos* succeeded.
Onto build*ubuntu* : https://awx.adoptopenjdk.net/#/jobs/playbook/664

Of the build*ubuntu*:
build-scaleway-ubuntu1604-x64-1, build-scaleway-ubuntu1604-armv7-2 and build-packet-ubuntu1804-armv8-1 were all UNREACHABLE with something to the effect of:

UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Warning: Permanently added '<IP>' (ECDSA) to the list of known hosts.\r\nno such identity: /var/lib/awx/.ssh/id_rsa: No such file or directory\r\nroot@<IP>: Permission denied (publickey).", "unreachable": true}

Looks like there's something wrong with AWX's Private key? Ping @sxa

build-scaleway-ubuntu1604-armv7-1 FAILED on the Swap_File: Create swap file - via DD task, with:

dd: failed to open '//swapfile': Text file busy"

build-alibaba-ubuntu1804-armv8-1 succeeded

No hosts were skipped like they were in build*centos*.

build-alibaba-ubuntu1804-armv8-1 succeeded

That's reassuring since I ran the playbook on it yesterday successfully :-)

I'm a little surprised we haven't seen that issue with the swapfile previously. I suspect we need to adjust the conditions under which that is executed ...

That should be an easy enough fix - just check if /swapfile exists.

For the ssh key problem, I had a quick re-run of the failed boxes, and it's recurred - I've ssh'd to the build-scaleway-ubuntu1604-armv7-2 and added the AWX ssh key into the authorized_keys file and it works. The AWX secret file says that the AWX authorized_key is added to a machine via bastillion - so I presume those 3 machines aren't in bastillion ?

While I wait for the PR to be merged, I've started test*rhel* : https://awx.adoptopenjdk.net/#/jobs/playbook/670?job_search=page_size:20;order_by:-finished;not__launch_type:sync

EDIT:
test-ibmcloud-rhel6-x64-1 : FAILED : No package matching locales
test-aws-rhel76-armv8-1 : FAILED : 404 Error from link in the _Install missing Rhel7 aarch64 deps from Centos Mirror_ task
test-aws-rhel8-x64-1: FAILED: Can't get certain deps for nagios-plugins-all
test-ibmcloud-rhel7-x64-1 : SUCCESS

EDIT2: These should have all been addressed in : #1999

Given the above PR, I'm going to just add any small fixes to that, and carry on with the search.
First test*centos* run: https://awx.adoptopenjdk.net/#/jobs/playbook/672?job_search=page_size:20;order_by:-finished;not__launch_type:sync
Looks like the missing machines issue is back; test-osuosl-centos74-ppc64le-4 isn't running, despite being in the inv (FYI @sxa )

test*centos* was successful! Running test*ubuntu* : https://awx.adoptopenjdk.net/#/jobs/playbook/674?job_search=page_size:20;order_by:-finished;not__launch_type:sync

Looks like all 19 Ubuntu Hosts are running :)

EDIT: All of them passed! (slightly shocked).
I think that's it for all the linux so I'll look to get those PRs in, and once they've been merged, I can rerun all the hosts that failed, or were skipped

Okay, the PRs that were regarding the issues I found, have been merged. SO we have the following machines to do:
Hosts that AWX never ran on:

  • [ ] test-osuosl-centos74-ppc64le-4 (Ignored)
  • [ ] build-osusol-centos74-ppc64le-1 (This doesn't even appear in the inventory..)

Hosts that AWX failed on:

  • [x] test-ibmcloud-rhel6-x64-1
  • [x] test-aws-rhel76-armv8-1
  • [x] test-aws-rhel8-x64-1
  • [x] build-scaleway-ubuntu1604-armv7-1

Hosts that were 'unreachable' by AWX (I'm just going to ssh to it and put AWX's key into the authorized_keys file):

  • [x] build-scaleway-ubuntu1604-x64-1
  • [x] build-scaleway-ubuntu1604-armv7-2
  • [ ] build-packet-ubuntu1804-armv8-1 (I can't ssh to it)

first run, containing the 'Hosts that AWX never ran on', and 'Hosts that AWX failed on',:
test-aws-rhel8-x64-1 failed on task "Create Symlink to (Nagios) Plugins":

refusing to convert from directory to symlink for /usr/local/nagios/libexec",

test-ibmcloud-rhel6-x64-1 failed adoptopenjdk_install task with:

Failed to validate the SSL certificate for github-releases.githubusercontent.com:443. Make sure your managed systems have a valid CA certificate installed.

(Interesting - Python 2.7.18 should be on the machine and used for that task - I'll confirm)

test-osuosl-centos74-ppc64le-4 was unreachable ( maybe no AWX key on it )

Ignore the osuosl one for now.

Got it!

With the RHEL machine, only CentOS machines install Python 2.7.18 - I'll fix that up so it's RedHat too.

The machines that were initially unreachable by AWX: https://awx.adoptopenjdk.net/#/jobs/playbook/689?job_search=page_size:20;order_by:-finished;not__launch_type:sync

EDIT: Those two worked! woo

test-ibmcloud-rhel6-x64-1 rerun (now that #2005 has been merged): https://awx.adoptopenjdk.net/#/jobs/playbook/692?job_search=page_size:20;order_by:-finished;not__launch_type:sync

EDIT: Failed- I forgot to make RHEL6 use the alternative install (like CentOS6), in #2005

Rerunning: https://awx.adoptopenjdk.net/#/jobs/playbook/694?job_search=page_size:20;order_by:-finished;not__launch_type:sync

EDIT: It worked! :tada:

With #2006 merged, Rerunning test-aws-rhel8-x64-1 : https://awx.adoptopenjdk.net/#/jobs/playbook/700?job_search=page_size:20;order_by:-finished;not__launch_type:sync

EDIT: Passed!

So, All linux hosts have had the playbooks run on them successfully, except for 3 of them:

test-osuosl-centos74-ppc64le-4
build-packet-ubuntu1804-armv8-1
  • build-packet-ubuntu1804-armv8-1 is one that doesn't need the full playbook to run on it at the moment, although I'll need to look into why the keys aren't being propogated to it
  • I'm fairly sure test-osuosl-centos74-ppc64le-4 doesn't exist and has been replaced with another system with a later Ubuntu level that isn't yet live so it can continue to be ignored until all the inventory updates are in place
  • build-osuosl-centos74-ppc64le-1 ... No idea why this isn't in the inventory but I've manually added it in AWX and the playbook has run ok on it.

Awesome! Thanks for the update. In that case, every other machine has been done, so, closing issue :+1:

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Willsparker picture Willsparker  路  9Comments

piyush286 picture piyush286  路  5Comments

Mesbah-Alam picture Mesbah-Alam  路  4Comments

LongyuZhang picture LongyuZhang  路  4Comments

M-Davies picture M-Davies  路  4Comments