Openshift-ansible: openshift 3.11 install failure due to failed docker image check

Created on 21 Nov 2018 · 21Comments · Source: openshift/openshift-ansible

Openshift 3.11 install fails and complains that required docker images are missing

Hosts: 13.233.125.141
Play: OpenShift Health Checks
Task: Run health checks (install) - EL
Message: One or more checks failed
Details: check "docker_image_availability":
One or more required container images are not available:
registry.redhat.io/openshift3/ose-deployer:v3.11.43
Checked with: skopeo inspect [--tls-verify=false] [--creds=:] docker:///

got rid of this by disabling docker_image_availability checks in inventory YAML file.

Actual issues appears to be too short timeout hard coded in skopeo check command (10 secs). See line below:

https://github.com/openshift/openshift-ansible/blob/dc4bf75fce90f79ed93016ce5806c06e7fa29222/roles/openshift_health_checker/openshift_checks/docker_image_availability.py#L23

This hard coded value should be removed/parameterized.

FYI - latency on the skopeo command is higher from where i tested this install (Bangalore). I am sure there are other locations where this will fail due to short timeout

help wanted lifecyclrotten

Source

rsriniva

👍6

Most helpful comment

@liuyatao add it to your Ansible inventory. It should go under the [OSEv3:vars] section. Here is some example code:

[OSEv3:vars]
timeout=60
ansible_user=root
ansible_become=yes

openshift_deployment_type=openshift-enterprise
openshift_disable_check="docker_image_availability"

paddy667 on 9 Jan 2019

👍3

All 21 comments

I have same issue but in my case is related with credentials.
When i try on machine command that ansible plugin produce (with creds) i got this error:
FATA[0001] unable to retrieve auth token: invalid username/password
When i remove creds everything is working correctly, as docker have configured this credentials yet. To be more precious I put same credentials as in configuration.

kivio on 13 Dec 2018

Just ran into this issue today too. So +1 for the timeout of the skopeo command to be parameterized.

Workaround by setting openshift_disable_check="docker_image_availability"

paddy667 on 19 Dec 2018

👍1

Running into the same timeout issue here in Beijing, China, so +1 for the timeout of the skopeo command to be parameterized. Thanks.

ligc on 21 Dec 2018

I can confirm this from Germany, +1 for that.

balpert89 on 7 Jan 2019

I also meet this problem,anyone know to handle this problem?

liuyatao on 9 Jan 2019

@paddy667 Which file to setting openshift_disable_check="docker_image_availability" ?

liuyatao on 9 Jan 2019

@liuyatao add it to your Ansible inventory. It should go under the [OSEv3:vars] section. Here is some example code:

[OSEv3:vars]
timeout=60
ansible_user=root
ansible_become=yes

openshift_deployment_type=openshift-enterprise
openshift_disable_check="docker_image_availability"

paddy667 on 9 Jan 2019

👍3

Installing new OpenShift cluster will check for docker images with commands like skopeo inspect --tls-verify=true docker://docker.io/openshift/origin-haproxy-router:v3.11. However, this can take more then 10 seconds to complete, even on a good internet connection.

$ time skopeo inspect --tls-verify=true  docker://docker.io/openshift/origin-haproxy-router:v3.11
{
    "Name": "docker.io/openshift/origin-haproxy-router",
    "Digest": "sha256:3415fcc585945cf0eee230a0031c154edc4f6b83bca1f31f85d69a9982f159b3",
    ...
}

real    0m20.555s
user    0m0.078s
sys     0m0.038s

Full output of failed check: https://gist.github.com/jozefizso/cb053e880dfa7abc6a2f1c5831122195

Only way to prevent this is to skip the docker_image_availability check.

jozefizso on 26 Jan 2019

Same issue here when deploying 3.11.69 cluster. Having set "timeout 70" in inventory fie and in ansible.cfg but it is not being picked up. This timeout should be parametrized just like any other commands.

My logs with debug output are reading " ...timeout 10 skopeo inspect --tls-verify=true ..."

bortek on 18 Feb 2019

As for this error from skopeo
FATA[0001] unable to retrieve auth token: invalid username/password

In my case it was caused by presence of token in /root/.docker/ directory . Just correct a token in it or remove the dir if you are not using authentication to registry.

bortek on 18 Feb 2019

+1 for the latency issue on skopeo. I am trying from India

rathinikunj on 15 Mar 2019

same here. running into the issue due to timeout problem. takes about 11+ seconds from an internet connection in canada. had to disable the check in inventory file.

Th3G4mbl3r on 19 Mar 2019

I've found that disabling the login test via oreg_test_login=False help alleviate the issue for me. But you should only configure this when you're sure that authentication works.

tongpu on 2 Apr 2019

Is there a way to configure the inspect timeout? I couldn't tell from @bortek's comments. It seems like no, since I can't find it in the docs.

clnperez on 1 Aug 2019

same issue here

ahosam on 16 Aug 2019

I've started hitting similar issue in my setup from today, till yesterday it was passing. Is there any change went in?

TASK [openshift_node : Create credentials for registry auth] *****
FAILED - RETRYING: Create credentials for registry auth (3 retries left).
FAILED - RETRYING: Create credentials for registry auth (3 retries left).
FAILED - RETRYING: Create credentials for registry auth (3 retries left).
FAILED - RETRYING: Create credentials for registry auth (2 retries left).
FAILED - RETRYING: Create credentials for registry auth (2 retries left).
FAILED - RETRYING: Create credentials for registry auth (2 retries left).
FAILED - RETRYING: Create credentials for registry auth (1 retries left).
FAILED - RETRYING: Create credentials for registry auth (1 retries left).
FAILED - RETRYING: Create credentials for registry auth (1 retries left).
fatal: [10.172.182.97]: FAILED! => {"attempts": 3, "changed": false, "msg": "time=\"2019-08-21T02:16:14-04:00\" level=fatal msg=\"unable to retrieve auth token: invalid username/password\" \n", "state": "unknown"}
fatal: [10.172.182.119]: FAILED! => {"attempts": 3, "changed": false, "msg": "time=\"2019-08-21T02:16:14-04:00\" level=fatal msg=\"unable to retrieve auth token: invalid username/password\" \n", "state": "unknown"}
fatal: [10.172.181.69]: FAILED! => {"attempts": 3, "changed": false, "msg": "time=\"2019-08-21T02:16:14-04:00\" level=fatal msg=\"unable to retrieve auth token: invalid username/password\" \n", "state": "unknown"}
to retry, use: --limit @/root/openshift-ansible/playbooks/deploy_cluster.retry

piyushkv1 on 21 Aug 2019

Same issue here!

howardluck34 on 23 Aug 2019

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot on 2 Jun 2020

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot on 2 Jul 2020

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-bot on 1 Aug 2020

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.