Openshift-ansible: Upgrade from OpenShift Origin v1.4.1 to v1.5.0 Fails

Created on 28 Apr 2017 · 6Comments · Source: openshift/openshift-ansible

Description

I did a clean install of v1.4.1 in our lab and then tried to perform an upgrade by changing the version in /etc/ansible/hosts file and running ~/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_5/upgrade.yml from the release-1.5 branch.

Version

# ansible --version
ansible 2.2.2.0
  config file = /root/openshift-ansible/ansible.cfg
  configured module search path = Default w/o overrides
# git describe
openshift-ansible-3.5.63-1

Steps To Reproduce (Attempt 1 of 2)

Perform a clean installation of v1.4.1 using the ansible scripts
Change openshift_release=v1.5.0 in /etc/ansible/hosts file
Execute ansible-playbook ~/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_5/upgrade.yml

Expected Results

No errors and a successful upgrade

Observed Results

The following error occurs.

TASK [fail] ********************************************************************
fatal: [openshift-master-1.redacted.com]: FAILED! => {
    "changed": false,
    "failed": true
}

MSG:

openshift_release is 1.5.0 which is not a valid release for a 1.5 upgrade

This error seems to be because there is a check in the scripts for the version number and this version is not recognized. I tried a different version, described next.

Steps To Reproduce (Attempt 2 of 2)

Perform a clean installation of v1.4.1 using the ansible scripts
Change openshift_release=v1.5 in /etc/ansible/hosts file
Execute ansible-playbook ~/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_5/upgrade.yml

Expected Results

No errors and a successful upgrade

Observed Results

The following error occurs.

TASK [openshift_version : Set precise containerized version to configure if openshift_release specified] ***
fatal: [openshift-master-1.redacted.com]: FAILED! => {
    "changed": true,
    "cmd": [
        "docker",
        "run",
        "--rm",
        "openshift/origin:v1.5",
        "version"
    ],
    "delta": "0:00:00.542070",
    "end": "2017-04-25 17:16:17.337285",
    "failed": true,
    "rc": 125,
    "start": "2017-04-25 17:16:16.795215",
    "warnings": []
}

STDERR:

Unable to find image 'openshift/origin:v1.5' locally
Trying to pull repository docker.io/openshift/origin ...
/usr/bin/docker-current: manifest unknown: manifest unknown.
See '/usr/bin/docker-current run --help'.

This error seems to be because there is no release of origin tagged with v1.5. There is a release tagged with v1.5.0, but the scripts don't try to pull that one.

Additional Information

# cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core)

Contents of /etc/ansible/hosts file:

[OSEv3:children]
masters
nodes
etcd
lb

[OSEv3:vars]
ansible_ssh_user=root
deployment_type=origin
openshift_release=v1.5.0
openshift_master_identity_providers=[{'name': 'redacted_ldap', 'mappingMethod': 'add', 'challenge': 'true', 'login': 'true', 'kind': 'LDAPPasswordIdentityProvider', 'attributes': {'id': ['dn'], 'email': ['mail'], 'name': ['cn'], 'preferredUsername': ['sAMAccountName']}, 'bindDN': 'CN=LDAP OpenShift,OU=Service_Accounts,DC=redacted,DC=com', 'bindPassword': 'redacted', 'ca': 'redacted-ca.crt', 'insecure': 'false', 'url': 'ldaps://ldap.redacted.com:636/DC=redacted,DC=com?sAMAccountName?sub?(&(objectclass=user)(memberOf:1.2.840.113556.1.4.1941:=CN=dlg_openshift,OU=openshift,OU=Resources,DC=redacted,DC=com))'}]
openshift_master_ldap_ca_file=/root/redacted-ca.crt
openshift_master_cluster_method=native
openshift_master_cluster_hostname=openshift.redacted.com
openshift_master_cluster_public_hostname=openshift.redacted.com
openshift_master_default_subdomain=os.redacted.com
openshift_hosted_router_certificate={"certfile": "/root/router.crt", "keyfile": "/root/router.key", "cafile": "/root/redacted-ca.crt"}
openshift_master_named_certificates=[{"certfile": "/root/openshift.redacted.com.crt", "keyfile": "/root/openshift.redacted.com.key", "cafile": "/root/redacted-ca.crt"}]
openshift_master_api_port=443
openshift_master_console_port=443

[masters]
openshift-master-[1:3].redacted.com

[etcd]
openshift-master-[1:3].redacted.com

[lb]
openshift.redacted.com

[nodes]
openshift-master-[1:3].redacted.com
openshift-node-[1:2].redacted.com openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
openshift-node-[3:4].redacted.com openshift_node_labels="{'region': 'primary', 'zone': 'east'}"

lifecyclrotten

Source

ceagan

👍1

Most helpful comment

After digging into the scripts, it looks like a possible workaround to this is to specify the openshift_version variable explicitly in addition to the openshift_release. The scripts appear to try and detect the explicit version, but fail because there is no v1.5 Docker image tag.

I was able to perform an upgrade in my test environment using the following settings. Note that the use of the v prefix on openshift_release is important and the lack of the v prefix for openshift_version is also important.

openshift_release=v1.5
openshift_version=1.5.0

In case this wasn't clear in the report, my test setup uses CentOS Atomic for all of the masters and nodes, but uses CentOS 7 for the ansible installer and the load balancer.

ceagan on 28 Apr 2017

👍4

All 6 comments

openshift_release=v1.5
openshift_version=1.5.0

In case this wasn't clear in the report, my test setup uses CentOS Atomic for all of the masters and nodes, but uses CentOS 7 for the ansible installer and the load balancer.

ceagan on 28 Apr 2017

👍4

same thing happened to me on a fresh install into redhat-atomic hosts. Thanks for leaving this here @ceagan. Doesn't look like the maintainers have triaged this yet.

natebc on 18 Jul 2017

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot on 17 May 2020

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot on 16 Jun 2020

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-bot on 16 Jul 2020

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.