Openshift-ansible: Unable to create Openshift 3.7 from byo/config.yml due to mixed versions of RPMs

Created on 19 Mar 2018  ·  38Comments  ·  Source: openshift/openshift-ansible

Description

As of today (19/03) I am unable to create a non containerized Openshift Origin using the byo/config.yml playbook.
The playbook is run against a clean CentOS 7.4 VM.

The very high level explanation I can come up with is that with the advent of the 3.7.1 RPMs in http://mirror.centos.org/centos/7/paas/x86_64/openshift-origin37/ (which seem to have landed today), the playbook now fails due to mixing different RPM versions

Version
ansible version: ansible 2.4.3.0
git describe: openshift-ansible-3.7.39-1
Steps To Reproduce
  1. ansible-playbook -i inventory openshift-ansible/playbooks/byo/config.yml -e openshift_node=masters
Expected Results

Up until yesterday (when only 3.7.0 RPMs were present in the CentOS repo), I was able to create the cluster without issues.

Observed Results
INSTALLER STATUS ***************************************************************************************************************************
Initialization             : Complete
Health Check               : Complete
etcd Install               : Complete
Master Install             : Complete
Master Additional Install  : Complete
Node Install               : In Progress
    This phase can be restarted by running: playbooks/byo/openshift-node/config.yml



Failure summary:


  1. Hosts:    192.168.99.50
     Play:     Configure nodes
     Task:     Install sdn-ovs package
     Message:  Error: Package: origin-sdn-ovs-3.7.0-1.0.7ed6862.x86_64 (centos-openshift-origin37)
                          Requires: origin-node = 3.7.0-1.0.7ed6862
                          Installed: origin-node-3.7.1-1.el7.git.0.0a2d6a1.x86_64 (@centos-openshift-origin37)
                              origin-node = 3.7.1-1.el7.git.0.0a2d6a1
                          Available: origin-node-3.7.0-1.0.7ed6862.x86_64 (centos-openshift-origin37)
                              origin-node = 3.7.0-1.0.7ed6862

Additional Information
  • Operating System:, CentOS Linux release 7.4.1708 (Core)
  • Inventory file:
[OSEv3:children]
masters
nodes
etcd

[OSEv3:vars]
ansible_user=root

public_ip_address = 192.168.99.50
host_key_checking = False

containerized = false

openshift_release=v3.7
openshift_pkg_version=-3.7.0

openshift_deployment_type=origin

openshift_hostname=192.168.99.50
openshift_master_cluster_public_hostname=192.168.99.50
openshift_master_default_subdomain=192.168.99.50.nip.io
openshift_master_unsupported_embedded_etcd=true

openshift_disable_check = docker_storage,memory_availability,disk_availability,docker_image_availability,package_version

openshift_enable_service_catalog=false

ansible_python_interpreter=/usr/bin/python

[masters]
192.168.99.50 openshift_public_hostname=192.168.99.50 openshift_ip=192.168.99.50

[etcd]
192.168.99.50 openshift_ip=192.168.99.50

[nodes]
192.168.99.50 openshift_node_labels="{'region':'infra','zone':'default'}" openshift_public_hostname=192.168.99.50 openshift_schedulable=true openshift_ip=192.168.99.50

EXTRA INFORMATION GOES HERE

It should be noted that I also tried configuring the playbook to use 3.7.1 RPMs by setting:

openshift_release=v3.7.1
openshift_pkg_version=-3.7.1
openshift_image_tag=v3.7.1

Unfortunately in that case I had a different problem that occurred even earlier in the installation process.
The specific error was:

TASK [openshift_master_facts : Set Default scheduler predicates and priorities] ************************************************************
fatal: [192.168.99.50]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'openshift_master_facts_default_predicates'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Unknown short_version 3.10"}

Please let me know if I can provide any further information.

Thank you

Most helpful comment

There was a problem with the releases centos 3.7.1 packages.
They have been rebuilt, and passed our initial tests. They should be in the released centos repositories in a few days.
Thank you for your patience with the dead horse.

All 38 comments

I also hit exactly the same problem today.
Key things were:

  1. need to add this property to get things moving:
    openshift_pkg_version=-3.7.1

  2. Once set hit the Unknown short_version 3.10 error.

Seems something got badly broken.

This was using release-3.7 branch of openshift-ansible.

2 remarks :

  • Documentation should be improved in order to explain for containerized and non containerized environment how rpms packages are resolved and if not yet there, how they are downloaded (mirror server for centos, fedora, rhel)
  • Version, package_version, release and image_tag should be better documented including also how the number should be passed. Some examples are required !

Unknown short_version 3.10 error.

Here is the detail

ASK [openshift_master_facts : Set Default scheduler predicates and priorities] **********************************************************************************************************************************************************************
task path: /Users/dabou/Code/rhoar/cloud-native/infra/ansible/openshift-ansible/roles/openshift_master_facts/tasks/main.yml:110
fatal: [192.168.99.50]: FAILED! => {
    "msg": "An unhandled exception occurred while running the lookup plugin 'openshift_master_facts_default_predicates'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Unknown short_version 3.10"
}

PLAY RECAP *******************************************************************************************************************************************************************************************************************************************
192.168.99.50              : ok=108  changed=5    unreachable=0    failed=1   
localhost                  : ok=11   changed=0    unreachable=0    failed=0   


INSTALLER STATUS *************************************************************************************************************************************************************************************************************************************
Initialization             : Complete
Health Check               : Complete
etcd Install               : Complete
Master Install             : In Progress
        This phase can be restarted by running: playbooks/byo/openshift-master/config.yml



Failure summary:


  1. Hosts:    192.168.99.50
     Play:     Create OpenShift certificates for master hosts
     Task:     Set Default scheduler predicates and priorities
     Message:  An unhandled exception occurred while running the lookup plugin 'openshift_master_facts_default_predicates'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Unknown short_version 3.10

Full debugging output would be helpful. The role openshift_version is responsible for setting openshift_version among other version-related variables. openshift.common.short_version is set by openshift_facts.

Also, can you post the value of openshift.common.short_version from the file /etc/ansible/facts/openshift.fact here?

I see this in the facts for the master node:
"short_version": "3.10"

@michaelgugino You can see the full output here

@michaelgugino The 3.10 version only shows up when using 3.7.1. When using using 3.7.0 then the error is

Message:  Error: Package: origin-sdn-ovs-3.7.0-1.0.7ed6862.x86_64 (centos-openshift-origin37)
                          Requires: origin-node = 3.7.0-1.0.7ed6862
                          Installed: origin-node-3.7.1-1.el7.git.0.0a2d6a1.x86_64 (@centos-openshift-origin37)
                              origin-node = 3.7.1-1.el7.git.0.0a2d6a1
                          Available: origin-node-3.7.0-1.0.7ed6862.x86_64 (centos-openshift-origin37)
                              origin-node = 3.7.0-1.0.7ed6862

The output from /etc/ansible/facts.d/openshift.fact is

{
  "node": {
    "schedulable": "true",
    "labels": {
      "region": "infra",
      "zone": "default"
    },
    "dns_ip": "10.0.3.15",
    "proxy_mode": "iptables",
    "kubelet_args": {
      "pods-per-core": [
        "20"
      ]
    }
  },
  "builddefaults": {
    "config": {
      "BuildDefaults": {
        "configuration": {
          "apiVersion": "v1",
          "kind": "BuildDefaultsConfig",
          "env": [
            {
              "name": "HTTP_PROXY",
              "value": ""
            },
            {
              "name": "HTTPS_PROXY",
              "value": ""
            },
            {
              "name": "NO_PROXY",
              "value": ""
            },
            {
              "name": "http_proxy",
              "value": ""
            },
            {
              "name": "https_proxy",
              "value": ""
            },
            {
              "name": "no_proxy",
              "value": ""
            }
          ],
          "resources": {
            "requests": {},
            "limits": {}
          }
        }
      }
    }
  },
  "logging": {
    "elasticsearch": {
      "pvc": {},
      "ops": {
        "pvc": {}
      }
    }
  },
  "cloudprovider": {},
  "master": {
    "admission_plugin_config": {
      "openshift.io/ImagePolicy": {
        "configuration": {
          "kind": "ImagePolicyConfig",
          "executionRules": [
            {
              "skipOnResolutionFailure": true,
              "matchImageAnnotations": [
                {
                  "key": "images.openshift.io/deny-execution",
                  "value": "true"
                }
              ],
              "reject": true,
              "name": "execution-denied",
              "onResources": [
                {
                  "resource": "pods"
                },
                {
                  "resource": "builds"
                }
              ]
            }
          ],
          "apiVersion": "v1"
        }
      }
    },
    "named_certificates": [],
    "cluster_public_hostname": "192.168.99.50",
    "identity_providers": [
      {
        "name": "htpasswd_auth",
        "login": "true",
        "challenge": "true",
        "kind": "HTPasswdPasswordIdentityProvider",
        "filename": "/etc/origin/master/htpasswd"
      }
    ],
    "etcd_hosts": [
      "192.168.99.50"
    ],
    "manage_htpasswd": true,
    "session_secrets_file": "/etc/origin/master/session-secrets.yaml",
    "master_count": "1",
    "cluster_method": "native",
    "etcd_port": "2379",
    "session_encryption_secrets": [
      "EDOyxF7Yn3THeN4Dl1agIv4iVb2blCs+"
    ],
    "ha": false,
    "htpasswd_users": {
      "admin": "$apr1$DloeoaY3$nqbN9fQBkyXgbj58buqEM."
    },
    "session_auth_secrets": [
      "EDOyxF7Yn3THeN4Dl1agIv4iVb2blCs+"
    ]
  },
  "common": {
    "etcd_runtime": "host",
    "is_etcd_system_container": false,
    "ip": "192.168.99.50",
    "hostname": "192.168.99.50",
    "deployment_subtype": "basic",
    "is_master_system_container": false,
    "is_containerized": false,
    "is_node_system_container": false,
    "system_images_registry": "docker.io",
    "generate_no_proxy_hosts": true,
    "is_openvswitch_system_container": false,
    "no_proxy_etcd_host_ips": "192.168.99.50",
    "public_hostname": "192.168.99.50",
    "deployment_type": "origin"
  },
  "etcd": {},
  "docker": {
    "hosted_registry_network": "172.30.0.0/16",
    "use_crio": false,
    "hosted_registry_insecure": false,
    "use_system_container": false
  },
  "buildoverrides": {
    "config": {
      "BuildOverrides": {
        "configuration": {
          "kind": "BuildOverridesConfig",
          "apiVersion": "v1"
        }
      }
    }
  }
}

From what I can see packages for v3.7.1-1 actually contain binaries for v3.10.0-alpha.0

https://bugs.centos.org/view.php?id=14594

@jfchevrette Thank you for the update.

Actually this issue isn't really about 3.7.1, but rather about 3.7.0.
The problems with 3.7.1 where mentioned just to give the complete picture:

I had a problem with 3.7.0 (which is what all the debugging output is from) -> I tried 3.7.1 to see if I can get around it -> No luck, blocked on both 3.7.0 and 3.7.1

Also unable to install. Can confirm the RPM has the wrong versioned binaries in it:

# rpm -q origin
origin-3.7.1-1.el7.git.0.0a2d6a1.x86_64

# origin version
origin v3.10.0-alpha.0+0a2d6a1-65
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.16

Does not appear to be any simple workaround as the yum repo does not contain any alternate versions of the (3.7.1) package:

# yum --showduplicates list origin
Installed Packages
origin.x86_64                         3.7.1-1.el7.git.0.0a2d6a1                          @centos-openshift-origin37
Available Packages
origin.x86_64                         3.7.0-1.0.7ed6862                                  centos-openshift-origin37 
origin.x86_64                         3.7.1-1.el7.git.0.0a2d6a1                          centos-openshift-origin37 

I tried debugging this issue further and I found the following information that looks interesting to me (forgive me if it's a totally wrong conclusion, since I'm by no means a yum/rpm expert):

When I run:

repoquery --requires --resolve origin-node-3.7.0

I get the following output:

ethtool-2:4.8-1.el7.x86_64
origin-0:3.7.0-1.0.7ed6862.x86_64
bash-0:4.2.46-29.el7_4.x86_64
util-linux-0:2.23.2-43.el7_4.2.x86_64
docker-2:1.12.6-48.git0fdc778.el7.centos.x86_64
util-linux-0:2.23.2-43.el7_4.2.i686
conntrack-tools-0:1.4.4-3.el7_3.x86_64
tuned-profiles-origin-node-0:3.7.0-1.0.7ed6862.x86_64
systemd-0:219-42.el7_4.10.x86_64
nfs-utils-1:1.3.0-0.48.el7.x86_64
origin-node-0:3.7.1-1.el7.git.0.0a2d6a1.x86_64
device-mapper-persistent-data-0:0.7.0-0.1.rc6.el7.x86_64
socat-0:1.7.3.2-2.el7.x86_64

I am very surprised to see origin-node-0:3.7.1-1.el7.git.0.0a2d6a1.x86_64 in the output above and I am guessing that it's causing all the problems.

Can someone with more knowledge please take a look?

Thanks

I'm definitely having the same issue. I was able to build a new cluster from scratch on 14 March (5 days ago) and it built minus having to add openshift_disable_check=package_version due to a docker release. Today I tried to build another cluster and kept receiving an error about my version didn't match the latest of 3.10.
Example:

You requested openshift_release 3.7, which is not matched by
the latest OpenShift RPM we detected as origin-3.10.0
on host master-1-openshift-test.isc.local.
We will only install the latest RPMs, so please ensure you are getting the release
you expect. You may need to adjust your Ansible inventory, modify the repositories
available on the host, or run the appropriate OpenShift upgrade playbook

This function is what was failing the deployment due to the results of openshift version being

openshift v3.10.0-alpha.0+0a2d6a1-65
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.16

I had already manually verified the origin version to be origin.x86_64 0:3.7.1-1.el7.git.0.0a2d6a1. I also verified that the openshift binary being run was provided by origin.x86_64 0:3.7.1-1.el7.git.0.0a2d6a1 as well, and that everything else that rpm provided was also part of the 3.10-alpha version (or at least tagged as such).

To beat a dead horse, it is definitely related to the RPM. Version 3.7.1 was released from origin on 16 January 2018 (proof), but the only available RPM for that specific build of 3.7 is from 05 March 2018 and can be seen here. It's just that it at least reports that it has 3.10-alpha binaries in it. Rebuild is probably the best path forward, but I don't know how often RPMs get built, and if they are part of a separate project.

There was a problem with the releases centos 3.7.1 packages.
They have been rebuilt, and passed our initial tests. They should be in the released centos repositories in a few days.
Thank you for your patience with the dead horse.

Just wanted to add, I'm also impacted by this. I've not found a temporary workaround, but perhaps I'm not setting the correct parameters in my inventory file.

Does anyone know a (temporary) workaround?

I think at this point your only option is to build a custom repo with the 3.7.1-2 RPMs (which reportedly have fixed the issue) from either https://buildlogs.centos.org/centos/7/paas/x86_64/openshift-origin/ or http://cbs.centos.org/kojifiles/packages/origin/3.7.1/2.el7/x86_64/

You can just dump the RPMs into a local directory, use createrepo on that directory, and then add it to /etc/yum.repos.d.

I tried to add a node to a OSO 3.7.0 Cluster. With openshift_pkg_version=-3.7.0 set.
I also encountered problems
I still figured out my problem: with http://mirror.centos.org/centos/7/paas/x86_64/openshift-origin/

yum deplist origin-node-3.7.0

Loaded plugins: langpacks, product-id
package: origin-node.x86_64 3.7.0-1.0.7ed6862
[...]
  dependency: tuned-profiles-origin-node = 3.7.0-1.0.7ed6862
   provider: tuned-profiles-origin-node.x86_64 3.7.0-1.0.7ed6862
   provider: origin-node.x86_64 3.7.1-1.el7.git.0.0a2d6a1

You can successfully install the cluster running the following scripts before to start installation

wget http://cbs.centos.org/kojifiles/packages/origin/3.7.1/2.el7/x86_64/origin-3.7.1-2.el7.x86_64.rpm
wget http://cbs.centos.org/kojifiles/packages/origin/3.7.1/2.el7/x86_64/origin-clients-3.7.1-2.el7.x86_64.rpm
wget http://cbs.centos.org/kojifiles/packages/origin/3.7.1/2.el7/x86_64/origin-master-3.7.1-2.el7.x86_64.rpm
wget http://cbs.centos.org/kojifiles/packages/origin/3.7.1/2.el7/x86_64/origin-node-3.7.1-2.el7.x86_64.rpm
wget http://cbs.centos.org/kojifiles/packages/origin/3.7.1/2.el7/x86_64/tuned-profiles-origin-node-3.7.1-2.el7.x86_64.rpm
wget http://cbs.centos.org/kojifiles/packages/origin/3.7.1/2.el7/x86_64/origin-sdn-ovs-3.7.1-2.el7.x86_64.rpm
wget http://cbs.centos.org/kojifiles/packages/origin/3.7.1/2.el7/x86_64/origin-service-catalog-3.7.1-2.el7.x86_64.rpm
wget http://cbs.centos.org/kojifiles/packages/origin/3.7.1/2.el7/x86_64/origin-template-service-broker-3.7.1-2.el7.x86_64.rpm
wget http://cbs.centos.org/kojifiles/packages/origin/3.7.1/2.el7/x86_64/origin-dockerregistry-3.7.1-2.el7.x86_64.rpm
yum install *.rpm

Grr. I've been hit by something related to this issue.

You requested openshift_release 3.7.1, which is not matched by
the latest OpenShift RPM we detected as origin-3.10.0
on host xxxxxx.
We will only install the latest RPMs, so please ensure you are getting the release
you expect. You may need to adjust your Ansible inventory, modify the repositories
available on the host, or run the appropriate OpenShift upgrade playbook.

So, does @adawolfs's temp workaround work or is it best to wait for the centos 3.7.1 packages hit the released centos repositories? Other than repeatedly trying is there way to know when the RPMs are in the released centos repositories?

@nemonik you can just watch this site

Even if the Centos, Fedora or RHEL repos will resolve such dependencies mismatch (rpm downloaded for 3.7, 3.10,...), these action items are required

  • Improve doc in order to explain for containerized and non containerized environment how rpms packages are resolved and if not yet there, how they are downloaded (mirror server for centos, fedora, rhel)
  • Better document the different variables such as openshift_version, openshift_package_version, openshift_release and openshift_image_tag and include real examples are required !
  • Review calculation logic of the playbook as I don't think that the current code is able to deal with minor versions correctly. Example :

    • origin-3.7.0-1.0.7ed6862.x86_64.rpm

    • origin-3.7.1-1.el7.git.0.0a2d6a1.x86_64.rpm

    • origin-3.7.1-2.el7.x86_64.rpm


The updated origin-3.7.1-2.el7 is now available in all the regular repositories. This problem should be fixed now.

hi @tdawson
Thanks for the update,
so i should just run the ansible from the master now?

You should be able to, yes.

Hi tdawson,
Thanks for the Prompt response , much appreciate .
i am running the ansible 3_7 and getting the following error "changed": false, "msg": "OCP rpm version 3.6.1 is different from OCP image version 3.6.0"
I think its the same issue.
trying to upgrade from 3.6 to 3.7 .

The dependency for origin-node-3.7.0 is still broken.

yum deplist origin-node-3.7.0

  dependency: tuned-profiles-origin-node = 3.7.0-1.0.7ed6862
   provider: tuned-profiles-origin-node.x86_64 3.7.0-1.0.7ed6862
   provider: origin-node.x86_64 3.7.1-1.el7.git.0.0a2d6a1

yum deplist origin-node-3.7.1 🆗

dependency: tuned-profiles-origin-node = 3.7.1-2.el7
   provider: tuned-profiles-origin-node.x86_64 3.7.1-2.el7
   provider: origin-node.x86_64 3.7.1-1.el7.git.0.0a2d6a1

seeing this error. upgrade from 3.6 to 3.7. setting 3.7 in inventory.

Message: Error: Package: origin-node-3.7.1-2.el7.x86_64 (centos-openshift-origin37)
Requires: tuned-profiles-origin-node = 3.7.1-2.el7
Available: origin-node-3.7.1-1.el7.git.0.0a2d6a1.x86_64 (centos-openshift-origin37)
tuned-profiles-origin-node
Available: origin-node-3.7.0-1.0.7ed6862.x86_64 (centos-openshift-origin37)
Not found
Installing: origin-node-3.7.1-2.el7.x86_64 (centos-openshift-origin37)
Not found

  1. Hosts:
    Play: push ca serial file to all masters
    Task: push ca.serial.txt
    Message: Could not find or access '/tmp/ca.serial.txt'

i've not been able to get the 3.6 to 3.7 upgrade to run with this config. i've set everything as requested, the repos still don't show all the right versions. is there a documented process to get this fixed. I've done git checkout agains a tag and release branch that i know worked up until recently.

i still get 1 of several errors, either the 3.10 error, or docker pkg version error, or tuned-profiles origin node (3.7.2) is not available.

i've tried the work around (installed all 3.7.2 pkgs from cbs.centos ) but still have had issues.

Please I need a working stable version that is known to work, does anyone has one for Centos 7?

Yes, this problem was fixed a couple of weeks ago and installs on Centos7 now work fine.
What we do is just specify this option in the ansible inventory file:
openshift_release=v3.7
No need to specify the openshift_image_tag or openshift_pkg_version properties.
And be on the release-3.7 branch of the openshift-ansible github repo.

I found that specfiying:

openshift_pkg_version=-3.7.1-2.el7

Yields a working deployment on CentOS 7

I had issue running the scale up playbook, when adding 3 nodes, that it was installing the origin-node-3.7.1-1.el7.git.0.0a2d6a1.x86_64.rpm and then would fail when trying to install origin-sdn-ovs because it was trying 3.7.0 and said that above pkg was installed. even though the 3.7 repo was enabled and the 3.7.1-2 pkgs were there too. To work around the scale up, i had to do run scaleup to that point where it failed, then run yum downgrade on my scaleup nodes to downgrade origin-node to 3.7.0, then rerun scaleup.

Any chance the 'bad' pkgs with 'git' in them, (like 3.7.1-1.el7.git.0.0a2d6a1.x86_64.rpm) will be removed from the CentOS repos?

all, can you please try to re-run the deployment as there were new origin rpms been promoted to the official centos repos yesterday (ie - v3.7.2) and we should no longer have any issues with mixed versions.

We do apologize for the inconvenience, we had few issues with our automation which should be fixed.

I am marking this issue closed because it looks like we've resolved this a few weeks ago.
If you continue to have problems, please re-open, or create a new issue.

Workin on this today -
I found I had to set this:
openshift_release=v3.7
openshift_pkg_version=v3.7
openshift_image_tag=v3.7

To get past that particular problem...
-Andy

Actually I found out i was still on the master branch... whoopsie :)

Yes, this has been resolved.

I can confirm that if you make sure you are on the correct branch and only set this openshift_release=v3.7

It works perfectly

Was this page helpful?
0 / 5 - 0 ratings