Openshift-ansible: Error creating default registry service

Created on 8 Jun 2018 · 10Comments · Source: openshift/openshift-ansible

Description

On running a deploy_cluster playbook, there is an error creating the default registry service. This didn't happen before I added openshift_master_cluster_public_hostname to the inventory/hosts.localhost file (right now it's a single master cluster).

Version

Please put the following version information in the code block
indicated below.

Your ansible version per ansible --version

If you're operating from a git clone:

The output of git describe

If you're running from playbooks installed via RPM

The output of rpm -q openshift-ansible

Place the output between the code block below:

ansible 2.5.2
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.15 (default, May 16 2018, 17:50:09) [GCC 8.1.1 20180502 (Red Hat 8.1.1-1)]

[~/openshift-ansible]$ git describe                                                                                      *[release-3.9]
openshift-ansible-3.9.30-1-14-gb17f21b5a

Steps To Reproduce

run deploy_cluster.yml playbook

Expected Results

Expected cluster to be deployed

Observed Results

Describe what is actually happening.

TASK [openshift_hosted : create the default registry service] ***************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "module_stderr": "", "module_stdout": "keys are not equal in dict\n{'ports', 'type', 's
elector', 'sessionAffinity'}\n{'ports', 'sessionAffinity', 'sessionAffinityConfig', 'type', 'selector'}\n\n{\"changed\": true, \"results\
": {\"returncode\": 0, \"cmd\": \"/usr/bin/oc get service docker-registry -o json -n default\", \"results\": [{\"apiVersion\": \"v1\", \"
kind\": \"Service\", \"metadata\": {\"creationTimestamp\": \"2018-06-08T19:04:21Z\", \"name\": \"docker-registry\", \"namespace\": \"defa
ult\", \"resourceVersion\": \"473253\", \"selfLink\": \"/api/v1/namespaces/default/services/docker-registry\", \"uid\": \"c03c33eb-6b4e-1
1e8-83a7-ac1f6b45c3f0\"}, \"spec\": {\"clusterIP\": \"172.30.192.95\", \"ports\": [{\"name\": \"5000-tcp\", \"port\": 5000, \"protocol\":
 \"TCP\", \"targetPort\": 5000}], \"selector\": {\"docker-registry\": \"default\"}, \"sessionAffinity\": \"ClientIP\", \"sessionAffinityC
onfig\": {\"clientIP\": {\"timeoutSeconds\": 10800}}, \"type\": \"ClusterIP\"}, \"status\": {\"loadBalancer\": {}}}], \"clusterip\": \"17
2.30.192.95\"}, \"state\": \"present\", \"invocation\": {\"module_args\": {\"namespace\": \"default\", \"name\": \"docker-registry\", \"p
orts\": [{\"name\": \"5000-tcp\", \"port\": 5000, \"protocol\": \"TCP\", \"targetPort\": 5000}], \"selector\": {\"docker-registry\": \"de
fault\"}, \"session_affinity\": \"ClientIP\", \"service_type\": \"ClusterIP\", \"clusterip\": \"\", \"kubeconfig\": \"/etc/origin/master/
admin.kubeconfig\", \"state\": \"present\", \"debug\": false, \"annotations\": null, \"labels\": null, \"portalip\": null, \"external_ips
\": null}}}\n", "msg": "MODULE FAILURE", "rc": 0}

For long output or logs, consider using a gist

Additional Information

Your operating system and version, ie: RHEL 7.2, Fedora 23 ($ cat /etc/redhat-release)
Your inventory file (especially any non-standard configuration parameters)
Sample code, etc

[~/openshift-ansible]$ cat /etc/redhat-release                                                                           *[release-3.9]
Fedora release 28 (Twenty Eight)

[~/openshift-ansible]$ cat inventory/hosts.localhost
#bare minimum hostfile

[OSEv3:children]
masters
nodes
etcd                                                                                                                                     

[OSEv3:vars]
# if your target hosts are Fedora uncomment this
ansible_python_interpreter=/usr/bin/python3
openshift_deployment_type=origin
openshift_version=3.9.0
openshift_release="3.9.0"
openshift_pkg_version=-3.9.0
osm_cluster_network_cidr=10.128.0.0/14
openshift_portal_net=172.30.0.0/16
osm_host_subnet_length=9
openshift_enable_excluders=false
# localhost likely doesn't meet the minimum requirements
openshift_disable_check=disk_availability,memory_availability
# use firewalld, it's bugged on Atomic Host but not normal spin
os_firewall_use_firewalld=true

# htpasswd auth
# Defining htpasswd users
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd' }]
openshift_node_kubelet_args={'cgroup-driver':['cgroupfs']}
openshift_master_named_certificates=[{"certfile": "/etc/secrets/cevn.pem", "keyfile": "/etc/secrets/cevn.key", "names": ["openshift.cevn.io"]}]
openshift_master_cluster_method=native
#openshift_master_cluster_hostname=openshift.cevn.io
openshift_master_cluster_public_hostname=openshift.cevn.io
openshift_hosted_router_certificate={"certfile": "/etc/secrets/cevn.pem", "keyfile": "/etc/secrets/cevn.key", "cafile": "/etc/secrets/cevn.ca"}
openshift_master_default_subdomain=cevn.io

[masters]
localhost ansible_connection=local

[etcd]
localhost ansible_connection=local

[nodes]
localhost  ansible_connection=local openshift_schedulable=true openshift_node_labels="{'region': 'infra', 'zone': 'default'}"

lifecyclrotten

Source

cevn

👍1

Most helpful comment

@nagonzalez I tried your change and it worked. I am able to run the playbook as many times as I want and it doesn't produce this error.

jdoss on 14 Sep 2018

👍3

All 10 comments

Fedora 28
Origin 3.9

I ran into the same issue. I'm guessing this is due to create the default registry service play not being idempotent.

Here's the play:

- name: create the default registry service
  oc_service:
    namespace: "{{ openshift_hosted_registry_namespace }}"
    name: "{{ openshift_hosted_registry_name }}"
    ports:
    - name: 5000-tcp
      port: 5000
      protocol: TCP
      targetPort: 5000
    selector:
      docker-registry: default
    session_affinity: ClientIP
    service_type: ClusterIP
    clusterip: '{{ openshift_hosted_registry_clusterip | default(omit) }}

Here's the relevant check in oc_service.py:

                # before passing ensure keys match
                api_values = set(value.keys()) - set(skip)
                user_values = set(user_def[key].keys()) - set(skip)
                if api_values != user_values:
                    if debug:
                        print("keys are not equal in dict")
                        print(user_values)
                        print(api_values)
                    return False

api_values contains sessionAffinityConfig but it's not in user_values or play

I was able to get around this by deleting the service prior to rerunning deploy_cluster.yml

oc delete service docker-registry

nagonzalez on 26 Jul 2018

This impacts openshift-ansible-3.10-* as well.

jdoss on 27 Aug 2018

If this is still currently broken for 3.10 is there at least a workaround in the meantime that can be used to get past it and complete an installation?

ironfroggy on 31 Aug 2018

I was able to get it to work by changing

https://github.com/openshift/openshift-ansible/blob/b5ef1991be32b0599be8d219fdd52ae7d9372278/roles/lib_openshift/library/oc_service.py#L1395

skip = ['metadata', 'status', 'sessionAffinityConfig']

I'm more than happy to submit PR if it works for you as well

nagonzalez on 1 Sep 2018

@nagonzalez I tried your change and it worked. I am able to run the playbook as many times as I want and it doesn't produce this error.

jdoss on 14 Sep 2018

👍3

This is still broken in openshift-ansible-3.11-*.

jdoss on 2 Nov 2018

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot on 28 May 2020

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot on 27 Jun 2020

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-bot on 27 Jul 2020

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.