Openshift-ansible: [GlusterFS] ansible-service-broker can't start when using Atomic

Created on 28 Feb 2018  路  11Comments  路  Source: openshift/openshift-ansible

Description

Working on installing OpenShift Origin 3.9 onto a CentOS Atomic host, and it looks like there might be some missing functionality missing that results in an inability to properly start OpenShift Ansible Broker (OAB). My understanding is that OAB requires persistent storage and that the only method for storage with OAB is to back it with NFS.

For example, the hosted registry storage setup has a kind setting via openshift_hosted_registry_storage_kind=glusterfs that allows for the backing store to be done via a PVC with GlusterFS.

For the OAB spin up, as far as I can tell, the only method for openshift_hosted_etcd_storage_kind is nfs. When you're using Atomic as the host OS, the instantiation of an NFS host fails during the playbooks/openshift-nfs execution, since there is no package manager on Atomic (fails with the Ansible package module obviously).

It looks like a new endpoint and service template needs to be created in roles/openshift_storage_glusterfs/ and roles/openshift_hosted/ just like exists with the registry?

Am I on the right track here?

Version
openshift-ansible-3.9.0-0.53.0-27-g3df819925

ansible 2.4.2.0
  config file = /home/lmadsen/src/github/leifmadsen/openshift-ansible/ansible.cfg
  configured module search path = [u'/home/lmadsen/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.14 (default, Jan 17 2018, 14:28:32) [GCC 7.2.1 20170915 (Red Hat 7.2.1-2)]
Steps To Reproduce
  1. Install OpenShift Origin 3.9 (containerized) on a CentOS Atomic host OS
  2. oc get all -n openshift-ansible-service-broker
  3. See that po/asb-etcd-1-deploy has failed
Expected Results

Passing deployment of ASB :)

Observed Results

The asb-etcd pod fails to start

$ oc get all -n openshift-ansible-service-broker
NAME                         REVISION   DESIRED   CURRENT   TRIGGERED BY
deploymentconfigs/asb        1          1         0         config
deploymentconfigs/asb-etcd   1          1         0         config

NAME              HOST/PORT                                                          PATH      SERVICES   PORT      TERMINATION   WILDCARD
routes/asb-1338   asb-1338-openshift-ansible-service-broker.apps.home.61will.space             asb        1338      reencrypt     None

NAME                   READY     STATUS    RESTARTS   AGE
po/asb-1-deploy        0/1       Error     0          13h
po/asb-etcd-1-deploy   0/1       Error     0          13h

NAME            DESIRED   CURRENT   READY     AGE
rc/asb-1        0         0         0         13h
rc/asb-etcd-1   0         0         0         13h

NAME           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
svc/asb        ClusterIP   172.30.98.17     <none>        1338/TCP   13h
svc/asb-etcd   ClusterIP   172.30.244.243   <none>        2379/TCP   13h

Logs:

$ oc logs -n openshift-ansible-service-broker po/asb-etcd-1-deploy
--> Scaling asb-etcd-1 to 1
error: update acceptor rejected asb-etcd-1: pods for rc 'openshift-ansible-service-broker/asb-etcd-1' took longer than 600 seconds to become available

Description of volumes etc for etcd:

$ oc describe -n openshift-ansible-service-broker rc/asb-etcd-1
Name:         asb-etcd-1
Namespace:    openshift-ansible-service-broker
Selector:     app=etcd,deployment=asb-etcd-1,deploymentconfig=asb-etcd
Labels:       app=etcd
              openshift.io/deployment-config.name=asb-etcd
              service=asb-etcd
Annotations:  kubectl.kubernetes.io/desired-replicas=1
              openshift.io/deployer-pod.completed-at=2018-02-28 02:04:40 +0000 UTC
              openshift.io/deployer-pod.created-at=2018-02-28 01:54:34 +0000 UTC
              openshift.io/deployer-pod.name=asb-etcd-1-deploy
              openshift.io/deployment-config.latest-version=1
              openshift.io/deployment-config.name=asb-etcd
              openshift.io/deployment.phase=Failed
              openshift.io/deployment.replicas=0
              openshift.io/deployment.status-reason=config change
              openshift.io/encoded-deployment-config={"kind":"DeploymentConfig","apiVersion":"v1","metadata":{"name":"asb-etcd","namespace":"openshift-ansible-service-broker","selfLink":"/apis/apps.openshift.io/v1/...
Replicas:     0 current / 0 desired
Pods Status:  0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app=etcd
                    deployment=asb-etcd-1
                    deploymentconfig=asb-etcd
                    service=asb-etcd
  Annotations:      openshift.io/deployment-config.latest-version=1
                    openshift.io/deployment-config.name=asb-etcd
                    openshift.io/deployment.name=asb-etcd-1
  Service Account:  asb
  Containers:
   etcd:
    Image:  quay.io/coreos/etcd:latest
    Port:   2379/TCP
    Args:
      /usr/local/bin/etcd
      --data-dir=/data
      --listen-client-urls=https://0.0.0.0:2379
      --advertise-client-urls=https://asb-etcd.openshift-ansible-service-broker.svc:2379
      --client-cert-auth
      --trusted-ca-file=/var/run/etcd-auth-secret/ca.crt
      --cert-file=/etc/tls/private/tls.crt
      --key-file=/etc/tls/private/tls.key
    Environment:
      ETCDCTL_API:  3
    Mounts:
      /data from etcd (rw)
      /etc/tls/private from etcd-tls (rw)
      /var/run/etcd-auth-secret from etcd-auth (rw)
  Volumes:
   etcd:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  etcd
    ReadOnly:   false
   etcd-tls:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  etcd-tls
    Optional:    false
   etcd-auth:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  etcd-auth-secret
    Optional:    false
Events:          <none>
Additional Information

My current hosts file for Ansible being passed into openshift-ansible:

openshift-master ansible_host=openshift-master.home.61will.space
openshift-node-1 ansible_host=openshift-node-1.home.61will.space
openshift-node-2 ansible_host=openshift-node-2.home.61will.space
openshift-node-3 ansible_host=openshift-node-3.home.61will.space

[OSEv3:children]
masters
nodes
etcd
glusterfs

[OSEv3:vars]
ansible_become=yes
debug_level=2

# storage
openshift_storage_glusterfs_namespace=glusterfs
openshift_storage_glusterfs_name=storage

# main setup
openshift_master_unsupported_embedded_etcd=true
openshift_disable_check=disk_availability,memory_availability,docker_image_availability
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
openshift_deployment_type=origin
containerized=true
openshift_release=3.9
openshift_image_tag=latest
enable_excluders=false

# hostname setup
openshift_hostname_check=true
openshift_master_default_subdomain=apps.home.61will.space

# registry storage
openshift_hosted_registry_storage_kind=glusterfs

ansible_service_broker_local_registry_whitelist=['.*-apb$']

[masters]
openshift-master

[etcd]
openshift-master

[nodes]
openshift-master openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_schedulable=true
openshift-node-[1:3] openshift_node_labels="{'region': 'primary', 'zone': 'default'}"
openshift-node-[4:6] openshift_node_labels="{'region': 'infra'}"

[glusterfs]
openshift-node-[1:3] glusterfs_devices='[ "/dev/vdb" ]'

[glusterfs_registry]
openshift-node-[4:6] glusterfs_devices='[ "/dev/vdb" ]'

[all:vars]
ansible_user=centos
ansible_ssh_private_key_file=/home/lmadsen/.ssh/id_openshiftlab
lifecyclrotten

Most helpful comment

OK, I was able to fix this by adding openshift_storage_glusterfs_storageclass_default=true to my atomic.inventory file. By having a default storageclass setup, when the PVC is created for etcd, it is Bound to the asb-etcd-1 pod.

However, now it looks like I'm running into issue https://github.com/openshift/ansible-service-broker/issues/585 as far as I can tell...

All 11 comments

/cc @cgwalters and @jtligon for issue subscription

@leifmadsen If you have a dynamic volume provisioner then you can use that rather than NFS. I believe there have been recent fixes to atomic host that may allow NFS volumes to work there but we haven't gotten around to testing that recently. Also, ASB will soon switch to Custom Resource Definitions (CRD) via the API and drop the requirement for a local etcd so this is a pretty low priority for us.

@sdodson hrmmm... is there a way now to configure the etcd that is used for ASB to leverage that storage instead of NFS so that a working ASB could be started now, or is the CRD method effectively a blocker for this?

I'm working on some sample configurations that will allow OpenShift 3.9 to start on Atomic (virtual environment) with just GlusterFS and no NFS ideally.

Just don't set openshift_hosted_etcd_storage_kind and it should work assuming there are volumes to fulfill the PVC defined here https://github.com/openshift/openshift-ansible/blob/master/roles/ansible_service_broker/tasks/install.yml#L207-L213

@sdodson I had to step away for another project, but now I'm back trying to figure this out. Are you saying that I should deploy OpenShift first, go in and manually create a bunch of PVs to satisfy other aspects of the system, then run some set of playbooks after those PVs are created?

OK, I was able to fix this by adding openshift_storage_glusterfs_storageclass_default=true to my atomic.inventory file. By having a default storageclass setup, when the PVC is created for etcd, it is Bound to the asb-etcd-1 pod.

However, now it looks like I'm running into issue https://github.com/openshift/ansible-service-broker/issues/585 as far as I can tell...

My issue above was a configuration issue as noted on the linked issue. At this point, I'm able to make everything work as long as I set the glusterfs storage class to default. I'm assuming there is a configuration option in here somewhere that would let me specify the storageclass, but I haven't had a chance to look yet.

Only issue now is primarily documentation I think, and that the default documentation points at using NFS which doesn't deploy onto Atomic. Indeed you can set this as low priority, or feel free to close.

@leifmadsen thanks for your testing and follow up!!

i encountered a similar issue on atomic with glusterfs - and I can confirm that adding the config below to my inventory also solved this issue for me.

[OSEv3:vars]
openshift_storage_glusterfs_storageclass_default=true

thanks @leifmadsen for the info!

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

Was this page helpful?
0 / 5 - 0 ratings