openshift-ansible 3.2 fails with flannel

Created on 19 Aug 2016  路  12Comments  路  Source: openshift/openshift-ansible

With current RHEL stable rpms openshift-ansible fails when using flannel:

TASK: [etcd_ca | copy ] ******************************************************* 
<flannel-openshift-master-0.example.com> ESTABLISH CONNECTION FOR USER: cloud-user
<flannel-openshift-master-0.example.com> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=cloud-user -o ConnectTimeout=10 flannel-openshift-master-0.example.com /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=shijlholyhkddblxhxigybljvpdnepji] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-shijlholyhkddblxhxigybljvpdnepji; rc=flag; [ -r /etc/etcd/ca/serial ] || rc=2; [ -f /etc/etcd/ca/serial ] || rc=1; [ -d /etc/etcd/ca/serial ] && rc=3; python -V 2>/dev/null || rc=4; [ x"$rc" != "xflag" ] && echo "${rc} "/etc/etcd/ca/serial && exit 0; (python -c '"'"'"'"'"'"'"'"'import hashlib; BLOCKSIZE = 65536; hasher = hashlib.sha1(); afile = open("'"'"'"'"'"'"'"'"'/etc/etcd/ca/serial'"'"'"'"'"'"'"'"'", "rb") buf = afile.read(BLOCKSIZE) while len(buf) > 0: hasher.update(buf) buf = afile.read(BLOCKSIZE) afile.close() print(hasher.hexdigest())'"'"'"'"'"'"'"'"' 2>/dev/null) || (python -c '"'"'"'"'"'"'"'"'import sha; BLOCKSIZE = 65536; hasher = sha.sha(); afile = open("'"'"'"'"'"'"'"'"'/etc/etcd/ca/serial'"'"'"'"'"'"'"'"'", "rb") buf = afile.read(BLOCKSIZE) while len(buf) > 0: hasher.update(buf) buf = afile.read(BLOCKSIZE) afile.close() print(hasher.hexdigest())'"'"'"'"'"'"'"'"' 2>/dev/null) || (echo '"'"'"'"'"'"'"'"'0 '"'"'"'"'"'"'"'"'/etc/etcd/ca/serial)'"'"''
ok: [flannel-openshift-master-0.example.com] => {"changed": false}

TASK: [etcd_ca | command openssl req -config {{ etcd_openssl_conf }} -newkey rsa:4096 -keyout {{ etcd_ca_key }} -new -out {{ etcd_ca_cert }} -x509 -extensions {{ etcd_ca_exts_self }} -batch -nodes -days {{ etcd_ca_default_days }} -subj /CN=etcd-signer@{{ ansible_date_time.epoch }}
] *** 
<flannel-openshift-master-0.example.com> ESTABLISH CONNECTION FOR USER: cloud-user
<flannel-openshift-master-0.example.com> REMOTE_MODULE command chdir=/etc/etcd/ca creates=/etc/etcd/ca/ca.crt openssl req -config /etc/etcd/ca/openssl.cnf -newkey rsa:4096 -keyout /etc/etcd/ca/ca.key -new -out /etc/etcd/ca/ca.crt -x509 -extensions etcd_v3_ca_self -batch -nodes -days 365 -subj /CN=
<flannel-openshift-master-0.example.com> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=cloud-user -o ConnectTimeout=10 flannel-openshift-master-0.example.com /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1471512694.42-257802562797056 && echo $HOME/.ansible/tmp/ansible-tmp-1471512694.42-257802562797056'
<flannel-openshift-master-0.example.com> PUT /tmp/tmptnwBXo TO /home/cloud-user/.ansible/tmp/ansible-tmp-1471512694.42-257802562797056/command
<flannel-openshift-master-0.example.com> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=cloud-user -o ConnectTimeout=10 flannel-openshift-master-0.example.com /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=pyigeajimbnrwnicwbjngkccbyzdvzah] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-pyigeajimbnrwnicwbjngkccbyzdvzah; LANG=C LC_CTYPE=C SAN=etcd-signer /usr/bin/python /home/cloud-user/.ansible/tmp/ansible-tmp-1471512694.42-257802562797056/command; rm -rf /home/cloud-user/.ansible/tmp/ansible-tmp-1471512694.42-257802562797056/ >/dev/null 2>&1'"'"''
ok: [flannel-openshift-master-0.example.com] => {"changed": false, "cmd": "openssl req -config /etc/etcd/ca/openssl.cnf -newkey rsa:4096 -keyout /etc/etcd/ca/ca.key -new -out /etc/etcd/ca/ca.crt -x509 -extensions etcd_v3_ca_self -batch -nodes -days 365 -subj /CN=etcd-signer@1471512679", "rc": 0, "stderr": false, "stdout": "skipped, since /etc/etcd/ca/ca.crt exists"}

TASK: [etcd_certificates | Ensure generated_certs directory present] ********** 
fatal: [flannel-openshift-master-0.example.com] => Failed to template {{ etcd_needing_client_certs | default([]) }}: Failed to template {{ g_master_hosts | union(g_node_hosts) | union(g_etcd_hosts) | union(g_lb_hosts) | union(g_nfs_hosts) | union(g_new_node_hosts)| union(g_new_master_hosts) | default([]) }}: an unexpected type error occurred. Error was unsupported operand type(s) for +: 'set' and 'unicode'

FATAL: all hosts have already failed -- aborting

PLAY RECAP ******************************************************************** 
           to retry, use: --limit @/root/main.retry

flannel-openshift-master-0.example.com : ok=554  changed=117  unreachable=1    failed=0   
flannel-openshift-node-tzz812qo.example.com : ok=163  changed=34   unreachable=0    failed=0   
localhost                  : ok=47   changed=6    unreachable=0    failed=0   

rpms:
root@flannel-infra openshift-ansible]# rpm -qa|grep openshift-ansible
openshift-ansible-filter-plugins-3.2.13-1.git.0.0afa976.el7.noarch
openshift-ansible-docs-3.2.13-1.git.0.0afa976.el7.noarch
openshift-ansible-lookup-plugins-3.2.13-1.git.0.0afa976.el7.noarch
openshift-ansible-playbooks-3.2.13-1.git.0.0afa976.el7.noarch
openshift-ansible-3.2.13-1.git.0.0afa976.el7.noarch
openshift-ansible-roles-3.2.13-1.git.0.0afa976.el7.noarch

It fails only if openshift_use_flannel=true.

Thsi is a different error than #2322

kinbug prioritP2

Most helpful comment

Yes, it's really advisable to use an explicitly defined etcd host even if that's the same host as your master.

All 12 comments

Looks like one of the group variables isn't being defaulted properly, and judging from the task, I suspect it only affects flannel because we don't need to deploy etcd client certs to nodes unless flannel is in use.

Relevant error:

TASK: [etcd_certificates | Ensure generated_certs directory present] ********** 
fatal: [flannel-openshift-master-0.example.com] => Failed to template {{ etcd_needing_client_certs | default([]) }}: Failed to template {{ g_master_hosts | union(g_node_hosts) | union(g_etcd_hosts) | union(g_lb_hosts) | union(g_nfs_hosts) | union(g_new_node_hosts)| union(g_new_master_hosts) | default([]) }}: an unexpected type error occurred. Error was unsupported operand type(s) for +: 'set' and 'unicode'

but in master fails with some other error like below:

TASK [openshift_repos : Configure yum repositories Fedora] *********************
task path: /root/openshift-ansible/roles/openshift_repos/tasks/main.yaml:83
skipping: [jupiter-vm1144.pok.stglabs.ibm.com] => (item=/root/openshift-ansible/roles/openshift_repos/files/fedora-origin/repos/maxamillion-fedora-openshift-fedora.repo)  => {"changed": false, "item": "/root/openshift-ansible/roles/openshift_repos/files/fedora-origin/repos/maxamillion-fedora-openshift-fedora.repo", "skip_reason": "Conditional check failed", "skipped": true}
skipping: [jupiter-vm931.pok.stglabs.ibm.com] => (item=/root/openshift-ansible/roles/openshift_repos/files/fedora-origin/repos/maxamillion-fedora-openshift-fedora.repo)  => {"changed": false, "item": "/root/openshift-ansible/roles/openshift_repos/files/fedora-origin/repos/maxamillion-fedora-openshift-fedora.repo", "skip_reason": "Conditional check failed", "skipped": true}

TASK [etcd_client_certificates : Ensure CA certificate exists on etcd_ca_host] *
task path: /root/openshift-ansible/roles/etcd_client_certificates/tasks/main.yml:2
fatal: [jupiter-vm1144.pok.stglabs.ibm.com]: FAILED! => {"failed": true, "msg": "{{ groups.oo_etcd_to_config.0 }}: 'dict object' has no attribute 'oo_etcd_to_config'"}

This one fails while running PLAY [Additional node config]

Code is failing due to this line - https://github.com/openshift/openshift-ansible/blob/master/playbooks/common/openshift-node/config.yml#L135 where we are trying to access the first etcd host which does not really exist because in this example we are not passing any [etcd] hosts.

So I'm wondering whether we need to send first master in case of no etcd hosts present!?

@jprovaznik @mkumatag: I'm wondering if we should just require that an etcd host be defined for using flannel. We _could_ leverage the embedded etcd, however it would require some changes to the way that we are currently handling etcd certificates.

I'd be surprised if this worked in the past without configuring an external etcd.

@detiber etcd group is actually set in my case, here is inventory file being used:

# Create an OSEv3 group that contains the masters and nodes groups
[OSv3:children]
infra
masters
nodes
etcd
[infra]
localhost

[masters]
flannel-openshift-master-0.example.com

[etcd]
flannel-openshift-master-0.example.com

[nodes]
flannel-openshift-master-0.example.com openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
flannel-openshift-node-tzz812qo.example.com openshift_node_labels="{'region': 'primary', 'zone': 'default'}"

[dns]
localhost

[extradnsitems]
loadbalancer

And ansible log file is attached.
ansible.19314.txt

@jprovaznik Is this still an issue w/ the fixes that have gone in for flannel?

@abutcher IIRC my last attempt on 3.2 worked fine w/o hitting this.

@jprovaznik cool, I'm going to close this out then.

Hey guys, still seeing this consistently on 1.3 (and 1.4-rc), using a recent cut of openshift-ansible when openshift_use_flannel=true and etcd hosts are not explicitly defined:

TASK [etcd_client_certificates : Ensure CA certificate exists on etcd_ca_host] *
fatal: [master.turk.durk]: FAILED! => {"failed": true, "msg": "{{ groups.oo_etcd_to_config.0 }}: 'dict object' has no attribute 'oo_etcd_to_config'"}

With the following configuration:

[OSEv3:children]
masters
nodes

[OSEv3:vars]
deployment_type=origin
os_sdn_network_plugin_name=cni
openshift_use_openshift_sdn=false
openshift_use_flannel=true

[masters]
master

[nodes]
master

I'm using flannel's playbook as a guide to create a new Calico playbook, but of course that means my new playbook is hitting this exact issue as well....

@djosborne : Can you check by specifying explicit etcd entry. AFAIK flannel setup is not supported with implicit etcd.

@mkumatag Yes, I do get past this issue using separately defined etcd hosts.

I guess if it's not supported, it's not supported :)

Yes, it's really advisable to use an explicitly defined etcd host even if that's the same host as your master.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

gvv90 picture gvv90  路  4Comments

detiber picture detiber  路  6Comments

thebithead picture thebithead  路  5Comments

MarWestermann picture MarWestermann  路  6Comments

leoluk picture leoluk  路  4Comments