The etcd installation is skipped on a simple single node setup (in vagrant).
ansible --version:ansible 2.7.0
config file = /etc/ansible/ansible.cfg
configured module search path = [u'/home/lennart/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/dist-packages/ansible
executable location = /usr/bin/ansible
python version = 2.7.15rc1 (default, Apr 15 2018, 21:51:34) [GCC 7.3.0]
git describe:openshift-ansible-3.10.53-1-2-gb839d825c
ansible-playbook -i single-node.ini openshift-ansible/playbooks/prerequisites.ymlansible-playbook -i single-node.ini openshift-ansible/playbooks/deploy_cluster.ymlThe cluster installation is successful.
The deploy_cluster.yml playbook fails with the following message:
TASK [openshift_control_plane : Report control plane errors] *******************************************************************************
fatal: [master]: FAILED! => {"changed": false, "msg": "Control plane pods didn't come up"}
NO MORE HOSTS LEFT *************************************************************************************************************************
to retry, use: --limit @/home/lennart/workspace/elastisys/vagrant/openshift/openshift-ansible/playbooks/deploy_cluster.retry
PLAY RECAP *********************************************************************************************************************************
localhost : ok=12 changed=0 unreachable=0 failed=0
master : ok=248 changed=102 unreachable=0 failed=1
INSTALLER STATUS ***************************************************************************************************************************
Initialization : Complete (0:00:11)
Health Check : Complete (0:00:38)
Node Bootstrap Preparation : Complete (0:02:03)
etcd Install : Complete (0:00:04)
Master Install : In Progress (0:17:28)
This phase can be restarted by running: playbooks/openshift-master/config.yml
Failure summary:
1. Hosts: master
Play: Configure masters
Task: Report control plane errors
Message: Control plane pods didn't come up
Full logs here: deploy.txt
Verbose logs here: deploy-verbose.txt
There were three warnings:
[WARNING]: Could not match supplied host pattern, ignoring: oo_lb_to_config
[WARNING]: Could not match supplied host pattern, ignoring: oo_nfs_to_config
[WARNING]: flush_handlers task does not support when conditional
I further debugged this and found this in the logs of the api pod:
I1010 08:38:44.914483 1 plugins.go:84] Registered admission plugin "NamespaceLifecycle"
I1010 08:38:44.914599 1 plugins.go:84] Registered admission plugin "Initializers"
I1010 08:38:44.914608 1 plugins.go:84] Registered admission plugin "ValidatingAdmissionWebhook"
I1010 08:38:44.914615 1 plugins.go:84] Registered admission plugin "MutatingAdmissionWebhook"
I1010 08:38:44.914621 1 plugins.go:84] Registered admission plugin "AlwaysAdmit"
I1010 08:38:44.914626 1 plugins.go:84] Registered admission plugin "AlwaysPullImages"
I1010 08:38:44.914634 1 plugins.go:84] Registered admission plugin "LimitPodHardAntiAffinityTopology"
I1010 08:38:44.914643 1 plugins.go:84] Registered admission plugin "DefaultTolerationSeconds"
I1010 08:38:44.914648 1 plugins.go:84] Registered admission plugin "AlwaysDeny"
I1010 08:38:44.914655 1 plugins.go:84] Registered admission plugin "EventRateLimit"
I1010 08:38:44.914660 1 plugins.go:84] Registered admission plugin "DenyEscalatingExec"
I1010 08:38:44.914663 1 plugins.go:84] Registered admission plugin "DenyExecOnPrivileged"
I1010 08:38:44.914668 1 plugins.go:84] Registered admission plugin "ExtendedResourceToleration"
I1010 08:38:44.914675 1 plugins.go:84] Registered admission plugin "OwnerReferencesPermissionEnforcement"
I1010 08:38:44.914683 1 plugins.go:84] Registered admission plugin "ImagePolicyWebhook"
I1010 08:38:44.914688 1 plugins.go:84] Registered admission plugin "InitialResources"
I1010 08:38:44.914693 1 plugins.go:84] Registered admission plugin "LimitRanger"
I1010 08:38:44.914698 1 plugins.go:84] Registered admission plugin "NamespaceAutoProvision"
I1010 08:38:44.914703 1 plugins.go:84] Registered admission plugin "NamespaceExists"
I1010 08:38:44.914707 1 plugins.go:84] Registered admission plugin "NodeRestriction"
I1010 08:38:44.914712 1 plugins.go:84] Registered admission plugin "PersistentVolumeLabel"
I1010 08:38:44.914717 1 plugins.go:84] Registered admission plugin "PodNodeSelector"
I1010 08:38:44.914722 1 plugins.go:84] Registered admission plugin "PodPreset"
I1010 08:38:44.914726 1 plugins.go:84] Registered admission plugin "PodTolerationRestriction"
I1010 08:38:44.914731 1 plugins.go:84] Registered admission plugin "ResourceQuota"
I1010 08:38:44.914736 1 plugins.go:84] Registered admission plugin "PodSecurityPolicy"
I1010 08:38:44.914741 1 plugins.go:84] Registered admission plugin "Priority"
I1010 08:38:44.914747 1 plugins.go:84] Registered admission plugin "SecurityContextDeny"
I1010 08:38:44.914752 1 plugins.go:84] Registered admission plugin "ServiceAccount"
I1010 08:38:44.914757 1 plugins.go:84] Registered admission plugin "DefaultStorageClass"
I1010 08:38:44.914762 1 plugins.go:84] Registered admission plugin "PersistentVolumeClaimResize"
I1010 08:38:44.914766 1 plugins.go:84] Registered admission plugin "StorageObjectInUseProtection"
Invalid MasterConfig /etc/origin/master/master-config.yaml
etcdClientInfo.ca: Invalid value: "/etc/origin/master/master.etcd-ca.crt": could not read file: stat /etc/origin/master/master.etcd-ca.crt: no such file or directory
Then I went back to the ansible logs and realized that it skipped almost all tasks when installing etcd. (Relevant part of logs here: etcd-installation-log.txt)
Why is this happening? Did I miss something in the inventory file?
There are quite a few issues about control plane pods not starting but I don't think this is a duplicate of any of them.
Here are some of the issues that I looked at before reporting:
/etc/cni, however, these issues does not mention anything about etcd not being present.CentOS Linux release 7.5.1804 (Core)# single-node.ini mostly copied from inventory/hosts.localhost
master ansible_host=192.168.121.159 ansible_port=22 ansible_user='vagrant' ansible_ssh_private_key_file='/home/lennart/.vagrant.d/insecure_private_key'
[OSEv3:children]
masters
nodes
etcd
[OSEv3:vars]
ansible_become=yes
openshift_deployment_type=origin
openshift_portal_net=172.30.0.0/16
openshift_disable_check=disk_availability,memory_availability,docker_storage
openshift_node_groups=[{'name': 'node-config-all-in-one', 'labels': ['node-role.kubernetes.io/master=true', 'node-role.kubernetes.io/infra=true', 'node-role.kubernetes.io/compute=true']}]
[masters]
master
[etcd]
master
[nodes]
master openshift_node_group_name="node-config-all-in-one"
Ansible 2.7 is not recommended yet - the issue should go away once you downgrade to 2.6
This is a perfectly described issue though, lets use it as 'ansible 2.7 support' tracking bug
Thanks for the quick reply!
I switched to the containerized installer to get the correct version, and the control plane came up as expected :)
@lentzi90 thanks for pointing this out. I ran into the exact same issue jumping to 2.7.0. I reverted back to ansible 2.6.2 and the installation proceeded without error.
just curious - why are you guys jumping on latest ansible version when it was made it clear in the requirement the ansible version ?
_not trying to start a debate, just i see more and more people ignoring and jumping to latest v_
The laptop I'm using is running Ubuntu, which is still stuck with ansible 2.5 in the official repos, so I went with a PPA and got the latest.
To be honest I didn't think Ansible >= 2.6.2 implied that 2.7 was unsupported.
@DanyC97
I've got a CI job that I use to run daily tests of the localhost origin installer on the release-3.10 branch. As part of that job, I first install the requirements per the README.md. Reading the requirements, I got the impressions it was ok to use _latest_ given I was installing origin and not OCP.
Requirements in README.md (release-3.10)
Requirements:
Ansible >= 2.4.3.0, 2.5.x is not currently supported for OCP installations
Jinja >= 2.7
pyOpenSSL
python-lxml
Code I used to prep installer:
- name: install required pip packages for installer
pip:
name: "{{ item.name }}"
state: latest
loop:
- ansible
- pyOpenSSL
tags:
- skip_ansible_lint
Would it make sense to add a check in the prerequisites.yml playbook that fails if required packages are higher than the supported versions?
state: latest
That would pull in latest ansible, which is 2.7. You should use requirements.txt instead
I think it would be a good idea to also mention requirements.txt in the readme. Something like:
You can ensure that you get supported versions of the packages by using
requirements.txtwhen installing pip packages:pip install -r requirements.txt.
Faced the same issue today. Continuing the failed installation after downgrading to ansible 2.6.5 didn't work but a clean installation with ansible 2.6.5 did work.
I believe the problem that caused this should fixed in Ansible 2.7.1.
In Ansible 2.7.0 we made a change to variable exposure from import_role as described at:
https://docs.ansible.com/ansible/latest/porting_guides/porting_guide_2.7.html#include-role-and-import-role-variable-exposure
However, that exposed a bug in regards to mutable defaults which was resolved for the Ansible 2.7.1 release as part of https://github.com/ansible/ansible/pull/46833
Based on what I understand of these playbooks, I believe that etcd_ca_setup: False carried over from 1 play to another play, causing the when statement on additional etcd role calls to be skipped.
I've had 2 co-workers confirm that Ansible 2.7.1 is behaving properly with the etcd roles.
/close
Fixed in Ansible 2.7.1
@sivel (and co-workers) thanks for confirming!
@sdodson: Closing this issue.
In response to this:
/close
Fixed in Ansible 2.7.1@sivel (and co-workers) thanks for confirming!
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/reopen
@mtnbikenc indicates other problems remain https://github.com/openshift/openshift-ansible/pull/10523#issuecomment-433472776
@sdodson: Reopening this issue.
In response to this:
/reopen
@mtnbikenc indicates other problems remain https://github.com/openshift/openshift-ansible/pull/10523#issuecomment-433472776
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
I had this issue as well trying to do a single node install.
A bad (edit: was "the" but surely this is not "the" solution) solution was to ditch the node-config-all-in-one node group and use just node-config-master while also setting openshift_schedulable=True. Then just manually re-label the node and re-run deploy_cluster.
When using node-config-all-in-one the Ansible playbook for some reason decides to try and configure node services before master services.
@DanyC97
I've got a CI job that I use to run daily tests
I was about to ask what Openshift/OKD's CI is in this regard because over the last 3 releases the regressions I've seen in various bits of the install procedure seems like there's a big big void in functional test coverage.
This breaks okd deployment on CentOS + EPEL, since they upgraded Ansible to 2.7 (not for the first time, either). @extra is still at 2.4, so EPEL is necessary. Maybe the CentOS SCL responsible for OpenShift should maintain their own Ansible packages which are compatible with openshift-ansible, like RHEL does?
Happy to contribute, if someone points me in the right direction.
so now with 3.11 being out you you can see that we have a new ansible rpm being released in CentOS extras repo especially to address the above issue.
If people happy with it i think we can close this issue
Ansible 2.7.4 is in CentOS repos - see https://cbs.centos.org/repos/configmanagement7-ansible-27-testing/x86_64/os/ - this seems to work for me, could someone verify that it works?
Most helpful comment
Ansible 2.7 is not recommended yet - the issue should go away once you downgrade to 2.6
This is a perfectly described issue though, lets use it as 'ansible 2.7 support' tracking bug