OKD 3.11 installation fails at:
TASK [openshift_service_catalog : Wait for API Server rollout success]
Please put the following version information in the code block
indicated below.
ansible --versionansible 2.6.5
config file = /etc/ansible/ansible.cfg
configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/site-packages/ansible
executable location = /usr/bin/ansible
python version = 2.7.5 (default, Oct 30 2018, 23:45:53) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]
If you're operating from a git clone:
git describeopenshift-ansible-3.11.51-1
[OSEv3:children]
masters
etcd
lb
nodes
[masters]
master[1:2].example.com
[etcd]
etcd[1:2].example.com
[lb]
lb1.example.com
[nodes]
master[1:2].example.com openshift_node_group_name='node-config-master'
node[1:2].example.com openshift_node_group_name='node-config-compute'
infra-node[1:2].example.com openshift_node_group_name='node-config-infra'
[OSEv3:vars]
ansible_ssh_user=root
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true','challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]
openshift_deployment_type=origin
openshift_release=v3.11
openshift_master_cluster_method=native
openshift_master_cluster_hostname=console.example.com
openshift_master_default_subdomain=apps.example.com
openshift_master_api_port=8443
openshift_master_console_port=8443
openshift_disable_check=disk_availability,docker_storage,memory_availability,docker_image_availability
os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant'
## Needed for OKD 3.11
openshift_additional_repos=[{'id': 'centos-paas', 'name': 'centos-paas','baseurl' :'https://buildlogs.centos.org/centos/7/paas/x86_64/openshift-origin311','gpgcheck' :'0', 'enabled' :'1'}]
/etc/hosts:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.4.1.100 ns1.example.com ns1
10.4.1.101 lb1.example.com lb1 console.example.com console
10.4.1.102 master1.example.com master1
10.4.1.103 master2.example.com master2
10.4.1.104 etcd1.example.com etcd1
10.4.1.105 etcd2.example.com etcd2
10.4.1.106 node1.example.com node1
10.4.1.107 node2.example.com node2
10.4.1.108 infra-node1.example.com infra-node1
10.4.1.109 infra-node2.example.com infra-node2
/etc/dnsmasq.conf:
conf-dir=/etc/dnsmasq.d,.rpmnew,.rpmsave,.rpmorig
strict-order
domain-needed
local=/example.com/
bind-dynamic
log-queries
address=/.example.com/10.4.1.101 # load-balancer
/etc/resolv.conf on each Node (DNS nameserver [ns1] additionally has an upstream DNS):
# Generated by NetworkManager
search example.com
nameserver 10.4.1.100
dig output to test DNS resolution -- run from Load Balancer:
[root@lb1 ~]# dig ns1.example.com @10.4.1.100 +short
10.4.1.100
[root@lb1 ~]# dig lb1.example.com @10.4.1.100 +short
10.4.1.101
[root@lb1 ~]# dig master1.example.com @10.4.1.100 +short
10.4.1.102
[root@lb1 ~]# dig master2.example.com @10.4.1.100 +short
10.4.1.103
[root@lb1 ~]# dig etcd1.example.com @10.4.1.100 +short
10.4.1.104
[root@lb1 ~]# dig etcd2.example.com @10.4.1.100 +short
10.4.1.105
[root@lb1 ~]# dig node1.example.com @10.4.1.100 +short
10.4.1.106
[root@lb1 ~]# dig node2.example.com @10.4.1.100 +short
10.4.1.107
[root@lb1 ~]# dig infra-node1.example.com @10.4.1.100 +short
10.4.1.108
[root@lb1 ~]# dig infra-node2.example.com @10.4.1.100 +short
10.4.1.109
for host in \
lb1.example.com \
master1.example.com \
master2.example.com \
etcd1.example.com \
etcd2.example.com \
node1.example.com \
node2.example.com \
infra-node1.example.com \
infra-node2.example.com; \
do
ssh-copy-id ${host}; \
done
# ansible-playbook -i inventory.ini openshift-ansible/playbooks/prerequisites.yml
# ansible-playbook -i inventory.ini openshift-ansible/playbooks/deploy_cluster.yml
OKD 3.11 to install and the Service Catalog install to rollout successfully
OKD 3.11 fails to install the Service Catalog:
TASK [openshift_service_catalog : Wait for API Server rollout success]
FAILED - RETRYING: Wait for API Server rollout success (1 retries left).
fatal: [master1.example.com]: FAILED! => {
"attempts": 5,
"changed": false,
"cmd": [
"oc",
"rollout",
"status",
"--config=/etc/origin/master/admin.kubeconfig",
"-n",
"kube-service-catalog",
"ds/apiserver"
],
"delta": "0:00:00.219700",
"end": "2018-12-04 10:53:07.205662",
"invocation": {
"module_args": {
"_raw_params": "oc rollout status --config=/etc/origin/master/admin.kubeconfig -n kube-service-catalog ds/apiserver",
"_uses_shell": false,
"argv": null,
"chdir": null,
"creates": null,
"executable": null,
"removes": null,
"stdin": null,
"warn": true
}
},
"msg": "non-zero return code",
"rc": 1,
"start": "2018-12-04 10:53:06.985962",
"stderr": "error: watch closed before Until timeout",
"stderr_lines": [
"error: watch closed before Until timeout"
],
"stdout": "Waiting for daemon set \"apiserver\" rollout to finish: 0 of 2 updated pods are available...\nWaiting for daemon set \"apiserver\" rollout to
finish: 0 of 2 updated pods are available...",
"stdout_lines": [
"Waiting for daemon set \"apiserver\" rollout to finish: 0 of 2 updated pods are available...",
"Waiting for daemon set \"apiserver\" rollout to finish: 0 of 2 updated pods are available..."
]
}
...ignoring
TASK [openshift_service_catalog : Wait for Controller Manager rollout success]
FAILED - RETRYING: Wait for Controller Manager rollout success (1 retries left).
fatal: [master1.example.com]: FAILED! => {
"attempts": 5,
"changed": false,
"cmd": [
"oc",
"rollout",
"status",
"--config=/etc/origin/master/admin.kubeconfig",
"-n",
"kube-service-catalog",
"ds/controller-manager"
],
"delta": "0:00:00.221175",
"end": "2018-12-04 10:54:00.366703",
"invocation": {
"module_args": {
"_raw_params": "oc rollout status --config=/etc/origin/master/admin.kubeconfig -n kube-service-catalog ds/controller-manager",
"_uses_shell": false,
"argv": null,
"chdir": null,
"creates": null,
"executable": null,
"removes": null,
"stdin": null,
"warn": true
}
},
"msg": "non-zero return code",
"rc": 1,
"start": "2018-12-04 10:54:00.145528",
"stderr": "error: watch closed before Until timeout",
"stderr_lines": [
"error: watch closed before Until timeout"
],
"stdout": "Waiting for daemon set \"controller-manager\" rollout to finish: 0 of 2 updated pods are available...\nWaiting for daemon set \"controller-ma
nager\" rollout to finish: 0 of 2 updated pods are available...",
"stdout_lines": [
"Waiting for daemon set \"controller-manager\" rollout to finish: 0 of 2 updated pods are available...",
"Waiting for daemon set \"controller-manager\" rollout to finish: 0 of 2 updated pods are available..."
]
}
...ignoring
TASK [openshift_service_catalog : Verify that the Catalog API Server is running]
FAILED - RETRYING: Verify that the Catalog API Server is running (1 retries left).
fatal: [master1.example.com]: FAILED! => {
"attempts": 60,
"changed": false,
"cmd": [
"curl",
"-k",
"https://apiserver.kube-service-catalog.svc/healthz"
],
"delta": "0:00:01.014590",
"end": "2018-12-04 11:17:36.736802",
"invocation": {
"module_args": {
"_raw_params": "curl -k https://apiserver.kube-service-catalog.svc/healthz",
"_uses_shell": false,
"argv": null,
"chdir": null,
"creates": null,
"executable": null,
"removes": null,
"stdin": null,
"warn": false
}
},
"msg": "non-zero return code",
"rc": 7,
"start": "2018-12-04 11:17:35.722212",
"stderr": " % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spe
nt Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r 0 0 0 0 0 0 0 0 --:--:--
0:00:01 --:--:-- 0curl: (7) Failed connect to apiserver.kube-service-catalog.svc:443; Connection refused",
"stderr_lines": [
" % Total % Received % Xferd Average Speed Time Time Time Current",
" Dload Upload Total Spent Left Speed",
"",
" 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0",
" 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0curl: (7) Failed connect to apiserver.kube-service-catalog.svc:443; C
onnection refused"
],
"stdout": "",
"stdout_lines": []
}
...ignoring
TASK [openshift_service_catalog : Report errors] ***********************************************************************************************************
fatal: [master1.example.com]: FAILED! => {"changed": false, "msg": "Catalog install failed."}
to retry, use: --limit @/root/okd-installer/openshift-ansible/playbooks/openshift-service-catalog/config.retry
PLAY RECAP *************************************************************************************************************************************************
etcd1.example.com : ok=18 changed=1 unreachable=0 failed=0
etcd2.example.com : ok=16 changed=1 unreachable=0 failed=0
infra-node1.example.com : ok=0 changed=0 unreachable=0 failed=0
infra-node2.example.com : ok=0 changed=0 unreachable=0 failed=0
lb1.example.com : ok=1 changed=0 unreachable=0 failed=0
localhost : ok=12 changed=0 unreachable=0 failed=0
master1.example.com : ok=91 changed=25 unreachable=0 failed=1
master2.example.com : ok=28 changed=1 unreachable=0 failed=0
node1.example.com : ok=0 changed=0 unreachable=0 failed=0
node2.example.com : ok=0 changed=0 unreachable=0 failed=0
INSTALLER STATUS *******************************************************************************************************************************************
Initialization : Complete (0:00:34)
Service Catalog Install : In Progress (0:24:32)
This phase can be restarted by running: playbooks/openshift-service-catalog/config.yml
CentOS Linux release 7.6.1810 (Core)
I've retired just running the playbooks/openshift-service-catalog/config.yml, same output.
Anyone see anything wrong with my inventory/setup?
Thanks!
@JayKayy mentioned that the template being used to create the daemonsets (in his case) were using the wrong etcd value for etcd_servers. It was trying to use master[0] host (etcd isn't co-located).
Since my inventory is setup to have external etcds too, maybe this is a similar issue?
I've resolved this particular issue, installation now finishes and the service catalog rolls out successfully.
My particular issue was in the DNSMasq configuration.
I had:
address=/.example.com/10.4.1.101 # load-balancer
and needed, instead (note the added apps):
address=/.apps.example.com/10.4.1.101 # load-balancer
My guess at the root cause is that, even though I was able to resolve all domains, it was still failing because wildcard was setup to resolve everything unknown to the Load Balancer, including the expected: foobar.apps.example.com, making it a bit tricky to figure out.
Sorry for the noise, hopefully this saves others some time if they wind up in a similar DNS situation.
Hey all,
i need for you experience. i search this ns1.example.com ns1 in the inventory but i can't found.
I wanted to know the usefulness of this host. Can you please help me please.
best regards
khaled Moez
Most helpful comment
I've resolved this particular issue, installation now finishes and the service catalog rolls out successfully.
My particular issue was in the DNSMasq configuration.
I had:
and needed, instead (note the added apps):
My guess at the root cause is that, even though I was able to resolve all domains, it was still failing because wildcard was setup to resolve everything unknown to the Load Balancer, including the expected: foobar.apps.example.com, making it a bit tricky to figure out.
Sorry for the noise, hopefully this saves others some time if they wind up in a similar DNS situation.