I have an openshift origin cluster in version 3.9. I want to upgrade in 3.10 but at the stage "Approve the node", i have always this message: Cound not find csr for nodes: XXXX". The upgrade hangs at this step.
fatal: [uosm1.XXXX -> uosm1.XXXX]: FAILED! => {
"attempts": 30,
"changed": false,
"invocation": {
"module_args": {
"node_list": [
"uosm1.XXXX"
],
"oc_bin": "oc",
"oc_conf": "/etc/origin/master/admin.kubeconfig"
}
},
"msg": "Cound not find csr for nodes: uosm1.XXXX",
"state": "unknown"
}
Failure summary:
oc --config=admin.kubeconfig.udll get nodes
NAME STATUS ROLES AGE VERSION
uosi1.XXXX Ready infra 133d v1.9.1+a0ce1bc657
uosi2.XXXX Ready infra 133d v1.9.1+a0ce1bc657
uosi3.XXXX Ready infra 133d v1.9.1+a0ce1bc657
uosm1.XXXX NotReady master 133d v1.10.0+b81c8f8
uosm2.XXXX Ready master 133d v1.9.1+a0ce1bc657
uosm3.XXXX Ready master 133d v1.9.1+a0ce1bc657
uosn1.XXXX Ready compute 133d v1.9.1+a0ce1bc657
uosn2.XXXX Ready compute 133d v1.9.1+a0ce1bc657
uosn3.XXXX Ready compute 133d v1.9.1+a0ce1bc657
oc --config=admin.kubeconfig.udll get csr
NAME AGE REQUESTOR CONDITION
csr-2kgng 15h system:node:uosm1.XXXX Pending
csr-2z4f7 9h system:node:uosm1.XXXX Pending
csr-4hsbd 13h system:node:uosm1.XXXX Pending
[...]
3 masters/3infranodes/3nodes in rhel 7.5 on vmware
Inventory file:
[OSEv3:children]
masters
nodes
etcd
[OSEv3:vars]
openshift_template_service_broker_namespaces=['openshift','default']
openshift_master_default_subdomain=uapps.XXXX
ansible_ssh_user=root
debug_level=2
openshift_master_cluster_hostname=uopenshift.XXXX
openshift_master_cluster_public_hostname=uopenshift.XXXX
openshift_deployment_type=origin
openshift_release="3.10"
openshift_clock_enabled=true
openshift_use_openshift_sdn=true
openshift_master_named_certificates=[{"certfile": "/home/XXXX/git/openshift/configuration/before_openshift/https/XXXX.crt", "keyfile": "/home/XXXX/git/openshift/configuration/before_openshift/https/XXXX.key", "cafile": "/home/XXXX/git/openshift/configuration/before_openshift/https/XXXX.ca"}]
openshift_master_overwrite_named_certificates=true
openshift_hosted_router_certificate={"certfile": "/home/XXXX/git/openshift/configuration/before_openshift/https/XXXX.crt", "keyfile": "/home/XXXX/git/openshift/configuration/before_openshift/https/XXXX.key", "cafile": "/home/XXXX/git/openshift/configuration/before_openshift/https/XXXX.ca"}
openshift_hosted_registry_routehost=registry.uapps.XXXX
openshift_hosted_registry_routecertificates={"certfile": "/home/XXXX/git/openshift/configuration/before_openshift/https/uapps.XXXX.crt", "keyfile": "/home/XXXX/git/openshift/configuration/before_openshift/https/uapps.XXXX.key", "cafile": "/home/XXXX/git/openshift/configuration/before_openshift/https/uapps.XXXX.ca"}
osm_cluster_network_cidr=173.18.0.0/16
openshift_portal_net=172.19.0.0/16
openshift_docker_options='--insecure-registry 172.18.0.0/15'
openshift_router_selector='node-role.kubernetes.io/infra=true'
openshift_registry_selector='node-role.kubernetes.io/infra=true'
osm_default_node_selector='node-role.kubernetes.io/compute=true'
openshift_master_api_port=443
openshift_master_console_port=443
openshift_disable_check=docker_image_availability,memory_availability,disk_availability,package_availability,docker_storage
openshift_http_proxy=http://XXXX:3128
openshift_https_proxy=https://XXX:3128
openshift_no_proxy='172.18.0.0/15,registry.uapps.XXXX,ceph-s3.XXXX'
openshift_hosted_registry_storage_kind=object
openshift_hosted_registry_storage_provider=s3
openshift_hosted_registry_storage_s3_accesskey=XXXX
openshift_hosted_registry_storage_s3_secretkey=XXXX
openshift_hosted_registry_storage_s3_regionendpoint=http://ceph-s3.XXXX:8080
openshift_hosted_registry_storage_s3_bucket=XXXX
openshift_hosted_registry_storage_s3_region=default
openshift_hosted_registry_storage_s3_chunksize=26214400
openshift_hosted_registry_storage_s3_rootdirectory=/registry
openshift_hosted_registry_pullthrough=true
openshift_hosted_registry_acceptschema2=true
openshift_hosted_registry_enforcequota=false
[masters]
uosm1.XXXX
uosm2.XXXX
uosm3.XXXX
[etcd]
uosm1.XXXX
uosm2.XXXX
uosm3.XXXX
[nodes]
uosm1.XXXX openshift_node_group_name='node-config-master'
uosm2.XXXX openshift_node_group_name='node-config-master'
uosm3.XXXX openshift_node_group_name='node-config-master'
uosi1.XXXX openshift_node_group_name='node-config-infra'
uosi2.XXXX openshift_node_group_name='node-config-infra'
uosi3.XXXX openshift_node_group_name='node-config-infra'
uosn1.XXXX openshift_node_group_name='node-config-compute'
uosn2.XXXX openshift_node_group_name='node-config-compute'
uosn3.XXXX openshift_node_group_name='node-config-compute'
The native dns must be resolved to all machine names, in version 3.10.
Thanks for your answer.
All nodes can resolve each other. On the first master, i can resolve the other nodes, but also the nodes out of this cluster.
There is a problem when the csr is created. When i do an oc describe on it, i don't see the real name of this node. Instead of the node's fqdn the certificate is created for the node's hostname.
If you were previously using /etc/hosts on each node, how do you fix this? Entering each node into DNS doesn't seem to resolve by itself
I have the same issue on a new installation of the 90 Day Trial from RedHat for OpenShift-Enterprise v3.10.41-1
FAILED - RETRYING: Approve node certificates when bootstrapping (2 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (1 retries left).
fatal: [AgoodDNSHostname]: FAILED! => {"attempts": 30, "changed": false, "msg": "Cound not find csr for nodes: "AgoodDNSHostname", "state": "unknown"}
Edited: I have ticket open with redhat by the way, no work around or diagnostic since the 6th of September
I have the same problem upgrading to 3.10 using openshift-ansible.noarch 3.10.41-1.git.0.fd15dd7.el7 @rhel-7-server-ose-3.10-rpms
output of 'hostname': node1
output of 'hostname -f': node1.ourdomain.com
nodes in the inventory file are fqdn
I have the same issue on a new installation of the 90 Day Trial from RedHat for OpenShift-Enterprise v3.10.41-1
FAILED - RETRYING: Approve node certificates when bootstrapping (2 retries left).
FAILED - RETRYING: Approve node certificates when bootstrapping (1 retries left).
fatal: [AgoodDNSHostname]: FAILED! => {"attempts": 30, "changed": false, "msg": "Cound not find csr for nodes: "AgoodDNSHostname", "state": "unknown"}Edited: I have ticket open with redhat by the way, no work around or diagnostic since the 6th of September
Can you use the openshift-ansible-3.10.21-1.git.0.6446011 ? It works for me with this release. If not, when your playbook is at this stage, do an "oc get csr". You can see all csr and you can approve the last generated csr with the command "oc adm certificate approve "CSR_ID" "
That didn't help form me. With a bit of hacking I got past the csr problem but then I got cni errors: "Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized"
I then decided to change the hostnames to fqdn
After clearing up the mess of previous upgrade attempts like a failed etcd node and sync daemon set I got through the upgrade ( at least until the installation of the service catalog. But that is an other story)
There have been a series of fixes around the CSR process in the last couple of weeks. I don't think all the fixes have shipped out yet, but the latest of git branch release-3.10 should contain them.
It's important that when upgrading from 3.9, your hostnames match the node names in 'oc get nodes' otherwise, we won't be able to find the CSRs for your nodes.
we were able to solve the problem using the updated playbooks in 3.10 from the github repo and replace the RPM playbooks with the ones from the github repo release-3.10 and that worked.
How ever for some reason if I download the entire release-3.10 it still fails with the same csr issue.
Any way my 3.10 openshift enterprise is now online
we were able to solve the problem using the updated playbooks in 3.10 from the github repo and replace the RPM playbooks with the ones from the github repo release-3.10 and that worked.
Can you please explain what you did exactly by "using the updated playbooks"? It seems to fail for me...
Downloaded these and replaced:
https://github.com/openshift/openshift-ansible/pull/10055
roles/lib_openshift/library/oc_csr_approve.py
roles/lib_openshift/test/test_oc_csr_approve.py
Works for me with openshift-ansible-3.10.51-1
@infrasystemelille , i'm not able to find the openshift-ansible-3.10.51-1 rpm in rhel yum repo.
It is not a rpm,you have to clone the git repository of openshift ansible and do a git checkout of this release or download this rpm: http://mirror.centos.org/centos/7/paas/x86_64/openshift-origin310/openshift-ansible-3.10.51-1.git.0.44a646c.el7.noarch.rpm
release-3.10 is the branch right?
release-3.10 is the branch right?
yes
@infrasystemelille am getting the same error using the rpm as well as the git source.
If you were previously using /etc/hosts instead of DNS, and then switch to DNS, is there any way to fix the existing nodes without rebuilding the cluster?
Most helpful comment
The native dns must be resolved to all machine names, in version 3.10.