I have installed the openshift-origin v3.7, v3.8, v3.9, v3.10, but all got following issues:
There may be some prerequisites for service catalog?
fatal: [dev.cefcfco.com]: FAILED! => {
"attempts": 120,
"changed": false,
"cmd": [
"curl",
"-k",
"https://apiserver.kube-service-catalog.svc/healthz"
],
"delta": "0:00:01.188682",
"end": "2018-03-22 02:32:27.933614",
"invocation": {
"module_args": {
"_raw_params": "curl -k https://apiserver.kube-service-catalog.svc/healthz",
"_uses_shell": false,
"chdir": null,
"creates": null,
"executable": null,
"removes": null,
"stdin": null,
"warn": false
}
},
"rc": 0,
"start": "2018-03-22 02:32:26.744932",
"stderr": " % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent
Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0\r100 180 100 180 0 0 153 0 0:00:01 0:00:01 --:--:-- 153",
"stderr_lines": [
" % Total % Received % Xferd Average Speed Time Time Time Current",
" Dload Upload Total Spent Left Speed",
"",
" 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0",
" 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0",
" 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0",
"100 180 100 180 0 0 153 0 0:00:01 0:00:01 --:--:-- 153"
],
"stdout": "[+]ping ok\n[+]poststarthook/generic-apiserver-start-informers ok\n[+]poststarthook/start-service-catalog-apiserver-informers ok\n[-]etcd failed: reason withheld\nhealthz check failed",
"stdout_lines": [
"[+]ping ok",
"[+]poststarthook/generic-apiserver-start-informers ok",
"[+]poststarthook/start-service-catalog-apiserver-informers ok",
"[-]etcd failed: reason withheld",
"healthz check failed"
]
}
to retry, use: --limit @/root/openshift-ansible/playbooks/byo/config.retry
INSTALLER STATUS ***********************************************************************************************************************************
Initialization : Complete
Health Check : Complete
etcd Install : Complete
Master Install : Complete
Master Additional Install : Complete
Node Install : Complete
Hosted Install : Complete
Service Catalog Install : In Progress
This phase can be restarted by running: playbooks/byo/openshift-cluster/service-catalog.yml
Failure summary:
1. Hosts: dev.cefcfco.com
Play: Service Catalog
Task: wait for api server to be ready
Message: Failed without returning a message.
md5-e20ea802451fc96636da74758c78b608
[root@feng ~]# curl -k https://apiserver.kube-service-catalog.svc/healthz
[+]ping ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-service-catalog-apiserver-informers ok
[-]etcd failed: reason withheld
healthz check failed
md5-e20ea802451fc96636da74758c78b608
[root@dev ~]# oc get pods -n kube-service-catalog
NAME READY STATUS RESTARTS AGE
apiserver-qbjj7 1/1 Running 0 9m
controller-manager-ptz7v 1/1 Running 1 9m
md5-dac2bdbc8817d492bf82f7117bf04e47
[OSEv3:children]
masters
nodes
etcd
[OSEv3:vars]
ansible_ssh_user=root
enable_excluders=False
enable_docker_excluder=False
ansible_service_broker_install=False
containerized=True
os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant'
openshift_disable_check=disk_availability,docker_storage,memory_availability,docker_image_availability,package_version
deployment_type=origin
openshift_deployment_type=origin
openshift_release=v3.7.2
openshift_release=v3.7.2
openshift_pkg_version=v3.7.2
openshift_image_tag=v3.7.2
openshift_service_catalog_image_version=v3.7.2
template_service_broker_image_version=v3.7.2
openshift_metrics_image_version=v3.7.2
osm_use_cockpit=true
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
openshift_public_hostname=dev.cefcfco.com
openshift_master_default_subdomain=apps.dev.cefcfco.com
[masters]
dev.cefcfco.com openshift_schedulable=true
[etcd]
dev.cefcfco.com
[nodes]
dev.cefcfco.com openshift_schedulable=true openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
i have the same Problem with Openshift 3.7
I find all the related content, but not give me answers.
If anyone knows the answer, please let me know.
@flipkill1985
I finally installed successed it this morning.
[root@localhost ~]# oc get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default docker-registry-1-zmgt4 1/1 Running 0 10m
default registry-console-1-6dnjv 1/1 Running 0 10m
default router-1-n479h 1/1 Running 0 12m
kube-service-catalog apiserver-8sd62 1/1 Running 0 9m
kube-service-catalog controller-manager-5bbvb 1/1 Running 0 9m
openshift-ansible-service-broker asb-1-deploy 1/1 Running 0 8m
openshift-ansible-service-broker asb-1-scb6l 0/1 ImagePullBackOff 0 8m
openshift-ansible-service-broker asb-etcd-1-jq6s5 1/1 Running 0 8m
openshift-template-service-broker apiserver-dd6mh 1/1 Running 0 7m
I've always followed this video: https://blog.openshift.com/installing-openshift-3-7-1-30-minutes/
but all failed.
After I follow this step one by one, successed: https://docs.openshift.org/latest/install_config/install/host_preparation.html
maybe missing some prerequisites package before.
https://docs.openshift.org/latest/install_config/install/host_preparation.html
This is for openshift 3.9 not 3.7.x ???
@flipkill1985
I installed v3.7 successed.
Can you post your steps and the playbook you use? Please :)
This time, I use basic config for test, so I don't set hostname, dns, docker storage .... ,but I think this is easy until you successed installed.
1
yum install wget git net-tools bind-utils iptables-services bridge-utils bash-completion kexec-tools sos psacct
yum update
2
yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sed -i -e "s/^enabled=1/enabled=0/" /etc/yum.repos.d/epel.repo
yum -y --enablerepo=epel install ansible pyOpenSSL
3
git clone https://github.com/openshift/openshift-ansible
cd openshift-ansible
git checkout release-3.7
cd ~/
4
yum install docker-1.13.1
systemctl start docker
systemctl enable docker
5
ssh-keygen -t rsa
-- change to your host ip
ssh-copy-id -i ~/.ssh/id_rsa.pub 10.1.7.39
6
vi /etc/ansible/hosts
-- my hosts, change with your host ip,
-- dev.cefcfco.com this is my domain, change with your's
[OSEv3:children]
masters
nodes
etcd
nfs
[OSEv3:vars]
ansible_ssh_user=root
os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant'
openshift_disable_check=disk_availability,docker_storage,memory_availability,docker_image_availability,package_version
openshift_docker_options='--selinux-enabled --insecure-registry 172.30.0.0/16'
deployment_type=origin
openshift_deployment_type=origin
openshift_release=v3.7
openshift_hosted_etcd_storage_kind=nfs
openshift_hosted_etcd_storage_nfs_options="*(rw,root_squash,sync,no_wdelay)"
openshift_hosted_etcd_storage_nfs_directory=/opt/osev3-etcd
openshift_hosted_etcd_storage_volume_name=etcd-vol2
openshift_hosted_etcd_storage_access_modes=["ReadWriteOnce"]
openshift_hosted_etcd_storage_volume_size=1G
openshift_hosted_etcd_storage_labels={'storage': 'etcd'}
ansible_service_broker_image_prefix=openshift/
ansible_service_broker_registry_url="registry.access.redhat.com"
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
openshift_public_hostname=dev.cefcfco.com
openshift_master_default_subdomain=apps.dev.cefcfco.com
[masters]
10.1.7.39 openshift_schedulable=true
[etcd]
10.1.7.39
[nfs]
10.1.7.39
[nodes]
10.1.7.39 openshift_schedulable=true openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
7
ansible-playbook -i /etc/ansible/hosts openshift-ansible/playbooks/byo/config.yml -vvv
dont work :( wich Distribution do you use, i use centos 7.4
Thats the Error:
fatal: [sp-peter02.os.peter.es]: FAILED! => {
"attempts": 120,
"changed": false,
"cmd": [
"curl",
"-k",
"https://apiserver.kube-service-catalog.svc/healthz"
],
"delta": "0:00:00.144529",
"end": "2018-03-23 09:34:21.024849",
"invocation": {
"module_args": {
"_raw_params": "curl -k https://apiserver.kube-service-catalog.svc/healthz",
"_uses_shell": false,
"chdir": null,
"creates": null,
"executable": null,
"removes": null,
"stdin": null,
"warn": false
}
},
"rc": 0,
"start": "2018-03-23 09:34:20.880320",
"stderr": " % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 180 100 180 0 0 1311 0 --:--:-- --:--:-- --:--:-- 1313",
"stderr_lines": [
" % Total % Received % Xferd Average Speed Time Time Time Current",
" Dload Upload Total Spent Left Speed",
"",
" 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0",
"100 180 100 180 0 0 1311 0 --:--:-- --:--:-- --:--:-- 1313"
],
"stdout": "[+]ping ok\n[+]poststarthook/generic-apiserver-start-informers ok\n[+]poststarthook/start-service-catalog-apiserver-informers ok\n[-]etcd failed: reason withheld\nhealthz check failed",
"stdout_lines": [
"[+]ping ok",
"[+]poststarthook/generic-apiserver-start-informers ok",
"[+]poststarthook/start-service-catalog-apiserver-informers ok",
"[-]etcd failed: reason withheld",
"healthz check failed"
]
}
# curl -k https://apiserver.kube-service-catalog.svc/healthz
[+]ping ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-service-catalog-apiserver-informers ok
[-]etcd failed: reason withheld
healthz check failed
Please can someone help me???
Thats the Problem
# curl -k https://apiserver.kube-service-catalog.svc/healthz
[+]ping ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-service-catalog-apiserver-informers ok
[-]etcd failed: reason withheld
healthz check failed
@flipkill1985
ansible hosts , you use ip or hostname ?
I found, if I use hostname , then failed, but use ip successed.
I guess this is a DNS problem.
so I will try to install a dns server after.
i use hostname
@flipkill1985 I think this _may_ be related to https://github.com/openshift/origin/issues/17316
Do you have a wildcard entry for *.dev.cefcfco.com configured in your DNS?
I've recently experienced a similar issue where the apiserver pod failed to resolve the etcd hosts correctly because the DNS lookup was matching a wildcard DNS, entry due to the search and ndots configuration in /etc/resolv.conf inside the apiserver pod
see my comment here as i found similar behavior: https://github.com/openshift/openshift-ansible/issues/8076
I'm running into the same issue.
[OSEv3:children]
masters
nodes
etcd
lb
[OSEv3:vars]
ansible_python_interpreter=/usr/bin/python3
ansible_ssh_user=fedora
ansible_become=true
openshift_deployment_type=origin
openshift_release=v3.9
openshift_master_cluster_method=native
openshift_master_cluster_hostname=k8s.unigs.de
openshift_master_cluster_public_hostname=cloud.unigs.de
[masters]
node1.k8s.unigs.de
node3.k8s.unigs.de
node5.k8s.unigs.de
[etcd]
node1.k8s.unigs.de
node3.k8s.unigs.de
node5.k8s.unigs.de
[lb]
lb.k8s.unigs.de ansible_python_interpreter=/usr/bin/python ansible_ssh_user=root
[nodes]
node1.k8s.unigs.de openshift_node_labels="{'region': 'infra','zone': 'default'}"
node3.k8s.unigs.de openshift_node_labels="{'region': 'infra','zone': 'default'}"
node5.k8s.unigs.de openshift_node_labels="{'region': 'infra','zone': 'default'}"
node2.k8s.unigs.de openshift_node_labels="{'region': 'infra','primary': 'default'}"
node4.k8s.unigs.de openshift_node_labels="{'region': 'infra','primary': 'default'}"
node6.k8s.unigs.de openshift_node_labels="{'region': 'infra','primary': 'default'}"
node 1 to 6 are fedora atomic, lb is centos 7. All on the latest version.
I have done all the prepare commands and setup a fully working dns (inluding wildcard, they point to the lb).
I noticed that 1 of 3 curl -k https://apiserver.kube-service-catalog.svc/healthz will return ok.
Is there anything i can provide to give you a clue what could be wrong?
On a retest from scratch with an external load balancer i got stuck in exactly the same error.
The healthz url seems to only work on a single node. It fails in ~66% of the curls.
for i in {1..1000}; do curl -s -k https://apiserver.kube-service-catalog.svc/healthz \
| grep -oE '^ok \
|etcd.*'; done \
| sort \
| uniq -c
662 etcd failed: reason withheld
338 ok
I think i found the reason why its not working:
some of the api servers do not work:
apiserver-8n5g5 @node1 curl -k https://10.128.0.4:6443 healthz check failed
apiserver-cdbfh @node3 curl -k https://10.129.0.4:6443 healthz check failed
apiserver-n4qm7 @node2 curl -k https://10.130.0.6:6443 ok
a quick look with describe showed me that hey try to reslove the etcd servers:
Command:
/usr/bin/service-catalog
Args:
apiserver
--storage-type
etcd
--secure-port
6443
--etcd-servers
https://node1.k8s.unigs.de:2379,https://node2.k8s.unigs.de:2379,https://node3.k8s.unigs.de:2379
--etcd-cafile
/etc/origin/master/master.etcd-ca.crt
--etcd-certfile
/etc/origin/master/master.etcd-client.crt
--etcd-keyfile
/etc/origin/master/master.etcd-client.key
-v
3
--cors-allowed-origins
localhost
--admission-control
KubernetesNamespaceLifecycle,DefaultServicePlan,ServiceBindingsLifecycle,ServicePlanChangeValidator,BrokerAuthSarCheck
--feature-gates
OriginatingIdentity=true
i exec into the contianer and run the following commands:
sh-4.2# ping node1.k8s.unigs.de
PING node1.k8s.unigs.de.k8s.unigs.de (10.18.255.99) 56(84) bytes of data.
64 bytes from lb.k8s.unigs.de (10.18.255.99): icmp_seq=1 ttl=63 time=0.213 ms
that is clearly wrong. Notice the point on the end on the next command.
sh-4.2# ping node1.k8s.unigs.de.
PING node1.k8s.unigs.de (10.18.255.1) 56(84) bytes of data.
64 bytes from node1.k8s.unigs.de (10.18.255.1): icmp_seq=1 ttl=63 time=0.730 ms
oh interesting!
sh-4.2# cat /etc/resolv.conf
nameserver 10.18.255.2
search kube-service-catalog.svc.cluster.local svc.cluster.local cluster.local k8s.unigs.de
options ndots:5
as far as i understand it, the ndots:5 option forces to lookup hostnames with fewer than 5 dots. i have 4. so node1.k8s.unigs.de gets resolved to node1.k8s.unigs.de.k8s.unigs.de.
does this ndots option make sense? and how can i force it to use the domain name i provided?
i tried adding openshift_ip= to all of my hosts, but that did not change the result.
I finally got it to work. The cause of the issue was that i had a wildcard A record on the domain i used.
If there is no wildcard entry node1.k8s.unigs.de.k8s.unigs.de gets not resolved and it will try to resolve the correct name.
I redeployed the same stuff on another domain, without a wildcard record and it worked!
this may also works for these issues:
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle rotten
/remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.
/close
@openshift-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting
/reopen.
Mark the issue as fresh by commenting/remove-lifecycle rotten.
Exclude this issue from closing again by commenting/lifecycle frozen./close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
I think i found the reason why its not working:
some of the api servers do not work:
a quick look with describe showed me that hey try to reslove the etcd servers:
i exec into the contianer and run the following commands:
that is clearly wrong. Notice the point on the end on the next command.
oh interesting!
as far as i understand it, the
ndots:5option forces to lookup hostnames with fewer than 5 dots. i have 4. sonode1.k8s.unigs.degets resolved tonode1.k8s.unigs.de.k8s.unigs.de.does this ndots option make sense? and how can i force it to use the domain name i provided?
i tried adding
openshift_ip=to all of my hosts, but that did not change the result.