when i used the openshift-ansible to install, the origin-master.service started fail. after check the journal, it seems like a problem of etce which could not be found and invoked. Infor showed bellow
ansible-playbook ~/openshift-ansible/playbooks/byo/config.yml
TASK [openshift_master : Start and enable master] **********
fatal: [10.134.29.158]: FAILED! => {"changed": false, "failed": true, "msg": "Job for origin-master.service failed because a timeout was exceeded. See \"systemctl status origin-master.service\" and \"journalctl -xe\" for details.\n"}
(then run) systemctl status openshift-master.service
● openshift-master.service
Loaded: not-found (Reason: No such file or directory)
Active: inactive (dead)
journalctl -xe
Jul 01 02:35:20 os-3-1-server-enb6fe2omdfu.novalocal origin-master[15396]: Content-Length: 0
Jul 01 02:35:22 os-3-1-server-enb6fe2omdfu.novalocal origin-master[15396]: E0701 02:35:22.017747 15396 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Jul 01 02:35:22 os-3-1-server-enb6fe2omdfu.novalocal origin-master[15396]: Content-Length: 0
Jul 01 02:35:23 os-3-1-server-enb6fe2omdfu.novalocal origin-master[15396]: E0701 02:35:23.224066 15396 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
(several miniutes later) systemctl status origin-master.service
● origin-master.service - Origin Master Service
Loaded: loaded (/usr/lib/systemd/system/origin-master.service; enabled; vendor preset: disabled)
Active: activating (start) since Fri 2016-07-01 03:01:31 UTC; 25s ago
Docs: https://github.com/openshift/origin
Main PID: 15700 (openshift)
CGroup: /system.slice/origin-master.service
└─15700 /usr/bin/openshift start master --config=/etc/origin/master/master-config.yaml --loglevel=2
Jul 01 03:01:51 os-3-1-server-enb6fe2omdfu.novalocal origin-master[15700]: E0701 03:01:51.960769 15700 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Jul 01 03:01:51 os-3-1-server-enb6fe2omdfu.novalocal origin-master[15700]: Content-Length: 0
Jul 01 03:01:53 os-3-1-server-enb6fe2omdfu.novalocal origin-master[15700]: E0701 03:01:53.171694 15700 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Jul 01 03:01:53 os-3-1-server-enb6fe2omdfu.novalocal origin-master[15700]: Content-Length: 0
Versions
openshift-ansible: lastest master
centos7-1.6
openshift v1.2.0
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5
Additional Information
archtecture of my openshfit
Host Name Infrastructure Component to Install
master.example.com Master and node
node1.example.com Node
node2.example.com Node
The times I've seen this HTTP/0.0 has always been a proxy is configured but the proxy cannot reach the etcd host. So I'd wonder if you have a proxy configured in /etc/sysconfig/origin-master? If not then all I can think of is to verify that etcd is started and look at its logs.
@sdodson , i doubt etcd is not successfully install on the master node. can i install it manully? The playbook of etcd is directly download from the github master branch, but i have no idea how to edit it.
Getting same error
TASK [openshift_master : Start and enable master api on first master] **********
skipping: [ec2-23-20-115-254.compute-1.amazonaws.com]
fatal: [ec2-54-147-48-107.compute-1.amazonaws.com]: FAILED! => {"changed": false, "failed": true, "msg": "Job for origin-master-api.service failed because a timeout was exceeded. See \"systemctl status origin-master-api.service\" and \"journalctl -xe\" for details.\n"}
OpenShift cluster:
2 Master nodes
1 Worker node
3 total nodes
1 lb
embedded etcd
@dvohra Is there anything in the journal indicating what the start failure was?
journalctl -u origin-master-api -l
Etcd failure response seems to be the issue, but a [etcd] is not configured and an embedded etcd is expected to be used.
Some command outputs for Master node on which api server failed:
systemctl status origin-master-api.service
origin-master-api.service - Atomic OpenShift Master API
Loaded: loaded (/usr/lib/systemd/system/origin-master-api.service; enabled; vendor preset: disabled)
Active: failed (Result: timeout) since Mon 2016-08-08 21:13:10 UTC; 9min ago
Aug 08 21:13:07 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 2...
Aug 08 21:13:07 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content...
Aug 08 21:13:08 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 2...
Aug 08 21:13:08 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content...
Aug 08 21:13:09 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 2...
Aug 08 21:13:09 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content...
Aug 08 21:13:10 ip-10-168-150-205 systemd[1]: origin-master-api.service star....
Aug 08 21:13:10 ip-10-168-150-205 systemd[1]: Failed to start Atomic OpenShi....
Aug 08 21:13:10 ip-10-168-150-205 systemd[1]: Unit origin-master-api.service....
Aug 08 21:13:10 ip-10-168-150-205 systemd[1]: origin-master-api.service failed.
journalctl -u origin-master-api -l
-- Logs begin at Mon 2016-08-08 20:05:20 UTC, end at Mon 2016-08-08 21:21:55 UTC. --
Aug 08 21:11:39 ip-10-168-150-205 systemd[1]: Starting Atomic OpenShift Master API...
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: I0808 21:11:40.026197 22228 start_api.go:102] Using a listen address override "0.0.0.0:8443"
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: W0808 21:11:40.028055 22228 start_master.go:270] assetConfig.loggingPublicURL: Invalid value:
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: W0808 21:11:40.028095 22228 start_master.go:270] assetConfig.metricsPublicURL: Invalid value:
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: I0808 21:11:40.041914 22228 plugins.go:71] No cloud provider specified.
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: I0808 21:11:40.043163 22228 genericapiserver.go:81] Adding storage destination for group
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: I0808 21:11:40.043201 22228 genericapiserver.go:81] Adding storage destination for group exte
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: I0808 21:11:40.043231 22228 start_master.go:383] Starting master on 0.0.0.0:8443 (v1.2.1)
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: I0808 21:11:40.043243 22228 start_master.go:384] Public master address is https://ec2-54-81-1
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: I0808 21:11:40.043262 22228 start_master.go:388] Using images from "openshift/origin-<compone
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 21:11:40.045935 22228 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content-Length: 0
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 21:11:40.071760 22228 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content-Length: 0
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 21:11:40.122593 22228 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content-Length: 0
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 21:11:40.223429 22228 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content-Length: 0
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 21:11:40.424236 22228 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content-Length: 0
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 21:11:40.825071 22228 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content-Length: 0
Aug 08 21:11:41 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 21:11:41.626177 22228 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Aug 08 21:11:41 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content-Length: 0
Aug 08 21:11:42 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 21:11:42.627226 22228 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Aug 08 21:11:42 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content-Length: 0
Aug 08 21:11:43 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 21:11:43.628290 22228 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Aug 08 21:11:43 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content-Length: 0
Aug 08 21:11:44 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 21:11:44.629247 22228 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Aug 08 21:11:44 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content-Length: 0
Aug 08 21:11:45 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 21:11:45.630314 22228 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Aug 08 21:11:45 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content-Length: 0
@dvohra Embedded etcd will only be used with the single master service origin-master.
If [etcd] is used another issue is generated
https://github.com/openshift/origin/issues/10259
@dvohra Yep, I just found that one and submitted a PR for what I think is the problem.
Thanks for submitting a PR for the other issue.
HA Master can't be used till the other issue is fixed?
@dvohra Right, without the etcd certificates we can't configure/start the etcd service.
@dvohra Looks like we just merged it. Please give it a try when you can.
Is a message such as the following usual. or restarting API server while installing could be an issue?
FAILED - RETRYING: HANDLER: openshift_master: Verify API Server (80 tries left)
While the earlier issue about [etcd] is fixed, installation does not complete. Shall open another issue.
Closing.
Hey,
Installing on a CentOs 7 and still having this problem. I am running everything in one master. any pointers?
@jmwenda Hey, I am still having this problem on a CentOs 7...Have u find solutions???
@jmwenda @wsszh I also met the same problem on my CentOS 7 env. My version is openshift-ansible-openshift-ansible-3.2.14-1.tar.gz.
I run ansible-playbook ~/openshift-ansible/playbooks/adhoc/uninstall.yml to uninstall it and reboot my env and then run ansible-playbook ~/openshift-ansible/playbooks/byo/config.yml to install again. It works!
I'm running into this as well with a centos 7.2 myself
[stack@undercloud openshift-ansible]$ ssh [email protected] 'cat /etc/redhat-release'
CentOS Linux release 7.2.1511 (Core)
And current head as of now. And here's how I'm running it:
bin/cluster create -o image_name=centos-custom -o external_net=ext-net -o floating_ip_pool=ext-net -o net_cidr=40.0.0.0/24 openstack test_cluster
Which results in a:
TASK [openshift_master : Start and enable master] ******************************
fatal: [test_cluster-master-0]: FAILED! => {"changed": false, "failed": true, "msg": "Job for origin-master.service failed because a timeout was exceeded. See \"systemctl status origin-master.service\" and \"journalctl -xe\" for details.\n"}
The journal gives me (repeatedly)
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.029121 19858 reflector.go:203] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/limitranger/admission.go:154: Failed to list
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.029212 19858 reflector.go:214] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/resourcequota/resource_access.go:83: Failed
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.029311 19858 reflector.go:214] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:119: Failed to l
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.029413 19858 reflector.go:203] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/namespace/lifecycle/admission.go:141: Failed
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.029506 19858 reflector.go:203] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/limitranger/admission.go:154: Failed to list
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.029712 19858 reflector.go:214] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:103: Failed to l
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.375788 19858 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.User: client: etcd cluster is unavailable o
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.481789 19858 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.Group: client: etcd cluster is unavailable
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.524174 19858 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.ClusterPolicy: client: etcd cluster is unav
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.551239 19858 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.ClusterPolicyBinding: client: etcd cluster
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.566663 19858 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.PolicyBinding: client: etcd cluster is unav
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.608404 19858 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.Policy: client: etcd cluster is unavailable
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.769059 19858 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.OAuthAccessToken: client: etcd cluster is u
Nov 05 13:46:15 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:15.014487 19858 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Nov 05 13:46:15 test-cluster-master-0.localdomain origin-master[19858]: Content-Length: 0
Some additional info after getting some help from @abutcher (thanks!), I went and did a yum install -y etcd to get etcdctl and I wound up with this...
[root@test-cluster-master-0 openshift]# journalctl -u origin-master -n 1000 | grep -i 4001 | tail -n 1
Nov 07 17:32:22 test-cluster-master-0.localdomain openshift[26897]: published {Name:openshift.local ClientURLs:[https://192.168.23.4:4001]} to cluster 31e7965d9f02f32d
[root@test-cluster-master-0 openshift]#
[root@test-cluster-master-0 openshift]#
[root@test-cluster-master-0 openshift]# etcdctl --endpoints="https://192.168.23.4:4001" --cert-file /etc/origin/master/master.etcd-client.crt --key-file /etc/origin/master/master.etcd-client.key --ca-file /etc/origin/master/ca.crt cluster-health
cluster may be unhealthy: failed to list members
Error: client: etcd cluster is unavailable or misconfigured
error #0: x509: certificate has expired or is not yet valid
What are the validity periods on the certificates and is the system clock right on this host?
openssl x509 -in /etc/origin/master/master.etcd-client.crt -noout -text | grep Not
openssl x509 -in /etc/origin/master/etcd.server.crt -noout -text | grep Not
...
Good eye -- The clock is accurate for my timezone at least. I think I spun up the cluster with the openshift-ansible playbooks around 16:00ish, I want to say.
But, when I show the dates for those certs.... They appear to not kick in until for another 15 minutes.
[root@test-cluster-master-0 openshift]# date
Mon Nov 7 17:58:13 EST 2016
[root@test-cluster-master-0 openshift]# openssl x509 -in /etc/origin/master/master.etcd-client.crt -noout -text | grep Not
Not Before: Nov 7 23:14:32 2016 GMT
Not After : Nov 7 23:14:33 2018 GMT
[root@test-cluster-master-0 openshift]# openssl x509 -in /etc/origin/master/etcd.server.crt -noout -text | grep Not
Not Before: Nov 7 23:14:33 2016 GMT
Not After : Nov 7 23:14:34 2018 GMT
Also, for what it's worth, the machine where I ran the playbooks from has the system clock in UTC, unsure if that plays a role.
[stack@undercloud openshift-ansible]$ date
Mon Nov 7 23:01:10 UTC 2016
I wound up changing my nodes so that they are UTC, to see if it helped, and they have the right time, but, still getting a cert created out into the future. And, still winding up with the same results when trying to run the Start and enable master play. I'm going to take a look at how the certs are created in the playbooks and I'll come back with some findings.
[root@test-cluster-master-0 openshift]# date
Tue Nov 8 15:48:40 UTC 2016
[root@test-cluster-master-0 openshift]# journalctl -u origin-master -n 1000 | grep -i 4001 | tail -n 1
Nov 08 15:46:36 test-cluster-master-0.localdomain openshift[7018]: published {Name:openshift.local ClientURLs:[https://192.168.108.4:4001]} to cluster 8a936018f0e1ec33
[root@test-cluster-master-0 openshift]# etcdctl --endpoints="https://192.168.108.4:4001" --cert-file /etc/origin/master/master.etcd-client.crt --key-file /etc/origin/master/master.etcd-client.key --ca-file /etc/origin/master/ca.crt cluster-health
cluster may be unhealthy: failed to list members
Error: client: etcd cluster is unavailable or misconfigured
error #0: x509: certificate has expired or is not yet valid
[root@test-cluster-master-0 openshift]# openssl x509 -in /etc/origin/master/master.etcd-client.crt -noout -text | grep Not
Not Before: Nov 8 17:39:58 2016 GMT
Not After : Nov 8 17:39:59 2018 GMT
[root@test-cluster-master-0 openshift]# openssl x509 -in /etc/origin/master/etcd.server.crt -noout -text | grep Not
Not Before: Nov 8 17:39:58 2016 GMT
Not After : Nov 8 17:39:59 2018 GMT
Additionally, I had tried changing the timezone on the client machine from which I run the playbooks with export TZ="/usr/share/zoneinfo/America/New_York" so it matched the hosts where the playbooks are run against, in case there was some discrepancy with the certs being created locally (I haven't looked much yet). Alas, that didn't work. And I realize that's still in play on this run, so, I'm going to re-run with both client/host on UTC, too, for what it's worth.
So, I've narrowed it down to it being the play named Create the master certificates if they do not already exist in this line in ./roles/openshift_ca/tasks/main.yml.
It's using the {{ openshift.common.client_binary }} adm create-master-certs as I believe is documented here.
However, I'm unsure how to alter this one to properly create certs that aren't dated out 2 hours in the future. Any input?
edit, even weirder I guess is that I run it myself and get a different result, apparently...
[root@test-cluster-master-0 tmp]# mkdir foo
[root@test-cluster-master-0 tmp]# cd foo/
[root@test-cluster-master-0 foo]# oc adm create-master-certs --hostnames=foo.bar.com --master=https://192.168.111.5:8443 --public-master=https://example.com:8443 --cert-dir=/tmp/foo --overwrite=false
Command "create-master-certs" is deprecated, Use 'oc adm ca' instead.
Generated new key pair as /tmp/foo/serviceaccounts.public.key and /tmp/foo/serviceaccounts.private.key
[root@test-cluster-master-0 foo]# date
Tue Nov 8 20:00:19 UTC 2016
[root@test-cluster-master-0 foo]# openssl x509 -in ca.crt -noout -text | grep Not
Not Before: Nov 8 20:00:11 2016 GMT
Not After : Nov 7 20:00:12 2021 GMT
I chucked a few debug plays into the role's ./roles/openshift_ca/tasks/main.yml, to see what the time is... a la:
- name: Debug the time
command: >
date
register: the_date
- debug: msg="the date is {{ the_date.stdout }}"
Which results in...
TASK [openshift_ca : Debug the time] *******************************************
changed: [test_cluster-master-0]
TASK [openshift_ca : debug] ****************************************************
ok: [test_cluster-master-0] => {
"msg": "the date is Tue Nov 8 23:19:55 UTC 2016"
}
TASK [openshift_ca : Create the master certificates if they do not already exist] ***
changed: [test_cluster-master-0 -> None]
However, the time on the machine is actually 2 hours behind that, and correct by my calculation...
Command run 10 minutes later, and sanity check here that I'm in GMT-5, it's 4:30 now as I type, so, 4+5 =9. 9 + 12 = 21. So, I believe the ansible run somehow has the time an additional two hours in the future.
[root@test-cluster-master-0 openshift]# date
Tue Nov 8 21:29:29 UTC 2016
@dougbtv Hmmm, so the date is incorrect when the CA is created but correct 10 minutes later? If we add the date check to the openshift_facts role's main tasks then we should see the output several times during install and maybe we can see if the VMs are created with an incorrect clock that eventually becomes correct.
@abutcher thanks for circling back. I'm meaning to update the ticket, because.... I just discovered in the last few hours -- this is looking a lot like user error on my part, and I somehow confused myself. And again -- just an extra thank you for helping point me in the right direction here to start looking in the right places.
TL;DR is -- The system time on my cloud instances on which I was running the openshift-ansible playbooks against were guess what -- 2 hours off in the future.
But, they were only apparently 2 hours off into the future if I looked at the instances before I ran the openshift-ansible playbooks against them. If I ran the playbooks and let them fail at the point where it started origin-master service, and then ran date at the CLI and looked at it, the date/time was apparently correct. That caused me... a lot of looking in all the wrong places. My own fault, really. I'm wondering if there's something in the playbooks that sync or otherwise set the time, I didn't look.
For some more information and in memorial.... In my case I'm using the openstack method of the cluster creator in openshift-ansible. But, I went through and manually spun up machines and used the BYO inventory method, and... ran into exactly the same thing. It was at that time that I just spun up an instance, if I opened it up and checked it out... It would show the date in the future. Additionally, I'm a Triple-O user, and my undercloud had the correct time, but, my overcloud instances had the future time. So I redeployed.
I'm hoping next time my machines time goes into the future, they can like pick up some stock prices for me, so I can do a little trading two hours early.
We do have a clock role that enables ntpd/chrony.
We should do openshift_clock in pre-requisites playbooks when we get to them and add a forced clock sync.
@xiangpengzhao , @dougbtv can you share with me how you solved the issue?
I am getting same error with Centos7: Unable to start service origin-master...
Thanks!
@nennete in my case, the system time was wrong, and causing the ssl certificates used by etcd (and potentially elsewhere) to not generate properly.
Here are my logs for:
[root@test-cluster-master-0 log]# journalctl -u origin-master.service | more
And I was able to see it was an invalid cert with:
[root@test-cluster-master-0 openshift]# etcdctl --endpoints="https://192.168.23.4:4001" --cert-file /etc/origin/master/master.etcd-client.crt --key-file /etc/origin/master/master.etcd-client.key --ca-file /etc/origin/master/ca.crt cluster-health
cluster may be unhealthy: failed to list members
Error: client: etcd cluster is unavailable or misconfigured
error #0: x509: certificate has expired or is not yet valid
In my case I was confused because the hosts I run openshift-ansible again are running on openstack instances, and the hosts that openstack runs on had the wrong system time.
@nennete As is mentioned in my comment, I just run the uninstall script and reboot and, run the install script again. Then everythins is OK. To be honest, I don't know where is wrong...