Openshift-ansible: Seems like a problem of etcd. Job for origin-master.service failed because a timeout was exceeded.

Created on 1 Jul 2016 · 31Comments · Source: openshift/openshift-ansible

when i used the openshift-ansible to install, the origin-master.service started fail. after check the journal, it seems like a problem of etce which could not be found and invoked. Infor showed bellow

ansible-playbook ~/openshift-ansible/playbooks/byo/config.yml

TASK [openshift_master : Start and enable master] **********
fatal: [10.134.29.158]: FAILED! => {"changed": false, "failed": true, "msg": "Job for origin-master.service failed because a timeout was exceeded. See \"systemctl status origin-master.service\" and \"journalctl -xe\" for details.\n"}

(then run) systemctl status openshift-master.service

● openshift-master.service
Loaded: not-found (Reason: No such file or directory)
Active: inactive (dead)

journalctl -xe
Jul 01 02:35:20 os-3-1-server-enb6fe2omdfu.novalocal origin-master[15396]: Content-Length: 0
Jul 01 02:35:22 os-3-1-server-enb6fe2omdfu.novalocal origin-master[15396]: E0701 02:35:22.017747 15396 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Jul 01 02:35:22 os-3-1-server-enb6fe2omdfu.novalocal origin-master[15396]: Content-Length: 0
Jul 01 02:35:23 os-3-1-server-enb6fe2omdfu.novalocal origin-master[15396]: E0701 02:35:23.224066 15396 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0

(several miniutes later) systemctl status origin-master.service

● origin-master.service - Origin Master Service
Loaded: loaded (/usr/lib/systemd/system/origin-master.service; enabled; vendor preset: disabled)
Active: activating (start) since Fri 2016-07-01 03:01:31 UTC; 25s ago
Docs: https://github.com/openshift/origin
Main PID: 15700 (openshift)
CGroup: /system.slice/origin-master.service
└─15700 /usr/bin/openshift start master --config=/etc/origin/master/master-config.yaml --loglevel=2

Jul 01 03:01:51 os-3-1-server-enb6fe2omdfu.novalocal origin-master[15700]: E0701 03:01:51.960769 15700 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Jul 01 03:01:51 os-3-1-server-enb6fe2omdfu.novalocal origin-master[15700]: Content-Length: 0
Jul 01 03:01:53 os-3-1-server-enb6fe2omdfu.novalocal origin-master[15700]: E0701 03:01:53.171694 15700 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Jul 01 03:01:53 os-3-1-server-enb6fe2omdfu.novalocal origin-master[15700]: Content-Length: 0

Versions
openshift-ansible: lastest master
centos7-1.6
openshift v1.2.0
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5

Additional Information
archtecture of my openshfit
Host Name Infrastructure Component to Install
master.example.com Master and node
node1.example.com Node
node2.example.com Node

kinbug prioritP1

Source

jeffchanjunwei

All 31 comments

The times I've seen this HTTP/0.0 has always been a proxy is configured but the proxy cannot reach the etcd host. So I'd wonder if you have a proxy configured in /etc/sysconfig/origin-master? If not then all I can think of is to verify that etcd is started and look at its logs.

sdodson on 1 Jul 2016

@sdodson , i doubt etcd is not successfully install on the master node. can i install it manully? The playbook of etcd is directly download from the github master branch, but i have no idea how to edit it.

jeffchanjunwei on 4 Jul 2016

Getting same error

TASK [openshift_master : Start and enable master api on first master] **********
skipping: [ec2-23-20-115-254.compute-1.amazonaws.com]
fatal: [ec2-54-147-48-107.compute-1.amazonaws.com]: FAILED! => {"changed": false, "failed": true, "msg": "Job for origin-master-api.service failed because a timeout was exceeded. See \"systemctl status origin-master-api.service\" and \"journalctl -xe\" for details.\n"}

OpenShift cluster:
2 Master nodes
1 Worker node
3 total nodes
1 lb
embedded etcd

Deepak-Vohra on 8 Aug 2016

@dvohra Is there anything in the journal indicating what the start failure was?

journalctl -u origin-master-api -l

abutcher on 8 Aug 2016

Etcd failure response seems to be the issue, but a [etcd] is not configured and an embedded etcd is expected to be used.

Some command outputs for Master node on which api server failed:

systemctl status origin-master-api.service
origin-master-api.service - Atomic OpenShift Master API
   Loaded: loaded (/usr/lib/systemd/system/origin-master-api.service; enabled; vendor preset: disabled)
   Active: failed (Result: timeout) since Mon 2016-08-08 21:13:10 UTC; 9min ago
Aug 08 21:13:07 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 2...
Aug 08 21:13:07 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content...
Aug 08 21:13:08 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 2...
Aug 08 21:13:08 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content...
Aug 08 21:13:09 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 2...
Aug 08 21:13:09 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content...
Aug 08 21:13:10 ip-10-168-150-205 systemd[1]: origin-master-api.service star....
Aug 08 21:13:10 ip-10-168-150-205 systemd[1]: Failed to start Atomic OpenShi....
Aug 08 21:13:10 ip-10-168-150-205 systemd[1]: Unit origin-master-api.service....
Aug 08 21:13:10 ip-10-168-150-205 systemd[1]: origin-master-api.service failed.

journalctl -u origin-master-api -l
-- Logs begin at Mon 2016-08-08 20:05:20 UTC, end at Mon 2016-08-08 21:21:55 UTC. --
Aug 08 21:11:39 ip-10-168-150-205 systemd[1]: Starting Atomic OpenShift Master API...
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: I0808 21:11:40.026197   22228 start_api.go:102] Using a listen address override "0.0.0.0:8443"
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: W0808 21:11:40.028055   22228 start_master.go:270] assetConfig.loggingPublicURL: Invalid value:
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: W0808 21:11:40.028095   22228 start_master.go:270] assetConfig.metricsPublicURL: Invalid value:
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: I0808 21:11:40.041914   22228 plugins.go:71] No cloud provider specified.
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: I0808 21:11:40.043163   22228 genericapiserver.go:81] Adding storage destination for group
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: I0808 21:11:40.043201   22228 genericapiserver.go:81] Adding storage destination for group exte
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: I0808 21:11:40.043231   22228 start_master.go:383] Starting master on 0.0.0.0:8443 (v1.2.1)
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: I0808 21:11:40.043243   22228 start_master.go:384] Public master address is https://ec2-54-81-1
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: I0808 21:11:40.043262   22228 start_master.go:388] Using images from "openshift/origin-<compone
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 21:11:40.045935   22228 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content-Length: 0
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 21:11:40.071760   22228 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content-Length: 0
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 21:11:40.122593   22228 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content-Length: 0
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 21:11:40.223429   22228 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content-Length: 0
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 21:11:40.424236   22228 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content-Length: 0
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 21:11:40.825071   22228 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Aug 08 21:11:40 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content-Length: 0
Aug 08 21:11:41 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 21:11:41.626177   22228 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Aug 08 21:11:41 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content-Length: 0
Aug 08 21:11:42 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 21:11:42.627226   22228 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Aug 08 21:11:42 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content-Length: 0
Aug 08 21:11:43 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 21:11:43.628290   22228 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Aug 08 21:11:43 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content-Length: 0
Aug 08 21:11:44 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 21:11:44.629247   22228 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Aug 08 21:11:44 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content-Length: 0
Aug 08 21:11:45 ip-10-168-150-205 atomic-openshift-master-api[22228]: E0808 21:11:45.630314   22228 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Aug 08 21:11:45 ip-10-168-150-205 atomic-openshift-master-api[22228]: Content-Length: 0

Deepak-Vohra on 8 Aug 2016

@dvohra Embedded etcd will only be used with the single master service origin-master.

abutcher on 8 Aug 2016

If [etcd] is used another issue is generated
https://github.com/openshift/origin/issues/10259

Deepak-Vohra on 8 Aug 2016

@dvohra Yep, I just found that one and submitted a PR for what I think is the problem.

abutcher on 8 Aug 2016

Thanks for submitting a PR for the other issue.
HA Master can't be used till the other issue is fixed?

Deepak-Vohra on 8 Aug 2016

@dvohra Right, without the etcd certificates we can't configure/start the etcd service.

abutcher on 8 Aug 2016

@dvohra Looks like we just merged it. Please give it a try when you can.

abutcher on 8 Aug 2016

Is a message such as the following usual. or restarting API server while installing could be an issue?
FAILED - RETRYING: HANDLER: openshift_master: Verify API Server (80 tries left)

Deepak-Vohra on 9 Aug 2016

While the earlier issue about [etcd] is fixed, installation does not complete. Shall open another issue.

Deepak-Vohra on 9 Aug 2016

Closing.

abutcher on 9 Aug 2016

Hey,
Installing on a CentOs 7 and still having this problem. I am running everything in one master. any pointers?

jmwenda on 1 Oct 2016

@jmwenda Hey, I am still having this problem on a CentOs 7...Have u find solutions???

wsszh on 31 Oct 2016

@jmwenda @wsszh I also met the same problem on my CentOS 7 env. My version is openshift-ansible-openshift-ansible-3.2.14-1.tar.gz.
I run ansible-playbook ~/openshift-ansible/playbooks/adhoc/uninstall.yml to uninstall it and reboot my env and then run ansible-playbook ~/openshift-ansible/playbooks/byo/config.yml to install again. It works!

xiangpengzhao on 5 Nov 2016

I'm running into this as well with a centos 7.2 myself

[stack@undercloud openshift-ansible]$ ssh [email protected] 'cat /etc/redhat-release'
CentOS Linux release 7.2.1511 (Core)

And current head as of now. And here's how I'm running it:

bin/cluster create   -o image_name=centos-custom   -o external_net=ext-net   -o floating_ip_pool=ext-net   -o net_cidr=40.0.0.0/24   openstack test_cluster

Which results in a:

TASK [openshift_master : Start and enable master] ******************************
fatal: [test_cluster-master-0]: FAILED! => {"changed": false, "failed": true, "msg": "Job for origin-master.service failed because a timeout was exceeded. See \"systemctl status origin-master.service\" and \"journalctl -xe\" for details.\n"}

The journal gives me (repeatedly)

Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.029121   19858 reflector.go:203] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/limitranger/admission.go:154: Failed to list
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.029212   19858 reflector.go:214] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/resourcequota/resource_access.go:83: Failed 
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.029311   19858 reflector.go:214] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:119: Failed to l
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.029413   19858 reflector.go:203] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/namespace/lifecycle/admission.go:141: Failed
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.029506   19858 reflector.go:203] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/limitranger/admission.go:154: Failed to list
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.029712   19858 reflector.go:214] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:103: Failed to l
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.375788   19858 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.User: client: etcd cluster is unavailable o
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.481789   19858 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.Group: client: etcd cluster is unavailable 
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.524174   19858 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.ClusterPolicy: client: etcd cluster is unav
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.551239   19858 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.ClusterPolicyBinding: client: etcd cluster 
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.566663   19858 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.PolicyBinding: client: etcd cluster is unav
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.608404   19858 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.Policy: client: etcd cluster is unavailable
Nov 05 13:46:14 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:14.769059   19858 cacher.go:220] unexpected ListAndWatch error: pkg/storage/cacher.go:163: Failed to list *api.OAuthAccessToken: client: etcd cluster is u
Nov 05 13:46:15 test-cluster-master-0.localdomain origin-master[19858]: E1105 13:46:15.014487   19858 etcd.go:146] etcd failure response: HTTP/0.0 0 status code 0
Nov 05 13:46:15 test-cluster-master-0.localdomain origin-master[19858]: Content-Length: 0

dougbtv on 5 Nov 2016

Some additional info after getting some help from @abutcher (thanks!), I went and did a yum install -y etcd to get etcdctl and I wound up with this...

[root@test-cluster-master-0 openshift]# journalctl -u origin-master -n 1000 | grep -i 4001 | tail -n 1
Nov 07 17:32:22 test-cluster-master-0.localdomain openshift[26897]: published {Name:openshift.local ClientURLs:[https://192.168.23.4:4001]} to cluster 31e7965d9f02f32d
[root@test-cluster-master-0 openshift]# 
[root@test-cluster-master-0 openshift]# 
[root@test-cluster-master-0 openshift]# etcdctl --endpoints="https://192.168.23.4:4001" --cert-file /etc/origin/master/master.etcd-client.crt --key-file /etc/origin/master/master.etcd-client.key --ca-file /etc/origin/master/ca.crt cluster-health
cluster may be unhealthy: failed to list members
Error:  client: etcd cluster is unavailable or misconfigured
error #0: x509: certificate has expired or is not yet valid

dougbtv on 7 Nov 2016

What are the validity periods on the certificates and is the system clock right on this host?

openssl x509 -in /etc/origin/master/master.etcd-client.crt -noout -text | grep Not
openssl x509 -in /etc/origin/master/etcd.server.crt -noout -text | grep Not
...

abutcher on 7 Nov 2016

Good eye -- The clock is accurate for my timezone at least. I think I spun up the cluster with the openshift-ansible playbooks around 16:00ish, I want to say.

But, when I show the dates for those certs.... They appear to not kick in until for another 15 minutes.

[root@test-cluster-master-0 openshift]# date
Mon Nov  7 17:58:13 EST 2016
[root@test-cluster-master-0 openshift]# openssl x509 -in /etc/origin/master/master.etcd-client.crt -noout -text | grep Not
            Not Before: Nov  7 23:14:32 2016 GMT
            Not After : Nov  7 23:14:33 2018 GMT
[root@test-cluster-master-0 openshift]# openssl x509 -in /etc/origin/master/etcd.server.crt -noout -text | grep Not
            Not Before: Nov  7 23:14:33 2016 GMT
            Not After : Nov  7 23:14:34 2018 GMT

Also, for what it's worth, the machine where I ran the playbooks from has the system clock in UTC, unsure if that plays a role.

[stack@undercloud openshift-ansible]$ date
Mon Nov  7 23:01:10 UTC 2016

dougbtv on 8 Nov 2016

I wound up changing my nodes so that they are UTC, to see if it helped, and they have the right time, but, still getting a cert created out into the future. And, still winding up with the same results when trying to run the Start and enable master play. I'm going to take a look at how the certs are created in the playbooks and I'll come back with some findings.

[root@test-cluster-master-0 openshift]# date
Tue Nov  8 15:48:40 UTC 2016

[root@test-cluster-master-0 openshift]# journalctl -u origin-master -n 1000 | grep -i 4001 | tail -n 1
Nov 08 15:46:36 test-cluster-master-0.localdomain openshift[7018]: published {Name:openshift.local ClientURLs:[https://192.168.108.4:4001]} to cluster 8a936018f0e1ec33

[root@test-cluster-master-0 openshift]# etcdctl --endpoints="https://192.168.108.4:4001" --cert-file /etc/origin/master/master.etcd-client.crt --key-file /etc/origin/master/master.etcd-client.key --ca-file /etc/origin/master/ca.crt cluster-health
cluster may be unhealthy: failed to list members
Error:  client: etcd cluster is unavailable or misconfigured
error #0: x509: certificate has expired or is not yet valid

[root@test-cluster-master-0 openshift]# openssl x509 -in /etc/origin/master/master.etcd-client.crt -noout -text | grep Not
            Not Before: Nov  8 17:39:58 2016 GMT
            Not After : Nov  8 17:39:59 2018 GMT
[root@test-cluster-master-0 openshift]# openssl x509 -in /etc/origin/master/etcd.server.crt -noout -text | grep Not
            Not Before: Nov  8 17:39:58 2016 GMT
            Not After : Nov  8 17:39:59 2018 GMT

Additionally, I had tried changing the timezone on the client machine from which I run the playbooks with export TZ="/usr/share/zoneinfo/America/New_York" so it matched the hosts where the playbooks are run against, in case there was some discrepancy with the certs being created locally (I haven't looked much yet). Alas, that didn't work. And I realize that's still in play on this run, so, I'm going to re-run with both client/host on UTC, too, for what it's worth.

dougbtv on 8 Nov 2016

So, I've narrowed it down to it being the play named Create the master certificates if they do not already exist in this line in ./roles/openshift_ca/tasks/main.yml.

It's using the {{ openshift.common.client_binary }} adm create-master-certs as I believe is documented here.

However, I'm unsure how to alter this one to properly create certs that aren't dated out 2 hours in the future. Any input?

edit, even weirder I guess is that I run it myself and get a different result, apparently...

[root@test-cluster-master-0 tmp]# mkdir foo
[root@test-cluster-master-0 tmp]# cd foo/
[root@test-cluster-master-0 foo]# oc adm create-master-certs --hostnames=foo.bar.com --master=https://192.168.111.5:8443 --public-master=https://example.com:8443 --cert-dir=/tmp/foo --overwrite=false
Command "create-master-certs" is deprecated, Use 'oc adm ca' instead.
Generated new key pair as /tmp/foo/serviceaccounts.public.key and /tmp/foo/serviceaccounts.private.key
[root@test-cluster-master-0 foo]# date
Tue Nov  8 20:00:19 UTC 2016
[root@test-cluster-master-0 foo]# openssl x509 -in ca.crt -noout -text | grep Not
            Not Before: Nov  8 20:00:11 2016 GMT
            Not After : Nov  7 20:00:12 2021 GMT

dougbtv on 8 Nov 2016

I chucked a few debug plays into the role's ./roles/openshift_ca/tasks/main.yml, to see what the time is... a la:

- name: Debug the time
  command: > 
    date
  register: the_date

- debug: msg="the date is {{ the_date.stdout }}"

Which results in...

TASK [openshift_ca : Debug the time] *******************************************
changed: [test_cluster-master-0]

TASK [openshift_ca : debug] ****************************************************
ok: [test_cluster-master-0] => {
    "msg": "the date is Tue Nov  8 23:19:55 UTC 2016"
}

TASK [openshift_ca : Create the master certificates if they do not already exist] ***
changed: [test_cluster-master-0 -> None]

However, the time on the machine is actually 2 hours behind that, and correct by my calculation...

Command run 10 minutes later, and sanity check here that I'm in GMT-5, it's 4:30 now as I type, so, 4+5 =9. 9 + 12 = 21. So, I believe the ansible run somehow has the time an additional two hours in the future.

[root@test-cluster-master-0 openshift]# date
Tue Nov  8 21:29:29 UTC 2016

dougbtv on 8 Nov 2016

@dougbtv Hmmm, so the date is incorrect when the CA is created but correct 10 minutes later? If we add the date check to the openshift_facts role's main tasks then we should see the output several times during install and maybe we can see if the VMs are created with an incorrect clock that eventually becomes correct.

abutcher on 9 Nov 2016

@abutcher thanks for circling back. I'm meaning to update the ticket, because.... I just discovered in the last few hours -- this is looking a lot like user error on my part, and I somehow confused myself. And again -- just an extra thank you for helping point me in the right direction here to start looking in the right places.

TL;DR is -- The system time on my cloud instances on which I was running the openshift-ansible playbooks against were guess what -- 2 hours off in the future.

But, they were only apparently 2 hours off into the future if I looked at the instances before I ran the openshift-ansible playbooks against them. If I ran the playbooks and let them fail at the point where it started origin-master service, and then ran date at the CLI and looked at it, the date/time was apparently correct. That caused me... a lot of looking in all the wrong places. My own fault, really. I'm wondering if there's something in the playbooks that sync or otherwise set the time, I didn't look.

For some more information and in memorial.... In my case I'm using the openstack method of the cluster creator in openshift-ansible. But, I went through and manually spun up machines and used the BYO inventory method, and... ran into exactly the same thing. It was at that time that I just spun up an instance, if I opened it up and checked it out... It would show the date in the future. Additionally, I'm a Triple-O user, and my undercloud had the correct time, but, my overcloud instances had the future time. So I redeployed.

I'm hoping next time my machines time goes into the future, they can like pick up some stock prices for me, so I can do a little trading two hours early.

dougbtv on 9 Nov 2016

🎉1

We do have a clock role that enables ntpd/chrony.

abutcher on 9 Nov 2016

We should do openshift_clock in pre-requisites playbooks when we get to them and add a forced clock sync.

sdodson on 9 Nov 2016

👍1

@xiangpengzhao , @dougbtv can you share with me how you solved the issue?

I am getting same error with Centos7: Unable to start service origin-master...

Thanks!

antonioberben on 22 Nov 2016

@nennete in my case, the system time was wrong, and causing the ssl certificates used by etcd (and potentially elsewhere) to not generate properly.

Here are my logs for:

[root@test-cluster-master-0 log]# journalctl -u origin-master.service | more

And I was able to see it was an invalid cert with:

[root@test-cluster-master-0 openshift]# etcdctl --endpoints="https://192.168.23.4:4001" --cert-file /etc/origin/master/master.etcd-client.crt --key-file /etc/origin/master/master.etcd-client.key --ca-file /etc/origin/master/ca.crt cluster-health
cluster may be unhealthy: failed to list members
Error:  client: etcd cluster is unavailable or misconfigured
error #0: x509: certificate has expired or is not yet valid

In my case I was confused because the hosts I run openshift-ansible again are running on openstack instances, and the hosts that openstack runs on had the wrong system time.

dougbtv on 23 Nov 2016

@nennete As is mentioned in my comment, I just run the uninstall script and reboot and, run the install script again. Then everythins is OK. To be honest, I don't know where is wrong...

xiangpengzhao on 23 Nov 2016

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Missing config.yml from /openshift-ansible/playbooks/byo/

rharveyva · 6Comments

No package matching 'origin-docker-excluder-3.11**' found available, installed or updated

wongkafai · 7Comments

Detected OpenShift version 1.3.0 does not match requested openshift_release 1.5.0-alpha.2

cgutshal · 4Comments

OKD 3.11 - deploy_cluster.yml fails ("Unable to connect to the server: unexpected EOF")

adamulacha · 6Comments

openshift_service_catalog install fails (OKD 3.11) - Wait for API Server rollout success

DizzyThermal · 3Comments