Try to install OpenShift Origin using openshift-ansible playbook with branch release-3.6. I got the following error at enable and start origin-master step during the installation process:
Failure summary:
1. Hosts: master.example.com
Play: Configure masters
Task: restart master
Message: Unable to restart service origin-master: Job for origin-master.service failed because a timeout was exceeded. See "systemctl status origin-master.service" and "journalctl -xe" for details.
Check service log of origin-master, and get many tls:handshake timeout errors.
then check etcd logs using journalctl -u etcd -lf and got:
-- Logs begin at ๆฅ 2017-10-15 05:53:50 CST. --
11ๆ 14 14:52:10 master.example.com etcd[31068]: 785ad3e4f1b9ce8b became leader at term 2
11ๆ 14 14:52:10 master.example.com etcd[31068]: raft.node: 785ad3e4f1b9ce8b elected leader 785ad3e4f1b9ce8b at term 2
11ๆ 14 14:52:10 master.example.com etcd[31068]: setting up the initial cluster version to 3.2
11ๆ 14 14:52:10 master.example.com etcd[31068]: published {Name:master.example.com ClientURLs:[https://192.168.123.155:2379]} to cluster 74619f9d53805edf
11ๆ 14 14:52:10 master.example.com etcd[31068]: ready to serve client requests
11ๆ 14 14:52:10 master.example.com systemd[1]: Started Etcd Server.
11ๆ 14 14:52:10 master.example.com etcd[31068]: set the initial cluster version to 3.2
11ๆ 14 14:52:10 master.example.com etcd[31068]: enabled capabilities for version 3.2
11ๆ 14 14:52:10 master.example.com etcd[31068]: serving client requests on 192.168.123.155:2379
11ๆ 14 14:52:10 master.example.com etcd[31068]: Failed to dial 192.168.123.155:2379: connection error: desc = "transport: remote error: tls: bad certificate"; please retry.
Please put the following version information in the code block
indicated below.
ansible --versionansible 2.4.0.0
config file = /etc/ansible/ansible.cfg
configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/site-packages/ansible
executable location = /usr/bin/ansible
python version = 2.7.5 (default, Aug 4 2017, 00:39:18) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]
If you're operating from a git clone:
git describeopenshift-ansible-3.6.173.0.75-1
ansible-playbook /usr/share/openshift-ansible/playbooks/byo/config.ymlDescribe what you expected to happen.
Example command and output or error messages
Describe what is actually happening.
Example command and output or error messages
For long output or logs, consider using a gist
Provide any additional information which may help us diagnose the
issue.
$ cat /etc/redhat-release)#
# ansible hosts file
#
# Create an OSEv3 group that contains the masters and nodes groups
[OSEv3:children]
masters
nodes
etcd
lb
# Set variables common for all OSEv3 hosts
[OSEv3:vars]
# SSH user, this user should allow ssh based auth without requiring a password
ansible_ssh_user=root
# If ansible_ssh_user is not root, ansible_become must be set to true
#ansible_become=true
openshift_deployment_type=origin
# Specify the generic release of OpenShift to install. This is used mainly just during installation, after which we
# rely on the version running on the first master. Works best for containerized installs where we can usually
# use this to lookup the latest exact version of the container images, which is the tag actually used to configure
# the cluster. For RPM installations we just verify the version detected in your configured repos matches this
# release.
openshift_release=v3.6.1
# Specify an exact container image tag to install or configure.
# WARNING: This value will be used for all hosts in containerized environments, even those that have another version installed.
# This could potentially trigger an upgrade and downtime, so be careful with modifying this value after the cluster is set up.
openshift_image_tag=v3.6.1
# Specify an exact rpm version to install or configure.
# WARNING: This value will be used for all hosts in RPM based environments, even those that have another version installed.
# This could potentially trigger an upgrade and downtime, so be careful with modifying this value after the cluster is set up.
openshift_pkg_version=-3.6.1
# uncomment the following to enable htpasswd authentication; defaults to DenyAllPasswordIdentityProvider
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
#openshift_repos_enable_testing=true
openshift_disable_check=disk_availability,docker_storage
#docker_selinux_enabled=false
#openshift_docker_options=" --log-driver=journald --storage-driver=overlay "
# Alternate image format string, useful if you've got your own registry mirror
# Configure this setting just on node or master
#oreg_url_master=example.com/openshift3/ose-${component}:${version}
#oreg_url_node=example.com/openshift3/ose-${component}:${version}
# For setting the configuration globally
oreg_url=registry.example.com:30000/openshift/origin-${component}:${version}
# If oreg_url points to a registry other than registry.access.redhat.com we can
# modify image streams to point at that registry by setting the following to true
openshift_examples_modify_imagestreams=true
openshift_docker_additional_registries=registry.example.com:30000
openshift_docker_insecure_registries=registry.example.com:30000
openshift_hosted_manage_registry=false
# OpenShift Router Options
# Router selector (optional)
# Router will only be created if nodes matching this label are present.
# Default value: 'region=infra'
openshift_hosted_router_selector='region=infra,router=true'
# default subdomain to use for exposed routes
openshift_master_default_subdomain=app.example.com
# host group for masters
[masters]
master.example.com
# host group for etcd
[etcd]
master.example.com
# Load balancers
[lb]
lb.example.com
# host group for nodes, includes region info
[nodes]
master.example.com openshift_schedulable=true openshift_node_labels="{'region': 'infra', 'router': 'true'}"
node01.example.com openshift_schedulable=true openshift_node_labels="{'region': 'infra', 'router': 'true'}"
node02.example.com openshift_schedulable=true openshift_node_labels="{'region': 'infra', 'router': 'true'}"
More logs from journalctl as below:
[root@master ~]# journalctl -xe -u etcd -l
11ๆ 14 16:35:13 master.example.com etcd[92962]: peerTLS: cert = /etc/etcd/peer.crt, key = /etc/etcd/peer.key, ca = , trusted-ca = /etc/etcd/ca.crt, cl
11ๆ 14 16:35:13 master.example.com etcd[92962]: listening for peers on https://192.168.123.155:2380
11ๆ 14 16:35:13 master.example.com etcd[92962]: listening for client requests on 192.168.123.155:2379
11ๆ 14 16:35:13 master.example.com etcd[92962]: name = master.example.com
11ๆ 14 16:35:13 master.example.com etcd[92962]: data dir = /var/lib/etcd/
11ๆ 14 16:35:13 master.example.com etcd[92962]: member dir = /var/lib/etcd/member
11ๆ 14 16:35:13 master.example.com etcd[92962]: heartbeat = 500ms
11ๆ 14 16:35:13 master.example.com etcd[92962]: election = 2500ms
11ๆ 14 16:35:13 master.example.com etcd[92962]: snapshot count = 100000
11ๆ 14 16:35:13 master.example.com etcd[92962]: advertise client URLs = https://192.168.123.155:2379
11ๆ 14 16:35:13 master.example.com etcd[92962]: initial advertise peer URLs = https://192.168.123.155:2380
11ๆ 14 16:35:13 master.example.com etcd[92962]: initial cluster = master.example.com=https://192.168.123.155:2380
11ๆ 14 16:35:13 master.example.com etcd[92962]: starting member 785ad3e4f1b9ce8b in cluster 74619f9d53805edf
11ๆ 14 16:35:13 master.example.com etcd[92962]: 785ad3e4f1b9ce8b became follower at term 0
11ๆ 14 16:35:13 master.example.com etcd[92962]: newRaft 785ad3e4f1b9ce8b [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
11ๆ 14 16:35:13 master.example.com etcd[92962]: 785ad3e4f1b9ce8b became follower at term 1
11ๆ 14 16:35:13 master.example.com etcd[92962]: simple token is not cryptographically signed
11ๆ 14 16:35:13 master.example.com etcd[92962]: starting server... [version: 3.2.7, cluster version: to_be_decided]
11ๆ 14 16:35:13 master.example.com etcd[92962]: ClientTLS: cert = /etc/etcd/server.crt, key = /etc/etcd/server.key, ca = , trusted-ca = /etc/etcd/ca.c
11ๆ 14 16:35:13 master.example.com etcd[92962]: added member 785ad3e4f1b9ce8b [https://192.168.123.155:2380] to cluster 74619f9d53805edf
11ๆ 14 16:35:15 master.example.com etcd[92962]: 785ad3e4f1b9ce8b is starting a new election at term 1
11ๆ 14 16:35:15 master.example.com etcd[92962]: 785ad3e4f1b9ce8b became candidate at term 2
11ๆ 14 16:35:15 master.example.com etcd[92962]: 785ad3e4f1b9ce8b received MsgVoteResp from 785ad3e4f1b9ce8b at term 2
11ๆ 14 16:35:15 master.example.com etcd[92962]: 785ad3e4f1b9ce8b became leader at term 2
11ๆ 14 16:35:15 master.example.com etcd[92962]: raft.node: 785ad3e4f1b9ce8b elected leader 785ad3e4f1b9ce8b at term 2
11ๆ 14 16:35:15 master.example.com etcd[92962]: setting up the initial cluster version to 3.2
11ๆ 14 16:35:15 master.example.com etcd[92962]: set the initial cluster version to 3.2
11ๆ 14 16:35:15 master.example.com etcd[92962]: enabled capabilities for version 3.2
11ๆ 14 16:35:15 master.example.com etcd[92962]: published {Name:master.example.com ClientURLs:[https://192.168.123.155:2379]} to cluster 74619f
11ๆ 14 16:35:15 master.example.com etcd[92962]: ready to serve client requests
11ๆ 14 16:35:15 master.example.com etcd[92962]: serving client requests on 192.168.123.155:2379
11ๆ 14 16:35:15 master.example.com systemd[1]: Started Etcd Server.
-- Subject: Unit etcd.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit etcd.service has finished starting up.
--
-- The start-up result is done.
11ๆ 14 16:35:15 master.example.com etcd[92962]: Failed to dial 192.168.123.155:2379: connection error: desc = "transport: remote error: tls: bad certi
lines 1232-1271/1271 (END)
I am having the same issue. etcd is unable to start and the OpenShift installation is halted with the message:
Unable to restart service origin-master-api: Job for origin-master-api.service failed because the control process exited with error code. See "systemctl status origin-master-api.service" and "journalctl -xe" for details.
Log outputs (hostnames and IPs have been redacted):
$ systemctl status -l origin-master-api.service
...
Nov 27 11:37:04 master.example.com openshift[16145]: grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp [AAAA::BBBB::CCCC:DDDD:EEEE:FFFF]:2379: getsockopt: connection refused"; Reconnecting to {master.example.com:2379 <nil>}
...
$ journalctl -u etcd -lf
...
Nov 27 08:25:18 master.example.com systemd[1]: Started Etcd Server.
Nov 27 08:25:18 master.example.com etcd[3090]: ready to serve client requests
Nov 27 08:25:18 master.example.com etcd[3090]: serving client requests on 192.168.123.123:2379
Nov 27 08:25:19 master.example.com etcd[3090]: Failed to dial 192.168.123.123:2379: connection error: desc = "transport: remote error: tls: bad certificate"; please retry.
...
I think that the common denominator among the people who are encountering this issue is that they all have a single etcd node instead of a cluster.
I have tried with the latest master branch and also with the release-3.7 branch and got the same error. I have also tried redeploying the certificates.
The certificates seem valid to me and include all the relevant IP addresses and hosts. I will post them just in case it helps.
I will try and have a look if there is some difference in the certificate logic when there is a single host compared to a multi-host setup but I am not very experienced with Ansible.
This is my inventory file for reference:
[OSEv3:children]
masters
nodes
[OSEv3:vars]
ansible_ssh_user=ansible
ansible_become=true
dynamic_volumes_check=False
openshift_metrics_cassandra_storage_type=dynamic
openshift_logging_storage_kind=dynamic
openshift_deployment_type=origin
openshift_disable_check=memory_availability,disk_availability
openshift_master_identity_providers=[{'name': 'google', 'challenge': 'false', 'login': 'true', 'mappingMethod': 'claim', 'kind': 'GoogleIdentityProvider', 'clientID': 'xxxxxxxxx', 'clientSecret': 'xxxxxxx', 'hostedDomain': 'example.com'}]
openshift_master_named_certificates=[{"certfile": "some.cert", "keyfile": "some.key", "names": ["external.example.com", "*.external.example.com"]}]
openshift_master_overwrite_named_certificates=true
openshift_hosted_registry_storage_kind=object'
openshift_hosted_registry_storage_provider=s3
openshift_hosted_registry_storage_s3_accesskey=xxxxxxxxxx
openshift_hosted_registry_storage_s3_secretkey= xxxxxxxxxx
openshift_hosted_registry_storage_s3_bucket= xxxxxxxxxx
openshift_hosted_registry_storage_s3_region=us-west-1
[masters]
master.example.com openshift_public_hostname=external.example.com openshift_ip=192.168.111.444
[etcd]
master.example.com openshift_public_hostname=external.example.com openshift_ip=192.168.111.444
[nodes]
master.example.com openshift_public_hostname=external.example.com openshift_node_labels="{'region': 'infra', 'zone': 'west'}" openshift_ip=192.168.111.444
node1.example.com openshift_node_labels="{'region': 'primary', 'zone': 'west'}" openshift_ip=192.168.111.333
node2.example.com openshift_node_labels="{'region': 'primary', 'zone': 'west'}" openshift_ip=192.168.111.222
$ openssl x509 -in /etc/origin/master/etcd.server.crt -text -noout
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 6 (0x6)
Signature Algorithm: sha256WithRSAEncryption
Issuer: CN=openshift-signer@1511772597
Validity
Not Before: Nov 27 08:49:59 2017 GMT
Not After : Nov 27 08:50:00 2019 GMT
Subject: CN=172.30.0.1
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Modulus:
00:e0:61:3b:c0:41:ed:0e:58:00:f4:99:ef:20:35:
5d:c3:f8:d7:81:70:9d:c0:46:0f:6e:f0:9e:f9:35:
92:67:b6:b5:93:7f:8e:8b:fd:19:43:fd:74:a8:85:
fb:96:4e:6f:5c:ba:b1:47:0e:88:39:17:f4:77:2f:
2b:98:57:c0:fa:cd:94:52:33:ec:d5:da:c1:6a:e7:
ed:54:6f:65:84:46:33:8e:67:b9:29:e4:63:b9:c2:
b1:7d:37:ce:4b:fb:ee:df:77:b1:f7:61:ba:4f:cb:
29:07:95:fb:73:e0:fe:28:28:85:a3:c1:c8:ef:17:
4d:52:f9:5c:a0:21:c8:ad:c3:fa:52:8f:91:db:15:
a6:66:b0:10:94:37:f3:ae:44:5b:b1:95:19:73:67:
d0:60:1a:d7:75:e7:db:de:9c:57:5d:52:b1:ad:f1:
18:1b:e0:4d:a4:ee:22:6f:b5:69:8c:91:a2:e8:9a:
f6:5a:d6:da:fe:a1:69:d3:29:fc:be:ce:98:ce:2f:
9c:46:99:65:c3:83:b5:72:be:3c:1a:83:ce:18:c6:
dd:63:09:aa:d2:8e:68:4e:30:7c:84:87:70:e9:8d:
f8:49:a3:80:69:ea:92:24:40:31:f4:42:8c:ef:11:
82:ac:86:47:f3:1b:13:07:67:2e:22:65:67:7e:a6:
f3:5d
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Extended Key Usage:
TLS Web Server Authentication
X509v3 Basic Constraints: critical
CA:FALSE
X509v3 Subject Alternative Name:
DNS:external.example.com, DNS:master.example.com, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, DNS:openshift, DNS:openshift.default, DNS:openshift.default.svc, DNS:openshift.default.svc.cluster.local, DNS:172.30.0.1, DNS:192.168.xxx.xx, DNS:74.207.xxx.x, IP Address:172.30.0.1, IP Address:192.168.xxx.xx, IP Address:74.207.xxx.x
Signature Algorithm: sha256WithRSAEncryption
4c:18:c7:15:27:3a:d8:d2:8f:b2:6f:f2:d1:27:91:1c:25:bc:
a8:21:41:df:72:46:3c:00:c8:96:36:8b:70:77:db:f5:c9:27:
98:57:d3:73:a9:af:23:23:26:29:b3:64:25:67:3a:f5:44:0c:
8a:34:f6:79:ee:e4:c1:51:77:27:ed:c0:86:c6:e8:06:2a:08:
a3:3a:9f:5a:22:1d:a1:55:81:c6:cd:76:98:e9:ed:cf:35:a5:
7a:69:38:f6:ce:4e:e3:79:dc:8f:22:ee:62:25:e2:34:7d:26:
33:2e:23:f0:1d:9c:e4:c2:95:84:39:85:54:0e:dd:ff:1c:62:
51:d6:98:2a:0c:fe:8b:c5:01:b3:f6:2c:1f:51:6b:06:f9:23:
86:fc:fe:85:e3:51:8c:99:5b:71:c9:a8:ee:15:f0:90:61:a4:
a4:89:f2:cd:7f:49:db:e6:d0:8c:e6:d7:96:cb:d5:80:56:8a:
43:7c:4b:57:8d:62:39:9f:d2:fa:fa:64:94:a3:14:fa:41:5a:
23:55:4f:85:25:e2:ed:97:49:2b:e7:ae:f2:e7:84:91:f8:d0:
6e:bb:6a:7d:1e:c1:6f:9b:df:6d:3e:9f:75:9e:d7:c6:2c:ee:
bb:ac:f7:5b:74:85:e0:94:e6:f2:a5:fa:e1:51:7b:ef:a0:c6:
71:89:bb:4f
$ openssl x509 -in /etc/origin/master/master.etcd-client.crt -text -noout
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 3 (0x3)
Signature Algorithm: sha256WithRSAEncryption
Issuer: CN=etcd-signer@1511770342
Validity
Not Before: Nov 27 08:23:53 2017 GMT
Not After : Nov 26 08:23:53 2022 GMT
Subject: CN=master.example.com
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Modulus:
00:ae:18:45:dc:62:6f:53:d2:9a:3b:80:bc:a9:97:
13:5f:d6:c6:18:44:f1:7e:29:16:f3:72:1f:83:4a:
0c:d7:6c:0d:b8:9c:cb:a8:03:cc:ab:0c:93:19:87:
c7:d3:25:9d:46:60:34:04:60:fc:d5:de:de:a3:43:
ff:db:67:d8:2e:6d:c4:89:7a:c6:84:f1:26:27:eb:
8c:6c:4f:42:52:99:d0:9e:98:f2:b9:c0:4d:e0:2d:
98:0a:8e:70:6c:a2:f7:40:92:4d:ee:c7:af:fe:3a:
65:d8:97:e5:b8:b5:92:b8:87:aa:9c:0f:0b:b8:9c:
25:47:d4:e7:8c:ff:3c:36:f2:0f:fd:1a:7f:17:75:
f1:eb:e0:03:3e:9f:4b:c1:8b:93:1f:85:b5:d7:6b:
de:df:7a:87:d4:fc:15:53:72:52:51:53:c2:98:ee:
85:05:91:6d:59:1f:0c:bb:4d:e9:1a:c9:a3:c0:a4:
04:12:1c:d5:c9:99:ce:8b:bf:41:88:ca:a2:d8:bb:
4f:26:11:98:a7:b8:e1:d3:45:07:2c:65:35:0f:94:
6d:af:dd:e9:b0:49:22:34:26:5d:0a:29:1a:33:00:
5a:2c:f7:74:d2:20:f9:fa:e5:d4:a2:ce:c7:94:4e:
da:1d:d6:36:d6:65:6e:ee:71:6c:f0:fd:ad:8d:14:
95:1f
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Authority Key Identifier:
keyid:D5:EF:AA:49:64:1C:65:27:43:15:EE:97:21:D0:2E:83:05:FA:A3:D9
DirName:/CN=etcd-signer@1511770342
serial:B9:BA:70:10:23:D5:EA:0D
X509v3 Basic Constraints: critical
CA:FALSE
X509v3 Extended Key Usage:
TLS Web Client Authentication
X509v3 Key Usage:
Digital Signature, Key Encipherment
X509v3 Subject Key Identifier:
B7:BB:15:D0:68:AB:73:DD:77:D1:7B:3D:39:02:BD:98:B0:FE:FC:D3
X509v3 Subject Alternative Name:
IP Address:192.168.xxx.xx, DNS:master.example.com
Signature Algorithm: sha256WithRSAEncryption
2b:08:53:a1:ff:06:0b:d5:17:9f:89:75:1d:95:eb:2f:16:8e:
1b:52:f6:a8:0d:1e:f6:2f:82:01:59:85:9b:61:da:4b:85:78:
66:49:5a:98:a1:b8:4e:fc:dd:d4:18:a0:52:bc:44:34:30:8e:
64:b9:22:a7:d0:57:69:2f:ba:1b:d1:00:b4:a8:9b:0f:0e:dd:
ac:31:b4:de:c4:a3:3c:0c:86:98:07:e8:2f:6f:21:4a:96:e5:
c3:c1:a1:4a:27:e2:4d:07:89:60:6d:c0:ae:b9:85:a1:63:0a:
fb:5d:47:5e:0c:39:d1:97:4d:76:a3:1c:cf:95:38:8c:cb:05:
17:a0:d5:6f:9b:e9:93:74:56:89:f1:d6:b4:82:40:a8:d0:1e:
55:3c:dc:7c:dd:87:03:61:28:f0:0e:95:9d:55:a2:53:d5:af:
2c:a7:2d:f4:f2:4f:0f:97:78:3d:98:4b:9b:d5:8b:44:fd:59:
eb:ad:a9:8c:c3:62:c1:44:48:0b:98:1f:28:fe:f4:b6:03:7a:
08:9e:ec:bf:f3:3c:fc:9f:2a:8c:ef:c8:ac:b6:4a:94:8c:c8:
9d:9e:68:51:0e:82:60:a4:92:3c:5c:52:b5:e0:e7:fa:9f:cd:
9f:97:0d:5b:ba:08:d1:38:23:e6:8f:16:1c:50:55:67:bc:b3:
8a:64:7d:a9:4f:7b:e5:55:5c:7f:6b:50:55:35:86:f3:7c:5f:
78:b2:f0:94:5e:21:73:32:97:8a:68:0d:1c:2c:54:79:c8:fa:
0f:34:e7:72:7d:0b:8f:d9:5f:70:02:2f:fa:11:43:d9:3e:44:
f7:0a:99:73:0f:1e:9e:44:9a:67:1f:97:51:16:be:38:21:61:
2a:8a:86:e1:e1:fc:f4:29:9e:35:9c:af:7e:1c:0b:fb:9d:1f:
bb:d2:c1:0d:46:32:48:15:fe:f8:38:27:5f:e2:4c:d7:34:ae:
66:22:6f:d4:bd:e3:3d:da:5f:22:67:80:f5:2d:d9:d7:d4:64:
b3:00:c9:29:09:41:60:d8:bc:ef:22:72:8d:a5:5b:38:55:f0:
19:e2:bb:a8:5a:ae:c0:0d:c2:3e:03:c8:2e:9e:df:2c:28:0d:
37:b8:28:e2:9a:30:b8:66:14:2c:c1:ee:fd:de:bc:5e:2c:d7:
3d:e6:fe:02:07:8c:1f:b7:a8:53:b6:48:d0:ea:06:ea:30:3e:
1e:13:c8:1d:3b:7a:73:e4:d0:15:40:5e:be:d8:94:44:c1:4d:
5b:a2:f2:de:9b:b4:96:3d:95:e9:8a:14:6f:3e:e5:73:52:be:
3d:0a:0e:fb:73:00:24:9b:26:69:d5:12:29:e9:71:09:05:49:
73:39:3a:c8:0d:be:5f:1f
Let me know if I can provide any more information to help! I will try and do some more investigation.
This also seems to be a similar, if not the same problem to this issue: https://github.com/openshift/openshift-ansible/issues/6087
I have done some more investigation and found that:
etcd server manually with:sudo /usr/bin/etcd --name=master.example.com --data-dir=/var/lib/etcd/ --cert-file /etc/etcd/peer.crt --key-file /etc/etcd/peer.key --ca-file /etc/etcd/ca.crt --listen-client-urls=https://192.168.xxx.xx:2379 --advertise-client-urls=https://192.168.xxx.xx:2379 --debug
I don't get a cert error when the server starts.
etcdctl -C https://${ETCD_CA_HOST}:2379 \
--ca-file=/etc/etcd/ca.crt \
--cert-file=/etc/etcd/peer.crt \
--key-file=/etc/etcd/peer.key \
cluster-health
If ETCD_CA_HOST is localhost or the hostname of my master the connection fails
If ETCD_CA_HOST is openshift_ip of my master it works, whether there is a certificate error or not.
Therefore I think what is happening is that the etcd daemon is being set up to listen on the private openshift_ip but the origin-master daemon is trying to connect to etcd using the master's public hostname which resolves to a different IP to openshift_ip and the etcd daemon refuses the connection.
@eliu Does your master host name resolve to an IPv6 address by any chance? Another theory I have is that certain services are listening on the masters's IPv4 address but the hostname is resolving to an IPv6 address so the connection is getting refused. I should be able to test this theory soon.
I managed to solve my problem by doing the following:
openshift_hostname to specify them in the inventory file.master.example.com openshift_ip=192.168.xxx.xx openshift_hostname=master.internal.example.com
This ensured that when the the origin-master-api.service tried to connect to the etcd daemon it resolved to the private IP that the etcd daemon was listening on, not the public IP of the master.
It seems that the openshift_ip parameter is ignored by the master-api when resolving services.
This has not fixed the transport: remote error: tls: bad certificate message but the etcd daemon seems to work regardless and the installation completed successfully.
@NoxHarmonium nope, all i use is ipv4 only. and recently i found the main cause is not this etcd bad certificate issue, but the port conflict between master and lb. I setup master and lb sharing one server node.
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle rotten
/remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.
/close
@openshift-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting
/reopen.
Mark the issue as fresh by commenting/remove-lifecycle rotten.
Exclude this issue from closing again by commenting/lifecycle frozen./close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
@NoxHarmonium nope, all i use is ipv4 only. and recently i found the main cause is not this etcd bad certificate issue, but the port conflict between master and lb. I setup master and lb sharing one server node.