Etcd: 3.2/3.3 etcd server with TLS would start with error "tls: bad certificate"

Created on 7 Mar 2018  Â·  21Comments  Â·  Source: etcd-io/etcd

While debugging issues (might be relevant):

I have found that a single member etcd server on bootstrap will show error:

2018-03-07 22:36:51.136699 I | embed: rejected connection from "127.0.0.1:35160" (error "tls: failed to verify client's certificate: x509: certificate specifies an incompatible key usage", ServerName "")
WARNING: 2018/03/07 22:36:51 Failed to dial 0.0.0.0:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.

Reproduce steps:
https://gist.github.com/hongchaodeng/7d62f3b5d30b58c783c382d9b629b819

Note that 3.1.11 didn't have this error log.

aretls stale

Most helpful comment

I meet the same issue etcd version 3.3.1
logs shown as follow
Mar 09 13:37:33 master1 etcd[4197]: rejected connection from "192.168.9.186:31833" (error "remote error: tls: bad certificate", ServerName "")

All 21 comments

I can reproduce with 3.2 and 3.3. Will take a look.

This comment is relevant here: https://github.com/coreos/etcd-operator/pull/1727#issuecomment-370968658

I meet the same issue etcd version 3.3.1
logs shown as follow
Mar 09 13:37:33 master1 etcd[4197]: rejected connection from "192.168.9.186:31833" (error "remote error: tls: bad certificate", ServerName "")

I have the same issue in 3.3.1

3月 13 12:38:47 172-20-24-117 etcd[9508]: rejected connection from "127.0.0.1:55480" (error "tls: failed to verify client's certificate: x509: certificate specifies an incompatible key usage", ServerName "")
3月 13 12:38:47 172-20-24-117 bash[9508]: WARNING: 2018/03/13 12:38:47 Failed to dial 0.0.0.0:4001: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.
3月 13 12:38:47 172-20-24-117 bash[9508]: WARNING: 2018/03/13 12:38:47 Failed to dial 0.0.0.0:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.
3月 13 12:38:47 172-20-24-117 etcd[9508]: rejected connection from "127.0.0.1:46640" (error "tls: failed to verify client's certificate: x509: certificate specifies an incompatible key usage", ServerName "")

Hello,
I've the same for 3.2.16

-- The start-up result is done. Mar 14 18:48:09 kubem01 etcd[3089]: WARNING: 2018/03/14 18:48:09 Failed to dial 10.101.0.81:2379: connection error: desc = "transport: authentication handshake failed: remote error Mar 14 18:48:09 kubem01 etcd[3089]: WARNING: 2018/03/14 18:48:09 Failed to dial 10.101.0.81:2379: connection error: desc = "transport: authentication handshake failed: remote error [root@kubem01 ~]#

I am having the same issue after updating v3.2.11 to v3.3.3.

Seeing the same with v3.3.5 after debugging coreos/etcd-operator#1962.

Resolved by adding client auth as an extended key usage in my cfssl config as recommended here (and evidently missing based on error output):
https://coreos.com/os/docs/latest/generate-self-signed-certificates.html

@tkellen What have yopu exactly added to resolve this? I am using etcd version 3.2.17 and getting below error-
Failed to dial 0.0.0.0:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.

I have the same issue in 3.3.9

i have the same issue in 3.3.8 on openshift

How to fix the issue: https://github.com/etcd-io/etcd/issues/9785#issuecomment-432438748 (add "client auth" to "server" profile in CA config and regenerate server cert).

How to fix the issue: #9785 (comment) (add "client auth" to "server" profile in CA config and regenerate server cert).

Not working for me.
CA

cat csr_ROOT_CA.json 
{
 "CN": "dev",
 "key": {
    "algo": "rsa",
    "size": 4096
 },
 "names": [
 {
    "C": "RU",
    "L": "Sauronsk",
    "O": "kubernetes",
    "ST": "Mordor"
 }
 ],
 "ca": {
    "expiry": "8760h"
 }
}

Generate

cfssl gencert -initca csr_ROOT_CA.json | cfssljson -bare root_ca

Intermediate for etcd

cat csr_INTERMEDIATE_CA.json
{
 "CN": "etcd.dev",
 "key": {
    "algo": "rsa",
    "size": 4096
 },
 "names": [
 {
    "C": "RU",
    "L": "Sauronsk",
    "O": "kubernetes",
    "ST": "Mordor"
 }
 ],
 "ca": {
    "expiry": "8760h"
 }
}

Generating intermediate

cfssl gencert -initca csr_INTERMEDIATE_CA.json | cfssljson -bare intermediate_ca

Intermediate sign config

cat root_to_intermediate_ca.json
{ 
"signing": {
 "default": {
 "usages": ["digital signature","cert sign","crl sign","signing"],
 "expiry": "8760h",
 "ca_constraint": {"is_ca": true, "max_path_len":0, "max_path_len_zero": true}
 }
 }

Sign intermediate

cfssl sign -ca ../root_ca.pem -ca-key ../root_ca-key.pem -config root_to_intermediate_ca.json intermediate_ca.csr | cfssljson -bare intermediate_ca

Etcd certificate config

cat csr_END_CA.json
{
 "CN": "etcd.kub1.cloud",
 "hosts": [
   "10.10.10.101", "etcd.kub1.cloud"
 ],
 "key": {
    "algo": "rsa",
    "size": 4096
 },
 "names": [
 {
    "C": "RU",
    "L": "Sauronsk",
    "O": "kubernetes",
    "ST": "Mordor"
 }
 ]
}

Intermediate to end config

cat intermediate_to_end.json                                                   
{
  "signing": {
    "default": {
      "expiry": "8760h"
    },
    "profiles": {
      "server": {
        "expiry": "8760h",
        "usages": [
          "signing",
          "key encipherment",
          "server auth"
        ]
      },
      "client": {
        "expiry": "8760h",
        "usages": [
          "signing",
          "key encipherment",
          "client auth"
        ]
      },
      "client-server": {
        "expiry": "8760h",
        "usages": [
          "signing",
          "key encipherment",
          "server auth",
          "client auth"
        ]
      }
    }
  }

Generating etcd config

cfssl gencert -ca ../intermediate_ca.pem -ca-key ../intermediate_ca-key.pem -config intermediate_to_end.json -profile=client-server csr_END_CA.json | cfssljson -bare etcd

And starting etcd like

cat /etc/systemd/system/etcd.service 
[Unit]
Description=etcd
Documentation=https://github.com/coreos/etcd

[Service]
User=etcd
Type=notify
Restart=always
RestartSec=5s
LimitNOFILE=40000
TimeoutStartSec=0

ExecStart=/usr/local/bin/etcd --name kub1_etcd \
    --data-dir /var/lib/etcd \
    --client-cert-auth \
    --trusted-ca-file /etc/ssl/certs/etcd/ca.pem \
    --cert-file /etc/ssl/certs/etcd/etcd.pem \
    --key-file /etc/ssl/certs/etcd/etcd-key.pem \
    --peer-client-cert-auth \
    --peer-trusted-ca-file /etc/ssl/certs/etcd/ca.pem \
    --peer-cert-file /etc/ssl/certs/etcd/etcd.pem \
    --peer-key-file /etc/ssl/certs/etcd/etcd-key.pem \
    --listen-client-urls https://10.10.10.101:2379 \
    --advertise-client-urls https://10.10.10.101:2379 \
    --listen-peer-urls https://10.10.10.101:2380 \
    --initial-advertise-peer-urls https://10.10.10.101:2380 \
    --initial-cluster kub1_etcd=https://10.10.10.101:2380,kub2_etcd=https://10.10.10.104:2380 \
    --initial-cluster-token my-etcd-token \
    --initial-cluster-state new

[Install]
WantedBy=multi-user.target

Same config for second node (10.10.10.104)

Erorr log still the same ((

rejected connection from "10.10.10.104:43816" (error "remote error: tls: bad certificate", ServerName "")

/assign

@nejtr0n , I don't see "client auth" permission in your config for "server" key:

      "server": {
        "expiry": "8760h",
        "usages": [
          "signing",
          "key encipherment",
          "server auth"
        ]
      },

How to fix the issue: #9785 (comment) (add "client auth" to "server" profile in CA config and regenerate server cert).

I reran ansible scripts one more time, and etcd is up with 3.3 version. Dunno how it was solved... Do not change nothng.

I didn't have this problem until I upgraded to 3.4. I think the golang upgrade is the cause but if y'all were having problems with 3.3 then I don't think my issues are the same as everyone's here.

This is still a problem with no clarity in the documentation for 3.4.

I have the same issue in 3.4.3

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

I had the same issue in v3.4.9.
I resolved it by adding clientAuth (TLS Web Client Authentication) to ETCD server certificate (used in ETCD_CERT_FILE).
I'm not sure if, and why a server should have clientAuth flag in its certificate...

Was this page helpful?
0 / 5 - 0 ratings