Rke: `Unable to connect to the server: x509: certificate is valid for ingress.local, not {{ADDRESS}}` desite being listed under `authentication.sans`

Created on 4 Oct 2020 · 5Comments · Source: rancher/rke

RKE version:
v1.1.7

Docker version: (docker version,docker info preferred)

Client:
 Debug Mode: false

Server:
 Containers: 48
  Running: 21
  Paused: 0
  Stopped: 27
 Images: 20
 Server Version: 19.03.13
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: systemd
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8fba4e9a7d01810a393d5d25a3621dc101981175
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.19.0-11-amd64
 Operating System: Debian GNU/Linux 10 (buster)
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 7.992GiB
 Name: k01
 ID: ZXMU:SN3A:4SXR:ZA2Y:JFUG:MZCK:5DMW:SRE5:SA3P:WLLS:H6QO:CZQC
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support
````

**Operating system and kernel: (`cat /etc/os-release`, `uname -r` preferred)**
`4.19.0-11-amd64`

**Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)**
Bare-metal/generic VPS

**cluster.yml file:**

nodes:

address: myip
user: root
role:
- controlplane
- etcd
- worker

cluster_name: mycluster
kubernetes_version: v1.18.8-rancher1-1

authentication:
strategy: x509
sans:
- "mycluster.mydomain.mytld"

authorization:
mode: rbac

network:
plugin: calico

dns:
provider: coredns

ingress:
provider: nginx
options:
use-forwarded-headers: 'true'

services:
kube-api:
secrets_encryption_config:
enabled: true

**Haproxy.cfg loadbalancer (just in case it might be relevant, I hope not)**

Global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
user haproxy
group haproxy
daemon

ca-base         /etc/ssl/certs
crt-base        /etc/ssl/private

ssl-default-bind-ciphers     ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
ssl-default-bind-options     no-sslv3

defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000

errorfile       400         /etc/haproxy/errors/400.http
errorfile       403         /etc/haproxy/errors/403.http
errorfile       408         /etc/haproxy/errors/408.http
errorfile       500         /etc/haproxy/errors/500.http
errorfile       502         /etc/haproxy/errors/502.http
errorfile       503         /etc/haproxy/errors/503.http
errorfile       504         /etc/haproxy/errors/504.http

frontend main
bind *:6443
bind *:443
bind *:80
mode tcp
option tcplog

acl is_kubeapi hdr(host) -i mycluster.mydomain.mytld

use_backend kubeapi if is_kubeapi
default_backend kubewrk

backend kubeapi
mode tcp
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100

    server           kubeapi-mycluster-mynode myip:6443 check

backend kubewrk
mode tcp
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100

    server           kubewrk-mycluster-mynode myip:443 check

```

Steps to Reproduce:

Have two Debian 10 nodes with docker-ce, one with haproxy, the other as RKE-node.
Run rke up
Edit the generated kubeconfig's server URL to point to haproxy load balancer through the SAN instead of control plane IP.
Try to do kubectl get nodes

Results:
Unable to connect to the server: x509: certificate is valid for ingress.local, not mycluster.mydomain.mytld

Causes that have ocurred to me:

I typo'd the SAN in the cluster.yml: Unlikely, I have tried this tens of times, each time having the same outcome, each time making sure I get it right.
The haproxy loadbalancer is misconfigured: Possible, but the error message indicates there's something wrong with the certificate, not the connection to the kubeapiserver, but perhaps the two are related. I was hoping to test (and debug) the load balancing by changing the resulting kubeconfig from rke up and pointing it to the load balancer, hence I am at this point. To make sure it's not this, I will validate whether the connection works, but I have already posted this issue because I suspect this is not the cause.
I have neglected some detail that is generic to k8s, and not RKE specifically: Likely, since I am not an expert on either k8s or RKE.

I hope this issue doesn't get burried, and would be grateful for any attention this issue gets.

statumore-info

Source

SebJansen

Most helpful comment

Hi @ohader @karthicksndr,

I have diverged significantly, since I have elected to use DNS-based spreading of traffic. My motivation for distributing was to not have a single point of failure, instead of going at it purely for performance-reasons.

Notice the ugliness in terms of repetition, and so without any guarantee, this was my last version of the haproxy.cfg (without the global section) and rke-config. I can't remember if it actually worked, but have a whole Ansible playbook that sets everything up, which I can share after a bit of cleaning up. However, it would at least initially make certain assumptions regarding OS and DNS provider. So let let me know if the code below either doesn't work or you'd like a ready-to-go Ansible-role for setting all up.

haproxy.cfg

frontend main
    bind            *:6443
    bind            *:80
    bind            *:443

    mode            tcp
    option          tcplog

    acl             is_worker80    dst_port 80
    acl             is_worker443   dst_port 443

    use_backend     worker80       if is_worker80
    use_backend     worker443      if is_worker443

    default_backend master

backend master
    mode            tcp
    option          tcp-check
    balance         roundrobin
    default-server  inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
    server          master.node.cluster.region.mydomain.tld myip:6443 check


backend worker443
    mode            tcp
    option          tcp-check
    balance         roundrobin
    default-server  inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100

    server          worker.node.cluster.region.mydomain.tld myip:443 check


backend worker80
    mode            tcp
    option          tcp-check
    balance         roundrobin
    default-server  inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100

    server          worker.node.cluster.region.mydomain.tld myip:80 check

rke config.yaml

nodes:
- address: my_ip
  user: root
  role:
    - controlplane
    - worker
    - etcd

cluster_name: my_cluster

# rke config --list-version --all
kubernetes_version: 1.19

# https://rancher.com/docs/rke/latest/en/config-options/authentication/
authentication:
  strategy: x509
  sans:
    - "master.node.cluster.region.mydomain.tld myip"

authorization:
  mode: rbac

# https://rancher.com/docs/rke/latest/en/config-options/add-ons/network-plugins/
network:
  plugin: canal

# https://rancher.com/docs/rke/latest/en/config-options/add-ons/dns/
dns:
  provider: coredns

# Currently only nginx ingress provider is supported.
# To disable ingress controller, set `provider: none`
# `node_selector` controls ingress placement and is optional
ingress:
  provider: nginx
  options:
    use-forwarded-headers: 'true'

services:
  kube-api:
    secrets_encryption_config:
      enabled: true
  # For Rook
  kubelet:
    extra_args:
      volume-plugin-dir: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
    extra_binds:
      - /usr/libexec/kubernetes/kubelet-plugins/volume/exec:/usr/libexec/kubernetes/kubelet-plugins/volume/exec

SebJansen on 17 Dec 2020

❤1 👍1

All 5 comments

ingress.local is hitting the ingress controller on the node, and returns the default certificate if no host is matched. So the ACL is not matched, this should be easily eliminated by checking if it works if you simplify the config and default to kubeapi to see if that works and work from there. The frontend mode in TCP and checking host headers doesn't add up to me, you normally need http to check host headers. But by simplifying your config you should be able to debug better.

superseb on 5 Oct 2020

Thank you, you were absolutely right @superseb!

SebJansen on 21 Oct 2020

@SebJansen Could you please share the adjustments you had to make to fix the issue? Thanks in advance!

ohader on 29 Oct 2020

👍1

@SebJansen Could you please share the adjustments you had to make to fix the issue? Thanks in advance!

karthicksndr on 16 Dec 2020

Hi @ohader @karthicksndr,

haproxy.cfg

frontend main
    bind            *:6443
    bind            *:80
    bind            *:443

    mode            tcp
    option          tcplog

    acl             is_worker80    dst_port 80
    acl             is_worker443   dst_port 443

    use_backend     worker80       if is_worker80
    use_backend     worker443      if is_worker443

    default_backend master

backend master
    mode            tcp
    option          tcp-check
    balance         roundrobin
    default-server  inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
    server          master.node.cluster.region.mydomain.tld myip:6443 check


backend worker443
    mode            tcp
    option          tcp-check
    balance         roundrobin
    default-server  inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100

    server          worker.node.cluster.region.mydomain.tld myip:443 check


backend worker80
    mode            tcp
    option          tcp-check
    balance         roundrobin
    default-server  inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100

    server          worker.node.cluster.region.mydomain.tld myip:80 check

rke config.yaml

nodes:
- address: my_ip
  user: root
  role:
    - controlplane
    - worker
    - etcd

cluster_name: my_cluster

# rke config --list-version --all
kubernetes_version: 1.19

# https://rancher.com/docs/rke/latest/en/config-options/authentication/
authentication:
  strategy: x509
  sans:
    - "master.node.cluster.region.mydomain.tld myip"

authorization:
  mode: rbac

# https://rancher.com/docs/rke/latest/en/config-options/add-ons/network-plugins/
network:
  plugin: canal

# https://rancher.com/docs/rke/latest/en/config-options/add-ons/dns/
dns:
  provider: coredns

# Currently only nginx ingress provider is supported.
# To disable ingress controller, set `provider: none`
# `node_selector` controls ingress placement and is optional
ingress:
  provider: nginx
  options:
    use-forwarded-headers: 'true'

services:
  kube-api:
    secrets_encryption_config:
      enabled: true
  # For Rook
  kubelet:
    extra_args:
      volume-plugin-dir: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
    extra_binds:
      - /usr/libexec/kubernetes/kubelet-plugins/volume/exec:/usr/libexec/kubernetes/kubelet-plugins/volume/exec

SebJansen on 17 Dec 2020

❤1 👍1

Was this page helpful?

0 / 5 - 0 ratings