Kubeadm: kubectl log returns tls error

Created on 11 Dec 2017 · 6Comments · Source: kubernetes/kubeadm

With freshly installed 1.8.5 cluster, I consistently see the following error when I try to run "kubectl logs" for a pod.

Dec 11 14:22:30 kube-3 kubelet: I1211 14:22:30.440352 1355 logs.go:41] http: TLS handshake error from 192.168.80.234:37000: no certificate available
Dec 11 14:22:30 kube-3 journal: E1211 19:22:30.440408 1 status.go:62] apiserver received an error that is not an metav1.Status: Get https://192.168.80.234:10250/containerLogs/kube-system/kube-proxy-l2f9s/kube-proxy: remote error: tls: internal error

Here is kubeadm command line I used to bring the cluster up:

kubeadm init --pod-network-cidr 10.57.128.0/19 --service-cidr 10.57.160.0/19 --service-dns-domain sbezverk.cisco.com --node-name kube-3.sbezverk.cisco.com

[root@kube-3 ~]# kubectl logs -n kube-system kube-proxy-l2f9s
Error from server: Get https://192.168.80.234:10250/containerLogs/kube-system/kube-proxy-l2f9s/kube-proxy: remote error: tls: internal error

/kind bug
/sig cluster-lifecycle

kinbug lifecyclstale sicluster-lifecycle

Source

sbezverk

👀1

Most helpful comment

I can reproduce this at will with kubeadm v1.15.0 by including serverTLSBootstrap: true in the config file passed into kubeadm init; does that make it a kubeadm bug or it should be reported against kubelet?

As best I can tell, there is some kind of race condition that is causing kubelet(?) to make two CSRs within what appears to be milliseconds of each other, but only one of them is actually Issued:

# kubectl get -o wide csr
NAME        AGE   REQUESTOR           CONDITION
csr-ngjws   91s   system:node:k8s-1   Pending
csr-svntg   91s   system:node:k8s-1   Approved,Issued
# kubectl get -o yaml csr | grep creationTimestamp
    creationTimestamp: "2019-07-16T03:05:54Z"
    creationTimestamp: "2019-07-16T03:05:54Z"

This is a single Node scenario, which should make debugging it a little easier since all the CSR traffic will be for this one Node only; it is provisioned in the normal way:

# kubeadm init --config /tmp/kubeadm.yml

# snip out the two other docs, `InitConfiguration` and `ClusterConfiguration`
# although they're in the Gist linked above
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: "systemd"
serverTLSBootstrap: true

mdaniel on 16 Jul 2019

👍3

All 6 comments

[root@kube-3 kolla-kubernetes]# kubectl get csr --all-namespaces
NAMESPACE NAME AGE REQUESTOR CONDITION
csr-5fwb9 6h system:node:kube-3.sbezverk.cisco.com Pending
csr-6ckwh 3h system:node:kube-3.sbezverk.cisco.com Pending
csr-ff49s 4h system:node:kube-3.sbezverk.cisco.com Pending
csr-hmtdl 6h system:node:kube-3.sbezverk.cisco.com Pending
csr-j429n 39m system:node:kube-3.sbezverk.cisco.com Pending
csr-tsms9 1h system:node:kube-3.sbezverk.cisco.com Pending
csr-v9wx4 5h system:node:kube-3.sbezverk.cisco.com Pending
csr-w4t8x 2h system:node:kube-3.sbezverk.cisco.com Pending
[root@kube-3 kolla-kubernetes]#
[root@kube-3 kolla-kubernetes]#
[root@kube-3 kolla-kubernetes]# kubectl describe csr system:node:kube-3.sbezverk.cisco.com
Error from server (NotFound): certificatesigningrequests.certificates.k8s.io "system:node:kube-3.sbezverk.cisco.com" not found
[root@kube-3 kolla-kubernetes]# kubectl describe csr csr-5fwb9
Name: csr-5fwb9
Labels:
Annotations:
CreationTimestamp: Mon, 11 Dec 2017 14:12:43 -0500
Requesting User: system:node:kube-3.sbezverk.cisco.com
Status: Pending
Subject:
Common Name: system:node:kube-3.sbezverk.cisco.com
Serial Number:
Organization: system:nodes
Subject Alternative Names:
DNS Names: kube-3.sbezverk.cisco.com
kube-3.sbezverk.cisco.com
IP Addresses: 192.168.80.234
fe80::5054:ff:fe43:d374
172.17.0.1
fe80::50dc:6aff:fe00:d2b1
10.57.128.44
fc00::10ca:1
fe80::548c:2fff:feb2:470
fe80::a0a0:37ff:fe21:4aa
Events:

sbezverk on 12 Dec 2017

👍1

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 13 Mar 2018

@sbezverk ever figured out why this was happening ?

kuberkaul on 20 Feb 2019

@kuberkaul by some reason api server was not approving certificate requests, after rebuilding cluster, the issue was gone.

sbezverk on 20 Feb 2019

👍1

As best I can tell, there is some kind of race condition that is causing kubelet(?) to make two CSRs within what appears to be milliseconds of each other, but only one of them is actually Issued:

# kubectl get -o wide csr
NAME        AGE   REQUESTOR           CONDITION
csr-ngjws   91s   system:node:k8s-1   Pending
csr-svntg   91s   system:node:k8s-1   Approved,Issued
# kubectl get -o yaml csr | grep creationTimestamp
    creationTimestamp: "2019-07-16T03:05:54Z"
    creationTimestamp: "2019-07-16T03:05:54Z"

This is a single Node scenario, which should make debugging it a little easier since all the CSR traffic will be for this one Node only; it is provisioned in the normal way:

# kubeadm init --config /tmp/kubeadm.yml

# snip out the two other docs, `InitConfiguration` and `ClusterConfiguration`
# although they're in the Gist linked above
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: "systemd"
serverTLSBootstrap: true

mdaniel on 16 Jul 2019

👍3

I have the same issue (serverTLSBootstrap: true), if I upgrade from v1.14.X to v1.15.0, everything continues working till I deleted the serving certificate of the master node's kubelet (for testing something else) and I have this error.
Then I decided to destroy the existing cluster and bootstrap a fresh v1.15.0, I have exactly the same situation as @mdaniel; 2 system:node: CSRs with very close creationTimeStamps.