'experimental-control-plane', 'control-plane node', 'ha master'
BUG REPORT
kubeadm version (use kubeadm version):
kubeadm version: &version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:36:44Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
Environment:
kubectl version):Cloud provider or hardware configuration:
Local VMs running Ubuntu 18.04 LTS
OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="18.04.1 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.1 LTS"
VERSION_ID="18.04"
Kernel (e.g. uname -a):
Linux hypervisor1 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Install tools:
kubeadm
Others:
Docker version 18.09.0, build 4d60db4
Attempted to follow the HA cluster setup guide at https://kubernetes.io/docs/setup/independent/high-availability/#first-steps-for-both-methods ('external etcd nodes' path), unable to bring up secondary/tertiary control-plane nodes with kubeadm join <first_master_ip>:6443 --token <token> --discovery-token-ca-cert-hash <hash> --experimental-control-plane. Resulting error (adding -v10 to flags) is:
I1225 23:10:38.732523 7862 join.go:299] [join] found NodeName empty; using OS hostname as NodeName
I1225 23:10:38.732570 7862 join.go:303] [join] found advertiseAddress empty; using default interface's IP address as advertiseAddress
I1225 23:10:38.732797 7862 interface.go:384] Looking for default routes with IPv4 addresses
I1225 23:10:38.732807 7862 interface.go:389] Default route transits interface "enp0s25"
I1225 23:10:38.733329 7862 interface.go:196] Interface enp0s25 is up
I1225 23:10:38.733433 7862 interface.go:244] Interface "enp0s25" has 2 addresses :[10.50.0.52/24 fe80::96c6:91ff:fe16:9061/64].
I1225 23:10:38.733462 7862 interface.go:211] Checking addr 10.50.0.52/24.
I1225 23:10:38.733482 7862 interface.go:218] IP found 10.50.0.52
I1225 23:10:38.733497 7862 interface.go:250] Found valid IPv4 address 10.50.0.52 for interface "enp0s25".
I1225 23:10:38.733518 7862 interface.go:395] Found active IP 10.50.0.52
[preflight] Running pre-flight checks
I1225 23:10:38.733591 7862 join.go:328] [preflight] Running general checks
I1225 23:10:38.733658 7862 checks.go:245] validating the existence and emptiness of directory /etc/kubernetes/manifests
I1225 23:10:38.733723 7862 checks.go:283] validating the existence of file /etc/kubernetes/kubelet.conf
I1225 23:10:38.733745 7862 checks.go:283] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf
I1225 23:10:38.733764 7862 checks.go:104] validating the container runtime
I1225 23:10:38.805043 7862 checks.go:130] validating if the service is enabled and active
I1225 23:10:38.820095 7862 checks.go:332] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
I1225 23:10:38.820184 7862 checks.go:332] validating the contents of file /proc/sys/net/ipv4/ip_forward
I1225 23:10:38.820242 7862 checks.go:644] validating whether swap is enabled or not
I1225 23:10:38.820290 7862 checks.go:373] validating the presence of executable ip
I1225 23:10:38.820340 7862 checks.go:373] validating the presence of executable iptables
I1225 23:10:38.820373 7862 checks.go:373] validating the presence of executable mount
I1225 23:10:38.820410 7862 checks.go:373] validating the presence of executable nsenter
I1225 23:10:38.820440 7862 checks.go:373] validating the presence of executable ebtables
I1225 23:10:38.820473 7862 checks.go:373] validating the presence of executable ethtool
I1225 23:10:38.820505 7862 checks.go:373] validating the presence of executable socat
I1225 23:10:38.820534 7862 checks.go:373] validating the presence of executable tc
I1225 23:10:38.820566 7862 checks.go:373] validating the presence of executable touch
I1225 23:10:38.820597 7862 checks.go:515] running all checks
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.0. Latest validated version: 18.06
I1225 23:10:38.850254 7862 checks.go:403] checking whether the given node name is reachable using net.LookupHost
I1225 23:10:38.851192 7862 checks.go:613] validating kubelet version
I1225 23:10:38.900691 7862 checks.go:130] validating if the service is enabled and active
I1225 23:10:38.911066 7862 checks.go:208] validating availability of port 10250
I1225 23:10:38.911176 7862 checks.go:430] validating if the connectivity type is via proxy or direct
I1225 23:10:38.911208 7862 join.go:334] [preflight] Fetching init configuration
I1225 23:10:38.911219 7862 join.go:601] [join] Discovering cluster-info
[discovery] Trying to connect to API Server "10.50.0.11:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.50.0.11:6443"
I1225 23:10:38.911828 7862 round_trippers.go:419] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.13.1 (linux/amd64) kubernetes/eec55b9" 'https://10.50.0.11:6443/api/v1/namespaces/kube-public/configmaps/cluster-info'
I1225 23:10:38.929570 7862 round_trippers.go:438] GET https://10.50.0.11:6443/api/v1/namespaces/kube-public/configmaps/cluster-info 200 OK in 17 milliseconds
I1225 23:10:38.929622 7862 round_trippers.go:444] Response Headers:
I1225 23:10:38.929633 7862 round_trippers.go:447] Date: Wed, 26 Dec 2018 07:10:38 GMT
I1225 23:10:38.929642 7862 round_trippers.go:447] Content-Type: application/json
I1225 23:10:38.929664 7862 round_trippers.go:447] Content-Length: 2104
I1225 23:10:38.929784 7862 request.go:942] Response Body: {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"cluster-info","namespace":"kube-public","selfLink":"/api/v1/namespaces/kube-public/configmaps/cluster-info","uid":"497784ba-fc2e-11e8-a921-94c6911692ac","resourceVersion":"1941746","creationTimestamp":"2018-12-10T03:47:16Z"},"data":{"jws-kubeconfig-203t0f":"<snip>","kubeconfig":"apiVersion: v1\nclusters:\n- cluster:\n certificate-authority-data: <snip>"\n server: https://10.50.0.50:6443\n name: \"\"\ncontexts: []\ncurrent-context: \"\"\nkind: Config\npreferences: {}\nusers: []\n"}}
[discovery] Requesting info from "https://10.50.0.11:6443" again to validate TLS against the pinned public key
I1225 23:10:38.934556 7862 round_trippers.go:419] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.13.1 (linux/amd64) kubernetes/eec55b9" 'https://10.50.0.11:6443/api/v1/namespaces/kube-public/configmaps/cluster-info'
I1225 23:10:38.949233 7862 round_trippers.go:438] GET https://10.50.0.11:6443/api/v1/namespaces/kube-public/configmaps/cluster-info 200 OK in 14 milliseconds
I1225 23:10:38.949263 7862 round_trippers.go:444] Response Headers:
I1225 23:10:38.949280 7862 round_trippers.go:447] Content-Type: application/json
I1225 23:10:38.949707 7862 round_trippers.go:447] Content-Length: 2104
I1225 23:10:38.949761 7862 round_trippers.go:447] Date: Wed, 26 Dec 2018 07:10:38 GMT
I1225 23:10:38.949927 7862 request.go:942] Response Body: {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"cluster-info","namespace":"kube-public","selfLink":"/api/v1/namespaces/kube-public/configmaps/cluster-info","uid":"497784ba-fc2e-11e8-a921-94c6911692ac","resourceVersion":"1941746","creationTimestamp":"2018-12-10T03:47:16Z"},"data":{"jws-kubeconfig-203t0f":"<snip>"","jws-kubeconfig-bs09n6":"<snip>","kubeconfig":"apiVersion: v1\nclusters:\n- cluster:\n certificate-authority-data: <snip>=\n server: https://10.50.0.50:6443\n name: \"\"\ncontexts: []\ncurrent-context: \"\"\nkind: Config\npreferences: {}\nusers: []\n"}}
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.50.0.11:6443"
[discovery] Successfully established connection with API Server "10.50.0.11:6443"
I1225 23:10:38.951585 7862 join.go:608] [join] Retrieving KubeConfig objects
[join] Reading configuration from the cluster...
[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
I1225 23:10:38.952957 7862 round_trippers.go:419] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.13.1 (linux/amd64) kubernetes/eec55b9" -H "Authorization: Bearer <snip>" 'https://10.50.0.50:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config'
I1225 23:10:38.966415 7862 round_trippers.go:438] GET https://10.50.0.50:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config 200 OK in 13 milliseconds
I1225 23:10:38.966453 7862 round_trippers.go:444] Response Headers:
I1225 23:10:38.966482 7862 round_trippers.go:447] Content-Type: application/json
I1225 23:10:38.966509 7862 round_trippers.go:447] Content-Length: 1265
I1225 23:10:38.966528 7862 round_trippers.go:447] Date: Wed, 26 Dec 2018 07:10:38 GMT
I1225 23:10:38.966613 7862 request.go:942] Response Body: {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"kubeadm-config","namespace":"kube-system","selfLink":"/api/v1/namespaces/kube-system/configmaps/kubeadm-config","uid":"48cb071d-fc2e-11e8-a921-94c6911692ac","resourceVersion":"1913271","creationTimestamp":"2018-12-10T03:47:15Z"},"data":{"ClusterConfiguration":"apiServer:\n certSANs:\n - 10.50.0.11\n extraArgs:\n authorization-mode: Node,RBAC\n timeoutForControlPlane: 4m0s\napiVersion: kubeadm.k8s.io/v1beta1\ncertificatesDir: /etc/kubernetes/pki\nclusterName: kubernetes\ncontrolPlaneEndpoint: \"\"\ncontrollerManager: {}\ndns:\n type: CoreDNS\netcd:\n external:\n caFile: /etc/kubernetes/pki/etcd/ca.crt\n certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt\n endpoints:\n - https://10.50.5.50:2379\n - https://10.50.5.51:2379\n - https://10.50.5.52:2379\n keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key\nimageRepository: k8s.gcr.io\nkind: ClusterConfiguration\nkubernetesVersion: v1.13.1\nnetworking:\n dnsDomain: cluster.local\n podSubnet: \"\"\n serviceSubnet: 10.96.0.0/12\nscheduler: {}\n","ClusterStatus":"apiEndpoints:\n hypervisor1:\n advertiseAddress: 10.50.0.50\n bindPort: 6443\napiVersion: kubeadm.k8s.io/v1beta1\nkind: ClusterStatus\n"}}
I1225 23:10:38.968352 7862 round_trippers.go:419] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.13.1 (linux/amd64) kubernetes/eec55b9" -H "Authorization: Bearer <snip>" 'https://10.50.0.50:6443/api/v1/namespaces/kube-system/configmaps/kube-proxy'
I1225 23:10:38.973288 7862 round_trippers.go:438] GET https://10.50.0.50:6443/api/v1/namespaces/kube-system/configmaps/kube-proxy 200 OK in 4 milliseconds
I1225 23:10:38.973321 7862 round_trippers.go:444] Response Headers:
I1225 23:10:38.973361 7862 round_trippers.go:447] Content-Type: application/json
I1225 23:10:38.973383 7862 round_trippers.go:447] Content-Length: 1643
I1225 23:10:38.973400 7862 round_trippers.go:447] Date: Wed, 26 Dec 2018 07:10:38 GMT
I1225 23:10:38.973464 7862 request.go:942] Response Body: {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"kube-proxy","namespace":"kube-system","selfLink":"/api/v1/namespaces/kube-system/configmaps/kube-proxy","uid":"498a3257-fc2e-11e8-a921-94c6911692ac","resourceVersion":"240","creationTimestamp":"2018-12-10T03:47:16Z","labels":{"app":"kube-proxy"}},"data":{"config.conf":"apiVersion: kubeproxy.config.k8s.io/v1alpha1\nbindAddress: 0.0.0.0\nclientConnection:\n acceptContentTypes: \"\"\n burst: 10\n contentType: application/vnd.kubernetes.protobuf\n kubeconfig: /var/lib/kube-proxy/kubeconfig.conf\n qps: 5\nclusterCIDR: \"\"\nconfigSyncPeriod: 15m0s\nconntrack:\n max: null\n maxPerCore: 32768\n min: 131072\n tcpCloseWaitTimeout: 1h0m0s\n tcpEstablishedTimeout: 24h0m0s\nenableProfiling: false\nhealthzBindAddress: 0.0.0.0:10256\nhostnameOverride: \"\"\niptables:\n masqueradeAll: false\n masqueradeBit: 14\n minSyncPeriod: 0s\n syncPeriod: 30s\nipvs:\n excludeCIDRs: null\n minSyncPeriod: 0s\n scheduler: \"\"\n syncPeriod: 30s\nkind: KubeProxyConfiguration\nmetricsBindAddress: 127.0.0.1:10249\nmode: \"\"\nnodePortAddresses: null\noomScoreAdj: -999\nportRange: \"\"\nresourceContainer: /kube-proxy\nudpIdleTimeout: 250ms","kubeconfig.conf":"apiVersion: v1\nkind: Config\nclusters:\n- cluster:\n certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt\n server: https://10.50.0.50:6443\n name: default\ncontexts:\n- context:\n cluster: default\n namespace: default\n user: default\n name: default\ncurrent-context: default\nusers:\n- name: default\n user:\n tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token"}}
I1225 23:10:38.975437 7862 round_trippers.go:419] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.13.1 (linux/amd64) kubernetes/eec55b9" -H "Authorization: Bearer 203t0f.naqlqn6j8a4j86w3" 'https://10.50.0.50:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config-1.13'
I1225 23:10:38.981111 7862 round_trippers.go:438] GET https://10.50.0.50:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config-1.13 200 OK in 5 milliseconds
I1225 23:10:38.981145 7862 round_trippers.go:444] Response Headers:
I1225 23:10:38.981186 7862 round_trippers.go:447] Content-Type: application/json
I1225 23:10:38.981202 7862 round_trippers.go:447] Content-Length: 2133
I1225 23:10:38.981213 7862 round_trippers.go:447] Date: Wed, 26 Dec 2018 07:10:38 GMT
I1225 23:10:38.981276 7862 request.go:942] Response Body: {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"kubelet-config-1.13","namespace":"kube-system","selfLink":"/api/v1/namespaces/kube-system/configmaps/kubelet-config-1.13","uid":"48ce2337-fc2e-11e8-a921-94c6911692ac","resourceVersion":"183","creationTimestamp":"2018-12-10T03:47:15Z"},"data":{"kubelet":"address: 0.0.0.0\napiVersion: kubelet.config.k8s.io/v1beta1\nauthentication:\n anonymous:\n enabled: false\n webhook:\n cacheTTL: 2m0s\n enabled: true\n x509:\n clientCAFile: /etc/kubernetes/pki/ca.crt\nauthorization:\n mode: Webhook\n webhook:\n cacheAuthorizedTTL: 5m0s\n cacheUnauthorizedTTL: 30s\ncgroupDriver: cgroupfs\ncgroupsPerQOS: true\nclusterDNS:\n- 10.96.0.10\nclusterDomain: cluster.local\nconfigMapAndSecretChangeDetectionStrategy: Watch\ncontainerLogMaxFiles: 5\ncontainerLogMaxSize: 10Mi\ncontentType: application/vnd.kubernetes.protobuf\ncpuCFSQuota: true\ncpuCFSQuotaPeriod: 100ms\ncpuManagerPolicy: none\ncpuManagerReconcilePeriod: 10s\nenableControllerAttachDetach: true\nenableDebuggingHandlers: true\nenforceNodeAllocatable:\n- pods\neventBurst: 10\neventRecordQPS: 5\nevictionHard:\n imagefs.available: 15%\n memory.available: 100Mi\n nodefs.available: 10%\n nodefs.inodesFree: 5%\nevictionPressureTransitionPeriod: 5m0s\nfailSwapOn: true\nfileCheckFrequency: 20s\nhairpinMode: promiscuous-bridge\nhealthzBindAddress: 127.0.0.1\nhealthzPort: 10248\nhttpCheckFrequency: 20s\nimageGCHighThresholdPercent: 85\nimageGCLowThresholdPercent: 80\nimageMinimumGCAge: 2m0s\niptablesDropBit: 15\niptablesMasqueradeBit: 14\nkind: KubeletConfiguration\nkubeAPIBurst: 10\nkubeAPIQPS: 5\nmakeIPTablesUtilChains: true\nmaxOpenFiles: 1000000\nmaxPods: 110\nnodeLeaseDurationSeconds: 40\nnodeStatusReportFrequency: 1m0s\nnodeStatusUpdateFrequency: 10s\noomScoreAdj: -999\npodPidsLimit: -1\nport: 10250\nregistryBurst: 10\nregistryPullQPS: 5\nresolvConf: /etc/resolv.conf\nrotateCertificates: true\nruntimeRequestTimeout: 2m0s\nserializeImagePulls: true\nstaticPodPath: /etc/kubernetes/manifests\nstreamingConnectionIdleTimeout: 4h0m0s\nsyncFrequency: 1m0s\nvolumeStatsAggPeriod: 1m0s\n"}}
I1225 23:10:38.984349 7862 interface.go:384] Looking for default routes with IPv4 addresses
I1225 23:10:38.984380 7862 interface.go:389] Default route transits interface "enp0s25"
I1225 23:10:38.984752 7862 interface.go:196] Interface enp0s25 is up
I1225 23:10:38.984873 7862 interface.go:244] Interface "enp0s25" has 2 addresses :[10.50.0.52/24 fe80::96c6:91ff:fe16:9061/64].
I1225 23:10:38.984904 7862 interface.go:211] Checking addr 10.50.0.52/24.
I1225 23:10:38.984928 7862 interface.go:218] IP found 10.50.0.52
I1225 23:10:38.984948 7862 interface.go:250] Found valid IPv4 address 10.50.0.52 for interface "enp0s25".
I1225 23:10:38.984963 7862 interface.go:395] Found active IP 10.50.0.52
I1225 23:10:38.985100 7862 join.go:341] [preflight] Running configuration dependant checks
One or more conditions for hosting a new control plane instance is not satisfied.
unable to add a new control plane instance a cluster that doesn't have a stable controlPlaneEndpoint address
Please ensure that:
* The cluster has a stable controlPlaneEndpoint address.
* The certificates that must be shared among control plane instances are provided.
I expected the host to join the cluster in a multi-master/HA fashion
Follow the instructions at https://kubernetes.io/docs/setup/independent/high-availability/#first-steps-for-both-methods on Ubuntu 18.04 LTS hosts
I already took great care to make sure I was copying over the certificates correctly. The instructions could be a bit clearer on the specifics (perhaps another bug/enhancement request?), they currently say "Copy certificates between the first control plane node and the other control plane nodes", without being particularly specific on _which_ certificates. I started out copying just the /etc/kubernetes/pki/{apiserver-etcd-client.crt,apiserver-etcd-client.key,etcd/ca.crt} files, in addition to (later) copying the full /etc/kubernetes/pki/* directory across from the original master node. Both gave similar output with the kubeadm join -v10 command.
Removing the --experimental-control-plane flag allows the node to join as a regular worker-node without complaint.
My kubeadm-config.yaml is:
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: stable
apiServer:
certSANs:
- "10.50.0.11"
controlPlaneEndpoint: "10.50.0.11:6443"
etcd:
external:
endpoints:
- https://10.50.5.50:2379
- https://10.50.5.51:2379
- https://10.50.5.52:2379
caFile: /etc/kubernetes/pki/etcd/ca.crt
certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key
@law
It seems that your config file has a small error.
controlPlaneEndpoint is a field of ClusterConfiguration, not of apiServer; the right yaml should be:
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: stable
apiServer:
certSANs:
- "10.50.0.11"
controlPlaneEndpoint: "10.50.0.11:6443"
etcd:
external:
endpoints:
- https://10.50.5.50:2379
- https://10.50.5.51:2379
- https://10.50.5.52:2379
caFile: /etc/kubernetes/pki/etcd/ca.crt
certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key
PS. There is a reason for copying only some certificates, so kubeadm will take care of generating the others (otherwise you might incur in other issues if certificates are not properly set for the joining node). Suggestion (or PR) for improving the doc are always welcome!
~Much profanity. That worked like a champ @fabianofranz, thank you so much. I have no idea how that indentation got in there, but the docs certainly have the correct setup. This took me many days to figure out, thanks for getting me over the hump~
edit: my apologies, I spoke too soon. I forgot to put the '--experimental-control-plane' flag with my kubeadm-join command. With the updated kubeadm-config.yaml AND the '--experimental-control-plane' flag, I am still unable to get the node to join the cluster. Same error message, regrettably.
I would be happy to submit a couple of clarifying PRs to the docs, would you happen to have the URL for their repo handy?
@law sorry to hear you have still problems
If you could share following info might be I can help
kubectl -n kube-system get cm kubeadm-config -o yamlPR for doc should go in the kubernetes/website repo
No problems. On the 'to be joined' control-plane node:
# kubectl -n kube-system get cm kubeadm-config -o yaml
apiVersion: v1
data:
ClusterConfiguration: |
apiServer:
certSANs:
- 10.50.0.11
extraArgs:
authorization-mode: Node,RBAC
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta1
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: ""
controllerManager: {}
dns:
type: CoreDNS
etcd:
external:
caFile: /etc/kubernetes/pki/etcd/ca.crt
certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
endpoints:
- https://10.50.5.50:2379
- https://10.50.5.51:2379
- https://10.50.5.52:2379
keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.13.1
networking:
dnsDomain: cluster.local
podSubnet: ""
serviceSubnet: 10.96.0.0/12
scheduler: {}
ClusterStatus: |
apiEndpoints:
hypervisor1:
advertiseAddress: 10.50.0.50
bindPort: 6443
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterStatus
kind: ConfigMap
metadata:
creationTimestamp: "2018-12-10T03:47:15Z"
name: kubeadm-config
namespace: kube-system
resourceVersion: "1913271"
selfLink: /api/v1/namespaces/kube-system/configmaps/kubeadm-config
uid: 48cb071d-fc2e-11e8-a921-94c6911692ac
contents of /etc/kubernetes/ on same host, before joining:
# tree /etc/kubernetes/
/etc/kubernetes/
├── admin.conf
├── manifests
└── pki
├── apiserver-etcd-client.crt
├── apiserver-etcd-client.key
└── etcd
└── ca.crt
3 directories, 4 files
controlPlaneEndpoint is still empty. Did you re-init your cluster with the new config?
+
Some mandatory certs are missing (see https://kubernetes.io/docs/setup/independent/high-availability/#steps-for-the-rest-of-the-control-plane-nodes). You need all the three ca., sa. + the apiserver-etcd-client.* Because you are in external etcd mode
MOST interesting. No, I did not re-init the cluster, I just edited the kubeadm-config.yaml file and re-ran the "join" command on the to-be-joined node, thinking it would take the config from the YAML file. It sounds like there's a persistent ConfigMap on the 'cluster' (really, just the initial node brought up as the master at this point) that the to-be-joined kubelet also takes the config from? If that's the case, why have redundant data in the kubeadm-config.yaml file?
Anyway, I'll burn down and rebuild this cluster right now, with the 'right' kubeadm-config.yaml, and see what happens. I'll also take care to get ALL the certs, per that step you listed. FWIW, that's one of the 'could be clarified better' steps I was referencing above. I'm following the "external etcd nodes" path, and that specific "steps for the rest of the control plane" section listing all the certs that need to be ferried around to all control-plane nodes is under the 'stacked etcd' path. I'll get this bit sorted and see if there isn't a better way to document all this though, and submit a PR.
Thanks again for all your help and patience! Will have an update shortly!
If that's the case, why have redundant data in the kubeadm-config.yaml file?
Long story. Kubeadm Init and kubeadm join use different configuration objects as documented here, but the user like to have a unique file, so the two commands simply ignore the other objects
Alright, I think I'm all set with this issue. The big time-sink was figuring out A. the etcd cluster is unaffected by 'kubeadm reset' and thus needs to be wiped manually, and B. how to actually wipe the etcd cluster data (ideally without having to rebuild the whole etcd cluster from scratch). I'll record the steps I went through here for posterity, and hopefully save some time/agony for the next poor soul who slams head-first into this scenario:
The current docs (https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-reset/
) say etcdctl del "" --prefix is the right command, but that throws all sorts of auth errors because the myriad certs, etc, aren't referenced. Filling those in, etcdctl --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --ca-file /etc/kubernetes/pki/etcd/ca.key --endpoints https://10.50.5.50:2379 del "" --prefix will throw a No help topic for 'del' error. The real trick is to recognize that all the other k8s docs (that I found at least) seem to be referencing the v2 etcdctl API, and you need to force etcdctl to use the v3 API. Quite a bit of mucking around later, this is the magic incantation to wipe the etcd cluster (run on the first etcd host directly):
ETCDCTL_API=3 etcdctl --cert="/etc/kubernetes/pki/etcd/peer.crt" --key="/etc/kubernetes/pki/etcd/peer.key" --insecure-transport=true --insecure-skip-tls-verify=true --endpoints=https://10.50.5.50:2379 del "" --prefix
--insecure-transport and --insecure-skip-tls-verify are needed because the --cacert option is looking for a CACert _bundle_, and no amount of cat ca.crt ca.key > ca.bundle or cat ca.key ca.crt > ca.bundle would give it a file it wanted to play nice with. The etcd cluster docs only detail how to print out the individual cert/key-files, with no mention on how to make a 'CA Bundle' that etcdctl (at API version 3) will play nice with.
Very long story short, wiping etcd, running kubeadm reset on all the hosts, feeding kubeadm init a properly-formatted kubeadm-config.yaml file on the initial master node per your suggestion, and running the kubeadm join <stuff> --experimental-control-plane on the remaining master control-plane nodes results in:
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
hypervisor1 Ready master 39m v1.13.1
hypervisor2 Ready master 17m v1.13.1
hypervisor3 Ready master 16m v1.13.1
which is exactly what I was hoping for :-) I'll be submitting some PRs for the docs shortly. Thanks so much for all your help!
/close
@fabriziopandini: Closing this issue.
In response to this:
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
I have the same problem and did exactly whay you did @law @fabianofranz , but finally I've ended up with
failure loading key for service account: couldn't load the private key file /etc/kubernetes/pki/sa.key: open /etc/kubernetes/pki/sa.key: no such file or directory
So kubeadm looks for TLS keys in directory even though I've provided ca key and cert. What I did wrong?
I am wondering that after updating kubeadm-config with controlPlaneEndpoint kubeadm starts to expect to have all certs and keys in /etc/kubernetes/pki folder
I can't help but be curious - did you intend the path to be /etc/kubernetes/pki/sa.key, or perhaps /etc/kubernetes/pki/ca.key ?
@law Sorry, that was my fault. I didn't copy CA certs and keys properly.
Just one more question, official docs on HA kubeadm clusters say that it needs loadbalancer. I see that you avoid that setting first master node ip as controlPlaneEndpoint. Is this legitimate or a hack that should be avoided?
@stgleb all the nodes will be configured to communicate with the controlPlaneEndpoint, so if your first master goes away, your cluster will be stuck
@fabianofranz can I set this param when do kubeadm init somehow?
@stgleb yes, using the kubeadm config file. see https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta1
Most helpful comment
Alright, I think I'm all set with this issue. The big time-sink was figuring out A. the etcd cluster is unaffected by 'kubeadm reset' and thus needs to be wiped manually, and B. how to actually wipe the etcd cluster data (ideally without having to rebuild the whole etcd cluster from scratch). I'll record the steps I went through here for posterity, and hopefully save some time/agony for the next poor soul who slams head-first into this scenario:
The current docs (https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-reset/
) say
etcdctl del "" --prefixis the right command, but that throws all sorts of auth errors because the myriad certs, etc, aren't referenced. Filling those in,etcdctl --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --ca-file /etc/kubernetes/pki/etcd/ca.key --endpoints https://10.50.5.50:2379 del "" --prefixwill throw aNo help topic for 'del'error. The real trick is to recognize that all the other k8s docs (that I found at least) seem to be referencing the v2 etcdctl API, and you need to force etcdctl to use the v3 API. Quite a bit of mucking around later, this is the magic incantation to wipe the etcd cluster (run on the first etcd host directly):ETCDCTL_API=3 etcdctl --cert="/etc/kubernetes/pki/etcd/peer.crt" --key="/etc/kubernetes/pki/etcd/peer.key" --insecure-transport=true --insecure-skip-tls-verify=true --endpoints=https://10.50.5.50:2379 del "" --prefix--insecure-transport and --insecure-skip-tls-verify are needed because the --cacert option is looking for a CACert _bundle_, and no amount of
cat ca.crt ca.key > ca.bundleorcat ca.key ca.crt > ca.bundlewould give it a file it wanted to play nice with. The etcd cluster docs only detail how to print out the individual cert/key-files, with no mention on how to make a 'CA Bundle' that etcdctl (at API version 3) will play nice with.Very long story short, wiping etcd, running
kubeadm reseton all the hosts, feedingkubeadm inita properly-formatted kubeadm-config.yaml file on the initial master node per your suggestion, and running thekubeadm join <stuff> --experimental-control-planeon the remaining master control-plane nodes results in:which is exactly what I was hoping for :-) I'll be submitting some PRs for the docs shortly. Thanks so much for all your help!