kubeadm 1.13.1: kubeadm-join fails with --experimental-control-plane

Created on 26 Dec 2018  ·  18Comments  ·  Source: kubernetes/kubeadm

What keywords did you search in kubeadm issues before filing this one?

'experimental-control-plane', 'control-plane node', 'ha master'

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version):
kubeadm version: &version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:36:44Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:39:04Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
    Local VMs running Ubuntu 18.04 LTS

  • OS (e.g. from /etc/os-release):
    NAME="Ubuntu"
    VERSION="18.04.1 LTS (Bionic Beaver)"
    ID=ubuntu
    ID_LIKE=debian
    PRETTY_NAME="Ubuntu 18.04.1 LTS"
    VERSION_ID="18.04"

  • Kernel (e.g. uname -a):
    Linux hypervisor1 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools:
    kubeadm

  • Others:
    Docker version 18.09.0, build 4d60db4

What happened?

Attempted to follow the HA cluster setup guide at https://kubernetes.io/docs/setup/independent/high-availability/#first-steps-for-both-methods ('external etcd nodes' path), unable to bring up secondary/tertiary control-plane nodes with kubeadm join <first_master_ip>:6443 --token <token> --discovery-token-ca-cert-hash <hash> --experimental-control-plane. Resulting error (adding -v10 to flags) is:

I1225 23:10:38.732523    7862 join.go:299] [join] found NodeName empty; using OS hostname as NodeName
I1225 23:10:38.732570    7862 join.go:303] [join] found advertiseAddress empty; using default interface's IP address as advertiseAddress
I1225 23:10:38.732797    7862 interface.go:384] Looking for default routes with IPv4 addresses
I1225 23:10:38.732807    7862 interface.go:389] Default route transits interface "enp0s25"
I1225 23:10:38.733329    7862 interface.go:196] Interface enp0s25 is up
I1225 23:10:38.733433    7862 interface.go:244] Interface "enp0s25" has 2 addresses :[10.50.0.52/24 fe80::96c6:91ff:fe16:9061/64].
I1225 23:10:38.733462    7862 interface.go:211] Checking addr  10.50.0.52/24.
I1225 23:10:38.733482    7862 interface.go:218] IP found 10.50.0.52
I1225 23:10:38.733497    7862 interface.go:250] Found valid IPv4 address 10.50.0.52 for interface "enp0s25".
I1225 23:10:38.733518    7862 interface.go:395] Found active IP 10.50.0.52 
[preflight] Running pre-flight checks
I1225 23:10:38.733591    7862 join.go:328] [preflight] Running general checks
I1225 23:10:38.733658    7862 checks.go:245] validating the existence and emptiness of directory /etc/kubernetes/manifests
I1225 23:10:38.733723    7862 checks.go:283] validating the existence of file /etc/kubernetes/kubelet.conf
I1225 23:10:38.733745    7862 checks.go:283] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf
I1225 23:10:38.733764    7862 checks.go:104] validating the container runtime
I1225 23:10:38.805043    7862 checks.go:130] validating if the service is enabled and active
I1225 23:10:38.820095    7862 checks.go:332] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
I1225 23:10:38.820184    7862 checks.go:332] validating the contents of file /proc/sys/net/ipv4/ip_forward
I1225 23:10:38.820242    7862 checks.go:644] validating whether swap is enabled or not
I1225 23:10:38.820290    7862 checks.go:373] validating the presence of executable ip
I1225 23:10:38.820340    7862 checks.go:373] validating the presence of executable iptables
I1225 23:10:38.820373    7862 checks.go:373] validating the presence of executable mount
I1225 23:10:38.820410    7862 checks.go:373] validating the presence of executable nsenter
I1225 23:10:38.820440    7862 checks.go:373] validating the presence of executable ebtables
I1225 23:10:38.820473    7862 checks.go:373] validating the presence of executable ethtool
I1225 23:10:38.820505    7862 checks.go:373] validating the presence of executable socat
I1225 23:10:38.820534    7862 checks.go:373] validating the presence of executable tc
I1225 23:10:38.820566    7862 checks.go:373] validating the presence of executable touch
I1225 23:10:38.820597    7862 checks.go:515] running all checks
    [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.0. Latest validated version: 18.06
I1225 23:10:38.850254    7862 checks.go:403] checking whether the given node name is reachable using net.LookupHost
I1225 23:10:38.851192    7862 checks.go:613] validating kubelet version
I1225 23:10:38.900691    7862 checks.go:130] validating if the service is enabled and active
I1225 23:10:38.911066    7862 checks.go:208] validating availability of port 10250
I1225 23:10:38.911176    7862 checks.go:430] validating if the connectivity type is via proxy or direct
I1225 23:10:38.911208    7862 join.go:334] [preflight] Fetching init configuration
I1225 23:10:38.911219    7862 join.go:601] [join] Discovering cluster-info
[discovery] Trying to connect to API Server "10.50.0.11:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.50.0.11:6443"
I1225 23:10:38.911828    7862 round_trippers.go:419] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.13.1 (linux/amd64) kubernetes/eec55b9" 'https://10.50.0.11:6443/api/v1/namespaces/kube-public/configmaps/cluster-info'
I1225 23:10:38.929570    7862 round_trippers.go:438] GET https://10.50.0.11:6443/api/v1/namespaces/kube-public/configmaps/cluster-info 200 OK in 17 milliseconds
I1225 23:10:38.929622    7862 round_trippers.go:444] Response Headers:
I1225 23:10:38.929633    7862 round_trippers.go:447]     Date: Wed, 26 Dec 2018 07:10:38 GMT
I1225 23:10:38.929642    7862 round_trippers.go:447]     Content-Type: application/json
I1225 23:10:38.929664    7862 round_trippers.go:447]     Content-Length: 2104
I1225 23:10:38.929784    7862 request.go:942] Response Body: {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"cluster-info","namespace":"kube-public","selfLink":"/api/v1/namespaces/kube-public/configmaps/cluster-info","uid":"497784ba-fc2e-11e8-a921-94c6911692ac","resourceVersion":"1941746","creationTimestamp":"2018-12-10T03:47:16Z"},"data":{"jws-kubeconfig-203t0f":"<snip>","kubeconfig":"apiVersion: v1\nclusters:\n- cluster:\n    certificate-authority-data: <snip>"\n    server: https://10.50.0.50:6443\n  name: \"\"\ncontexts: []\ncurrent-context: \"\"\nkind: Config\npreferences: {}\nusers: []\n"}}
[discovery] Requesting info from "https://10.50.0.11:6443" again to validate TLS against the pinned public key
I1225 23:10:38.934556    7862 round_trippers.go:419] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.13.1 (linux/amd64) kubernetes/eec55b9" 'https://10.50.0.11:6443/api/v1/namespaces/kube-public/configmaps/cluster-info'
I1225 23:10:38.949233    7862 round_trippers.go:438] GET https://10.50.0.11:6443/api/v1/namespaces/kube-public/configmaps/cluster-info 200 OK in 14 milliseconds
I1225 23:10:38.949263    7862 round_trippers.go:444] Response Headers:
I1225 23:10:38.949280    7862 round_trippers.go:447]     Content-Type: application/json
I1225 23:10:38.949707    7862 round_trippers.go:447]     Content-Length: 2104
I1225 23:10:38.949761    7862 round_trippers.go:447]     Date: Wed, 26 Dec 2018 07:10:38 GMT
I1225 23:10:38.949927    7862 request.go:942] Response Body: {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"cluster-info","namespace":"kube-public","selfLink":"/api/v1/namespaces/kube-public/configmaps/cluster-info","uid":"497784ba-fc2e-11e8-a921-94c6911692ac","resourceVersion":"1941746","creationTimestamp":"2018-12-10T03:47:16Z"},"data":{"jws-kubeconfig-203t0f":"<snip>"","jws-kubeconfig-bs09n6":"<snip>","kubeconfig":"apiVersion: v1\nclusters:\n- cluster:\n    certificate-authority-data: <snip>=\n    server: https://10.50.0.50:6443\n  name: \"\"\ncontexts: []\ncurrent-context: \"\"\nkind: Config\npreferences: {}\nusers: []\n"}}
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.50.0.11:6443"
[discovery] Successfully established connection with API Server "10.50.0.11:6443"
I1225 23:10:38.951585    7862 join.go:608] [join] Retrieving KubeConfig objects
[join] Reading configuration from the cluster...
[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
I1225 23:10:38.952957    7862 round_trippers.go:419] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.13.1 (linux/amd64) kubernetes/eec55b9" -H "Authorization: Bearer <snip>" 'https://10.50.0.50:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config'
I1225 23:10:38.966415    7862 round_trippers.go:438] GET https://10.50.0.50:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config 200 OK in 13 milliseconds
I1225 23:10:38.966453    7862 round_trippers.go:444] Response Headers:
I1225 23:10:38.966482    7862 round_trippers.go:447]     Content-Type: application/json
I1225 23:10:38.966509    7862 round_trippers.go:447]     Content-Length: 1265
I1225 23:10:38.966528    7862 round_trippers.go:447]     Date: Wed, 26 Dec 2018 07:10:38 GMT
I1225 23:10:38.966613    7862 request.go:942] Response Body: {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"kubeadm-config","namespace":"kube-system","selfLink":"/api/v1/namespaces/kube-system/configmaps/kubeadm-config","uid":"48cb071d-fc2e-11e8-a921-94c6911692ac","resourceVersion":"1913271","creationTimestamp":"2018-12-10T03:47:15Z"},"data":{"ClusterConfiguration":"apiServer:\n  certSANs:\n  - 10.50.0.11\n  extraArgs:\n    authorization-mode: Node,RBAC\n  timeoutForControlPlane: 4m0s\napiVersion: kubeadm.k8s.io/v1beta1\ncertificatesDir: /etc/kubernetes/pki\nclusterName: kubernetes\ncontrolPlaneEndpoint: \"\"\ncontrollerManager: {}\ndns:\n  type: CoreDNS\netcd:\n  external:\n    caFile: /etc/kubernetes/pki/etcd/ca.crt\n    certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt\n    endpoints:\n    - https://10.50.5.50:2379\n    - https://10.50.5.51:2379\n    - https://10.50.5.52:2379\n    keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key\nimageRepository: k8s.gcr.io\nkind: ClusterConfiguration\nkubernetesVersion: v1.13.1\nnetworking:\n  dnsDomain: cluster.local\n  podSubnet: \"\"\n  serviceSubnet: 10.96.0.0/12\nscheduler: {}\n","ClusterStatus":"apiEndpoints:\n  hypervisor1:\n    advertiseAddress: 10.50.0.50\n    bindPort: 6443\napiVersion: kubeadm.k8s.io/v1beta1\nkind: ClusterStatus\n"}}
I1225 23:10:38.968352    7862 round_trippers.go:419] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.13.1 (linux/amd64) kubernetes/eec55b9" -H "Authorization: Bearer <snip>" 'https://10.50.0.50:6443/api/v1/namespaces/kube-system/configmaps/kube-proxy'
I1225 23:10:38.973288    7862 round_trippers.go:438] GET https://10.50.0.50:6443/api/v1/namespaces/kube-system/configmaps/kube-proxy 200 OK in 4 milliseconds
I1225 23:10:38.973321    7862 round_trippers.go:444] Response Headers:
I1225 23:10:38.973361    7862 round_trippers.go:447]     Content-Type: application/json
I1225 23:10:38.973383    7862 round_trippers.go:447]     Content-Length: 1643
I1225 23:10:38.973400    7862 round_trippers.go:447]     Date: Wed, 26 Dec 2018 07:10:38 GMT
I1225 23:10:38.973464    7862 request.go:942] Response Body: {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"kube-proxy","namespace":"kube-system","selfLink":"/api/v1/namespaces/kube-system/configmaps/kube-proxy","uid":"498a3257-fc2e-11e8-a921-94c6911692ac","resourceVersion":"240","creationTimestamp":"2018-12-10T03:47:16Z","labels":{"app":"kube-proxy"}},"data":{"config.conf":"apiVersion: kubeproxy.config.k8s.io/v1alpha1\nbindAddress: 0.0.0.0\nclientConnection:\n  acceptContentTypes: \"\"\n  burst: 10\n  contentType: application/vnd.kubernetes.protobuf\n  kubeconfig: /var/lib/kube-proxy/kubeconfig.conf\n  qps: 5\nclusterCIDR: \"\"\nconfigSyncPeriod: 15m0s\nconntrack:\n  max: null\n  maxPerCore: 32768\n  min: 131072\n  tcpCloseWaitTimeout: 1h0m0s\n  tcpEstablishedTimeout: 24h0m0s\nenableProfiling: false\nhealthzBindAddress: 0.0.0.0:10256\nhostnameOverride: \"\"\niptables:\n  masqueradeAll: false\n  masqueradeBit: 14\n  minSyncPeriod: 0s\n  syncPeriod: 30s\nipvs:\n  excludeCIDRs: null\n  minSyncPeriod: 0s\n  scheduler: \"\"\n  syncPeriod: 30s\nkind: KubeProxyConfiguration\nmetricsBindAddress: 127.0.0.1:10249\nmode: \"\"\nnodePortAddresses: null\noomScoreAdj: -999\nportRange: \"\"\nresourceContainer: /kube-proxy\nudpIdleTimeout: 250ms","kubeconfig.conf":"apiVersion: v1\nkind: Config\nclusters:\n- cluster:\n    certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt\n    server: https://10.50.0.50:6443\n  name: default\ncontexts:\n- context:\n    cluster: default\n    namespace: default\n    user: default\n  name: default\ncurrent-context: default\nusers:\n- name: default\n  user:\n    tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token"}}
I1225 23:10:38.975437    7862 round_trippers.go:419] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.13.1 (linux/amd64) kubernetes/eec55b9" -H "Authorization: Bearer 203t0f.naqlqn6j8a4j86w3" 'https://10.50.0.50:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config-1.13'
I1225 23:10:38.981111    7862 round_trippers.go:438] GET https://10.50.0.50:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config-1.13 200 OK in 5 milliseconds
I1225 23:10:38.981145    7862 round_trippers.go:444] Response Headers:
I1225 23:10:38.981186    7862 round_trippers.go:447]     Content-Type: application/json
I1225 23:10:38.981202    7862 round_trippers.go:447]     Content-Length: 2133
I1225 23:10:38.981213    7862 round_trippers.go:447]     Date: Wed, 26 Dec 2018 07:10:38 GMT
I1225 23:10:38.981276    7862 request.go:942] Response Body: {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"kubelet-config-1.13","namespace":"kube-system","selfLink":"/api/v1/namespaces/kube-system/configmaps/kubelet-config-1.13","uid":"48ce2337-fc2e-11e8-a921-94c6911692ac","resourceVersion":"183","creationTimestamp":"2018-12-10T03:47:15Z"},"data":{"kubelet":"address: 0.0.0.0\napiVersion: kubelet.config.k8s.io/v1beta1\nauthentication:\n  anonymous:\n    enabled: false\n  webhook:\n    cacheTTL: 2m0s\n    enabled: true\n  x509:\n    clientCAFile: /etc/kubernetes/pki/ca.crt\nauthorization:\n  mode: Webhook\n  webhook:\n    cacheAuthorizedTTL: 5m0s\n    cacheUnauthorizedTTL: 30s\ncgroupDriver: cgroupfs\ncgroupsPerQOS: true\nclusterDNS:\n- 10.96.0.10\nclusterDomain: cluster.local\nconfigMapAndSecretChangeDetectionStrategy: Watch\ncontainerLogMaxFiles: 5\ncontainerLogMaxSize: 10Mi\ncontentType: application/vnd.kubernetes.protobuf\ncpuCFSQuota: true\ncpuCFSQuotaPeriod: 100ms\ncpuManagerPolicy: none\ncpuManagerReconcilePeriod: 10s\nenableControllerAttachDetach: true\nenableDebuggingHandlers: true\nenforceNodeAllocatable:\n- pods\neventBurst: 10\neventRecordQPS: 5\nevictionHard:\n  imagefs.available: 15%\n  memory.available: 100Mi\n  nodefs.available: 10%\n  nodefs.inodesFree: 5%\nevictionPressureTransitionPeriod: 5m0s\nfailSwapOn: true\nfileCheckFrequency: 20s\nhairpinMode: promiscuous-bridge\nhealthzBindAddress: 127.0.0.1\nhealthzPort: 10248\nhttpCheckFrequency: 20s\nimageGCHighThresholdPercent: 85\nimageGCLowThresholdPercent: 80\nimageMinimumGCAge: 2m0s\niptablesDropBit: 15\niptablesMasqueradeBit: 14\nkind: KubeletConfiguration\nkubeAPIBurst: 10\nkubeAPIQPS: 5\nmakeIPTablesUtilChains: true\nmaxOpenFiles: 1000000\nmaxPods: 110\nnodeLeaseDurationSeconds: 40\nnodeStatusReportFrequency: 1m0s\nnodeStatusUpdateFrequency: 10s\noomScoreAdj: -999\npodPidsLimit: -1\nport: 10250\nregistryBurst: 10\nregistryPullQPS: 5\nresolvConf: /etc/resolv.conf\nrotateCertificates: true\nruntimeRequestTimeout: 2m0s\nserializeImagePulls: true\nstaticPodPath: /etc/kubernetes/manifests\nstreamingConnectionIdleTimeout: 4h0m0s\nsyncFrequency: 1m0s\nvolumeStatsAggPeriod: 1m0s\n"}}
I1225 23:10:38.984349    7862 interface.go:384] Looking for default routes with IPv4 addresses
I1225 23:10:38.984380    7862 interface.go:389] Default route transits interface "enp0s25"
I1225 23:10:38.984752    7862 interface.go:196] Interface enp0s25 is up
I1225 23:10:38.984873    7862 interface.go:244] Interface "enp0s25" has 2 addresses :[10.50.0.52/24 fe80::96c6:91ff:fe16:9061/64].
I1225 23:10:38.984904    7862 interface.go:211] Checking addr  10.50.0.52/24.
I1225 23:10:38.984928    7862 interface.go:218] IP found 10.50.0.52
I1225 23:10:38.984948    7862 interface.go:250] Found valid IPv4 address 10.50.0.52 for interface "enp0s25".
I1225 23:10:38.984963    7862 interface.go:395] Found active IP 10.50.0.52 
I1225 23:10:38.985100    7862 join.go:341] [preflight] Running configuration dependant checks

One or more conditions for hosting a new control plane instance is not satisfied.

unable to add a new control plane instance a cluster that doesn't have a stable controlPlaneEndpoint address

Please ensure that:
* The cluster has a stable controlPlaneEndpoint address.
* The certificates that must be shared among control plane instances are provided.

What you expected to happen?

I expected the host to join the cluster in a multi-master/HA fashion

How to reproduce it (as minimally and precisely as possible)?

Follow the instructions at https://kubernetes.io/docs/setup/independent/high-availability/#first-steps-for-both-methods on Ubuntu 18.04 LTS hosts

Anything else we need to know?

I already took great care to make sure I was copying over the certificates correctly. The instructions could be a bit clearer on the specifics (perhaps another bug/enhancement request?), they currently say "Copy certificates between the first control plane node and the other control plane nodes", without being particularly specific on _which_ certificates. I started out copying just the /etc/kubernetes/pki/{apiserver-etcd-client.crt,apiserver-etcd-client.key,etcd/ca.crt} files, in addition to (later) copying the full /etc/kubernetes/pki/* directory across from the original master node. Both gave similar output with the kubeadm join -v10 command.

Removing the --experimental-control-plane flag allows the node to join as a regular worker-node without complaint.

My kubeadm-config.yaml is:

apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: stable
apiServer:
  certSANs:
  - "10.50.0.11"
  controlPlaneEndpoint: "10.50.0.11:6443"
etcd:
    external:
        endpoints:
        - https://10.50.5.50:2379
        - https://10.50.5.51:2379
        - https://10.50.5.52:2379
        caFile: /etc/kubernetes/pki/etcd/ca.crt
        certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
        keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key
kindocumentation kinsupport

Most helpful comment

Alright, I think I'm all set with this issue. The big time-sink was figuring out A. the etcd cluster is unaffected by 'kubeadm reset' and thus needs to be wiped manually, and B. how to actually wipe the etcd cluster data (ideally without having to rebuild the whole etcd cluster from scratch). I'll record the steps I went through here for posterity, and hopefully save some time/agony for the next poor soul who slams head-first into this scenario:

The current docs (https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-reset/
) say etcdctl del "" --prefix is the right command, but that throws all sorts of auth errors because the myriad certs, etc, aren't referenced. Filling those in, etcdctl --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --ca-file /etc/kubernetes/pki/etcd/ca.key --endpoints https://10.50.5.50:2379 del "" --prefix will throw a No help topic for 'del' error. The real trick is to recognize that all the other k8s docs (that I found at least) seem to be referencing the v2 etcdctl API, and you need to force etcdctl to use the v3 API. Quite a bit of mucking around later, this is the magic incantation to wipe the etcd cluster (run on the first etcd host directly):

ETCDCTL_API=3 etcdctl --cert="/etc/kubernetes/pki/etcd/peer.crt" --key="/etc/kubernetes/pki/etcd/peer.key" --insecure-transport=true --insecure-skip-tls-verify=true --endpoints=https://10.50.5.50:2379 del "" --prefix

--insecure-transport and --insecure-skip-tls-verify are needed because the --cacert option is looking for a CACert _bundle_, and no amount of cat ca.crt ca.key > ca.bundle or cat ca.key ca.crt > ca.bundle would give it a file it wanted to play nice with. The etcd cluster docs only detail how to print out the individual cert/key-files, with no mention on how to make a 'CA Bundle' that etcdctl (at API version 3) will play nice with.

Very long story short, wiping etcd, running kubeadm reset on all the hosts, feeding kubeadm init a properly-formatted kubeadm-config.yaml file on the initial master node per your suggestion, and running the kubeadm join <stuff> --experimental-control-plane on the remaining master control-plane nodes results in:

# kubectl get nodes
NAME          STATUS   ROLES    AGE   VERSION
hypervisor1   Ready    master   39m   v1.13.1
hypervisor2   Ready    master   17m   v1.13.1
hypervisor3   Ready    master   16m   v1.13.1

which is exactly what I was hoping for :-) I'll be submitting some PRs for the docs shortly. Thanks so much for all your help!

All 18 comments

@law
It seems that your config file has a small error.
controlPlaneEndpoint is a field of ClusterConfiguration, not of apiServer; the right yaml should be:

apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: stable
apiServer:
  certSANs:
  - "10.50.0.11"
controlPlaneEndpoint: "10.50.0.11:6443"
etcd:
    external:
        endpoints:
        - https://10.50.5.50:2379
        - https://10.50.5.51:2379
        - https://10.50.5.52:2379
        caFile: /etc/kubernetes/pki/etcd/ca.crt
        certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
        keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key

PS. There is a reason for copying only some certificates, so kubeadm will take care of generating the others (otherwise you might incur in other issues if certificates are not properly set for the joining node). Suggestion (or PR) for improving the doc are always welcome!

~Much profanity. That worked like a champ @fabianofranz, thank you so much. I have no idea how that indentation got in there, but the docs certainly have the correct setup. This took me many days to figure out, thanks for getting me over the hump~
edit: my apologies, I spoke too soon. I forgot to put the '--experimental-control-plane' flag with my kubeadm-join command. With the updated kubeadm-config.yaml AND the '--experimental-control-plane' flag, I am still unable to get the node to join the cluster. Same error message, regrettably.

I would be happy to submit a couple of clarifying PRs to the docs, would you happen to have the URL for their repo handy?

@law sorry to hear you have still problems
If you could share following info might be I can help

  • the output of kubectl -n kube-system get cm kubeadm-config -o yaml
  • the content of /etc/kubernetes dir before joining

PR for doc should go in the kubernetes/website repo

No problems. On the 'to be joined' control-plane node:

# kubectl -n kube-system get cm kubeadm-config -o yaml
apiVersion: v1
data:
  ClusterConfiguration: |
    apiServer:
      certSANs:
      - 10.50.0.11
      extraArgs:
        authorization-mode: Node,RBAC
      timeoutForControlPlane: 4m0s
    apiVersion: kubeadm.k8s.io/v1beta1
    certificatesDir: /etc/kubernetes/pki
    clusterName: kubernetes
    controlPlaneEndpoint: ""
    controllerManager: {}
    dns:
      type: CoreDNS
    etcd:
      external:
        caFile: /etc/kubernetes/pki/etcd/ca.crt
        certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
        endpoints:
        - https://10.50.5.50:2379
        - https://10.50.5.51:2379
        - https://10.50.5.52:2379
        keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key
    imageRepository: k8s.gcr.io
    kind: ClusterConfiguration
    kubernetesVersion: v1.13.1
    networking:
      dnsDomain: cluster.local
      podSubnet: ""
      serviceSubnet: 10.96.0.0/12
    scheduler: {}
  ClusterStatus: |
    apiEndpoints:
      hypervisor1:
        advertiseAddress: 10.50.0.50
        bindPort: 6443
    apiVersion: kubeadm.k8s.io/v1beta1
    kind: ClusterStatus
kind: ConfigMap
metadata:
  creationTimestamp: "2018-12-10T03:47:15Z"
  name: kubeadm-config
  namespace: kube-system
  resourceVersion: "1913271"
  selfLink: /api/v1/namespaces/kube-system/configmaps/kubeadm-config
  uid: 48cb071d-fc2e-11e8-a921-94c6911692ac

contents of /etc/kubernetes/ on same host, before joining:

# tree /etc/kubernetes/
/etc/kubernetes/
├── admin.conf
├── manifests
└── pki
    ├── apiserver-etcd-client.crt
    ├── apiserver-etcd-client.key
    └── etcd
        └── ca.crt

3 directories, 4 files

controlPlaneEndpoint is still empty. Did you re-init your cluster with the new config?
+
Some mandatory certs are missing (see https://kubernetes.io/docs/setup/independent/high-availability/#steps-for-the-rest-of-the-control-plane-nodes). You need all the three ca., sa. + the apiserver-etcd-client.* Because you are in external etcd mode

MOST interesting. No, I did not re-init the cluster, I just edited the kubeadm-config.yaml file and re-ran the "join" command on the to-be-joined node, thinking it would take the config from the YAML file. It sounds like there's a persistent ConfigMap on the 'cluster' (really, just the initial node brought up as the master at this point) that the to-be-joined kubelet also takes the config from? If that's the case, why have redundant data in the kubeadm-config.yaml file?

Anyway, I'll burn down and rebuild this cluster right now, with the 'right' kubeadm-config.yaml, and see what happens. I'll also take care to get ALL the certs, per that step you listed. FWIW, that's one of the 'could be clarified better' steps I was referencing above. I'm following the "external etcd nodes" path, and that specific "steps for the rest of the control plane" section listing all the certs that need to be ferried around to all control-plane nodes is under the 'stacked etcd' path. I'll get this bit sorted and see if there isn't a better way to document all this though, and submit a PR.

Thanks again for all your help and patience! Will have an update shortly!

If that's the case, why have redundant data in the kubeadm-config.yaml file?

Long story. Kubeadm Init and kubeadm join use different configuration objects as documented here, but the user like to have a unique file, so the two commands simply ignore the other objects

Alright, I think I'm all set with this issue. The big time-sink was figuring out A. the etcd cluster is unaffected by 'kubeadm reset' and thus needs to be wiped manually, and B. how to actually wipe the etcd cluster data (ideally without having to rebuild the whole etcd cluster from scratch). I'll record the steps I went through here for posterity, and hopefully save some time/agony for the next poor soul who slams head-first into this scenario:

The current docs (https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-reset/
) say etcdctl del "" --prefix is the right command, but that throws all sorts of auth errors because the myriad certs, etc, aren't referenced. Filling those in, etcdctl --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --ca-file /etc/kubernetes/pki/etcd/ca.key --endpoints https://10.50.5.50:2379 del "" --prefix will throw a No help topic for 'del' error. The real trick is to recognize that all the other k8s docs (that I found at least) seem to be referencing the v2 etcdctl API, and you need to force etcdctl to use the v3 API. Quite a bit of mucking around later, this is the magic incantation to wipe the etcd cluster (run on the first etcd host directly):

ETCDCTL_API=3 etcdctl --cert="/etc/kubernetes/pki/etcd/peer.crt" --key="/etc/kubernetes/pki/etcd/peer.key" --insecure-transport=true --insecure-skip-tls-verify=true --endpoints=https://10.50.5.50:2379 del "" --prefix

--insecure-transport and --insecure-skip-tls-verify are needed because the --cacert option is looking for a CACert _bundle_, and no amount of cat ca.crt ca.key > ca.bundle or cat ca.key ca.crt > ca.bundle would give it a file it wanted to play nice with. The etcd cluster docs only detail how to print out the individual cert/key-files, with no mention on how to make a 'CA Bundle' that etcdctl (at API version 3) will play nice with.

Very long story short, wiping etcd, running kubeadm reset on all the hosts, feeding kubeadm init a properly-formatted kubeadm-config.yaml file on the initial master node per your suggestion, and running the kubeadm join <stuff> --experimental-control-plane on the remaining master control-plane nodes results in:

# kubectl get nodes
NAME          STATUS   ROLES    AGE   VERSION
hypervisor1   Ready    master   39m   v1.13.1
hypervisor2   Ready    master   17m   v1.13.1
hypervisor3   Ready    master   16m   v1.13.1

which is exactly what I was hoping for :-) I'll be submitting some PRs for the docs shortly. Thanks so much for all your help!

/close

@fabriziopandini: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

I have the same problem and did exactly whay you did @law @fabianofranz , but finally I've ended up with

failure loading key for service account: couldn't load the private key file /etc/kubernetes/pki/sa.key: open /etc/kubernetes/pki/sa.key: no such file or directory

So kubeadm looks for TLS keys in directory even though I've provided ca key and cert. What I did wrong?

I am wondering that after updating kubeadm-config with controlPlaneEndpoint kubeadm starts to expect to have all certs and keys in /etc/kubernetes/pki folder

I can't help but be curious - did you intend the path to be /etc/kubernetes/pki/sa.key, or perhaps /etc/kubernetes/pki/ca.key ?

@law Sorry, that was my fault. I didn't copy CA certs and keys properly.

Just one more question, official docs on HA kubeadm clusters say that it needs loadbalancer. I see that you avoid that setting first master node ip as controlPlaneEndpoint. Is this legitimate or a hack that should be avoided?

@stgleb all the nodes will be configured to communicate with the controlPlaneEndpoint, so if your first master goes away, your cluster will be stuck

@fabianofranz can I set this param when do kubeadm init somehow?

@stgleb yes, using the kubeadm config file. see https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta1

Was this page helpful?
0 / 5 - 0 ratings