Kubespray: Kubeadm fails - kubelet fails to find /etc/kubernetes/bootstrap-kubelet.conf

Created on 27 Nov 2018  ·  18Comments  ·  Source: kubernetes-sigs/kubespray

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Environment:

  • Cloud provider or hardware configuration: None, 4 vagrant vms.
  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
Linux 3.10.0-862.14.4.el7.x86_64 x86_64
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
  • Version of Ansible (ansible --version):
ansible 2.7.2
  config file = None
  configured module search path = [u'/Users/user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python2.7/site-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 2.7.15 (default, Aug 17 2018, 22:39:05) [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)]

Kubespray version (commit) (git rev-parse --short HEAD):

02169e8f85fda688e08f7b20e01e86d9d34d45dc

Network plugin used:

Calico

Copy of your inventory file:

[all]
node1 ansible_host=10.10.10.10 ansible_user=vagrant ansible_become=true ansible_become_method=sudo ip=10.10.10.10 etcd_member_name=etcd1
node2 ansible_host=10.10.10.2  ansible_user=vagrant ansible_become=true ansible_become_method=sudo ip=10.10.10.2  etcd_member_name=etcd2
node3 ansible_host=10.10.10.3  ansible_user=vagrant ansible_become=true ansible_become_method=sudo ip=10.10.10.3  etcd_member_name=etcd3
node4 ansible_host=10.10.10.4  ansible_user=vagrant ansible_become=true ansible_become_method=sudo ip=10.10.10.4

[kube-master]
node1
node2
node3

[etcd]
node1
node2
node3

[kube-node]
node1
node2
node3
node4

[k8s-cluster:children]
kube-master
kube-node

[localhost]
127.0.0.1 ansible_connection=local kubeadm_enabled=true skip_non_kubeadm_warning=false ansible_become=false

Kubespray config:

---
bootstrap_os: centos

kernel_upgrade: false

nginx_config_dir: /data/nginx

etcd_data_dir: /data/etcd

cluster_name: "vagrant"
dns_domain: "{{ cluster_name }}.local"

dns_mode: coredns

deploy_netchecker: false

kube_config_dir: /data/kubernetes

kube_api_pwd: "{{ secret_kube_api_pwd }}"

kube_users:
  kube:
    pass: "{{ kube_api_pwd }}"
    role: admin
    groups:
      - system:masters

kube_network_plugin: calico
kubeadm_enabled: true
kube_proxy_mode: ipvs

docker_daemon_graph: "/data/docker"

dashboard_enabled: false

vault_base_dir: /data/vault

kubelet_load_modules: true
kubernetes_audit: true

docker_version: "18.06"
kube_version: v1.12.3
kubeadm_version: "{{ kube_version }}"
etcd_version: v3.2.24
coredns_version: "1.2.6"

kubeconfig_localhost: true

docker_dns_servers_strict: false
docker_storage_options: -s overlay2

kubelet_authentication_token_webhook: true
kubelet_authorization_mode_webhook: true

calico_felix_prometheusmetricsenabled: true
etcd_metrics: extensive
kube_read_only_port: 10255
kube_apiserver_insecure_port: 0
kube_api_anonymous_auth: true

Command used to invoke ansible:

ansible-playbook -i inventories/vagrant playbooks/kubespray_cluster.yml -vv --flush-cache -k --become --become-user=root -K --user=user

Output of ansible run:

TASK [kubernetes/master : kubeadm | Initialize first master] *********************************************************************************************************************************************
task path: /Users/user/workspace/ops/ansible/vendor/kubespray/roles/kubernetes/master/tasks/kubeadm-setup.yml:117
Tuesday 27 November 2018  00:58:30 -0800 (0:00:02.496)       0:19:14.100 ******
skipping: [node2] => changed=false
  skip_reason: Conditional result was False
skipping: [node3] => changed=false
  skip_reason: Conditional result was False
fatal: [node1]: FAILED! => changed=true
  cmd:
  - timeout
  - -k
  - 600s
  - 600s
  - /usr/local/bin/kubeadm
  - init
  - --config=/data/kubernetes/kubeadm-config.v1alpha3.yaml
  - --ignore-preflight-errors=all
  delta: '0:03:06.063868'
  end: '2018-11-27 09:01:37.417495'
  failed_when_result: true
  msg: non-zero return code
  rc: 1
  start: '2018-11-27 08:58:31.353627'
  stderr: |2-
            [WARNING KubeletVersion]: couldn't get kubelet version: executable file not found in $PATH
    couldn't initialize a Kubernetes cluster
  stderr_lines:
  - "\t[WARNING KubeletVersion]: couldn't get kubelet version: executable file not found in $PATH"
  - couldn't initialize a Kubernetes cluster
  stdout: |-
    [init] using Kubernetes version: v1.12.3
    [preflight] running pre-flight checks
    [preflight/images] Pulling images required for setting up a Kubernetes cluster
    [preflight/images] This might take a minute or two, depending on the speed of your internet connection
    [preflight/images] You can also perform this action in beforehand using 'kubeadm config images pull'
    [kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
    [kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
    [preflight] Activating the kubelet service
    [certificates] Generated front-proxy-ca certificate and key.
    [certificates] Generated front-proxy-client certificate and key.
    [certificates] Generated ca certificate and key.
    [certificates] Generated apiserver certificate and key.
    [certificates] apiserver serving cert is signed for DNS names [node1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.dt-vagrant.local kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.dt-vagrant.local localhost node1 node2 node3] and IPs [10.233.0.1 10.10.10.10 10.10.10.10 10.233.0.1 127.0.0.1 10.10.10.10 10.10.10.2 10.10.10.3]
    [certificates] Generated apiserver-kubelet-client certificate and key.
    [certificates] valid certificates and keys now exist in "/data/kubernetes/ssl"
    [certificates] Generated sa key and public key.
    [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
    [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
    [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
    [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
    [controlplane] Adding extra host path mount "audit-policy" to "kube-apiserver"
    [controlplane] Adding extra host path mount "audit-logs" to "kube-apiserver"
    [controlplane] wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
    [controlplane] wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
    [controlplane] wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
    [init] waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests"
    [init] this might take a minute or longer if the control plane images have to be pulled
    [kubelet-check] It seems like the kubelet isn't running or healthy.
    [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
    [kubelet-check] It seems like the kubelet isn't running or healthy.
    [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
    [kubelet-check] It seems like the kubelet isn't running or healthy.
    [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
    [kubelet-check] It seems like the kubelet isn't running or healthy.
    [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
    [kubelet-check] It seems like the kubelet isn't running or healthy.
    [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.

    Unfortunately, an error has occurred:
            timed out waiting for the condition

    This error is likely caused by:
            - The kubelet is not running
            - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

    If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
            - 'systemctl status kubelet'
            - 'journalctl -xeu kubelet'

    Additionally, a control plane component may have crashed or exited when started by the container runtime.
    To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
    Here is one example how you may list all Kubernetes containers running in docker:
            - 'docker ps -a | grep kube | grep -v pause'
            Once you have found the failing container, you can inspect its logs with:
            - 'docker logs CONTAINERID'
  stdout_lines: <omitted>

Anything else do we need to know:

[vagrant@node1 ~]$ sudo systemctl status kubelet -l
● kubelet.service - Kubernetes Kubelet Server
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: exit-code) since Tue 2018-11-27 09:00:16 UTC; 1s ago
     Docs: https://github.com/GoogleCloudPlatform/kubernetes
  Process: 27284 ExecStart=/usr/local/bin/kubelet $KUBE_LOGTOSTDERR $KUBE_LOG_LEVEL $KUBELET_API_SERVER $KUBELET_ADDRESS $KUBELET_PORT $KUBELET_HOSTNAME $KUBE_ALLOW_PRIV $KUBELET_ARGS $DOCKER_SOCKET $KUBELET_NETWORK_PLUGIN $KUBELET_VOLUME_PLUGIN $KUBELET_CLOUDPROVIDER (code=exited, status=255)
  Process: 27283 ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volume-plugins (code=exited, status=0/SUCCESS)
 Main PID: 27284 (code=exited, status=255)

Nov 27 09:00:16 node1 kubelet[27284]: I1127 09:00:16.479833   27284 feature_gate.go:206] feature gates: &{map[]}
Nov 27 09:00:16 node1 kubelet[27284]: I1127 09:00:16.479967   27284 feature_gate.go:206] feature gates: &{map[]}
Nov 27 09:00:16 node1 kubelet[27284]: I1127 09:00:16.565929   27284 mount_linux.go:179] Detected OS with systemd
Nov 27 09:00:16 node1 kubelet[27284]: I1127 09:00:16.566080   27284 server.go:408] Version: v1.12.3
Nov 27 09:00:16 node1 kubelet[27284]: I1127 09:00:16.566178   27284 feature_gate.go:206] feature gates: &{map[]}
Nov 27 09:00:16 node1 kubelet[27284]: I1127 09:00:16.566350   27284 feature_gate.go:206] feature gates: &{map[]}
Nov 27 09:00:16 node1 kubelet[27284]: I1127 09:00:16.566452   27284 plugins.go:99] No cloud provider specified.
Nov 27 09:00:16 node1 kubelet[27284]: I1127 09:00:16.566463   27284 server.go:524] No cloud provider specified: "" from the config file: ""
Nov 27 09:00:16 node1 kubelet[27284]: I1127 09:00:16.566483   27284 bootstrap.go:61] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
Nov 27 09:00:16 node1 kubelet[27284]: F1127 09:00:16.566510   27284 server.go:262] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory

Kubespray is successful if I disable kubeadm... any thoughts?

Most helpful comment

As the document said here The kubelet drop-in file for systemd:

The KubeConfig file to use for the TLS Bootstrap is /etc/kubernetes/bootstrap-kubelet.conf, 
but it is only used if /etc/kubernetes/kubelet.conf does not exist.

/etc/kubernetes/bootstrap-kubelet.conf was only used when /etc/kubernetes/kubelet.conf, so you can fix it:

  • copy a bootstrap-kubelet.conf from other nodes, just make sure it exsits;
  • renew a bootstrap token and replace the old one bootstrap file:

    new_token=$(kubeadm token create)
    sed -i "s/token: .*/token: $new_token/" /etc/kubernetes/bootstrap-kubelet.conf
    

    restart kubelet and it will generate a new kubelet.conf file.

    OR

  • generate kubelet.conf from here renew-kubernetes-pki-after-expired/56334732#56334732

All 18 comments

Same Problem, at step Upgrade first master.
the API Server is unhealthy, dial tcp ip:6443: connect: connection refused

TASK [kubernetes/master : sets kubeadm api version to v1alpha3] ******************************************************************************************************************************************************************************
Wednesday 28 November 2018  03:51:35 +0000 (0:00:00.118)       0:07:09.368 ****
ok: [node1] => {"ansible_facts": {"kubeadmConfig_api_version": "v1alpha3"}, "changed": false}
ok: [node2] => {"ansible_facts": {"kubeadmConfig_api_version": "v1alpha3"}, "changed": false}
ok: [node3] => {"ansible_facts": {"kubeadmConfig_api_version": "v1alpha3"}, "changed": false}

TASK [kubernetes/master : set kubeadm_config_api_fqdn define] ********************************************************************************************************************************************************************************
Wednesday 28 November 2018  03:51:36 +0000 (0:00:00.604)       0:07:09.972 ****

TASK [kubernetes/master : kubeadm | Create kubeadm config] ***********************************************************************************************************************************************************************************
Wednesday 28 November 2018  03:51:36 +0000 (0:00:00.117)       0:07:10.090 ****
changed: [node1] => {"changed": true, "checksum": "beba3df2670ac508cb590fc1ab7a98773271ddd8", "dest": "/etc/kubernetes/kubeadm-config.v1alpha3.yaml", "gid": 0, "group": "root", "md5sum": "3d3ec8be0b1276978b07a097b4eb2773", "mode": "0644", "owner": "root", "size": 2488, "src": "/home/ubuntu/.ansible/tmp/ansible-tmp-1543377097.1-22799602472481/source", "state": "file", "uid": 0}
changed: [node2] => {"changed": true, "checksum": "9e70ec6c4c7ce90f12ef6093afe36fdf0f3a5e1b", "dest": "/etc/kubernetes/kubeadm-config.v1alpha3.yaml", "gid": 0, "group": "root", "md5sum": "2c018642cba08c6550073d8878f0348c", "mode": "0644", "owner": "root", "size": 2486, "src": "/home/ubuntu/.ansible/tmp/ansible-tmp-1543377097.15-125800389592253/source", "state": "file", "uid": 0}
changed: [node3] => {"changed": true, "checksum": "ec973b4b905d41f3eb181eaec6068ac2a85dfb7b", "dest": "/etc/kubernetes/kubeadm-config.v1alpha3.yaml", "gid": 0, "group": "root", "md5sum": "97dee72ceefe26c7242e0c1206572806", "mode": "0644", "owner": "root", "size": 2488, "src": "/home/ubuntu/.ansible/tmp/ansible-tmp-1543377097.22-142786641515500/source", "state": "file", "uid": 0}

TASK [kubernetes/master : kubeadm | Initialize first master] *********************************************************************************************************************************************************************************
Wednesday 28 November 2018  03:51:38 +0000 (0:00:01.867)       0:07:11.958 ****

TASK [kubernetes/master : kubeadm | Upgrade first master] ************************************************************************************************************************************************************************************
Wednesday 28 November 2018  03:51:38 +0000 (0:00:00.113)       0:07:12.071 ****
fatal: [node1]: FAILED! => {"changed": true, "cmd": ["timeout", "-k", "600s", "600s", "/usr/local/bin/kubeadm", "upgrade", "apply", "-y", "v1.12.3", "--config=/etc/kubernetes/kubeadm-config.v1alpha3.yaml", "--ignore-preflight-errors=all", "--allow-experimental-upgrades", "--allow-release-candidate-upgrades", "--etcd-upgrade=false", "--force"], "delta": "0:00:00.035118", "end": "2018-11-28 03:51:38.926230", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2018-11-28 03:51:38.891112", "stderr": "\t[WARNING APIServerHealth]: the API Server is unhealthy; /healthz didn't return \"ok\"\n\t[WARNING MasterNodesReady]: couldn't list masters in cluster: Get https://172.31.9.146:6443/api/v1/nodes?labelSelector=node-role.kubernetes.io%2Fmaster%3D: dial tcp 172.31.9.146:6443: connect: connection refused\n[upgrade/version] FATAL: The --version argument is invalid due to these fatal errors:\n\n\t- Unable to fetch cluster version: Couldn't fetch cluster version from the API Server: Get https://172.31.9.146:6443/version?timeout=32s: dial tcp 172.31.9.146:6443: connect: connection refused\n\nPlease fix the misalignments highlighted above and try upgrading again", "stderr_lines": ["\t[WARNING APIServerHealth]: the API Server is unhealthy; /healthz didn't return \"ok\"", "\t[WARNING MasterNodesReady]: couldn't list masters in cluster: Get https://172.31.9.146:6443/api/v1/nodes?labelSelector=node-role.kubernetes.io%2Fmaster%3D: dial tcp 172.31.9.146:6443: connect: connection refused", "[upgrade/version] FATAL: The --version argument is invalid due to these fatal errors:", "", "\t- Unable to fetch cluster version: Couldn't fetch cluster version from the API Server: Get https://172.31.9.146:6443/version?timeout=32s: dial tcp 172.31.9.146:6443: connect: connection refused", "", "Please fix the misalignments highlighted above and try upgrading again"], "stdout": "[preflight] Running pre-flight checks.\n[upgrade] Making sure the cluster is healthy:\n[upgrade/config] Making sure the configuration is correct:\n[upgrade/config] Reading configuration options from a file: /etc/kubernetes/kubeadm-config.v1alpha3.yaml\n[upgrade/apply] Respecting the --cri-socket flag that is set with higher priority than the config file.\n[upgrade/version] You have chosen to change the cluster version to \"v1.12.3\"", "stdout_lines": ["[preflight] Running pre-flight checks.", "[upgrade] Making sure the cluster is healthy:", "[upgrade/config] Making sure the configuration is correct:", "[upgrade/config] Reading configuration options from a file: /etc/kubernetes/kubeadm-config.v1alpha3.yaml", "[upgrade/apply] Respecting the --cri-socket flag that is set with higher priority than the config file.", "[upgrade/version] You have chosen to change the cluster version to \"v1.12.3\""]}

NO MORE HOSTS LEFT ***************************************************************************************************************************************************************************************************************************
    to retry, use: --limit @/home/ubuntu/kkkkube/kubespray-settings/cluster.retry

PLAY RECAP ***********************************************************************************************************************************************************************************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=0
node1                      : ok=266  changed=19   unreachable=0    failed=1
node2                      : ok=244  changed=18   unreachable=0    failed=0
node3                      : ok=244  changed=18   unreachable=0    failed=0
node4                      : ok=200  changed=11   unreachable=0    failed=0
node5                      : ok=200  changed=11   unreachable=0    failed=0
node6                      : ok=200  changed=11   unreachable=0    failed=0

@gongzili456 can you share the output of sudo systemctl status kubelet -l from the master as well?

I suspect kubelet is failing to startup because it can't find the bootstrap-kubelet.conf, right?

yes, the kubelet is failing to startup. how to resolve it?
@servo1x

kubeadm join:

[i1987@k8s-node01 ~]$ sudo kubeadm join 172.16.18.53:6443 --token 3cxl4o.npf352g4ryvdl89i --discovery-token-ca-cert-hash sha256:88ddf380ab354067b0bb830ad6e76484f79073b3edbe3702ac1537d850f35cd4 --ignore-preflight-errors=all
[preflight] Running pre-flight checks
    [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.0. Latest validated version: 18.06
    [WARNING Hostname]: hostname "k8s-node01" could not be reached
    [WARNING Hostname]: hostname "k8s-node01": lookup k8s-node01 on 100.100.2.136:53: no such host
[discovery] Trying to connect to API Server "172.16.18.53:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://172.16.18.53:6443"
[discovery] Requesting info from "https://172.16.18.53:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "172.16.18.53:6443"
[discovery] Successfully established connection with API Server "172.16.18.53:6443"
[join] Reading configuration from the cluster...
[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
unable to fetch the kubeadm-config ConfigMap: failed to get config map: Unauthorized
[i1987@k8s-node01 ~]$ 
[i1987@k8s-node01 ~]$ kubeadm join 172.16.18.53:6443 --token t4dhp1.6c132knx4hh8oroz --discovery-token-ca-cert-hash sha256:88ddf380ab354067b0bb830ad6e76484f79073b3edbe3702ac1537d850f35cd4
[preflight] Running pre-flight checks
[preflight] Some fatal errors occurred:
    [ERROR IsPrivilegedUser]: user is not running as root
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
[i1987@k8s-node01 ~]$ kubeadm join 172.16.18.53:6443 --token t4dhp1.6c132knx4hh8oroz --discovery-token-ca-cert-hash sha256:88ddf380ab354067b0bb830ad6e76484f79073b3edbe3702ac1537d850f35cd4 --ignore-preflight-errors=all
[preflight] Running pre-flight checks
    [WARNING IsPrivilegedUser]: user is not running as root
    [WARNING CRI]: container runtime is not running: output: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.39/info: dial unix /var/run/docker.sock: connect: permission denied
, error: exit status 1
[preflight] The system verification failed. Printing the output from the verification:
KERNEL_VERSION: 3.10.0-693.2.2.el7.x86_64
CONFIG_NAMESPACES: enabled
CONFIG_NET_NS: enabled
CONFIG_PID_NS: enabled
CONFIG_IPC_NS: enabled
CONFIG_UTS_NS: enabled
CONFIG_CGROUPS: enabled
CONFIG_CGROUP_CPUACCT: enabled
CONFIG_CGROUP_DEVICE: enabled
CONFIG_CGROUP_FREEZER: enabled
CONFIG_CGROUP_SCHED: enabled
CONFIG_CPUSETS: enabled
CONFIG_MEMCG: enabled
CONFIG_INET: enabled
CONFIG_EXT4_FS: enabled (as module)
CONFIG_PROC_FS: enabled
CONFIG_NETFILTER_XT_TARGET_REDIRECT: enabled (as module)
CONFIG_NETFILTER_XT_MATCH_COMMENT: enabled (as module)
CONFIG_OVERLAY_FS: enabled (as module)
CONFIG_AUFS_FS: not set - Required for aufs.
CONFIG_BLK_DEV_DM: enabled (as module)
OS: Linux
CGROUPS_CPU: enabled
CGROUPS_CPUACCT: enabled
CGROUPS_CPUSET: enabled
CGROUPS_DEVICES: enabled
CGROUPS_FREEZER: enabled
CGROUPS_MEMORY: enabled
    [WARNING SystemVerification]: failed to get docker info: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/info: dial unix /var/run/docker.sock: connect: permission denied
    [WARNING Hostname]: hostname "k8s-node01" could not be reached
    [WARNING Hostname]: hostname "k8s-node01": lookup k8s-node01 on 100.100.2.136:53: no such host
[discovery] Trying to connect to API Server "172.16.18.53:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://172.16.18.53:6443"
[discovery] Requesting info from "https://172.16.18.53:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "172.16.18.53:6443"
[discovery] Successfully established connection with API Server "172.16.18.53:6443"
[join] Reading configuration from the cluster...
[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
couldn't save bootstrap-kubelet.conf to disk: open /etc/kubernetes/bootstrap-kubelet.conf: permission denied

kubelet status:

[i1987@k8s-node01 ~]$ sudo systemctl status kubelet -l
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: inactive (dead) (Result: exit-code) since Wed 2018-12-05 18:45:28 CST; 42min ago
     Docs: https://kubernetes.io/docs/
  Process: 24839 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=255)
 Main PID: 24839 (code=exited, status=255)

Dec 05 18:45:24 k8s-node01 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Dec 05 18:45:24 k8s-node01 systemd[1]: Unit kubelet.service entered failed state.
Dec 05 18:45:24 k8s-node01 systemd[1]: kubelet.service failed.
Dec 05 18:45:28 k8s-node01 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.

I resolved it.
It is because k8s release the v1.13.0 version yesterday.
my k8s-master version is v1.12.3,
and today I add a node(it download the newest k8s version: v1.13.0) to my k8s cluster.
I Just update the k8s-master version resolved the problem.(or keep the master and nodes the same version)

How to update k8s version: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade-1-13/

@lianghuiyuan Still having this issue on v1.13.0...

I'm setting up an entirely new cluster or trying to go from a kubespray non kubeadm to kubeadm config.

I got same broblem , when i upgrade cluster from 1.8.10 to v1.12.3 , setup new v1.12.3 is very .

Anyone else having any luck with this? I see non kubeadm deploys have been completely removed from new releases as well... 😞

Any thoughts @riverzhang ?

Moving all the content from /data/kubernetes to /etc/kubernetes, allows the switch to kubeadm.

我也遇到此问题了,我尝试使用

kubeadm reset -f
kubeadm init --config /etc/kubernetes/kubeadm-config.yaml

result

[root@k8s-m1 ~]# kubeadm init --config /etc/kubernetes/kubeadm-config.yaml
[init] Using Kubernetes version: v1.13.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/ssl"
[certs] Using existing front-proxy-ca certificate authority
[certs] Using existing front-proxy-client certificate and key on disk
[certs] External etcd mode: Skipping etcd/ca certificate authority generation
[certs] External etcd mode: Skipping apiserver-etcd-client certificate authority generation
[certs] External etcd mode: Skipping etcd/server certificate authority generation
[certs] External etcd mode: Skipping etcd/peer certificate authority generation
[certs] External etcd mode: Skipping etcd/healthcheck-client certificate authority generation
[certs] Using existing ca certificate authority
[certs] Using existing apiserver-kubelet-client certificate and key on disk
[certs] Using existing apiserver certificate and key on disk
[certs] Using the existing "sa" key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 5m0s
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.

Unfortunately, an error has occurred:
    timed out waiting for the condition

This error is likely caused by:
    - The kubelet is not running
    - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
    - 'systemctl status kubelet'
    - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
    - 'docker ps -a | grep kube | grep -v pause'
    Once you have found the failing container, you can inspect its logs with:
    - 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster

我查看了 kubelet 启动日志 发现 kubelet 依然从 /etc/kubernetes/pki 查找 证书

[root@k8s-m1 ~]# systemctl status kubelet -l
● kubelet.service - Kubernetes Kubelet Server
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: activating (auto-restart) (Result: exit-code) since 三 2019-01-16 08:46:55 CST; 7s ago
     Docs: https://github.com/GoogleCloudPlatform/kubernetes
  Process: 40602 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=255)
  Process: 40599 ExecStartPre=/bin/mkdir -p /var/lib/kubelet/volume-plugins (code=exited, status=0/SUCCESS)
 Main PID: 40602 (code=exited, status=255)

1月 16 08:46:55 k8s-m1 systemd[1]: Unit kubelet.service entered failed state.
1月 16 08:46:55 k8s-m1 systemd[1]: kubelet.service failed.
[root@k8s-m1 ~]# journalctl -xeu kubelet
1月 16 08:52:02 k8s-m1 kubelet[945]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by
1月 16 08:52:02 k8s-m1 kubelet[945]: F0116 08:52:02.623031     945 server.go:244] unable to load client CA file /etc/kubernetes/pki/ca.crt: o
1月 16 08:52:02 k8s-m1 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
1月 16 08:52:02 k8s-m1 systemd[1]: Unit kubelet.service entered failed state.
1月 16 08:52:02 k8s-m1 systemd[1]: kubelet.service failed.
1月 16 08:52:12 k8s-m1 systemd[1]: kubelet.service holdoff time over, scheduling restart.
1月 16 08:52:12 k8s-m1 systemd[1]: Stopped Kubernetes Kubelet Server.
-- Subject: Unit kubelet.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit kubelet.service has finished shutting down.
1月 16 08:52:12 k8s-m1 systemd[1]: Starting Kubernetes Kubelet Server...
-- Subject: Unit kubelet.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit kubelet.service has begun starting up.
1月 16 08:52:12 k8s-m1 systemd[1]: Started Kubernetes Kubelet Server.
-- Subject: Unit kubelet.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit kubelet.service has finished starting up.
--
-- The start-up result is done.
1月 16 08:52:12 k8s-m1 kubelet[979]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by
1月 16 08:52:12 k8s-m1 kubelet[979]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by
1月 16 08:52:12 k8s-m1 kubelet[979]: F0116 08:52:12.873068     979 server.go:244]unable to load client CA file /etc/kubernetes/pki/ca.crt: o
1月 16 08:52:12 k8s-m1 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
1月 16 08:52:12 k8s-m1 systemd[1]: Unit kubelet.service entered failed state.
1月 16 08:52:12 k8s-m1 systemd[1]: kubelet.service failed.

1月 16 08:52:12 k8s-m1 kubelet[979]: F0116 08:52:12.873068 979 server.go:244]unable to load client CA file /etc/kubernetes/pki/ca.crt: o

我尝试修改 [certs] Using certificateDir folder "/etc/kubernetes/ssl" -> /etc/kubernetes/pki

success

[root@k8s-m1 ~]# kubeadm init --config /etc/kubernetes/kubeadm-config.yaml
[init] Using Kubernetes version: v1.13.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-m50 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local localhost k8s-m1 k8s-m2 k8s-m3] and IPs [10.233.0.1 10.2.1.50 10.2.1.50 10.233.0.1 127.0.0.1 10.2.1.50 10.2.2.51 10.2.3.52]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] External etcd mode: Skipping etcd/ca certificate authority generation
[certs] External etcd mode: Skipping etcd/peer certificate authority generation
[certs] External etcd mode: Skipping etcd/server certificate authority generation
[certs] External etcd mode: Skipping etcd/healthcheck-client certificate authority generation
[certs] External etcd mode: Skipping apiserver-etcd-client certificate authority generation
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 5m0s
[apiclient] All control plane components are healthy after 21.003030 seconds
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.13" in namespace kube-system with the configuration for the kubelets in the cluster
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "k8s-m1" as an annotation
[mark-control-plane] Marking the node k8s-m1 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node k8s-m1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: suyxa7.x25v9cltnmjvjewb
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join 10.2.1.50:6443 --token suyxa7.x25v9cltnmjvjewb --discovery-token-ca-cert-hash sha256:37156c269fbef1ea58772b69c2297c8981c494f6db397bc9c2403ac62bfa42f4

但是,我本地虚拟机执行是可以的。这是我在服务器离线部署遇到的问题。

Forgive me for not be able to use English description

@zhangmz0223 same problem(同样的问题,但是我搞不懂)[certs] Using certificateDir folder "/etc/kubernetes/ssl" -> /etc/kubernetes/pki这个怎么更改

@Mroch-Cn 这个只能修改具体的yaml配置。它这里的做法是写死了。

你可以打印下步骤信息, 里面有一个yaml配置文件在执行 kubeadm init 的时候是写死了 /etc/kubernetes/ssl ,我的做法是 修改成 /etc/kubernetes/pki

也可以,全文检索下。

刚查找了下,大概是 roles/kubespray-default/defaults/main.yaml 大概在 93行。

# This is where all the cert scripts and certs will be located
kube_cert_dir: "{{ kube_config_dir }}/ssl"

我修改过后是可以的。

哈哈哈谢谢回复,我查了journalctl里面的信息,发现是虽然我使用了swap off -a这个命令,但是在node加入节点时swap并不是被关掉的,swap会影响kubelet’的启动。Fdisk干掉他们 然后就解决我的问题了。。还是感谢

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows 10


From: zhangmz0223 notifications@github.com
Sent: Thursday, February 14, 2019 3:26:26 PM
To: kubernetes-sigs/kubespray
Cc: Mroch-Cn; Mention
Subject: Re: [kubernetes-sigs/kubespray] Kubeadm fails - kubelet fails to find /etc/kubernetes/bootstrap-kubelet.conf (#3769)

@Mroch-Cnhttps://github.com/Mroch-Cn 这个只能修改具体的yaml配置。它这里的做法是写死了。

你可以打印下步骤信息, 里面有一个yaml配置文件在执行 kubeadm init 的时候是写死了 /etc/kubernetes/ssl ,我的做法是 修改成 /etc/kubernetes/pki


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/kubernetes-sigs/kubespray/issues/3769#issuecomment-463521193, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AOMZTmKmm-h1QztB9kwJ9jkh9WX4fYYZks5vNQ-igaJpZM4Y1EO2.

I just ran into this issue and saw that I been closed but I cannot clearly understand what the fix is and if there is any branch that carry a fix. Can someone please point me to a fix. I believe I started seeing this after I updated my kernel from 3.10 to 4.20. I needed to do that in order to take advantage of some features provided by rook-ceph on my baremetal cluster.

So, I will try not to be longwinded on this one, my hope is that this save someone some pain. I had everything working fine on my on premises 7 node cluster. I realized some rook features like filesystem could not be used because of my kernel version. My hosts runs on Centos 7 that comes with kernel version 3.10. So I decided to tear everything down, update my kernel and got version 4.20 on all my host. Then I realized I ran into this 3986 issue. Apparently there was a bug on everything 2.8.2 and below causing it to fail on systems with Kernel version >= 4.19. That bug supposedly came from upstream kubernetes. It was fixed on version 1.13.0 and also accommodated here in kubespray but on the master branch only.

I traced the fix down to the master branch, read more about it 3986. Long story short even after an upgrade to use the master branch, and I know I am leaving on the edge, I still ran into this 4008. It turns out there are some significant changes on the master branch that also require a complete change of the inventory folder. The easiest way to do that is to copy the sample folder and then make the necessary changes to your host.ini or any additional changes that are specific to your environment.

Thanks @zhangmz0223 !! "{{ kube_config_dir }}/ssl -> {{ kube_config_dir }}/pki" this one worked for me.

Watching logs journalctl -xe show me the reason of the problem

Part of the existing bootstrap client certificate is expired:

Found solution in this SO answer
https://stackoverflow.com/a/56334732/2110663

As the document said here The kubelet drop-in file for systemd:

The KubeConfig file to use for the TLS Bootstrap is /etc/kubernetes/bootstrap-kubelet.conf, 
but it is only used if /etc/kubernetes/kubelet.conf does not exist.

/etc/kubernetes/bootstrap-kubelet.conf was only used when /etc/kubernetes/kubelet.conf, so you can fix it:

  • copy a bootstrap-kubelet.conf from other nodes, just make sure it exsits;
  • renew a bootstrap token and replace the old one bootstrap file:

    new_token=$(kubeadm token create)
    sed -i "s/token: .*/token: $new_token/" /etc/kubernetes/bootstrap-kubelet.conf
    

    restart kubelet and it will generate a new kubelet.conf file.

    OR

  • generate kubelet.conf from here renew-kubernetes-pki-after-expired/56334732#56334732

Was this page helpful?
0 / 5 - 0 ratings