kubeadm 1.9.2 doesn't work over proxy

Created on 31 Jan 2018  路  18Comments  路  Source: kubernetes/kubeadm

Versions

kubeadm version (use kubeadm version):

Environment:

  • Kubernetes version (use kubectl version): kubeadm version: &version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T09:42:01Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: Vmware / Proxmox
  • OS (e.g. from /etc/os-release): Debian GNU/Linux 9 (stretch)
  • Kernel (e.g. uname -a): 4.9.65-3+deb9u2
  • Others:

    What happened?

I try to execute kubeadm init --pod-network-cidr=192.168.0.0/16 and stucks with

[init] This might take a minute or longer if the control plane images have to be pulled.

What you expected to happen?

The kubadm runs fine and I get a working cluster node.

Anything else we need to know?

The problem is, that the first time I created a cluster, I did it on my Vmware Player with NAT and full access to the internet. In the second try, I created Vms (two for master on Proxmox VE (KVM) and two nodes on Vmware vSphere. The network is restricted with no direct internet connection. So I added to /etc/profile:

export http_proxy="http://192.168.42.214:3128"
export https_proxy="http://192.168.42.214:3128"
export no_proxy="localhost,127.0.0.1,localaddress,.localdomain.com,.example.local,192.168.0.0/16,10.96.0.0/12,172.25.50.21,172.25.50.22,172.25.50.23,172.25.50.24"
export HTTP_PROXY="http://192.168.42.214:3128"
export HTTPS_PROXY="http://192.168.42.214:3128"
export NO_PROXY="localhost,127.0.0.1,localaddress,.localdomain.com,.example.local,192.168.0.0/16,10.96.0.0/12,172.25.50.21,172.25.50.22,172.25.50.23,172.25.50.24"

On the firewall log I can see, that there is still traffic to 173.194.76.82 (gcr.io) via HTTPS. That is bad. Also kubeadm hangs forever. So I added the host to the whitelist on the firewall (NAT) and than, I got:

...
[init] This might take a minute or longer if the control plane images have to be pulled.
[apiclient] All control plane components are healthy after 75.501467 seconds
[uploadconfig]聽Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[markmaster] Will mark node ina-test-kubm-01 as master by adding a label and a taint
[markmaster] Master ina-test-kubm-01 tainted and labelled with key/value: node-role.kubernetes.io/master=""
...

Now I can go forward with the network part.

Most helpful comment

SOLVED - I had a cgroup driver mismatch between docker and kubelet. Rectified it and init completed successfully.

All 18 comments

Are you running kubeadm as sudo kubeadm .... ? If so, verify your sudoers settings for options about resetting environment variables. In many distros proxy settings will be not kept via sudo.

hi,

no, executed directly with root permissions.

Then please verify that your session really has environment variables set. e.g. env | grep -i _proxy.
We are using kubeadm on a daily basis in the enterprise network behind proxies. I don't see ways why kubeadm will be not using proxy unless environment is not set properly.

hi,

```

env | grep -i _proxy

HTTP_PROXY=http://192.168.42.214:3128
https_proxy=http://192.168.42.214:3128
http_proxy=http://192.168.42.214:3128
no_proxy=localhost,127.0.0.1,localaddress,.localdomain.com,.localdomain.local,192.168.0.0/16,10.96.0.0/12,172.25.50.21,172.25.50.22,172.25.50.23,172.25.50.24
NO_PROXY=localhost,127.0.0.1,localaddress,.localdomain.com,.localdomain.local,192.168.0.0/16,10.96.0.0/12,172.25.50.21,172.25.50.22,172.25.50.23,172.25.50.24
HTTPS_PROXY=http://192.168.42.214:3128
```
I had the same problem on the worker-nodes too. So I assume, that one or more processes drops the env, or does not use them.

What I can image is, that the process (dash/sh/) doesn't read the /etc/profile ...

The network calls in kubeadm are not spawning any subprocesses, thus should be utilising those proxies. Control plane components also getting proxy settings propagated.
The only component that I can imagine during setup, (and actually the only one which should connect to gcr.io IP) is docker daemon. It does not use /etc/profile and require configuration for proxies in systemd drop-in file.

Can you check what in your setup docker info shows, please ?

i got the same issue.
ive updated the /etc/sysconfig/docker
to add the proxy
and docker info now shows :

Containers: 0 Running: 0 Paused: 0 Stopped: 0 Images: 1 Server Version: 1.13.1 Storage Driver: overlay2 Backing Filesystem: xfs Supports d_type: true Native Overlay Diff: true Logging Driver: journald Cgroup Driver: systemd Plugins: Volume: local Network: bridge host macvlan null overlay Authorization: rhel-push-plugin Swarm: inactive Runtimes: runc oci Default Runtime: oci Init Binary: /usr/libexec/docker/docker-init-current containerd version: (expected: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1) runc version: N/A (expected: 9df8b306d01f59d3a8029be411de015b7304dd8f) init version: N/A (expected: 949e6facb77383876aeff8a6944dde66b3089574) Security Options: seccomp WARNING: You're not using the default seccomp profile Profile: /etc/docker/seccomp.json selinux Kernel Version: 4.14.16-200.fc26.x86_64 Operating System: Fedora 26 (Twenty Six) OSType: linux Architecture: x86_64 Number of Docker Hooks: 3 CPUs: 1 Total Memory: 3.877 GiB Name: hostnamexx ID: 2F4B:TXXA:YXXW:NFBD:LXXR:WZNE:QFAC:Y4Y5:NO37:U5DS:I6XT:XXXX Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Http Proxy: http://user:[email protected]:8887 No Proxy: localnet.net,localhost,127.0.0.1,192.168.0.0/16 Registry: https://index.docker.io/v1/ Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false Registries: docker.io (secure), registry.fedoraproject.org (secure), registry.access.redhat.com (secure), docker.io (secure)

@jamalsia so, which error you're getting now ?

[init] Using Kubernetes version: v1.9.3
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks.
        [WARNING Hostname]: hostname "hostnameXX" could not be reached
        [WARNING Hostname]: hostname "hostnameXX" lookup hostnameXX on 10.0.10.150:53: server misbehaving
        [WARNING FileExisting-tc]: tc not found in system path
        [WARNING FileExisting-crictl]: crictl not found in system path
        [WARNING HTTPProxy]: Connection to "https://10.44.102.144:6443" uses proxy "http://user:[email protected]:1234". If that is not intended, adjust your proxy settings
        [WARNING HTTPProxyCIDR]: connection to "10.96.0.0/12" uses proxy "http://user:[email protected]:1234". This may lead to malfunctional cluster setup. Make sure that Pod and Services IP ranges specified correctly as exceptions in proxy configuration
        [WARNING HTTPProxyCIDR]: connection to "192.168.0.0/16" uses proxy "http://user:[email protected]:1234". This may lead to malfunctional cluster setup. Make sure that Pod and Services IP ranges specified correctly as exceptions in proxy configuration
[preflight] Some fatal errors occurred:
        [ERROR Swap]: running with swap on is not supported. Please disable swap
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`

but ping is working , which makes me believe that it is not related to the hosts no_proxy configuration. I am running a HyperV Virtual Machine behind a proxy :

ping hostnameXXX
PING hostnameXXX(hostnameXXX (fe80::215:5dff:fe68:3300%eth0)) 56 data bytes
64 bytes from hostnameXXX (fe80::215:5dff:fe68:3300%eth0): icmp_seq=1 ttl=64 time=0.104 ms
64 bytes from hostnameXXX (fe80::215:5dff:fe68:3300%eth0): icmp_seq=2 ttl=64 time=0.076 ms
64 bytes from hostnameXXX (fe80::215:5dff:fe68:3300%eth0): icmp_seq=3 ttl=64 time=0.030 ms
64 bytes from hostnameXXX (fe80::215:5dff:fe68:3300%eth0): icmp_seq=4 ttl=64 time=0.074 ms
64 bytes from hostnameXXX (fe80::215:5dff:fe68:3300%eth0): icmp_seq=5 ttl=64 time=0.030 ms

there are 5 nics . eth0 is the main one. the others are the virtual machines network cards.

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:15:5d:68:33:00 brd ff:ff:ff:ff:ff:ff
    inet 10.44.102.144/24 brd 10.44.102.255 scope global dynamic eth0
       valid_lft 13928sec preferred_lft 13928sec
    inet6 fe80::215:5dff:fe68:3300/64 scope link
       valid_lft forever preferred_lft forever
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 52:54:00:16:cb:b1 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
       valid_lft forever preferred_lft forever
4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN group default qlen 1000
    link/ether 52:54:00:16:cb:b1 brd ff:ff:ff:ff:ff:ff
5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:36:1c:6f:f2 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever

@kad

docker info
Containers: 44
 Running: 23
 Paused: 0
 Stopped: 21
Images: 14
Server Version: 17.12.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.0-5-amd64
Operating System: Debian GNU/Linux 9 (stretch)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 996.3MiB
Name: ina-test-kubm-01
ID: E5US:GR7K:2SZA:OS6E:JSUN:MU7X:X4KS:VVXJ:ZLBH:HFZA:SFRC:VG23
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

As a hint: I remove the proxy vars and added the hosts to a whitelist to go forward. But if I remember right, there was some lines about proxy under /etc/kubernetes/ ....

cu denny

Greetings,

You may refer to https://docs.docker.com/config/daemon/systemd/#httphttps-proxy to set proxy for docker daemon.

kubeadm init still fails. Even after setting the docker proxy config (link in the above post).
It also doesnt help that using wildcards in no_proxy env variable doesnt work like it's supposed to on linux.

@cneginha kubeadm and all other kubernetes code support CIDR notation of NO_PROXY.
Set NO_PROXY to `127.0.0.1,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,example.com" (replace example.com with your domain).

@jamalsia you have multiple things that you need to solve in your setup:

  1. swap
  2. DNS and hostnames
  3. set correctly NO_PROXY
  4. check situation with docker. proxy setting might be needed to be adjusted there as well.

@kad after setting NO_PROXY env variable explicitly with IP address of the nodes involved, I no longer get the proxy warning. However, kubadm init is still failing with -

====
kubeadm init
[init] Using Kubernetes version: v1.9.3
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks.
[WARNING FileExisting-crictl]: crictl not found in system path
[preflight] Starting the kubelet service
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [kubeflow-1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.10.10.4]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests".
[init] This might take a minute or longer if the control plane images have to be pulled.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] It seems like the kubelet isn't running or healthy.
Unfortunately, an error has occurred:
timed out waiting for the condition

This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
- There is no internet connection, so the kubelet cannot pull the following control plane images:
- gcr.io/google_containers/kube-apiserver-amd64:v1.9.3
- gcr.io/google_containers/kube-controller-manager-amd64:v1.9.3
- gcr.io/google_containers/kube-scheduler-amd64:v1.9.3

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
couldn't initialize a Kubernetes cluster

========

docker pull gcr.io images works okay though..

SOLVED - I had a cgroup driver mismatch between docker and kubelet. Rectified it and init completed successfully.

@cneginha
Can you explain to me?

closing.

@4qv907rtet5r the cgroup driver of docker and kubelet were different.

You can find out what drivers are used with:

docker info | grep -i cgroup
cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

If they are different, edit the 10-kubeadm.conf :)

Info taken from the k8s docs: https://kubernetes.io/docs/tasks/tools/install-kubeadm/

Was this page helpful?
0 / 5 - 0 ratings