Kubespray: When deployed, found some wrong parameters。(In the offline environment)

Created on 25 Feb 2019  ·  15Comments  ·  Source: kubernetes-sigs/kubespray

Is this a BUG REPORT or FEATURE REQUEST? (choose one):

Environment:

  • Cloud provider or hardware configuration:

4 vagrant vms

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
Linux 4.20.12-1.el7.elrepo.x86_64 x86_64
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
  • Version of Ansible (ansible --version):
ansible 2.7.5
  config file = /Users/user/tmp/kubespray-2.8.3/ansible.cfg
  configured module search path = [u'/Users/user/tmp/kubespray-2.8.3/library']
  ansible python module location = /usr/local/lib/python2.7/site-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 2.7.15 (default, Aug 17 2018, 22:39:05) [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)]

Kubespray version (commit) (git rev-parse --short HEAD):

9805fb7a

Network plugin used:

Calico

Copy of your inventory file:

# 配置ip变量以绑定kubernetes服务到不同ip
[all]
k8s-m1   ansible_host=vgk8s111 ip=192.168.88.111
k8s-m2   ansible_host=vgk8s112 ip=192.168.88.112
k8s-m3   ansible_host=vgk8s113 ip=192.168.88.113
k8s-n1   ansible_host=vgk8s114 ip=192.168.88.114

[kube-master]
k8s-m1
k8s-m2
k8s-m3

[etcd]
k8s-m1
k8s-m2
k8s-m3

[kube-node]
k8s-m1
k8s-m2
k8s-m3
k8s-n1

[k8s-cluster:children]
kube-master
kube-node

[calico-rr]

# Vault是用来安全的获取秘密信息的工具,它可以保存密码、API密钥、证书等信息
[vault]
k8s-m1
k8s-m2
k8s-m3

Command used to invoke ansible:
ansible-playbook -i inventory/mycluster/hosts.ini -vv --flush-cache --become --become-user=root cluster.yml

Output of ansible run:

  1. /Users/zmz/tmp/kubespray-2.8.3/roles/kubernetes/client/tasks/main.yml in line 52

--cert-dir {{ kube_config_dir }}/ssl -> --cert-dir {{ kube_cert_dir }}

  1. /Users/zmz/tmp/kubespray-2.8.3/roles/kubernetes/master/tasks/kubeadm-setup.yml in line 74

command: "cp -TR {{ etcd_cert_dir }} {{ kube_config_dir }}/ssl/etcd" -> command: "cp -TR {{ etcd_cert_dir }} {{ kube_cert_dir }}/etcd"

  1. core dns not work

check /var/lib/kubelet/config.yaml

[root@k8s-m1 ~]# cat /var/lib/kubelet/config.yaml
address: 0.0.0.0
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 2m0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 5m0s
    cacheUnauthorizedTTL: 30s
cgroupDriver: cgroupfs
cgroupsPerQOS: true
clusterDNS:
- 10.233.0.10
clusterDomain: cluster.local
configMapAndSecretChangeDetectionStrategy: Watch
containerLogMaxFiles: 5
containerLogMaxSize: 10Mi
contentType: application/vnd.kubernetes.protobuf
cpuCFSQuota: true
cpuCFSQuotaPeriod: 100ms
cpuManagerPolicy: none
cpuManagerReconcilePeriod: 10s
enableControllerAttachDetach: true
enableDebuggingHandlers: true
enforceNodeAllocatable:
- pods
eventBurst: 10
eventRecordQPS: 5
evictionHard:
  imagefs.available: 15%
  memory.available: 100Mi
  nodefs.available: 10%
  nodefs.inodesFree: 5%
evictionPressureTransitionPeriod: 5m0s
failSwapOn: true
fileCheckFrequency: 20s
hairpinMode: promiscuous-bridge
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 20s
imageGCHighThresholdPercent: 85
imageGCLowThresholdPercent: 80
imageMinimumGCAge: 2m0s
iptablesDropBit: 15
iptablesMasqueradeBit: 14
kind: KubeletConfiguration
kubeAPIBurst: 10
kubeAPIQPS: 5
makeIPTablesUtilChains: true
maxOpenFiles: 1000000
maxPods: 110
nodeLeaseDurationSeconds: 40
nodeStatusReportFrequency: 1m0s
nodeStatusUpdateFrequency: 10s
oomScoreAdj: -999
podPidsLimit: -1
port: 10250
registryBurst: 10
registryPullQPS: 5
resolvConf: /etc/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 2m0s
serializeImagePulls: true
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 4h0m0s
syncFrequency: 1m0s
volumeStatsAggPeriod: 1m0s

clusterDNS is 10.233.0.10

but, k8s coredns svc clusterIP is 10.233.0.3

[root@k8s-m1 ~]# kubectl get svc -n kube-system
NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
coredns                ClusterIP   10.233.0.3      <none>        53/UDP,53/TCP,9153/TCP   6h58m
kubernetes-dashboard   ClusterIP   10.233.38.188   <none>        443/TCP                  6h58m
metrics-server         ClusterIP   10.233.53.239   <none>        443/TCP                  6h58m
tiller-deploy          ClusterIP   10.233.41.95    <none>        44134/TCP                6h58m

I created a curl-bushy img pod and tried to modify /etc/resolv.conf to change to 10.233.0.3.

It worked

kubectl run -it curl --image=radial/busyboxplus:curl sh
[ root@curl-66959f6557-sslr8:/ ]$ cat /etc/resolv.conf
nameserver 10.233.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

[ root@curl-66959f6557-sslr8:/ ]$ nslookup nginx
Server:    10.233.0.10
^C

[ root@curl-66959f6557-sslr8:/ ]$ nslookup nginx
Server:    10.233.0.3
Address 1: 10.233.0.3 coredns.kube-system.svc.cluster.local

Name:      nginx
Address 1: 10.233.56.17 nginx.default.svc.cluster.local

I don't know what went wrong.
The results of deployment are the same.

Anything else do we need to know:

Most helpful comment

+1 for the DNS on master.
I ended up adding a service manually until this is fixed.
(I just copied the coredns service and assigned a clusterIP -- and changed name of course)

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/port: "9153"
    prometheus.io/scrape: "true"
  creationTimestamp: null
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: coredns
  name: coredns-manual
  namespace: kube-system
spec:
  ports:
  - name: dns
    port: 53
    protocol: UDP
    targetPort: 53
  - name: dns-tcp
    port: 53
    protocol: TCP
    targetPort: 53
  - name: metrics
    port: 9153
    protocol: TCP
    targetPort: 9153
  selector:
    k8s-app: kube-dns
  sessionAffinity: None
  type: ClusterIP
  clusterIP: 10.233.0.10

All 15 comments

I'm having the exact same problem.

Same here with Kubespray v2.9.0 (a4e65c7c)

+1 for the DNS on master.
I ended up adding a service manually until this is fixed.
(I just copied the coredns service and assigned a clusterIP -- and changed name of course)

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/port: "9153"
    prometheus.io/scrape: "true"
  creationTimestamp: null
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: coredns
  name: coredns-manual
  namespace: kube-system
spec:
  ports:
  - name: dns
    port: 53
    protocol: UDP
    targetPort: 53
  - name: dns-tcp
    port: 53
    protocol: TCP
    targetPort: 53
  - name: metrics
    port: 9153
    protocol: TCP
    targetPort: 9153
  selector:
    k8s-app: kube-dns
  sessionAffinity: None
  type: ClusterIP
  clusterIP: 10.233.0.10

why coredns clusterIP is different than /var/lib/kubelet/config clusterDNS? it is supposed to be same in order to pod to pod and pod to service communication in a cluster. I've changed clusterDNS to coredns service clusterIP and restart kubelet service in all nodes for pod communications. can anyone please explain me how coredns clusterIP is being set to 10.233.0.3 instead of 10.233.0.10 ?

The clusterIP for the kubelet is set in /etc/kubernetes/kubelet.env

Can you paste the content of that file

The issue is the svc for kube-dns is deployed even though we skip the phase. We should delete it maybe?

PLAY RECAP *********************************************************************
k8s-1                      : ok=405  changed=110  unreachable=0    failed=0
k8s-2                      : ok=335  changed=95   unreachable=0    failed=0
k8s-3                      : ok=299  changed=83   unreachable=0    failed=0

$ vagrant ssh k8s-1
Last login: Tue May  7 08:43:04 UTC 2019 from 10.0.2.2 on pts/0kub
Container Linux by CoreOS stable (2079.3.0)

core@k8s-1 ~ $ sudo su
k8s-1 core # /opt/bin/kubectl get svc --all-namespaces
NAMESPACE     NAME                   TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                  AGE
default       kubernetes             ClusterIP   10.233.0.1     <none>        443/TCP                  5m45s
kube-system   coredns                ClusterIP   10.233.0.3     <none>        53/UDP,53/TCP,9153/TCP   3m19s
kube-system   kube-dns               ClusterIP   10.233.0.10    <none>        53/UDP,53/TCP,9153/TCP   5m1s
kube-system   kubernetes-dashboard   ClusterIP   10.233.60.57   <none>        443/TCP                  3m11s

Checking the env config for the kubelet:

```k8s-1 core # cat /etc/kubernetes/kubelet.env

Upstream source https://github.com/kubernetes/release/blob/master/debian/xenial/kubeadm/channel/stable/etc/systemd/system/kubelet.service.d/

All upstream values should be present in this file

logging to stderr means we get it in the systemd journal

KUBE_LOGTOSTDERR="--logtostderr=true"
KUBE_LOG_LEVEL="--v=2"

The address for the info server to serve on (set to 0.0.0.0 or "" for all interfaces)

KUBELET_ADDRESS="--address=172.17.8.101 --node-ip=172.17.8.101"

The port for the info server to serve on

KUBELET_PORT="--port=10250"

You may leave this blank to use the actual hostname

KUBELET_HOSTNAME="--hostname-override=k8s-1"

KUBELET_ARGS="--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf \
--kubeconfig=/etc/kubernetes/kubelet.conf \
--authentication-token-webhook \
--enforce-node-allocatable="" \
--client-ca-file=/etc/kubernetes/ssl/ca.crt \
--rotate-certificates \
--pod-manifest-path=/etc/kubernetes/manifests \
--pod-infra-container-image=gcr.io/google_containers/pause-amd64:3.1 \
--node-status-update-frequency=10s \
--cgroup-driver=cgroupfs \
--max-pods=110 \
--anonymous-auth=false \
--read-only-port=0 \
--fail-swap-on=True \
--runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice \
--cluster-dns=169.254.25.10 --cluster-domain=cluster.local --resolv-conf=/etc/resolv.conf --kube-reserved cpu=200m,memory=512M --node-labels= "
KUBELET_NETWORK_PLUGIN="--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"
KUBELET_VOLUME_PLUGIN="--volume-plugin-dir=/var/lib/kubelet/volume-plugins"

Should this cluster be allowed to run privileged docker containers

KUBE_ALLOW_PRIV="--allow-privileged=true"
KUBELET_CLOUDPROVIDER=""

PATH=/opt/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

You can see the cluster-dns is set to nodelocaldns as it should.

Checking the resolv.conf file in a pod:

k8s-1 core # /opt/bin/kubectl run -it curl --image=radial/busyboxplus:curl sh
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
If you don't see a command prompt, try pressing enter.
[ root@curl-78bdd5c756-sm77w:/ ]$ cat /etc/resolv.conf
nameserver 169.254.25.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
[ root@curl-78bdd5c756-sm77w:/ ]$
```

As you can see it uses the correct address for the dns.

I don't think the config file at /var/lib/kubelet/config.yaml is actually in use.

Disabing enable_nodelocaldns still works for me too. The --cluster-dns in the kubelet.env file is honored.

k8s-1 core # cat /etc/kubernetes/kubelet.env
### Upstream source https://github.com/kubernetes/release/blob/master/debian/xenial/kubeadm/channel/stable/etc/systemd/system/kubelet.service.d/
### All upstream values should be present in this file

# logging to stderr means we get it in the systemd journal
KUBE_LOGTOSTDERR="--logtostderr=true"
KUBE_LOG_LEVEL="--v=2"
# The address for the info server to serve on (set to 0.0.0.0 or "" for all interfaces)
KUBELET_ADDRESS="--address=172.17.8.101 --node-ip=172.17.8.101"
# The port for the info server to serve on
# KUBELET_PORT="--port=10250"
# You may leave this blank to use the actual hostname
KUBELET_HOSTNAME="--hostname-override=k8s-1"






KUBELET_ARGS="--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf \
--kubeconfig=/etc/kubernetes/kubelet.conf \
--authentication-token-webhook \
--enforce-node-allocatable="" \
--client-ca-file=/etc/kubernetes/ssl/ca.crt \
--rotate-certificates \
--pod-manifest-path=/etc/kubernetes/manifests \
--pod-infra-container-image=gcr.io/google_containers/pause-amd64:3.1 \
--node-status-update-frequency=10s \
--cgroup-driver=cgroupfs \
--max-pods=110 \
--anonymous-auth=false \
--read-only-port=0 \
--fail-swap-on=True \
--runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice \
 --cluster-dns=10.233.0.3 --cluster-domain=cluster.local --resolv-conf=/etc/resolv.conf --kube-reserved cpu=200m,memory=512M --node-labels=  "
KUBELET_NETWORK_PLUGIN="--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"
KUBELET_VOLUME_PLUGIN="--volume-plugin-dir=/var/lib/kubelet/volume-plugins"
# Should this cluster be allowed to run privileged docker containers
KUBE_ALLOW_PRIV="--allow-privileged=true"
KUBELET_CLOUDPROVIDER=""

PATH=/opt/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
k8s-1 core # kubectl /opt/bin/kubectl run -it curl --image=radial/busyboxplus:curl sh^C
k8s-1 core # /opt/bin/kubectl run -it curl --image=radial/busyboxplus:curl sh
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
If you don't see a command prompt, try pressing enter.
[ root@curl-78bdd5c756-q77th:/ ]$ cat /etc/resolv.conf
nameserver 10.233.0.3
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

4719 should cleanup the kube-dns service upon deployments as a temp fix while we wait for https://github.com/kubernetes/kubeadm/issues/1557

thanks @woopstar for explaination. but i've one doubt. https://github.com/kubernetes-sigs/kubespray/blob/master/roles/kubernetes/node/templates/kubelet.kubeadm.env.j2 this file is used to generate kubelet.env and it sets --cluster-dns flag from skydns_server value if enable_nodelocaldns is false and this skydns_server value comes from https://github.com/kubernetes-sigs/kubespray/blob/master/inventory/sample/group_vars/k8s-cluster/k8s-cluster.yml file which in turns uses the value of kube_service_addresses which is 10.233.0.0/18 and this same value is being used in coredns service yaml. so ideally both values should be same.
but coredns cluster ip will be 10.233.0.3 but clusterDNS is 10.233.0.10 in /var/lib/kubelet/config.yaml.
I assume by changing the value of skydns_server to following will solve this problem.
https://github.com/kubernetes-sigs/kubespray/blob/03bded2b6b4714762ddc1efd4252c05a65f983ac/inventory/sample/group_vars/k8s-cluster/k8s-cluster.yml#L143
skydns_server: "{{ kube_service_addresses|ipaddr('net')|ipaddr(10)|ipaddr('address') }}"
Please let me know if i am wrong.
Thanks.

thanks @woopstar for explaination. but i've one doubt. https://github.com/kubernetes-sigs/kubespray/blob/master/roles/kubernetes/node/templates/kubelet.kubeadm.env.j2 this file is used to generate kubelet.env and it sets --cluster-dns flag from skydns_server value if enable_nodelocaldns is false and this skydns_server value comes from https://github.com/kubernetes-sigs/kubespray/blob/master/inventory/sample/group_vars/k8s-cluster/k8s-cluster.yml file which in turns uses the value of kube_service_addresses which is 10.233.0.0/18 and this same value is being used in coredns service yaml. so ideally both values should be same.
but coredns cluster ip will be 10.233.0.3 but clusterDNS is 10.233.0.10 in /var/lib/kubelet/config.yaml.
I assume by changing the value of skydns_server to following will solve this problem.
https://github.com/kubernetes-sigs/kubespray/blob/03bded2b6b4714762ddc1efd4252c05a65f983ac/inventory/sample/group_vars/k8s-cluster/k8s-cluster.yml#L143

skydns_server: "{{ kube_service_addresses|ipaddr('net')|ipaddr(10)|ipaddr('address') }}"
Please let me know if i am wrong.
Thanks.

You are a bit wrong. In the k8s-cluster.yml file, the skydns_server is set by:

skydns_server: "{{ kube_service_addresses|ipaddr('net')|ipaddr(3)|ipaddr('address') }}"

That would return the third ip address in the range as it should and which is used in the coredns template. It is also used in the kubelets env file for the cluster-dns flag.

Can you please provide me with the content of the kubelet's env file.

thanks @woopstar for explaination. but i've one doubt. https://github.com/kubernetes-sigs/kubespray/blob/master/roles/kubernetes/node/templates/kubelet.kubeadm.env.j2 this file is used to generate kubelet.env and it sets --cluster-dns flag from skydns_server value if enable_nodelocaldns is false and this skydns_server value comes from https://github.com/kubernetes-sigs/kubespray/blob/master/inventory/sample/group_vars/k8s-cluster/k8s-cluster.yml file which in turns uses the value of kube_service_addresses which is 10.233.0.0/18 and this same value is being used in coredns service yaml. so ideally both values should be same.
but coredns cluster ip will be 10.233.0.3 but clusterDNS is 10.233.0.10 in /var/lib/kubelet/config.yaml.
I assume by changing the value of skydns_server to following will solve this problem.
https://github.com/kubernetes-sigs/kubespray/blob/03bded2b6b4714762ddc1efd4252c05a65f983ac/inventory/sample/group_vars/k8s-cluster/k8s-cluster.yml#L143

skydns_server: "{{ kube_service_addresses|ipaddr('net')|ipaddr(10)|ipaddr('address') }}"
Please let me know if i am wrong.
Thanks.

You are a bit wrong. In the k8s-cluster.yml file, the skydns_server is set by:

skydns_server: "{{ kube_service_addresses|ipaddr('net')|ipaddr(3)|ipaddr('address') }}"

That would return the third ip address in the range as it should and which is used in the coredns template. It is also used in the kubelets env file for the cluster-dns flag.

Can you please provide me with the content of the kubelet's env file.

@woopstar here is my kubelet.env file and it has correct value of --cluster-dns (10.233.0.3) flag but in /var/lib/kubelet/config.yaml clusterDNS value(10.233.0.10) is different
root@master-1:~# cat /etc/kubernetes/kubelet.env

Upstream source https://github.com/kubernetes/release/blob/master/debian/xenial/kubeadm/channel/stable/etc/systemd/system/kubelet.service.d/

All upstream values should be present in this file

logging to stderr means we get it in the systemd journal

KUBE_LOGTOSTDERR="--logtostderr=true"
KUBE_LOG_LEVEL="--v=2"

The address for the info server to serve on (set to 0.0.0.0 or "" for all interfaces)

KUBELET_ADDRESS="--address=161.X.X.X --node-ip=161.X.X.X"

The port for the info server to serve on

KUBELET_PORT="--port=10250"

You may leave this blank to use the actual hostname

KUBELET_HOSTNAME="--hostname-override=master-1"

KUBELET_ARGS="--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf \
--kubeconfig=/etc/kubernetes/kubelet.conf \
--authentication-token-webhook \
--enforce-node-allocatable="" \
--client-ca-file=/etc/kubernetes/ssl/ca.crt \
--rotate-certificates \
--pod-manifest-path=/etc/kubernetes/manifests \
--pod-infra-container-image=gcr.io/google_containers/pause-amd64:3.1 \
--node-status-update-frequency=10s \
--cgroup-driver=cgroupfs \
--max-pods=110 \
--anonymous-auth=false \
--read-only-port=10255 \
--fail-swap-on=True \
--runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice \
--cluster-dns=10.233.0.3 --cluster-domain=cluster.local --resolv-conf=/etc/resolv.conf --kube-reserved cpu=200m,memory=512M --node-labels=node.kubernetes.io/master='' "
KUBELET_NETWORK_PLUGIN="--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"
KUBELET_VOLUME_PLUGIN="--volume-plugin-dir=/var/lib/kubelet/volume-plugins"

Should this cluster be allowed to run privileged docker containers

KUBE_ALLOW_PRIV="--allow-privileged=true"
KUBELET_CLOUDPROVIDER=""

PATH=/opt/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

still i've same doubt how clusterDNS value got changed in /var/lib/kubelet/config.yaml ?

/var/lib/kubelet/config.yaml is not used. It is created by kubeadm init but we do not use it. We override the values on the kubelet with flags from the env file.
I'm working on a PR that uses the config file though.

/var/lib/kubelet/config.yaml is not used. It is created by kubeadm init but we do not use it. We override the values on the kubelet with flags from the env file.
I'm working on a PR that uses the config file though.

Great @woopstar (y). that will fix this issue.

On RHEL7, I still get the wrong clusterDNS=10.233.0.10

Still seeing this as well... Should we create a new issue?

Was this page helpful?
0 / 5 - 0 ratings