Kubespray: scale.yml does not populate existing node's /etc/hosts file

Created on 19 Feb 2018 · 11Comments · Source: kubernetes-sigs/kubespray

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT

Environment:

Cloud provider or hardware configuration:
Aliyun, but relevant to all cloud providers

OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
Linux 4.4.0-105-generic x86_64
NAME="Ubuntu"
VERSION="16.04.3 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.3 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
Version of Ansible (ansible --version):
ansible 2.4.3.0

Kubespray version (commit) (git rev-parse --short HEAD):
f13e76d0

Network plugin used:
flannel

Copy of your inventory file:
[all:vars]

[all]
K8SMST1 ansible_host=10.209.184.26 ip=10.209.184.26
K8SMST2 ansible_host=10.209.184.28 ip=10.209.184.28
K8SETC1 ansible_host=10.209.184.29 ip=10.209.184.29
K8SETC2 ansible_host=10.209.184.30 ip=10.209.184.30
K8SETC3 ansible_host=10.209.184.31 ip=10.209.184.31
K8SWRK1 ansible_host=10.209.184.32 ip=10.209.184.32
K8SWRK2 ansible_host=10.209.184.33 ip=10.209.184.33
K8SWRK3 ansible_host=10.209.184.35 ip=10.209.184.35
K8SWRK4 ansible_host=10.209.184.34 ip=10.209.184.34
K8SWRK5 ansible_host=10.209.184.37 ip=10.209.184.37

[kube-master]
K8SMST1
K8SMST2

[kube-node]
K8SWRK1
K8SWRK2
K8SWRK3
K8SWRK4
K8SWRK5

[etcd]
K8SETC1
K8SETC2
K8SETC3

[k8s-cluster:children]
kube-node
kube-master

[calico-rr]

[vault]

Command used to invoke ansible:
ansible-playbook -i clusters//inventory.cfg scale.yml -b -v --key-file=.pem -u root

Output of ansible run:

2018-02-17 07:36:33,445 p=21191 u=root | TASK [network_plugin/cloud : Cloud | Set cni directory permissions] ****************************
2018-02-17 07:36:33,446 p=21191 u=root | Saturday 17 February 2018 07:36:33 +0800 (0:00:00.228) 0:14:20.586
2018-02-17 07:36:33,663 p=21191 u=root | PLAY RECAP ***********************************************
2018-02-17 07:36:33,663 p=21191 u=root | K8SWRK1 : ok=222 changed=6 unreachable=0 failed=0
2018-02-17 07:36:33,663 p=21191 u=root | K8SWRK2 : ok=196 changed=2 unreachable=0 failed=0
2018-02-17 07:36:33,664 p=21191 u=root | K8SWRK3 : ok=196 changed=2 unreachable=0 failed=0
2018-02-17 07:36:33,664 p=21191 u=root | K8SWRK4 : ok=196 changed=2 unreachable=0 failed=0
2018-02-17 07:36:33,664 p=21191 u=root | K8SWRK5 : ok=220 changed=52 unreachable=0 failed=0
2018-02-17 07:36:33,664 p=21191 u=root | localhost : ok=2 changed=0 unreachable=0 failed=0
2018-02-17 07:36:33,665 p=21191 u=root | Saturday 17 February 2018 07:36:33 +0800 (0:00:00.219) 0:14:20.805

Anything else do we need to know:

Currently the scale.yml playbook does not populate the /etc/hosts files on existing nodes in the clusters. This causes issues when trying to run kubectl commands on any pods on the newest nodes in the cluster. For example, I tried to add K8SWRK5 to a cluster using scale.yml and it ran to completion but I could not get the logs or exec into any pods on that node. I would get the following error.

 kubectl exec -it test-box-6d5b44f485-hsxlw bash
Error from server: error dialing backend: dial tcp: lookup k8swrk5 on 100.100.2.136:53: no such host

scale.yml should populate all the /etc/hosts file for all existing nodes. Then restart the containers on each node to ensure the newest /etc/hosts file is being used.

lifecyclrotten

Source

RaymondArias

👍8

Most helpful comment

At the moment scale.yml does populate all the existing nodes for /etc/hosts file, but for the worker nodes only. It doesn't do this on the masters, meaning the addresses of new nodes don't appear in hosts file of masters.

minhdanh on 24 Apr 2018

😕2 👍2

All 11 comments

You coud look at adding:

apiserver_custom_flags:
  - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname

woopstar on 20 Feb 2018

😕1

That would be a nice work around. But scale.yml should really populate the existing node's /etc/hosts file. Just to ensure that the cluster's nodes are consistent.

RaymondArias on 20 Feb 2018

👍1

minhdanh on 24 Apr 2018

😕2 👍2

How do you solve this issue? Restart kubelet and container api server for everyone api-servers?

It is very uncomfortable.

arslanbekov on 24 May 2018

We have to manually update /etc/hosts and restart api server to make up for scale.yml insufficiency. :(

vladimir-kozyrev on 24 May 2018

👍1

@Arslanbekov Unfortunately I think that is the only way to solve this. For multi master clusters it not a big deal because you restart one masters kubelet and api server and wait for both services to become ready before continuing on the other master. For one master setups I am not sure how that will work.

RaymondArias on 24 May 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 11 Apr 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 11 May 2019

We solved it by running the ansible again at the end with --tags="etchosts"

AvihayTsayeg on 27 May 2019

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 26 Jun 2019

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.