Kubespray: Problem with scale cluster

Created on 4 Sep 2018 · 11Comments · Source: kubernetes-sigs/kubespray

Hi!
I have cluster: one master and two slave:
[root@node1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 Ready master,node 7h v1.11.2
node2 Ready node 7h v1.11.2
node3 Ready node 7h v1.11.2

When I try to add two additional nodes: master and slave I see following error:
ansible-playbook-2.7 -i inventory/mycluster/hosts.ini scale.yml -b -v --limit node4,node5
................
................
TASK [etcd : include_tasks] ***********************************************
Tuesday 04 September 2018 22:49:08 +0300 (0:00:00.414) 0:00:21.395 *
included: /root/projects/kuberspray-new/roles/etcd/tasks/gen_certs_script.yml for node4, node5

TASK [etcd : Gen_certs | create etcd cert dir] *****************************************
Tuesday 04 September 2018 22:49:09 +0300 (0:00:00.288) 0:00:21.684
fatal: [node4]: FAILED! => {"changed": false, "gid": 0, "group": "root", "mode": "0755", "msg": "chown failed: failed to look up user kube", "owner": "root", "path": "/etc/ssl/etcd", "size": 4096, "state": "directory", "uid": 0}
fatal: [node5]: FAILED! => {"changed": false, "gid": 0, "group": "root", "mode": "0755", "msg": "chown failed: failed to look up user kube", "owner": "root", "path": "/etc/ssl/etcd", "size": 4096, "state": "directory", "uid": 0}

NO MORE HOSTS LEFT **************************************************
to retry, use: --limit @/root/projects/kuberspray-new/scale.retry

PLAY RECAP ******************************************************
node4 : ok=17 changed=3 unreachable=0 failed=1
node5 : ok=15 changed=3 unreachable=0 failed=1

Please helm me!

Source

Aleks3050

👍1

Most helpful comment

I was able to have a successful scale.yml run by modifying the etcd role to add the kube user. I can submit a PR if no one sees a problem with this solution.

kubespray/roles/etcd/meta/main.yml

---
dependencies:
  - role: adduser
    user: "{{ addusers.etcd }}"
    when: not (ansible_os_family in ['CoreOS', 'Container Linux by CoreOS'] or is_atomic)
  - role: adduser
    user: "{{ addusers.kube }}"
    when: not (ansible_os_family in ['CoreOS', 'Container Linux by CoreOS'] or is_atomic)

tjtelan on 16 Nov 2018

👍6

All 11 comments

When I tried:
ansible-playbook-2.7 -i inventory/mycluster/hosts.ini cluster.yml -b -v --limit node4,node5

I'l see error:
TASK [etcd : Configure | Ensure etcd-events is running] **************************************
Tuesday 04 September 2018 23:13:53 +0300 (0:00:00.744) 0:08:57.022

TASK [etcd : Configure | Check if etcd cluster is healthy] *************************************
Tuesday 04 September 2018 23:13:53 +0300 (0:00:00.186) 0:08:57.209
FAILED - RETRYING: Configure | Check if etcd cluster is healthy (4 retries left).
ok: [node4] => {"attempts": 1, "changed": false, "cmd": "/usr/local/bin/etcdctl --endpoints=https://159.69.156.5:2379,https://159.69.156.4:2379,https://159.69.8.218:2379,https://159.69.157.250:2379,https://159.69.146.137:2379 cluster-health | grep -q 'cluster is healthy'", "delta": "0:00:00.129497", "end": "2018-09-04 22:13:55.157243", "rc": 0, "start": "2018-09-04 22:13:55.027746", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
FAILED - RETRYING: Configure | Check if etcd cluster is healthy (3 retries left).
FAILED - RETRYING: Configure | Check if etcd cluster is healthy (2 retries left).
FAILED - RETRYING: Configure | Check if etcd cluster is healthy (1 retries left).
fatal: [node5]: FAILED! => {"attempts": 4, "changed": false, "cmd": "/usr/local/bin/etcdctl --endpoints=https://159.69.156.5:2379,https://159.69.156.4:2379,https://159.69.8.218:2379,https://159.69.157.250:2379,https://159.69.146.137:2379 cluster-health | grep -q 'cluster is healthy'", "delta": "0:00:00.025962", "end": "2018-09-04 22:14:28.112431", "msg": "non-zero return code", "rc": 1, "start": "2018-09-04 22:14:28.086469", "stderr": "Error: open /etc/ssl/etcd/ssl/admin-node5.pem: no such file or directory", "stderr_lines": ["Error: open /etc/ssl/etcd/ssl/admin-node5.pem: no such file or directory"], "stdout": "", "stdout_lines": []}

NO MORE HOSTS LEFT **************************************************
to retry, use: --limit @/root/projects/kuberspray-new/cluster.retry

PLAY RECAP ******************************************************
node4 : ok=248 changed=57 unreachable=0 failed=0
node5 : ok=240 changed=57 unreachable=0 failed=1

node5 - master with etcd.

Cert on node5 not exist:
ls /etc/ssl/etcd/ssl/admin-node5.pem
ls: cannot access /etc/ssl/etcd/ssl/admin-node5.pem: No such file or directory

Aleks3050 on 4 Sep 2018

Did you try scale.yml playbook? AFAIK, cluster.yml playbook should be used for bootstrapping only.

shuraa on 7 Sep 2018

I'm running into the same issue :)

pathcl on 18 Oct 2018

👍2

I am also running into the same issue using the scale.yml playbook.

When users are being created from the etcd role, the the etcd user gets created, but not the kube user. The nodes originally provisioned with the cluster.yml playbook all have the kube user, so the scale.yml playbook is skipping over this user.

OS: Ubuntu 16.04

tjtelan on 16 Nov 2018

👍2

I was able to have a successful scale.yml run by modifying the etcd role to add the kube user. I can submit a PR if no one sees a problem with this solution.

kubespray/roles/etcd/meta/main.yml

---
dependencies:
  - role: adduser
    user: "{{ addusers.etcd }}"
    when: not (ansible_os_family in ['CoreOS', 'Container Linux by CoreOS'] or is_atomic)
  - role: adduser
    user: "{{ addusers.kube }}"
    when: not (ansible_os_family in ['CoreOS', 'Container Linux by CoreOS'] or is_atomic)

tjtelan on 16 Nov 2018

👍6

I was able to scale the cluster just by rerunning cluster.yml and changing
the inventory

On Fri, Nov 16, 2018 at 22:18 T.J. Telan notifications@github.com wrote:

I was able to have a successful scale.yml run by modifying the etcd role
to add the kube user. I can submit a PR if no one sees a problem with
this solution.

kubespray/roles/etcd/meta/main.yml

dependencies:

role: adduser
user: "{{ addusers.etcd }}"
when: not (ansible_os_family in ['CoreOS', 'Container Linux by CoreOS'] or is_atomic)

role: adduser
user: "{{ addusers.kube }}"
when: not (ansible_os_family in ['CoreOS', 'Container Linux by CoreOS'] or is_atomic)

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-incubator/kubespray/issues/3240#issuecomment-439530783,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADmCQJnf8yZMRZ4P153NwsU4cXbmjmmJks5uvyutgaJpZM4WZqxw
.

pathcl on 17 Nov 2018

Same issue when using scale.yml
The user kube is not created in the new nodes added to inventory

alijahnas on 10 Dec 2018

I'm running into the same problem. Anything new on it? Any solution?

MatthiasLohr on 3 Mar 2019

Just ran into the same issue on release-2.9 branch. Isn't scale.yml meant to be used for exactly the case when a freshly installed machine should be provisioned as a new node added to an existing cluster?