Kubespray: "Copy kubectl binary to ansible host" sometimes fails with memory error due to known issue with 'fetch' module.

Created on 9 Aug 2019 · 8Comments · Source: kubernetes-sigs/kubespray

Environment:

Cloud provider or hardware configuration:

Master Nodes:
3 x VMware VMs (3.9 GB RAM)

Worker Nodes:
2 x NVIDIA DGX-1 servers (bare-metal) (503 GB RAM)
OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):

Master Nodes:
Ubuntu 18.04:

Linux 4.15.0-54-generic x86_64
NAME="Ubuntu"
VERSION="18.04.2 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.2 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

Worker Nodes:
DGX-OS, based on Ubuntu 18.04:

Linux 4.15.0-55-generic x86_64
NAME="Ubuntu"
VERSION="18.04.2 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.2 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

Version of Ansible (ansible --version):

ansible 2.7.11
config file = None
configured module search path = [u'/home/cpoc/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/lib/python2.7/dist-packages/ansible
executable location = /usr/local/bin/ansible
python version = 2.7.15+ (default, Nov 27 2018, 23:36:35) [GCC 7.3.0]

Kubespray version (commit) (git rev-parse --short HEAD):

7d8da83

Network plugin used:

calico

Copy of your inventory file:

Note: Kubespray invoked via NVIDIA DeepOps (https://github.com/NVIDIA/deepops)

[kube-master]
10.61.218.131
10.61.218.132
10.61.218.133

[etcd]
10.61.218.131
10.61.218.132
10.61.218.133

[kube-node]
10.61.218.152
10.61.218.154

[k8s-cluster:children]
kube-master
kube-node

Command used to invoke ansible:

Note: Kubespray invoked via NVIDIA DeepOps (https://github.com/NVIDIA/deepops)

ansible-playbook -l k8s-cluster playbooks/k8s-cluster.yml -K

Output of ansible run:

TASK [kubernetes/client : Copy kubectl binary to ansible host] ****************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: MemoryError
fatal: [10.61.218.131]: FAILED! => {"msg": "Unexpected failure during module execution.", "stdout": ""}

NO MORE HOSTS LEFT ********************************
to retry, use: --limit @/home/cpoc/test/deepops/playbooks/k8s-cluster.retry

PLAY RECAP **********************************
10.61.218.131 : ok=312 changed=13 unreachable=0 failed=1
10.61.218.132 : ok=282 changed=12 unreachable=0 failed=0
10.61.218.133 : ok=282 changed=12 unreachable=0 failed=0
10.61.218.152 : ok=238 changed=15 unreachable=0 failed=0
10.61.218.154 : ok=238 changed=15 unreachable=0 failed=0

Anything else do we need to know:

There is a known issue with the 'fetch' module that will sometimes lead to it failing with a memory error. See https://github.com/ansible/ansible/issues/11702. I encountered this issue with the "Copy kubectl binary to ansible host" task in kubespray/roles/kubernetes/client/tasks/main.yml, and it caused my entire deployment to error out (see "Output of ansible run" above).

I would like to suggest the following change to the "Copy kubectl binary to ansible host" task in kubespray/roles/kubernetes/client/tasks/main.yml as this resolved the issue for me:

- name: Copy kubectl binary to ansible host
# Replace fetch with synchronize due to memory error. Original fetch code is commented out.
#fetch:
# src: "{{ bin_dir }}/kubectl"
# dest: "{{ artifacts_dir }}/kubectl"
# flat: yes
# validate_checksum: no
synchronize:
src: "{{ bin_dir }}/kubectl"
dest: "{{ artifacts_dir }}/kubectl"
become: no
run_once: yes
when: kubectl_localhost|default(false)

I would be happy to submit a pull request for this if you would like me to.

kinbug

Source

mboglesby

Most helpful comment

I get this error if using the latest version of kubespray:

TASK [kubernetes/client : Copy kubectl binary to ansible host] ********************************
Wednesday 25 September 2019  15:48:07 +0200 (0:00:00.474)       0:24:45.188 *** 
fatal: [ops-k1m01.embl.de]: FAILED! => {"changed": false, "cmd": "/usr/bin/rsync --delay-updates -F --compress --archive --rsh=/usr/bin/ssh -S none -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null --out-format=<<CHANGED>>%i %n%L /usr/local/bin/kubectl 10.11.4.189:/root/kubespray/inventory/ops_k1/artifacts/kubectl", "msg": "Warning: Permanently added '10.11.4.189' (ECDSA) to the list of known hosts.\r\nrsync: change_dir#3 \"/root/kubespray/inventory/ops_k1/artifacts\" failed: No such file or directory (2)\nrsync error: errors selecting input/output files, dirs (code 3) at main.c(695) [Receiver=3.1.2]\n", "rc": 3}

It is trying to copy kubectl to the master node inside a kubespray directory (which doesn't exist)

titansmc on 25 Sep 2019

👍4

All 8 comments

/assign @mboglesby Thanks！

riverzhang on 10 Aug 2019

synchromize need ssh key authority, and broken run with --ask-pass option

may be you add more 1gb RAM to ansible host ?

LuckySB on 20 Aug 2019

As the pull request was reverted, can this be reopened? My VM with 4G of ram from https://github.com/NVIDIA/deepops/blob/master/docs/kubernetes-cluster.md fails on this.

cnf on 10 Sep 2019

As the pull request was reverted, can this be reopened? My VM with 4G of ram from https://github.com/NVIDIA/deepops/blob/master/docs/kubernetes-cluster.md fails on this.

Try to use patch from PR like this:

wget https://github.com/kubernetes-sigs/kubespray/commit/07ecef86e3f81e17221d89f8ea64ce54328ebfea.patch
patch kubespray/roles/kubernetes/client/tasks/main.yml 07ecef86e3f81e17221d89f8ea64ce54328ebfea.patch

Also try to add "mode: pull" in synchronize.

dalfos on 13 Sep 2019

I commented out the step, and copied manually to continue my install. But I can't believe I am the only person to run into this problem. Is this not considered to be an issue?

cnf on 13 Sep 2019

@cnf You are not. I hit it as well (also using DeepOps...hmm...)

ScottESanDiego on 14 Sep 2019

👀1

I get this error if using the latest version of kubespray:

TASK [kubernetes/client : Copy kubectl binary to ansible host] ********************************
Wednesday 25 September 2019  15:48:07 +0200 (0:00:00.474)       0:24:45.188 *** 
fatal: [ops-k1m01.embl.de]: FAILED! => {"changed": false, "cmd": "/usr/bin/rsync --delay-updates -F --compress --archive --rsh=/usr/bin/ssh -S none -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null --out-format=<<CHANGED>>%i %n%L /usr/local/bin/kubectl 10.11.4.189:/root/kubespray/inventory/ops_k1/artifacts/kubectl", "msg": "Warning: Permanently added '10.11.4.189' (ECDSA) to the list of known hosts.\r\nrsync: change_dir#3 \"/root/kubespray/inventory/ops_k1/artifacts\" failed: No such file or directory (2)\nrsync error: errors selecting input/output files, dirs (code 3) at main.c(695) [Receiver=3.1.2]\n", "rc": 3}

It is trying to copy kubectl to the master node inside a kubespray directory (which doesn't exist)

titansmc on 25 Sep 2019

👍4

@cnf I hit it as well.

@titansmc same error here as well.