Environment:
printf "$(uname -srm)\n$(cat /etc/os-release)\n"):Linux 4.15.0-54-generic x86_64
NAME="Ubuntu"
VERSION="18.04.2 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.2 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
Worker Nodes:
DGX-OS, based on Ubuntu 18.04:
Linux 4.15.0-55-generic x86_64
NAME="Ubuntu"
VERSION="18.04.2 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.2 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
ansible --version):git rev-parse --short HEAD):[kube-master]
10.61.218.131
10.61.218.132
10.61.218.133
[etcd]
10.61.218.131
10.61.218.132
10.61.218.133
[kube-node]
10.61.218.152
10.61.218.154
[k8s-cluster:children]
kube-master
kube-node
Command used to invoke ansible:
Note: Kubespray invoked via NVIDIA DeepOps (https://github.com/NVIDIA/deepops)
ansible-playbook -l k8s-cluster playbooks/k8s-cluster.yml -K
Output of ansible run:
TASK [kubernetes/client : Copy kubectl binary to ansible host] ****************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: MemoryError
fatal: [10.61.218.131]: FAILED! => {"msg": "Unexpected failure during module execution.", "stdout": ""}
NO MORE HOSTS LEFT ********************************
to retry, use: --limit @/home/cpoc/test/deepops/playbooks/k8s-cluster.retry
PLAY RECAP **********************************
10.61.218.131 : ok=312 changed=13 unreachable=0 failed=1
10.61.218.132 : ok=282 changed=12 unreachable=0 failed=0
10.61.218.133 : ok=282 changed=12 unreachable=0 failed=0
10.61.218.152 : ok=238 changed=15 unreachable=0 failed=0
10.61.218.154 : ok=238 changed=15 unreachable=0 failed=0
Anything else do we need to know:
There is a known issue with the 'fetch' module that will sometimes lead to it failing with a memory error. See https://github.com/ansible/ansible/issues/11702. I encountered this issue with the "Copy kubectl binary to ansible host" task in kubespray/roles/kubernetes/client/tasks/main.yml, and it caused my entire deployment to error out (see "Output of ansible run" above).
I would like to suggest the following change to the "Copy kubectl binary to ansible host" task in kubespray/roles/kubernetes/client/tasks/main.yml as this resolved the issue for me:
- name: Copy kubectl binary to ansible host
# Replace fetch with synchronize due to memory error. Original fetch code is commented out.
#fetch:
# src: "{{ bin_dir }}/kubectl"
# dest: "{{ artifacts_dir }}/kubectl"
# flat: yes
# validate_checksum: no
synchronize:
src: "{{ bin_dir }}/kubectl"
dest: "{{ artifacts_dir }}/kubectl"
become: no
run_once: yes
when: kubectl_localhost|default(false)
I would be happy to submit a pull request for this if you would like me to.
/assign @mboglesby Thanks!
synchromize need ssh key authority, and broken run with --ask-pass option
may be you add more 1gb RAM to ansible host ?
As the pull request was reverted, can this be reopened? My VM with 4G of ram from https://github.com/NVIDIA/deepops/blob/master/docs/kubernetes-cluster.md fails on this.
As the pull request was reverted, can this be reopened? My VM with 4G of ram from https://github.com/NVIDIA/deepops/blob/master/docs/kubernetes-cluster.md fails on this.
Try to use patch from PR like this:
wget https://github.com/kubernetes-sigs/kubespray/commit/07ecef86e3f81e17221d89f8ea64ce54328ebfea.patch
patch kubespray/roles/kubernetes/client/tasks/main.yml 07ecef86e3f81e17221d89f8ea64ce54328ebfea.patch
Also try to add "mode: pull" in synchronize.
I commented out the step, and copied manually to continue my install. But I can't believe I am the only person to run into this problem. Is this not considered to be an issue?
@cnf You are not. I hit it as well (also using DeepOps...hmm...)
I get this error if using the latest version of kubespray:
TASK [kubernetes/client : Copy kubectl binary to ansible host] ********************************
Wednesday 25 September 2019 15:48:07 +0200 (0:00:00.474) 0:24:45.188 ***
fatal: [ops-k1m01.embl.de]: FAILED! => {"changed": false, "cmd": "/usr/bin/rsync --delay-updates -F --compress --archive --rsh=/usr/bin/ssh -S none -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null --out-format=<<CHANGED>>%i %n%L /usr/local/bin/kubectl 10.11.4.189:/root/kubespray/inventory/ops_k1/artifacts/kubectl", "msg": "Warning: Permanently added '10.11.4.189' (ECDSA) to the list of known hosts.\r\nrsync: change_dir#3 \"/root/kubespray/inventory/ops_k1/artifacts\" failed: No such file or directory (2)\nrsync error: errors selecting input/output files, dirs (code 3) at main.c(695) [Receiver=3.1.2]\n", "rc": 3}
It is trying to copy kubectl to the master node inside a kubespray directory (which doesn't exist)
@cnf I hit it as well.
@titansmc same error here as well.
Most helpful comment
I get this error if using the latest version of kubespray:
It is trying to copy kubectl to the master node inside a kubespray directory (which doesn't exist)