Kubespray: "Copy kubectl binary to ansible host" sometimes fails with memory error due to known issue with 'fetch' module.

Created on 9 Aug 2019  ·  8Comments  ·  Source: kubernetes-sigs/kubespray


 
Environment:

  • Cloud provider or hardware configuration:
     
    Master Nodes:
    3 x VMware VMs (3.9 GB RAM)
     
    Worker Nodes:
    2 x NVIDIA DGX-1 servers (bare-metal) (503 GB RAM)
     
  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
     
    Master Nodes:
    Ubuntu 18.04:

Linux 4.15.0-54-generic x86_64
NAME="Ubuntu"
VERSION="18.04.2 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.2 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
 
Worker Nodes:
DGX-OS, based on Ubuntu 18.04:

Linux 4.15.0-55-generic x86_64
NAME="Ubuntu"
VERSION="18.04.2 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.2 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
 

  • Version of Ansible (ansible --version):
     
    ansible 2.7.11
      config file = None
      configured module search path = [u'/home/cpoc/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
      ansible python module location = /usr/local/lib/python2.7/dist-packages/ansible
      executable location = /usr/local/bin/ansible
      python version = 2.7.15+ (default, Nov 27 2018, 23:36:35) [GCC 7.3.0]
     
    Kubespray version (commit) (git rev-parse --short HEAD):
     
    7d8da83
     
    Network plugin used:
     
    calico
     
    Copy of your inventory file:
     
    Note: Kubespray invoked via NVIDIA DeepOps (https://github.com/NVIDIA/deepops)

[kube-master]
10.61.218.131
10.61.218.132
10.61.218.133
 
[etcd]
10.61.218.131
10.61.218.132
10.61.218.133
 
[kube-node]
10.61.218.152
10.61.218.154
 
[k8s-cluster:children]
kube-master
kube-node
 
Command used to invoke ansible:
 
Note: Kubespray invoked via NVIDIA DeepOps (https://github.com/NVIDIA/deepops)
 
ansible-playbook -l k8s-cluster playbooks/k8s-cluster.yml -K
 
Output of ansible run:

TASK [kubernetes/client : Copy kubectl binary to ansible host] ****************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: MemoryError
fatal: [10.61.218.131]: FAILED! => {"msg": "Unexpected failure during module execution.", "stdout": ""}

NO MORE HOSTS LEFT ********************************
to retry, use: --limit @/home/cpoc/test/deepops/playbooks/k8s-cluster.retry

PLAY RECAP **********************************
10.61.218.131 : ok=312 changed=13 unreachable=0 failed=1
10.61.218.132 : ok=282 changed=12 unreachable=0 failed=0
10.61.218.133 : ok=282 changed=12 unreachable=0 failed=0
10.61.218.152 : ok=238 changed=15 unreachable=0 failed=0
10.61.218.154 : ok=238 changed=15 unreachable=0 failed=0
 
Anything else do we need to know:

 
There is a known issue with the 'fetch' module that will sometimes lead to it failing with a memory error. See https://github.com/ansible/ansible/issues/11702. I encountered this issue with the "Copy kubectl binary to ansible host" task in kubespray/roles/kubernetes/client/tasks/main.yml, and it caused my entire deployment to error out (see "Output of ansible run" above).
 
I would like to suggest the following change to the "Copy kubectl binary to ansible host" task in kubespray/roles/kubernetes/client/tasks/main.yml as this resolved the issue for me:

- name: Copy kubectl binary to ansible host
  # Replace fetch with synchronize due to memory error. Original fetch code is commented out.
  #fetch:
  #  src: "{{ bin_dir }}/kubectl"
  #  dest: "{{ artifacts_dir }}/kubectl"
  #  flat: yes
  #  validate_checksum: no
  synchronize:
     src: "{{ bin_dir }}/kubectl"
     dest: "{{ artifacts_dir }}/kubectl"
  become: no
  run_once: yes
  when: kubectl_localhost|default(false)

I would be happy to submit a pull request for this if you would like me to.

kinbug

Most helpful comment

I get this error if using the latest version of kubespray:

TASK [kubernetes/client : Copy kubectl binary to ansible host] ********************************
Wednesday 25 September 2019  15:48:07 +0200 (0:00:00.474)       0:24:45.188 *** 
fatal: [ops-k1m01.embl.de]: FAILED! => {"changed": false, "cmd": "/usr/bin/rsync --delay-updates -F --compress --archive --rsh=/usr/bin/ssh -S none -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null --out-format=<<CHANGED>>%i %n%L /usr/local/bin/kubectl 10.11.4.189:/root/kubespray/inventory/ops_k1/artifacts/kubectl", "msg": "Warning: Permanently added '10.11.4.189' (ECDSA) to the list of known hosts.\r\nrsync: change_dir#3 \"/root/kubespray/inventory/ops_k1/artifacts\" failed: No such file or directory (2)\nrsync error: errors selecting input/output files, dirs (code 3) at main.c(695) [Receiver=3.1.2]\n", "rc": 3}

It is trying to copy kubectl to the master node inside a kubespray directory (which doesn't exist)

All 8 comments

/assign @mboglesby Thanks!

synchromize need ssh key authority, and broken run with --ask-pass option

may be you add more 1gb RAM to ansible host ?

As the pull request was reverted, can this be reopened? My VM with 4G of ram from https://github.com/NVIDIA/deepops/blob/master/docs/kubernetes-cluster.md fails on this.

As the pull request was reverted, can this be reopened? My VM with 4G of ram from https://github.com/NVIDIA/deepops/blob/master/docs/kubernetes-cluster.md fails on this.

Try to use patch from PR like this:

wget https://github.com/kubernetes-sigs/kubespray/commit/07ecef86e3f81e17221d89f8ea64ce54328ebfea.patch
patch kubespray/roles/kubernetes/client/tasks/main.yml 07ecef86e3f81e17221d89f8ea64ce54328ebfea.patch

Also try to add "mode: pull" in synchronize.

I commented out the step, and copied manually to continue my install. But I can't believe I am the only person to run into this problem. Is this not considered to be an issue?

@cnf You are not. I hit it as well (also using DeepOps...hmm...)

I get this error if using the latest version of kubespray:

TASK [kubernetes/client : Copy kubectl binary to ansible host] ********************************
Wednesday 25 September 2019  15:48:07 +0200 (0:00:00.474)       0:24:45.188 *** 
fatal: [ops-k1m01.embl.de]: FAILED! => {"changed": false, "cmd": "/usr/bin/rsync --delay-updates -F --compress --archive --rsh=/usr/bin/ssh -S none -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null --out-format=<<CHANGED>>%i %n%L /usr/local/bin/kubectl 10.11.4.189:/root/kubespray/inventory/ops_k1/artifacts/kubectl", "msg": "Warning: Permanently added '10.11.4.189' (ECDSA) to the list of known hosts.\r\nrsync: change_dir#3 \"/root/kubespray/inventory/ops_k1/artifacts\" failed: No such file or directory (2)\nrsync error: errors selecting input/output files, dirs (code 3) at main.c(695) [Receiver=3.1.2]\n", "rc": 3}

It is trying to copy kubectl to the master node inside a kubespray directory (which doesn't exist)

@cnf I hit it as well.

@titansmc same error here as well.

Was this page helpful?
0 / 5 - 0 ratings