Kind: docker readiness timeout may be too low on overloaded machines

Created on 26 Mar 2019 · 18Comments · Source: kubernetes-sigs/kind

sometimes there has a panic, when using kind to create a HA cluster.

[zhang@localhost kind]$ kind create cluster  --config kind-config-ha.yaml           
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.13.4) 🖼
 ✗ Preparing nodes 📦📦📦📦📦📦
ERRO[17:36:55] timed out waiting for docker to be ready on node kind-control-plane                       
panic: send on closed channel

goroutine 11 [running]:
sigs.k8s.io/kind/pkg/cluster/internal/create.createNodeContainers.func1(0xc00008a300, 0xc000397300, 0x1d,
0xc0002c0000, 0xc000117bc0)
        /home/zhang/go/src/sigs.k8s.io/kind/pkg/cluster/internal/create/nodes.go:114 +0x9c             
created by sigs.k8s.io/kind/pkg/cluster/internal/create.createNodeContainers                             
        /home/zhang/go/src/sigs.k8s.io/kind/pkg/cluster/internal/create/nodes.go:105 +0x2e0

maybe the channel has been closed.

https://github.com/kubernetes-sigs/kind/blob/master/pkg/cluster/internal/create/nodes.go#L96-L135

/cc @BenTheElder

kinbug prioritimportant-soon

Source

tao12345666333

Most helpful comment

perhaps let's bump it to 60s for now, add a TODO, and come back to it then 🤔

BenTheElder on 28 Mar 2019

👍2

All 18 comments

@tao12345666333
can you please share your config file and the system specification?

/kind-bug
/priority important-soon

neolit123 on 26 Mar 2019

I hit the same issue several times, mainly when using a large number of nodes

ERRO[17:36:55] timed out waiting for docker to be ready on node kind-control-plane

The problem is that the current timeout is set to 30 seconds
https://github.com/kubernetes-sigs/kind/blob/f5fe35507a94031d8bf5221da61c179da98a32e0/pkg/cluster/internal/create/nodes.go#L161

Bumping the value solved the problem to me but I thought that was a problem of slowness in my local environment.

Should we bump the timeout value?

aojea on 26 Mar 2019

Should we bump the timeout value?

that might be it, it's also fairly slow on my machine as well.
let's wait on @BenTheElder to comment.

neolit123 on 26 Mar 2019

can you please share your config file and the system specification?

The config file:


kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
- role: control-plane
- role: control-plane
- role: control-plane
- role: worker
- role: worker

The system info:

[zhang@localhost kind]$ uname -a
Linux localhost 3.10.0-957.5.1.el7.x86_64 #1 SMP Fri Feb 1 14:54:57 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[zhang@localhost kind]$ cat /etc/redhat-release 
CentOS Linux release 7.5.1804 (Core)
[zhang@localhost kind]$ free -h
              total        used        free      shared  buff/cache   available
Mem:           7.5G        621M        1.9G        121M        5.0G        6.3G
Swap:          7.7G        153M        7.6G
[zhang@localhost kind]$ uptime 
 18:28:02 up 34 days,  4:18,  1 user,  load average: 0.00, 0.05, 0.15
[zhang@localhost kind]$ docker version
Client:
 Version:           18.09.2
 API version:       1.39
 Go version:        go1.10.6
 Git commit:        6247962
 Built:             Sun Feb 10 04:13:27 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Sun Feb 10 03:47:25 2019
  OS/Arch:          linux/amd64
  Experimental:     false

tao12345666333 on 26 Mar 2019

👍1

do you experience this with single node clusters? I think the issue is partly that 5 nodes should need something like 10GB ram so running that many nodes is swapping like crazy.