Kind: Creating cluster with 20 worker nodes fails (due to file descriptor exhaustion?)

Created on 2 Oct 2019  Ā·  13Comments  Ā·  Source: kubernetes-sigs/kind

When trying to create a cluster with 20 worker nodes + 2 control plane nodes (config yml
), the creation fails.

The closest clue to the problem seems to be that the kube-proxy pods in the worker nodes enter a crash loop with the following error:

$ kubectl logs -n kube-system kube-proxy-j2lrt
F1002 09:37:45.214359       1 server.go:444] failed complete: too many open files

There are also many lines of this error in the kube-apiserver logs, but I have no clue what causes it:

$ kubectl logs -n kube-system kube-apiserver-test20-control-plane2
E1002 09:34:55.069811       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"context canceled"}
...

Problem is, I can't figure out what file limit it might be hitting. Here's what I've done/checked so far:

  • Increased nofile ulimit on all accounts on the host (including root) from 1024 to 50000. This made no difference, suggesting the default limits set in /etc/security/limits.conf are not what's being used.
  • Checked overall limits on the host. cat /proc/sys/fs/file-nr reports 164536 fds used out of 1622007 limit
  • Opened a shell on one of the failing nodes and checked ulimits on individual processes (cat /proc/{pid}/limits). I can't catch the failing kube-proxy process, but every other process in the container (kubelet, init, etc.) reports an open file limit of 1000000 or more.

The maximum number of nodes I can seem to start a cluster with is 12 (2 ctl + 10 workers). Anything above that starts running into the above issues. (At 14 nodes kind create cluster actually reports success, but the kube-proxy pods are already crashing with the above error.)

Environment:

  • kind version: 0.5.1
  • Kubernetes version: 1.16.0 (kubectl), 0.15.3 (kind cluster)
  • Docker version: 19.03.2
  • OS: Ubuntu 18.04.3 LTS
kinbug kinsupport

All 13 comments

this might (?) be inotify limits instead, which tend to be far to low to run dozens of kubelets etc.

20 nodes is a lot more than we typically see people start to try šŸ˜…

It was inotify limits indeed! I already had fs.inotify.max_user_watches=524288 on the host, but it turns out that is not enough, it's also necessary to increase fs.inotify.max_user_instances.

With the following settings the 20 worker cluster starts up fine, if slowly :grin: (I/O is the killer, would it be possible somehow to share images between nodes instead of copying them N times?)

fs.inotify.max_user_watches = 524288
fs.inotify.max_user_instances = 512

It would probably be worth mentioning this in the known issues page.

do you mind sending a PR with that @paol ?
:)

/help wanted

With the following settings the 20 worker cluster starts up fine, if slowly 😁 (I/O is the killer, would it be possible somehow to share images between nodes instead of copying them N times?)

I'm not sure what you mean, we aren't copying images?

If it's not the docker images that are being copied, then something is! Just starting the above mentioned 22 node cluster consumes over 20G of disk, before even loading my own images. The used space grows linearly with the number of nodes.

Just ran a little test, it's definitely the images: doing kind load docker-image consumes disk space exactly proportional to the number of nodes.

With the image I tested in fact it was about 2 x image_size x nr_of_nodes (image_size as reported by docker image inspect).

Just ran a little test, it's definitely the images: doing kind load docker-image consumes disk space exactly proportional to the number of nodes.

This is expected, you are copying the images inside each node ... you can pass a list if you just want to keep the image only in certain nodes

With the image I tested in fact it was about 2 x image_size x nr_of_nodes (image_size as reported by docker image inspect).

can you do a docker ps -s and sudo du -sh /var/lib/docker/containers/*

Just ran a little test, it's definitely the images: doing kind load docker-image consumes disk space exactly proportional to the number of nodes.

ah, you didn't mention loading images before :-)

@BenTheElder But it also applies to the image used to run the node itself, right? I assume that's the reason that just starting the cluster causes so much disk I/O and space usage.

@aojea Yeah, that's what I assumed was happening, hence my comment that it would be nice if images could be shared.

Here's you go

$ docker ps -s
CONTAINER ID        IMAGE                          COMMAND                  CREATED             STATUS              PORTS                                  NAMES                           SIZE                                      
a09474f2844f        kindest/node:v1.15.3           "/usr/local/bin/entr…"   13 minutes ago      Up 12 minutes                                              test12-worker9                  59.7kB (virtual 1.45GB)                   
982744949fa5        kindest/node:v1.15.3           "/usr/local/bin/entr…"   13 minutes ago      Up 12 minutes                                              test12-worker5                  59.7kB (virtual 1.45GB)                   
1913b6043250        kindest/node:v1.15.3           "/usr/local/bin/entr…"   13 minutes ago      Up 12 minutes                                              test12-worker3                  59.7kB (virtual 1.45GB)                   
67a7a3a931f8        kindest/node:v1.15.3           "/usr/local/bin/entr…"   13 minutes ago      Up 12 minutes                                              test12-worker11                 59.7kB (virtual 1.45GB)                   
fd45a4625911        kindest/node:v1.15.3           "/usr/local/bin/entr…"   13 minutes ago      Up 12 minutes                                              test12-worker2                  59.7kB (virtual 1.45GB)                   
050bd9b9ab0d        kindest/node:v1.15.3           "/usr/local/bin/entr…"   13 minutes ago      Up 12 minutes                                              test12-worker7                  59.7kB (virtual 1.45GB)                   
65e17b75b74b        kindest/node:v1.15.3           "/usr/local/bin/entr…"   13 minutes ago      Up 12 minutes                                              test12-worker8                  59.7kB (virtual 1.45GB)                   
2ed5cfcab857        kindest/node:v1.15.3           "/usr/local/bin/entr…"   13 minutes ago      Up 12 minutes                                              test12-worker12                 59.7kB (virtual 1.45GB)                   
e2ee7153df45        kindest/haproxy:2.0.0-alpine   "/docker-entrypoint.…"   13 minutes ago      Up 13 minutes       34255/tcp, 127.0.0.1:34255->6443/tcp   test12-external-load-balancer   551B (virtual 22.3MB)                     
ec38dc4fcbc3        kindest/node:v1.15.3           "/usr/local/bin/entr…"   13 minutes ago      Up 12 minutes                                              test12-worker10                 59.7kB (virtual 1.45GB)                   
b2f7af75c6cf        kindest/node:v1.15.3           "/usr/local/bin/entr…"   13 minutes ago      Up 12 minutes                                              test12-worker6                  59.7kB (virtual 1.45GB)                   
bbcb31e39244        kindest/node:v1.15.3           "/usr/local/bin/entr…"   13 minutes ago      Up 12 minutes                                              test12-worker                   59.7kB (virtual 1.45GB)                   
efb4876196e5        kindest/node:v1.15.3           "/usr/local/bin/entr…"   13 minutes ago      Up 12 minutes       33289/tcp, 127.0.0.1:33289->6443/tcp   test12-control-plane2           114kB (virtual 1.45GB)                    
b0649db2cc54        kindest/node:v1.15.3           "/usr/local/bin/entr…"   13 minutes ago      Up 12 minutes                                              test12-worker4                  59.7kB (virtual 1.45GB)                   
8c986c0e8e76        kindest/node:v1.15.3           "/usr/local/bin/entr…"   13 minutes ago      Up 12 minutes       41061/tcp, 127.0.0.1:41061->6443/tcp   test12-control-plane            3.5MB (virtual 1.45GB)                    
0a537827fc10        portainer/portainer            "/portainer -H unix:…"   5 weeks ago         Up 30 hours         0.0.0.0:9000->9000/tcp                 work_portainer_1                0B (virtual 77.7MB)                       
# du -sh /var/lib/docker/containers/*
48K     /var/lib/docker/containers/050bd9b9ab0d2403baddcb51dc28b0d065d9b4c7719f05387c964f2e9c8ce96b
48K     /var/lib/docker/containers/0a537827fc10c6be58a1d909d34df5a91113dfc5fa421eb5a0a3086be33bddf4
40K     /var/lib/docker/containers/17c7a1e849fcd6bf7d5db2f131a09a0f8789a62199413d3bbed9b362c95c94bf
48K     /var/lib/docker/containers/1913b6043250cb776a766db225440b4899a33880cdf3d0e1736b6d0ddb362261
200K    /var/lib/docker/containers/1e217af41543afdbc2b658dc4994cc501387a92c51b23ae30160058de94f1e68
48K     /var/lib/docker/containers/2ed5cfcab8576380f30468793b985d07d1466ebd55d77b44b096f2adb4a350fd
84K     /var/lib/docker/containers/417ef9a86ae04857106772f7ac6c0592a6c22221772e2e4344e2fb70749818e6
48K     /var/lib/docker/containers/65e17b75b74bc62eff46e7431c440ba89a5e1f9ccdfdbc33d493f65f4d063912
48K     /var/lib/docker/containers/67a7a3a931f81dfcb5dd09b8b9ef43fdbbbee254b41946005792d805c6bacea6
44K     /var/lib/docker/containers/7b71a6a99932c34d95046906021ccfe65937874de0ceecebfb26a76595ab1a59
44K     /var/lib/docker/containers/805117b5ad85a1a6ac8067d73de265d461b5ff899d78ecff63d0f763a7ac6a08
48K     /var/lib/docker/containers/8c986c0e8e769b02068fc6b647d31221ddc247e94afccb7a146242adc461bbf0
48K     /var/lib/docker/containers/982744949fa5cb7dfc8339d81a64f9575644477305bd468bfec985a2346d479f
48K     /var/lib/docker/containers/a09474f2844f40da3567d26698026838eec8d33f16d1771c6bb4bf44ca55a02c
1.1M    /var/lib/docker/containers/aecf46890a90b477abfbc45c6989d9c7dc6b86f406b06a429a6b4370cebea270
48K     /var/lib/docker/containers/b0649db2cc546a61e89b22092ed04b69b1230e7702cfa8606107956586b3e86b
48K     /var/lib/docker/containers/b2f7af75c6cf5edb209fb618aaea7c70c4d339520eef4b3657b2f93a31122eeb
48K     /var/lib/docker/containers/bbcb31e39244e9ad0221264df7efd0d50ce121ef5185569796eb35f9a7d3c05a
40K     /var/lib/docker/containers/c933db48518f4d31129a56ff64c0d719cd8bd59ac71ad48b62651ec7e0199434
44K     /var/lib/docker/containers/e2ee7153df45a33abec663efcfa902c32a299c105290730a050ecff64346d315
224K    /var/lib/docker/containers/e42b1e1afd64886eb9adcd68ce41b3cfbdc8370424807c0d7112416d88a89ddd
48K     /var/lib/docker/containers/e5e2dd564ffe7764cea7df571a3f36fa921b14c24a4fa8674aa1ccab4096333e
48K     /var/lib/docker/containers/ec38dc4fcbc307bfd03cfa8650a53e5af71301ce31a8cae2500a0fb6c1a34ba2
48K     /var/lib/docker/containers/efb4876196e58953cd81ef3c9518068062a2aab80ebc17c60803c74ce23e853b
208K    /var/lib/docker/containers/f374dea6d418e2565702ba581c50b3366f551995decdfdc621dcf673354c75b9
48K     /var/lib/docker/containers/fd45a4625911aa3f6ed33879a2c7e7ffe43f4bf70e876440570e933d275ae500

@BenTheElder But it also applies to the image used to run the node itself, right? I assume that's the reason that just starting the cluster causes so much disk I/O and space usage.

Sort of. We preload them to the extent that we can, so it just needs to unpack them which is less I/O than a full load.

Images cannot be shared on disk currently, but a local registry is probably the best and reasonably viable option for the future.

We're working on things that will make that route easier.

I think this can be closed, thanks for the help guys.

Was this page helpful?
0 / 5 - 0 ratings