What happened:
Use kind in a gitlab CI runner to create a k8s cluster.
KInd fails to start either control-plane or worker and ends up with:
docker run error: exit status 125 in each case (see below).
What you expected to happen:
kind shall be able to create and start a cluster in the context of the gitlab job.
How to reproduce it (as minimally and precisely as possible):
Try to do:
kind create cluster --config=kind-config.yaml --loglevel debug --name YourCluster
Currently we get:
Kind cluster name is kind-tests_with_k8s_in_CI-23900eb40c5416e47c0a9c0e3850451d52a7f43c
time="06:12:55" level=debug msg="Running: /usr/bin/docker [docker ps -q -a --no-trunc --filter label=io.k8s.sigs.kind.cluster --format {{.Names}}\\t{{.Label \"io.k8s.sigs.kind.cluster\"}}]"
Creating cluster "kind-tests_with_k8s_in_CI-23900eb40c5416e47c0a9c0e3850451d52a7f43c" ...
• Ensuring node image (kindest/node:v1.14.2) 🖼 ...
time="06:12:55" level=debug msg="Running: /usr/bin/docker [docker inspect --type=image kindest/node:v1.14.2]"
time="06:12:55" level=info msg="Image: kindest/node:v1.14.2 present locally"
✓ Ensuring node image (kindest/node:v1.14.2) 🖼
• Preparing nodes 📦📦 ...
time="06:12:55" level=debug msg="Running: /usr/bin/docker [docker info --format '{{json .SecurityOptions}}']"
time="06:12:55" level=debug msg="Running: /usr/bin/docker [docker info --format '{{json .SecurityOptions}}']"
time="06:12:56" level=debug msg="Running: /usr/bin/docker [docker run -d -t --privileged --security-opt seccomp=unconfined --tmpfs /tmp --tmpfs /run -v /lib/modules:/lib/modules:ro --hostname kind-tests_with_k8s_in_CI-23900eb40c5416e47c0a9c0e3850451d52a7f43c-worker --name kind-tests_with_k8s_in_CI-23900eb40c5416e47c0a9c0e3850451d52a7f43c-worker --label io.k8s.sigs.kind.cluster=kind-tests_with_k8s_in_CI-23900eb40c5416e47c0a9c0e3850451d52a7f43c --label io.k8s.sigs.kind.role=worker kindest/node:v1.14.2@sha256:33539d830a6cf20e3e0a75d0c46a4e94730d78c7375435e6b49833d81448c319]"
time="06:12:56" level=debug msg="Running: /usr/bin/docker [docker run -d -t --privileged --security-opt seccomp=unconfined --tmpfs /tmp --tmpfs /run -v /lib/modules:/lib/modules:ro --hostname kind-tests_with_k8s_in_CI-23900eb40c5416e47c0a9c0e3850451d52a7f43c-control-plane --name kind-tests_with_k8s_in_CI-23900eb40c5416e47c0a9c0e3850451d52a7f43c-control-plane --label io.k8s.sigs.kind.cluster=kind-tests_with_k8s_in_CI-23900eb40c5416e47c0a9c0e3850451d52a7f43c --label io.k8s.sigs.kind.role=control-plane --expose 46775 -p 127.0.0.1:46775:6443 kindest/node:v1.14.2@sha256:33539d830a6cf20e3e0a75d0c46a4e94730d78c7375435e6b49833d81448c319]"
time="06:12:59" level=error msg=751f9f17bf6cb230fc9f3d3cff9b1e4f8db7beb0ab1d9d424869dfbf14447d05
time="06:12:59" level=error msg="docker: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:359: container init caused \\\"invalid argument\\\"\"."
✗ Preparing nodes 📦📦
time="06:12:59" level=error msg="docker run error: exit status 125"
time="06:12:59" level=debug msg="Running: /usr/bin/docker [docker ps -q -a --no-trunc --filter label=io.k8s.sigs.kind.cluster --format {{.Names}}\\t{{.Label \"io.k8s.sigs.kind.cluster\"}} --filter label=io.k8s.sigs.kind.cluster=kind-tests_with_k8s_in_CI-23900eb40c5416e47c0a9c0e3850451d52a7f43c]"
time="06:12:59" level=debug msg="Running: /usr/bin/docker [docker rm -f -v kind-tests_with_k8s_in_CI-23900eb40c5416e47c0a9c0e3850451d52a7f43c-control-plane kind-tests_with_k8s_in_CI-23900eb40c5416e47c0a9c0e3850451d52a7f43c-worker]"
time="06:12:59" level=error msg=e552c91dbea1bb3772ede0fc7920630af05d422b1c3e026085173de1e78c163c
time="06:12:59" level=error msg="docker: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:359: container init caused \\\"invalid argument\\\"\"."
Error: failed to create cluster: docker run error: exit status 125
Anything else we need to know?:
The very same command works outside gitlab CI.
Is there any requirement on the docker and/or gitlab-runner version or gitlab configuration needed to work with kind in gitlab CI?
The content of our kind-config.yaml file is:
# this config file contains all config fields with comments
kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
# patch the generated kubeadm config with some extra settings
kubeadmConfigPatches:
- |
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
metadata:
name: config
networking:
serviceSubnet: 192.0.0.0/16
# 1 control plane node and 3 workers
nodes:
# the control plane node config
- role: control-plane
# one worker
- role: worker
Environment:
This user reported that he was able to get kind working with Gitlab, https://github.com/kubernetes-sigs/kind/issues/52#issuecomment-426550163, please check if that works for you and give feedback
Sure I'll check that and come back here. Thank you for the reference.
I'm pretty sure the cluster name is far too long.
Can confirm: kind-tests_with_k8s_in_CI-23900eb40c5416e47c0a9c0e3850451d52a7f43c is the problem, that name is too long to set the hostname.
The nodes will be named roughly ${CLUSTER_NAME}-${NODE_ROLE}${NODE_ROLE_COUNT}. Cluster names will need to be shorter or we'll have to not let the hostname match the container name.
Realistically long container names also do not work out well, there are unspecified limits in docker around the container name length due to being used in HTTP calls in various places. We should probably warn on the cluster name length.
$ kind create cluster --name=kind-tests_with_k8s_in_CI-23900eb40c5416e47c0a9c0e3850451d52a7f43c
Creating cluster "kind-tests_with_k8s_in_CI-23900eb40c5416e47c0a9c0e3850451d52a7f43c" ...
✓ Ensuring node image (kindest/node:v1.14.2) 🖼
ERRO[13:04:40] 5e5df4522accf90cae87c1daa7bef78e692a09dfb9025cb157b382fa74ed2d26
ERRO[13:04:40] docker: Error response from daemon: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:424: container init caused \"sethostname: invalid argument\"": unknown.
✗ Preparing nodes 📦
ERRO[13:04:40] docker run error: exit status 125
Error: failed to create cluster: docker run error: exit status 125
after https://github.com/kubernetes-sigs/kind/pull/624 we'll be warning on names that are too long
I can confirm that the too long name was the reason of my issue.
As you guessed I was inadvertently lying when I said that it was working outside CI, the culprit was our name mangling scheme for the cluster in CI. We wanted to have unique cluster name in CI but now that I think about it I'm not even sure it's necessary since we run kind in a specific runner so I am not sure there can be a name clash with other CI jobs running kind?
Anyway thank you very much for the fast feeback and analysis and for #624 as well.
I may have another issue but I consider this one close and come back to you with another ticket is the next issue is confirmed.
Thanks again.
Most helpful comment
I can confirm that the too long name was the reason of my issue.
As you guessed I was inadvertently lying when I said that it was working outside CI, the culprit was our name mangling scheme for the cluster in CI. We wanted to have unique cluster name in CI but now that I think about it I'm not even sure it's necessary since we run kind in a specific runner so I am not sure there can be a name clash with other CI jobs running kind?
Anyway thank you very much for the fast feeback and analysis and for #624 as well.
I may have another issue but I consider this one close and come back to you with another ticket is the next issue is confirmed.
Thanks again.