Kind: KinD fails to start on GKE with DinD

Created on 1 Jul 2019  路  6Comments  路  Source: kubernetes-sigs/kind

What happened:
I have two GKE nodes. I am running into two separate problems with them.

On node 1, KinD fails to start with Error: failed to create cluster: failed to apply overlay network: exit status 1. Full logs attached.
On node 2, the control plane seems to be in an unhealthy state:

NAMESPACE     NAME                                         READY   STATUS    RESTARTS   AGE
kube-system   etcd-test-control-plane                      1/1     Running   20         10m
kube-system   kube-apiserver-test-control-plane            1/1     Running   20         10m
kube-system   kube-controller-manager-test-control-plane   1/1     Running   20         10m
kube-system   kube-scheduler-test-control-plane            1/1     Running   20         10m

Note the 20 restarts in 10m, they also don't seem to go into crashloopbackoff, maybe intended though.

Logs show

E0701 14:41:23.412993       1 controller.go:148] Unable to remove old endpoints from kubernetes service: no master IPs were listed in storage, refusing to erase all endpoints for the kubernetes service
E0701 14:42:55.880979       1 autoregister_controller.go:193] v1alpha1.certmanager.k8s.io failed with : apiservices.apiregistration.k8s.io "v1alpha1.certmanager.k8s.io" already exists
E0701 14:42:55.881134       1 autoregister_controller.go:193] v1alpha3.networking.istio.io failed with : apiservices.apiregistration.k8s.io "v1alpha3.networking.istio.io" already exists
E0701 14:42:55.881221       1 autoregister_controller.go:193] v1alpha1.authentication.istio.io failed with : apiservices.apiregistration.k8s.io "v1alpha1.authentication.istio.io" already exists
E0701 14:42:55.881253       1 autoregister_controller.go:193] v1alpha2.config.istio.io failed with : apiservices.apiregistration.k8s.io "v1alpha2.config.istio.io" already exists

I consistently get the same errors on the same nodes. I only have 2 nodes in my cluster.

What you expected to happen:

KinD successfully creates a cluster.

How to reproduce it (as minimally and precisely as possible):

Not exactly sure. It seems related to something with the image we are using, prow, or GKE, or something. See http://prow.istio.io/?job=istio-kind-simpleTest-master for a host of failures, including the failed to apply overlay network one (https://k8s-gubernator.appspot.com/build/istio-prow/logs/istio-kind-simpleTest-master/358). Note that link is not from my runs, that is a separate cluster attempting to do the same thing I am doing (run tests on prow with KinD).

Anything else we need to know?:

Environment:

  • kind version: (use kind version): v0.3.0.
  • Kubernetes version: (use kubectl version): 1.15
  • Docker version: (use docker info): 18.06.1-ce
  • OS (e.g. from /etc/os-release): Ubuntu 16.04

It's running in image gcr.io/istio-testing/istio-builder:v20190628-31457b43 on a GKE cluster.

I still have the nodes up if you need anymore debugging info

  • node1.log shows failure to start
  • node2.log shows succesful start on node 2
    kube-node2.log shows some attempts at getting logs from the crashing api server on node 2. Not sure if there is a better way since the logs come from the api server which is crashing.
kinbug

All 6 comments

Got the kind logs on node 2
kind-logs.zip

I tried with a simpler docker image and kind 0.4.0 and still see the same issue. Dockerfile:

# For DinD
FROM docker:latest as docker

# For golang
FROM golang:1.12.5 as golang
ENV GO111MODULE=on
RUN go get -u sigs.k8s.io/[email protected]

FROM debian:9-slim


# Copy from prior stages
COPY --from=docker /usr/local/bin/docker /usr/local/bin/docker

COPY --from=golang /go/bin/kind /usr/local/bin/kind

# Set CI variable which can be checked by test scripts to verify
# if running in the continuous integration environment.
ENV CI prow

# Add entrypoint to start docker
ADD prow-runner.sh /usr/local/bin/entrypoint
RUN chmod +rx /usr/local/bin/entrypoint

RUN apt-get update && apt-get -qqy --no-install-recommends install \
    build-essential \
    ca-certificates \
    curl \
    git

# Add kubectl
RUN curl -Lo /tmp/kubectl https://storage.googleapis.com/kubernetes-release/release/v1.15.0/bin/linux/amd64/kubectl && chmod +x /tmp/kubectl && mv /tmp/kubectl /usr/local/bin/

ENTRYPOINT ["entrypoint"]

I think its related to https://github.com/kubernetes-sigs/kind/issues/303, trying out the steps there

Quick comment: in addition to 303 ensure docker storage is a volume (eg emptyDir), typically /var/lib/docker

Got it working with those, thanks, seems to be running smoothly now!

Big +1 on #303 though

I hear you on #303, just juggling priorities, glad you got it working! :sweat_smile:

will add a note about dind to 303 as well

Was this page helpful?
0 / 5 - 0 ratings