What happened:
I have two GKE nodes. I am running into two separate problems with them.
On node 1, KinD fails to start with Error: failed to create cluster: failed to apply overlay network: exit status 1. Full logs attached.
On node 2, the control plane seems to be in an unhealthy state:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-test-control-plane 1/1 Running 20 10m
kube-system kube-apiserver-test-control-plane 1/1 Running 20 10m
kube-system kube-controller-manager-test-control-plane 1/1 Running 20 10m
kube-system kube-scheduler-test-control-plane 1/1 Running 20 10m
Note the 20 restarts in 10m, they also don't seem to go into crashloopbackoff, maybe intended though.
Logs show
E0701 14:41:23.412993 1 controller.go:148] Unable to remove old endpoints from kubernetes service: no master IPs were listed in storage, refusing to erase all endpoints for the kubernetes service
E0701 14:42:55.880979 1 autoregister_controller.go:193] v1alpha1.certmanager.k8s.io failed with : apiservices.apiregistration.k8s.io "v1alpha1.certmanager.k8s.io" already exists
E0701 14:42:55.881134 1 autoregister_controller.go:193] v1alpha3.networking.istio.io failed with : apiservices.apiregistration.k8s.io "v1alpha3.networking.istio.io" already exists
E0701 14:42:55.881221 1 autoregister_controller.go:193] v1alpha1.authentication.istio.io failed with : apiservices.apiregistration.k8s.io "v1alpha1.authentication.istio.io" already exists
E0701 14:42:55.881253 1 autoregister_controller.go:193] v1alpha2.config.istio.io failed with : apiservices.apiregistration.k8s.io "v1alpha2.config.istio.io" already exists
I consistently get the same errors on the same nodes. I only have 2 nodes in my cluster.
What you expected to happen:
KinD successfully creates a cluster.
How to reproduce it (as minimally and precisely as possible):
Not exactly sure. It seems related to something with the image we are using, prow, or GKE, or something. See http://prow.istio.io/?job=istio-kind-simpleTest-master for a host of failures, including the failed to apply overlay network one (https://k8s-gubernator.appspot.com/build/istio-prow/logs/istio-kind-simpleTest-master/358). Note that link is not from my runs, that is a separate cluster attempting to do the same thing I am doing (run tests on prow with KinD).
Anything else we need to know?:
Environment:
kind version): v0.3.0. kubectl version): 1.15docker info): 18.06.1-ce/etc/os-release): Ubuntu 16.04It's running in image gcr.io/istio-testing/istio-builder:v20190628-31457b43 on a GKE cluster.
I still have the nodes up if you need anymore debugging info
Got the kind logs on node 2
kind-logs.zip
I tried with a simpler docker image and kind 0.4.0 and still see the same issue. Dockerfile:
# For DinD
FROM docker:latest as docker
# For golang
FROM golang:1.12.5 as golang
ENV GO111MODULE=on
RUN go get -u sigs.k8s.io/[email protected]
FROM debian:9-slim
# Copy from prior stages
COPY --from=docker /usr/local/bin/docker /usr/local/bin/docker
COPY --from=golang /go/bin/kind /usr/local/bin/kind
# Set CI variable which can be checked by test scripts to verify
# if running in the continuous integration environment.
ENV CI prow
# Add entrypoint to start docker
ADD prow-runner.sh /usr/local/bin/entrypoint
RUN chmod +rx /usr/local/bin/entrypoint
RUN apt-get update && apt-get -qqy --no-install-recommends install \
build-essential \
ca-certificates \
curl \
git
# Add kubectl
RUN curl -Lo /tmp/kubectl https://storage.googleapis.com/kubernetes-release/release/v1.15.0/bin/linux/amd64/kubectl && chmod +x /tmp/kubectl && mv /tmp/kubectl /usr/local/bin/
ENTRYPOINT ["entrypoint"]
I think its related to https://github.com/kubernetes-sigs/kind/issues/303, trying out the steps there
Quick comment: in addition to 303 ensure docker storage is a volume (eg emptyDir), typically /var/lib/docker
Got it working with those, thanks, seems to be running smoothly now!
Big +1 on #303 though
I hear you on #303, just juggling priorities, glad you got it working! :sweat_smile:
will add a note about dind to 303 as well