Host OS: RHEL 7.4
Host Docker version: 18.09.0
Host go version: go1.11.2
Node Image: kindest/node:v1.12.2
[root@localhost bin]# kind create cluster
Creating cluster 'kind-1' ...
โ Ensuring node image (kindest/node:v1.12.2) ๐ผ
โ [kind-1-control-plane] Creating node container ๐ฆ
โ [kind-1-control-plane] Fixing mounts ๐ป
โ [kind-1-control-plane] Starting systemd ๐ฅ
โ [kind-1-control-plane] Waiting for docker to be ready ๐
โ [kind-1-control-plane] Starting Kubernetes (this may take a minute) โธ
FATA[07:20:43] Failed to create cluster: failed to apply overlay network: exit status 1
Code below in pkg/cluster/context.go is trying to extract k8s version using kubectl version command in order to download the version-specific weave net.yaml. The code is not ok:-
// TODO(bentheelder): support other overlay networks
if err = node.Command(
"/bin/sh", "-c",
`kubectl apply --kubeconfig=/etc/kubernetes/admin.conf -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version --kubeconfig=/etc/kubernetes/admin.conf | base64 | tr -d '\n')"`,
).Run(); err != nil {
return kubeadmConfig, errors.Wrap(err, "failed to apply overlay network")
}
Why is the output of kubectl version command, base64 encoded?
Yep, as @alejandrox1 noted, the base64 encoding is from their guide. The reason for this is to pass it as an HTTP query parameter to weave so that their site can serve the appropriate weave version based on your Kubernetes version.
In the future we might use fixed weave versions, but this is the correct and normal way to install it per their upstream documentation.
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
It would be this verbatim, but we need to specify the admin kubeconfig location.
Regarding the failure, is this happening reliably, or did it just happen once?
It is happening repeatedly. I'll get the debug log gist and add it here.
On Mon, 26 Nov 2018 at 23:45, Benjamin Elder notifications@github.com
wrote:
Yep, as @alejandrox1 https://github.com/alejandrox1 noted, the base64
encoding is from their guide. The reason for this is to pass it as an HTTP
query parameter to weave so that their site can serve the appropriate weave
version based on your Kubernetes version.In the future we might use fixed weave versions, but this is the correct
and normal way to install it per their upstream documentation.kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl
version | base64 | tr -d '\n')"It would be this verbatim, but we need to specify the admin kubeconfig
location.
Regarding the failure, is this happening reliably, or did it just happen
once?โ
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-sigs/kind/issues/136#issuecomment-441741625,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AXK38nUHs1fqqiZhSdvjV1QyYevzK6Y6ks5uzC_MgaJpZM4Yy9kh
.
https://gist.github.com/senthilrch/70eb56cfeee38e311c13f6898791121a
The host in which I am creating the kind cluster is behind a proxy. Perhaps that's the reason it fails. Will kind honor http_proxy and https_proxy env variables set on the host?
Ah, that's almost definitely it!
kind does nothing special regarding proxies, the rest of the bringup only works because everything else (besides the overlay network config and its images) is pre-packed into the node image and doesn't need to go out to the internet.
We can either try to get these packed into the image ahead of time (which is probably quite doable, and possibly desirable, but maybe a little tricky), or we can try to make this step respect proxy information on the host machine.
It looks like http_proxy and HTTPS_PROXY are mostly a convention that curl and a few others happen to follow to varying degrees, we'd probably need to also set the docker daemon on the "nodes" to respect this as well.
Both approaches are probably worth doing. I'll update this issue to track.
/kind bug
/priority important-soon
Within the last 1-2 weeks Kind broke for me with the same error (I believe).
โ [control-plane] Creating the kubeadm config file โต
DEBU[16:26:27] Running: /usr/bin/docker [docker exec --privileged kind-1-control-plane kubeadm init --ignore-preflight-errors=all --config=/kind/kubeadm.conf]
DEBU[16:26:52] Running: /usr/bin/docker [docker exec --privileged -t kind-1-control-plane cat /etc/kubernetes/admin.conf]
DEBU[16:26:53] Running: /usr/bin/docker [docker exec --privileged kind-1-control-plane /bin/sh -c kubectl apply --kubeconfig=/etc/kubernetes/admin.conf -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version --kubeconfig=/etc/kubernetes/admin.conf | base64 | tr -d '\n')"]
ERRO[16:28:25] failed to apply overlay network: exit status 1 ) โธ
โ [control-plane] Starting Kubernetes (this may take a minute) โธ
ERRO[16:28:25] failed to apply overlay network: exit status 1
DEBU[16:28:25] Running: /usr/bin/docker [docker ps -q -a --no-trunc --filter label=io.k8s.sigs.kind.cluster --format {{.Names}}\t{{.Label "io.k8s.sigs.kind.cluster"}} --filter label=io.k8s.sigs.kind.cluster=1]
DEBU[16:28:25] Running: /usr/bin/docker [docker rm -f -v kind-1-control-plane]
โ โ [control-plane] Pre-loading images ๐ Error: failed to create cluster: failed to apply overlay network: exit status 1
I didn't change anything on my system and simply do a git pull origin master && go install every now and then. I'm running on ArchLinux if that's helpful information.
@metalmatze is it possible that you're behind a proxy as well? we've not fixed that yet.
I don't think so.
At home I have the same issues, also running Arch but an entirely different system.
hmm. I don't think we've made any functional changes to this step in that time frame. FWIW making this step not depend on the internet is very high on my todo :confused:
other known issues I've seen that can cause similar problems:
Pulling the latest master now fixed KinD for me again. I'm not entirely sure what happened. I can't see any changes related to my problem. I'm on the same machine and the same WiFi as first reported from. Additionally my machine was suspended most of the weekend and I didn't run any updates during the time (like updating Docker for exmaple)
https://github.com/kubernetes-sigs/kind/compare/302bb7d...4a348e0
Huh. I can't spot anything relevant in there ๐ค the plot thickens ๐
I think this week I'll take a stab at pre-loading the CNI images and using
a fixed manifest which should help avoid this sort of issue entirely ๐ค
On Mon, Jan 14, 2019, 05:25 Matthias Loibl <[email protected] wrote:
Pulling the latest master now fixed KinD for me again. I'm not entirely
sure what happened. I can't see any changes related to my problem. I'm on
the same machine and the same WiFi as first reported from. Additionally my
machine was suspended most of the weekend and I didn't run any updates
during the time (like updating Docker for exmaple)
302bb7d...4a348e0
https://github.com/kubernetes-sigs/kind/compare/302bb7d...4a348e0โ
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-sigs/kind/issues/136#issuecomment-454003487,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA4Bq4RTGijVedj4rKKCU0RBQeSxMPgmks5vDIVIgaJpZM4Yy9kh
.
I'm facing the same issue. In my case apply overlay network fails because cloud.weave.works is not resolvable from kind-1-control-plane container. Any help would be very appreciated.
Upgraded docker from 18.09 to 18.09.1 and problem went away ๐.
Huh, I wonder if there was a regression in docker somehow. What docker distribution are you using?
Interesting. For me it works since 10 days ago and I just checked that I'm on Docker 18.09.1 as well. I should have checked the version when it didn't work.
FWIW I looked at the Arch packages for Docker and the timeline of their releases pretty much adds up with that suspicion!
18.09.1 was pushed on Jan 10th:
https://git.archlinux.org/svntogit/community.git/commit/?h=packages/docker&id=0b11ffde10bf10ab1b08a459c12927ff02abf6d3
@BenTheElder , I was running dind container docker:18.09-dind in kubernetes. After I've changed image to docker:18.09.1-dind issue got resolved.
Thanks for confirming, I'm going to file another issue to create a "known issues" section in our docs and highlhight this as one of the first ones!
It looks like http_proxy and HTTPS_PROXY are mostly a convention that curl and a few others happen to follow to varying degrees, we'd probably need to also set the docker daemon on the "nodes" to respect this as well.
+1
yes, i think given the containerization, passing the http(s)_proxy env. vars to the kind nodes might be necessary.
adding the option to pre-bake the overlay network and also provide air-gapped support will help users that don't want their kind cluster to talk to the internet. for the rest we might have to still expose the proxy env vars.
I was running dind container docker:18.09-dind in kubernetes. After I've changed image to docker:18.09.1-dind issue got resolved.
i wonder what was fixed.
so docker itself supports HTTP_PROXY / HTTPS_PROXY https://docs.docker.com/network/proxy/ ๐ค
we could just blindly pass through these values from the host at node creation time... ๐ค
it makes sense, especially if it's a fix.
@BenTheElder I really need this as my company has a corporate proxy... will you be working on it, or should I jump in?
:+1:
so docker itself supports HTTP_PROXY / HTTPS_PROXY https://docs.docker.com/network/proxy/
we could just blindly pass through these values from the host at node creation time...
I opened issue #270 for implementing this.
I've encountered the same error while trying to run kind inside a docker container (docker in docker). I have checked kind container and I was actually able to apply the overlay network yaml manually from inside the container, but it looks like apiserver just keeps restarting for no particular reason in my case. Here are logs from the apiserver and kubelet (not sure if it can help as I did not find anything useful, other than multiple handshake errors).
I also have the same issue when connected to a VPN client. I havent' dug into this but the same VPN client messes with my docker networking (for future time travelers experiencing the same failed to create cluster: failed to apply overlay network: exit status 1)
@floreks hmm took a quick peek, nothing leapt out ๐
we do run kind extensively inside a docker in docker (not the standard image though) setup for k8s CI.
We have seen kubelet continually evicting the API server in a few cases due to low disk / memory but I didn't see that in the logs.
@endzyme I suspect some variant on #270 may help. I am also further exploring #200.
@BenTheElder Thanks for the hint! After investigating more, I have seen that .kubelet was constantly being killed with SIGKILL (9). I have checked dstat --top-oom and it showed that whole control plane is constantly being killed by the system
EDIT: Unfortunately after increasing available resources nothing changed. Control plane keeps getting restarted for no reason. What might be important is that when I am testing kind locally inside a docker container docker run --privileged -it --rm ... sh and I run kind create cluster it works, but when I try to do this inside a kubernetes cluster while exec into pod, same kind create cluster fails with the above error.
@floreks is your pod:
in addition to resource requests?
@BenTheElder Security context is set to allow privileged execution. I am using official docker:dind as a base image and docker itself is running in the container. I did not have to mount anything when running it locally and it was working correctly. Only when running in a k8s environment there is an issue.
Here is my test yaml:
apiVersion: v1
kind: Namespace
metadata:
name: test-floreks
---
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
namespace: test-floreks
name: dind
spec:
selector:
matchLabels:
app: dind
replicas: 1 # tells deployment to run 2 pods matching the template
template:
metadata:
labels:
app: dind
spec:
containers:
- name: dind
image: floreks/dind-with-kind:v1.0.0
securityContext:
privileged: true
so our actual podspec is ~ the contents of the pod_spec field in this prowjob (a few things get added for git checkout, environment variables...):
apiVersion: prow.k8s.io/v1
kind: ProwJob
metadata:
annotations:
prow.k8s.io/job: ci-kubernetes-kind-conformance
creationTimestamp: null
labels:
created-by-prow: "true"
preset-bazel-remote-cache-enabled: "true"
preset-bazel-scratch-dir: "true"
preset-dind-enabled: "true"
preset-service-account: "true"
prow.k8s.io/id: bc7c7a72-2b06-11e9-8fd7-0a580a6c037c
prow.k8s.io/job: ci-kubernetes-kind-conformance
prow.k8s.io/type: periodic
name: f8f7ed86-2b0d-11e9-bfc2-0a580a6c0297
spec:
agent: kubernetes
cluster: default
job: ci-kubernetes-kind-conformance
namespace: test-pods
pod_spec:
containers:
- args:
- --job=$(JOB_NAME)
- --root=/go/src
- --repo=k8s.io/kubernetes=master
- --repo=sigs.k8s.io/kind=master
- --service-account=/etc/service-account/service-account.json
- --upload=gs://kubernetes-jenkins/logs
- --scenario=execute
- --
- ./../../sigs.k8s.io/kind/hack/ci/e2e.sh
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /etc/service-account/service-account.json
- name: E2E_GOOGLE_APPLICATION_CREDENTIALS
value: /etc/service-account/service-account.json
- name: TEST_TMPDIR
value: /bazel-scratch/.cache/bazel
- name: BAZEL_REMOTE_CACHE_ENABLED
value: "true"
- name: DOCKER_IN_DOCKER_ENABLED
value: "true"
image: gcr.io/k8s-testimages/kubekins-e2e:v20190205-d83780367-master
name: ""
resources:
requests:
cpu: "2"
memory: 9000Mi
securityContext:
privileged: true
volumeMounts:
- mountPath: /lib/modules
name: modules
readOnly: true
- mountPath: /sys/fs/cgroup
name: cgroup
- mountPath: /etc/service-account
name: service
readOnly: true
- mountPath: /bazel-scratch/.cache
name: bazel-scratch
- mountPath: /docker-graph
name: docker-graph
dnsConfig:
options:
- name: ndots
value: "1"
volumes:
- hostPath:
path: /lib/modules
type: Directory
name: modules
- hostPath:
path: /sys/fs/cgroup
type: Directory
name: cgroup
- name: service
secret:
secretName: service-account
- emptyDir: {}
name: bazel-scratch
- emptyDir: {}
name: docker-graph
type: periodic
status:
startTime: "2019-02-07T19:24:22Z"
state: triggered
https://github.com/kubernetes-sigs/kind/pull/275 just merged to pass through HTTPS_PROXY and HTTP_PROXY from the host to the nodes, thanks @pablochacin!
We should be getting the 0.2 release soon with this change, but right now you can obtain it by building from the current master branch sources.
hopefully this should resolve this issue, I am finalizing the design for handling CNIs as well, plan to bring up at the next meeting.
We've additionally uncovered #284 which may affect some configurations.
Thanks @BenTheElder I now have a new issue:
Error: failed to create cluster: failed to init node with kubeadm: exit status 1
You can find the full debug log here.
@matthyx from the log I see that the proxy has been set to http://127.0.0.1:3129/ This is localhost in the host machine, but inside the kind node container this address is the container's loopback (not the host's loopback). Therefore, you should set your proxy to an address witch is reachable from the kind node container.
Ok, I feel so stupid indeed... so after setting the proxy to something reachable from containers, I get the following (full log here):
ERRO[11:30:05] failed to apply overlay network: exit status 1
@matthyx you set the proxy to 172.17.01. which is the address of the docker bridge on your host machine, so I guess you are running your proxy locally on the host machine. What I've found is that by default in my machine the firewall is set so that no traffic is allow from inside docker containers to the host. I had to turn of the firewall to make it work. I'm pretty much sure this is default behavior of docker.
My suggestion is that you either test it disabling your firewall (at your own risk ;-) ) or try with a proxy running on a public address.
@pablochacin thanks for the suggestion, I have just checked and my proxy works from inside docker, as confirmed by a small Dockerfile like:
```
FROM ubuntu
ENV http_proxy=http://172.17.0.1:3129/
RUN apt update
I'm not following here @matthyx This is a Docker file, right? It is applied at build time, the issue you have is at run time. Not sure if this two situations are comparable. What I suggest you is to start a container with ubuntu, and from inside the container try an update:
$> docker run -ti --rm ubuntu bash
root@b85870b5faa1:/# export http_proxy=http://172.17.0.1:3129/
root@b85870b5faa1:# apt-get update
Yes, this works:
$ docker run -ti --rm ubuntu bash
root@29cd8a005505:/# export http_proxy=http://172.17.0.1:3129/
root@29cd8a005505:/# apt-get update
Get:1 http://archive.ubuntu.com/ubuntu bionic InRelease [242 kB]
Get:2 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Get:4 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
Get:5 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [339 kB]
Get:6 http://archive.ubuntu.com/ubuntu bionic/multiverse amd64 Packages [186 kB]
Get:7 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [152 kB]
Get:8 http://archive.ubuntu.com/ubuntu bionic/universe amd64 Packages [11.3 MB]
Get:9 http://security.ubuntu.com/ubuntu bionic-security/multiverse amd64 Packages [3451 B]
Get:10 http://archive.ubuntu.com/ubuntu bionic/main amd64 Packages [1344 kB]
Get:11 http://archive.ubuntu.com/ubuntu bionic/restricted amd64 Packages [13.5 kB]
Get:12 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [679 kB]
Get:13 http://archive.ubuntu.com/ubuntu bionic-updates/multiverse amd64 Packages [6955 B]
Get:14 http://archive.ubuntu.com/ubuntu bionic-updates/restricted amd64 Packages [10.7 kB]
Get:15 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [932 kB]
Get:16 http://archive.ubuntu.com/ubuntu bionic-backports/universe amd64 Packages [3650 B]
Fetched 15.5 MB in 1s (10.8 MB/s)
Reading package lists... Done
However, I have reached IT and it seems our corporate proxy (which requires a local cntlm for AD authentication) uses an old protocol for the man in the middle... and for this reason we cannot upgrade our Docker past 18.06.1-ce
Do you think we could be hitting the same issue here?
Some updates on this. I have the privilege to work with extremely bright people here, and the problem seems to lie on TLS negotiation (although not 1.3) because our proxy policy hasn't been updated in a while, and none of the algorithm proposed by the go tls client is supported atm...
We're working with network and security to update this policy, and I will keep you posted if that solves our problem!
Just to confirm the problem from my side persists even after Docker upgrade (I don't have any HTTP proxy):
I get this error with Docker 18.06.1 from the official Ubuntu 18.04 LTS repository:
kind create cluster --image=kindest/node:v1.13.3@sha256:d1af504f20f3450ccb7aed63b67ec61c156f9ed3e8b0d973b3dee3c95991753c --retain
Creating cluster 'kind-1' ...
โ Ensuring node image (kindest/node:v1.13.3) ๐ผ
โ [control-plane] Creating node container ๐ฆ
โ [control-plane] Fixing mounts ๐ป
โ [control-plane] Starting systemd ๐ฅ
โ [control-plane] Waiting for docker to be ready ๐
โ [control-plane] Pre-loading images ๐
โ [control-plane] Creating the kubeadm config file โต
ERRO[11:41:36] failed to apply overlay network: exit status 1 ) โธ
โ [control-plane] Starting Kubernetes (this may take a minute) โธ
ERRO[11:41:36] failed to apply overlay network: exit status 1
Error: failed to create cluster: failed to apply overlay network: exit status 1
****
The problem persists for me after upgrading to docker-ce 18.09.2~3-0~ubuntu-bionic (I followed the Docker CE instructions):
```kind create cluster --image=kindest/node:v1.13.3@sha256:d1af504f20f3450ccb7aed63b67ec61c156f9ed3e8b0d973b3dee3c95991753c --retain
Creating cluster 'kind-1' ...
โ Ensuring node image (kindest/node:v1.13.3) ๐ผ
โ [control-plane] Creating node container ๐ฆ
โ [control-plane] Fixing mounts ๐ป
โ [control-plane] Starting systemd ๐ฅ
โ [control-plane] Waiting for docker to be ready ๐
โ [control-plane] Pre-loading images ๐
โ [control-plane] Creating the kubeadm config file โต
ERRO[11:55:19] failed to add default storage class: exit status 1
โ [control-plane] Starting Kubernetes (this may take a minute) โธ
ERRO[11:55:19] failed to add default storage class: exit status 1
Error: failed to create cluster: failed to add default storage class: exit status 1
`curl` for the overlay network install works for me (but `kubectl version` fails as API server is already down):
root@kind-1-control-plane:/# curl --location https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version --kubeconfig=/etc/kubernetes/admin.conf | base64 | tr -d '\n')
The connection to the server 172.17.0.2:6443 was refused - did you specify the right host or port?
apiVersion: v1
kind: List
items:
... (cut for brevity)
```
@hjacobs can you update with go get -u sigs.k8s.io/kind? the cluster name suggests that you're on on an old version (or one of the previous releases), I suspect your API server is getting evicted, which we've patched around in #293.
the next release will contain this fix, but in the meantime it can be installed from the current source ๐ฌ
Should be _actually_ fixed now, additionally new node images do not require pulling the overlay image at all.
I will test on Monday since I don't have our corporate proxy at home... thanks for the update!
@BenTheElder doesn't seem to work better... I did go get -u sigs.k8s.io/kind to update to latest, and then kind create cluster --loglevel debug which resulted in the same failure.
You can read the debug logs here.
hey @matthyx, can you run with kind create cluster --retain --loglevel debug and then run kind export logs after?
I suspect this is something else with your environment, at the latest source zero internet connectivity should be required after pulling the "node" image. (which I and one other user have been able to verify).
hey @matthyx, can you run with
kind create cluster --retain --loglevel debugand then runkind export logsafter?I suspect this is something else with your environment, at the latest source zero internet connectivity should be required after pulling the "node" image. (which I and one other user have been able to verify).
Should I open another issue once I have the logs?
that would be good, thanks!
I think this is good now... looking at the logs before sending them, I have noticed that:
I0225 07:49:12.803064 726 checks.go:430] validating if the connectivity type is via proxy or direct
[WARNING HTTPProxy]: Connection to "https://172.17.0.3" uses proxy "http://127.0.0.1:3129/". If that is not intended, adjust your proxy settings
I0225 07:49:12.803104 726 checks.go:466] validating http connectivity to first IP address in the CIDR
[WARNING HTTPProxyCIDR]: connection to "10.96.0.0/12" uses proxy "http://127.0.0.1:3129/". This may lead to malfunctional cluster setup. Make sure that Pod and Services IP ranges specified correctly as exceptions in proxy configuration
And so I decided to give it a try by unsetting all my *_proxy env variables and suddently it worked! I can finally enjoy kind on my pro workstation.
Thanks a lot @pablochacin and @BenTheElder !
Hi @bentheelder,
I'm also interested in helping with this issue as it relates to support for air gapped testing. I'm currently in-flight back to Austin, and I thought I'd get some Kind-based dev-work done. However, without a good internet connection things are just not working. I finally got past the above error, but now, due to a flaky network, the node is never ready due to the inability to initialize CNI.
hey @akutz -- on the latest code in master airgapped clusters should work, the CNI does not need to be pulled, is it possible you're using an older version?
Most helpful comment
I think this is good now... looking at the logs before sending them, I have noticed that:
And so I decided to give it a try by unsetting all my *_proxy env variables and suddently it worked! I can finally enjoy kind on my pro workstation.
Thanks a lot @pablochacin and @BenTheElder !