RKE version: v0.2.0
Docker version: (docker version,docker info preferred)
Client:
Version: 18.09.3
API version: 1.39
Go version: go1.10.8
Git commit: 774a1f4
Built: Thu Feb 28 06:53:11 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.3
API version: 1.39 (minimum version 1.12)
Go version: go1.10.8
Git commit: 774a1f4
Built: Thu Feb 28 05:59:55 2019
OS/Arch: linux/amd64
Experimental: false
Containers: 20
Running: 7
Paused: 0
Stopped: 13
Images: 4
Server Version: 18.09.3
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: e6b3f5632f50dbc4e9cb6288d911bf4f5e95b18e
runc version: 6635b4f0c6af3810594d2770f662f34ddc15b40d
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-46-generic
Operating System: Ubuntu 18.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 985.5MiB
Name: kanuahs
ID: 5EFK:2KX7:R64P:YT56:WCYV:653P:AFWT:TAS4:PMGA:YCOR:3FPX:4D2N
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
WARNING: No swap limit support
md5-9567005ce853803d36ec6b03ca977232
NAME="Ubuntu"
VERSION="18.04.2 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.2 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
4.15.0-46-generic
md5-d6a4b4cff24d0d43452a10fcd56a9683
**Steps to Reproduce:**
1. Create a fresh ubuntu 18.04 server virtualbox VM using [this ISO](http://cdimage.ubuntu.com/ubuntu/releases/18.04/release/ubuntu-18.04.2-server-amd64.iso), Install docker, etcd, Generate ssh keys
2. rke up
**Results:**
md5-0d2481584392488595f7176dc43e508e
md5-2cfaa1d85bf5b74686406473114324c0
kubectl describe jobs -n kube-system
Name: rke-network-plugin-deploy-job
Namespace: kube-system
Selector: controller-uid=9a9d01ed-5069-11e9-b158-080027e84c2b
Labels: controller-uid=9a9d01ed-5069-11e9-b158-080027e84c2b
job-name=rke-network-plugin-deploy-job
Annotations:
Parallelism: 1
Completions: 1
Start Time: Wed, 27 Mar 2019 13:53:30 +0530
Pods Statuses: 1 Running / 0 Succeeded / 4 Failed
Pod Template:
Labels: controller-uid=9a9d01ed-5069-11e9-b158-080027e84c2b
job-name=rke-network-plugin-deploy-job
Service Account: rke-job-deployer
Containers:
rke-network-plugin-pod:
Image: rancher/hyperkube:v1.13.4-rancher1
Port:
Host Port:
Command:
kubectl
apply
-f
/etc/config/rke-network-plugin.yaml
Environment:
Mounts:
/etc/config from config-volume (rw)
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rke-network-plugin
Optional: false
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 2m2s job-controller Created pod: rke-network-plugin-deploy-job-pk298
Normal SuccessfulCreate 102s job-controller Created pod: rke-network-plugin-deploy-job-n8wx2
Normal SuccessfulCreate 71s job-controller Created pod: rke-network-plugin-deploy-job-f6d9h
Normal SuccessfulCreate 50s job-controller Created pod: rke-network-plugin-deploy-job-6bcqv
Normal SuccessfulCreate 10s job-controller Created pod: rke-network-plugin-deploy-job-7kzkg
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
rke-network-plugin-deploy-job-6bcqv 0/1 Error 0 84s
rke-network-plugin-deploy-job-7kzkg 0/1 Error 0 44s
rke-network-plugin-deploy-job-f6d9h 0/1 Error 0 105s
rke-network-plugin-deploy-job-n8wx2 0/1 Error 0 2m16s
rke-network-plugin-deploy-job-pk298 0/1 Error 0 2m37s
kubectl logs -n kube-system rke-network-plugin-deploy-job-6bcqv
...
unable to recognize "/etc/config/rke-network-plugin.yaml": Get https://10.43.0.1:443/api?timeout=32s: dial tcp 10.43.0.1:443: connect: connection refused
unable to recognize "/etc/config/rke-network-plugin.yaml": Get https://10.43.0.1:443/api?timeout=32s: dial tcp 10.43.0.1:443: connect: connection refused
unable to recognize "/etc/config/rke-network-plugin.yaml": Get https://10.43.0.1:443/api?timeout=32s: dial tcp 10.43.0.1:443: connect: connection refused
```
Additional Info:
The rke binary is inside VM. I'm trying to create a single node cluster from inside the VM, with the VM itself as the node.
rke up --local (with no config file) causes the same problem
I'm having the exact same problem. Not sure what to do next. My output is as follow.
âžś rancher git:(master) âś— k --kubeconfig kube_config_cluster.yml get pod --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system rke-network-plugin-deploy-job-48ms4 0/1 Pending 0 19s
We hit the same issue with v0.2.1. As this is the way to check (CI/CD) verify rke/k8s release, this is preventing us upgrading.
As a workaround, I used the hostname of the VM instead of 127.0.0.1 and it worked in Ubuntu 16.04 and 18.04 . It seems to be related to pods not being able to reach the api server..
Using Ubuntu or RancherOS it works fine but problem with CentOS 7x.
we have the same problem here.
I changed the address in cluster.yml to the servers ip (192.168.1.10 in my case) instead of 127.0.0.1 and it started working for me.
Getting the same kind of issues, as well in Virtualbox. My setup is as:
3 nodes with RancherOS
All nodes being worker and etcd, two of them being controllers
rke is installed on a 4th node, and cluster.yaml is referring the three k8s nodes with their ip address
rke v0.2.4, installing kubernetes 1.13.5
As mentioned by @trankchung, CentOS/RHEL doesn't work properly, I have to run the pipeline twice to get a succesful job
...
[info] [sync] Syncing nodes Labels and Taints
[info] [sync] Successfully synced nodes Labels and Taints
[info] [network] Setting up network plugin: calico
[info] [addons] Saving ConfigMap for addon rke-network-plugin to Kubernetes
[info] [addons] Successfully saved ConfigMap for addon rke-network-plugin to Kubernetes
[info] [addons] Executing deploy job rke-network-plugin
Error:
Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system
I had a similar issue. The rke-network-plugin was never deployed. My nodes got just about 25% free disc storage left. Once I deleted old data and had about 75% of free disc space on all of my 3 nodes the rke-network-plugin was deployed successfully.
It seems for me the issue was too low default value of rke "addon_job_timeout" (default is 30 seconds).. I increased the value, and rke network plugin deploy job starting being successful (https://github.com/rancher/rke/issues/1652).
I had the same issue, and these two steps solved my problem
addon_job_timeoutIn my case, one of the nodes had DiskPressure state
In my case, this answer ~is the workaround~ had progress.
Setting addon_job_timeout to a long time didn't help, it just keeps failing the whole time. kubectl describe node shows that DiskPressure is False as well. I had to run docker network create --driver=bridge --subnet=10.43.0.0/16 br0_rke before rke up.
Keep note that a comment mentioned this would mean that RKE is broken.
quick update on my situation
logs of coredns and calico-kube-controller pods imply that there still seems to be no route to 10.43.0.1 for internal traffic across pods...
E0528 02:15:57.160530 1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to list *v1.Endpoints: Get https://10.43.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.43.0.1:443: connect: no route to host
E0528 02:15:57.160530 1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to list *v1.Endpoints: Get https://10.43.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.43.0.1:443: connect: no route to host
log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-799dffd9c4-lndpq.unknownuser.log.ERROR.20200528-021557.1: no such file or directory
2020-05-28 02:21:10.062 [ERROR][1] client.go 238: Error getting cluster information config ClusterInformation="default" error=Get https://10.43.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.43.0.1:443: connect: no route to host
2020-05-28 02:21:10.062 [FATAL][1] main.go 117: Failed to initialize Calico datastore error=Get https://10.43.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.43.0.1:443: connect: no route to host
You should not need the br0_rke network — all this is doing is applying a bandaid fix over your issue.
A more “suitable” workaround for this is to create a default route on the host network namespace — even if it doesn’t route anywhere due to you being airgapped, the iptables rules that are used for service IP resolution with rke will then start to work, and thus you’ll be able to route to 10.43.0.0/16
Most helpful comment
Using Ubuntu or RancherOS it works fine but problem with CentOS 7x.