RKE version:
rke version v0.2.5
Docker version: (docker version,docker info preferred)
Client:
Version: 1.13.1
API version: 1.26
Package version: docker-1.13.1-88.git07f3374.el7.centos.x86_64
Go version: go1.9.4
Git commit: 07f3374/1.13.1
Built: Fri Dec 7 16:13:51 2018
OS/Arch: linux/amd64
Server:
Version: 1.13.1
API version: 1.26 (minimum version 1.12)
Package version: docker-1.13.1-88.git07f3374.el7.centos.x86_64
Go version: go1.9.4
Git commit: 07f3374/1.13.1
Built: Fri Dec 7 16:13:51 2018
OS/Arch: linux/amd64
Experimental: false
Operating system and kernel: (cat /etc/os-release, uname -r preferred)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
3.10.0-327.18.2.el7.x86_64
Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
openstack create instance
cluster.yml file:
nodes:
private_registries:
Steps to Reproduce:
rke up --config ./rancher-cluster.yml
Results:
[centos@tpe-liberty-alex-2 HAnode1]$ rke up --config ./rancher-cluster.yml
INFO[0000] Initiating Kubernetes cluster
INFO[0000] [certificates] Generating admin certificates and kubeconfig
INFO[0000] Successfully Deployed state file at [./rancher-cluster.rkestate]
INFO[0000] Building Kubernetes cluster
INFO[0000] [dialer] Setup tunnel for host [10.57.241.144]
INFO[0000] [dialer] Setup tunnel for host [10.57.241.142]
INFO[0000] [dialer] Setup tunnel for host [10.57.241.143]
INFO[0010] [network] Deploying port listener containers
INFO[0011] [network] Successfully started [rke-etcd-port-listener] container on host [10.57.241.143]
INFO[0016] [network] Successfully started [rke-etcd-port-listener] container on host [10.57.241.144]
INFO[0017] [network] Successfully started [rke-cp-port-listener] container on host [10.57.241.143]
INFO[0018] [network] Successfully started [rke-cp-port-listener] container on host [10.57.241.142]
INFO[0018] [network] Successfully started [rke-cp-port-listener] container on host [10.57.241.144]
INFO[0019] [network] Successfully started [rke-worker-port-listener] container on host [10.57.241.143]
INFO[0021] [network] Successfully started [rke-worker-port-listener] container on host [10.57.241.144]
INFO[0021] [network] Successfully started [rke-worker-port-listener] container on host [10.57.241.142]
INFO[0021] [network] Port listener containers deployed successfully
INFO[0021] [network] Running etcd <-> etcd port checks
INFO[0022] [network] Successfully started [rke-port-checker] container on host [10.57.241.143]
INFO[0023] [network] Successfully started [rke-port-checker] container on host [10.57.241.142]
INFO[0024] [network] Successfully started [rke-port-checker] container on host [10.57.241.144]
INFO[0025] [network] Running control plane -> etcd port checks
INFO[0026] [network] Successfully started [rke-port-checker] container on host [10.57.241.143]
INFO[0029] [network] Successfully started [rke-port-checker] container on host [10.57.241.142]
INFO[0029] [network] Successfully started [rke-port-checker] container on host [10.57.241.144]
INFO[0031] [network] Running control plane -> worker port checks
INFO[0031] [network] Successfully started [rke-port-checker] container on host [10.57.241.143]
INFO[0034] [network] Successfully started [rke-port-checker] container on host [10.57.241.144]
INFO[0034] [network] Successfully started [rke-port-checker] container on host [10.57.241.142]
INFO[0036] [network] Running workers -> control plane port checks
INFO[0038] [network] Successfully started [rke-port-checker] container on host [10.57.241.143]
INFO[0039] [network] Successfully started [rke-port-checker] container on host [10.57.241.144]
INFO[0039] [network] Successfully started [rke-port-checker] container on host [10.57.241.142]
INFO[0059] [network] Checking KubeAPI port Control Plane hosts
INFO[0059] [network] Removing port listener containers
INFO[0059] [remove/rke-etcd-port-listener] Successfully removed container on host [10.57.241.143]
INFO[0060] [remove/rke-etcd-port-listener] Successfully removed container on host [10.57.241.142]
INFO[0061] [remove/rke-etcd-port-listener] Successfully removed container on host [10.57.241.144]
INFO[0062] [remove/rke-cp-port-listener] Successfully removed container on host [10.57.241.143]
INFO[0064] [remove/rke-cp-port-listener] Successfully removed container on host [10.57.241.142]
INFO[0065] [remove/rke-cp-port-listener] Successfully removed container on host [10.57.241.144]
INFO[0065] [remove/rke-worker-port-listener] Successfully removed container on host [10.57.241.143]
INFO[0067] [remove/rke-worker-port-listener] Successfully removed container on host [10.57.241.142]
INFO[0067] [remove/rke-worker-port-listener] Successfully removed container on host [10.57.241.144]
INFO[0067] [network] Port listener containers removed successfully
INFO[0067] [certificates] Deploying kubernetes certificates to Cluster nodes
INFO[0078] [reconcile] Rebuilding and updating local kube config
INFO[0078] Successfully Deployed local admin kubeconfig at [./kube_config_rancher-cluster.yml]
INFO[0078] Successfully Deployed local admin kubeconfig at [./kube_config_rancher-cluster.yml]
INFO[0078] Successfully Deployed local admin kubeconfig at [./kube_config_rancher-cluster.yml]
INFO[0078] [certificates] Successfully deployed kubernetes certificates to Cluster nodes
INFO[0078] [reconcile] Reconciling cluster state
INFO[0078] [reconcile] This is newly generated cluster
INFO[0078] Pre-pulling kubernetes images
INFO[0078] [pre-deploy] Pulling image [10.57.241.204:5000/rancher/hyperkube:v1.14.3-rancher1] on host [10.57.241.143]
INFO[0078] [pre-deploy] Pulling image [10.57.241.204:5000/rancher/hyperkube:v1.14.3-rancher1] on host [10.57.241.144]
INFO[0078] [pre-deploy] Pulling image [10.57.241.204:5000/rancher/hyperkube:v1.14.3-rancher1] on host [10.57.241.142]
INFO[0078] [pre-deploy] Successfully pulled image [10.57.241.204:5000/rancher/hyperkube:v1.14.3-rancher1] on host [10.57.241.144]
INFO[0078] [pre-deploy] Successfully pulled image [10.57.241.204:5000/rancher/hyperkube:v1.14.3-rancher1] on host [10.57.241.143]
INFO[0078] [pre-deploy] Successfully pulled image [10.57.241.204:5000/rancher/hyperkube:v1.14.3-rancher1] on host [10.57.241.142]
INFO[0078] Kubernetes images pulled successfully
INFO[0078] [etcd] Building up etcd plane..
INFO[0078] [etcd] Saving snapshot [etcd-rolling-snapshots] on host [10.57.241.142]
INFO[0078] [etcd] Pulling image [rancher/rke-tools:v0.1.34] on host [10.57.241.142]
INFO[0093] [etcd] Successfully pulled image [rancher/rke-tools:v0.1.34] on host [10.57.241.142]
FATA[0093] [etcd] Failed to bring up Etcd Plane: Failed to create [etcd-rolling-snapshots] container on host [10.57.241.142]: Error: No such image: rancher/rke-tools:v0.1.34
With version v0.2.5 I have just observed the same issue when running against a v1.13.5-rancher1 cluster (a step I perform before trying the update to v1.14.3-rancher1-1). The difference is that I customize the image list:
system_images:
etcd: registry.example.com:5000/rancher/coreos-etcd:v3.2.24
kubernetes: registry.example.com:5000/rancher/hyperkube:v1.13.5-rancher1
alpine: registry.example.com:5000/rancher/rke-tools:v0.1.28
nginx_proxy: registry.example.com:5000/rancher/rke-tools:v0.1.28
cert_downloader: registry.example.com:5000/rancher/rke-tools:v0.1.28
kubernetes_services_sidecar: registry.example.com:5000/rancher/rke-tools:v0.1.28
kubedns: registry.example.com:5000/rancher/k8s-dns-kube-dns-amd64:1.15.0
dnsmasq: registry.example.com:5000/rancher/k8s-dns-dnsmasq-nanny-amd64:1.15.0
kubedns_sidecar: registry.example.com:5000/rancher/k8s-dns-sidecar-amd64:1.15.0
kubedns_autoscaler: registry.example.com:5000/rancher/cluster-proportional-autoscaler-amd64:1.0.0
flannel: registry.example.com:5000/rancher/coreos-flannel:v0.10.0
flannel_cni: registry.example.com:5000/rancher/coreos-flannel-cni:v0.3.0
calico_node: registry.example.com:5000/rancher/calico-node:v3.4.0
calico_cni: registry.example.com:5000/rancher/calico-cni:v3.4.0
calico_ctl: registry.example.com:5000/rancher/calico-ctl:v2.0.0
canal_node: registry.example.com:5000/rancher/calico-node:v3.4.0
canal_cni: registry.example.com:5000/rancher/calico-cni:v3.4.0
canal_flannel: registry.example.com:5000/rancher/coreos-flannel:v0.10.0
weave_node: registry.example.com:5000/rancher/weave-kube:2.5.0
weave_cni: registry.example.com:5000/rancher/weave-npc:2.5.0
pod_infra_container: registry.example.com:5000/rancher/pause-amd64:3.1
ingress: registry.example.com:5000/rancher/nginx-ingress-controller:0.21.0-rancher3
ingress_backend: registry.example.com:5000/rancher/nginx-ingress-controller-defaultbackend:1.4
metrics_server: registry.example.com:5000/rancher/metrics-server-amd64:v0.3.1
coredns: registry.example.com:5000/rancher/coredns:1.2.6
codedns_autoscaler: registry.example.com:5000/rancher/cluster-proportional-autoscaler-amd64:1.0.0
And the result is pretty much the same:
INFO[0008] [etcd] Pulling image [rancher/rke-tools:v0.1.34] on host [cfdd9f3c.example.com]
INFO[0009] [etcd] Successfully pulled image [rancher/rke-tools:v0.1.34] on host [cfdd9f3c.example.com]
FATA[0009] [etcd] Failed to bring up Etcd Plane: Failed to create [etcd-rolling-snapshots] container on host [cfdd9f3c.example.com]: Error: No such image: rancher/rke-tools:v0.1.34
Previous versions ran perfectly.
EDIT: ran again using the correct configuration and images for version v1.14.3-rancher1-1 (an upgrade) and I still see the problem.
looks to be caused by https://github.com/rancher/rke/commit/7531e02563054a72410d878fa6e29a10e957704b#diff-a1c0977dfcb80994021b9d53dfc88892
cc @superseb @kinarashah
Image used for RKE snapshot was removed from a system_images image and to a static image and tag, making it fail in any situation where that static image and tag is not available.
Making it available on the nodes manually and tagging as the needed image and tag would be workaround before it is fixed, or use v0.2.4.
docker pull your_registry/rancher/rke-tools:v0.1.34
docker tag your_registry/rancher/rke-tools:v0.1.34 rancher/rke-tools:v0.1.34
The log indicating it was successfully pulled while it's not is tracked in https://github.com/rancher/rke/issues/1010
Can be tested with rke v0.2.6-rc1
rke v0.2.6-rc1
Airgap
Private registry w/ auth
rke pulls the correct rke-tools image and the cluster is able to provision successfully.
cluster.yml
nodes:
- address: 172.31.21.72
user: ubuntu
role: [controlplane,etcd,worker]
private_registries:
- url: registry:443
user: username
password: password
is_default: true
./rke_linux-amd64 up --config cluster.yml --ssh-agent-auth
ubuntu@ip-172-31-26-130:~$ ./rke_linux-amd64 up --config cluster.yml --ssh-agent-auth
WARN[0000] This is not an officially supported version (v0.2.6-rc1) of RKE. Please download the latest official release at https://github.com/rancher/rke/releases/latest
INFO[0000] Initiating Kubernetes cluster
INFO[0000] [dialer] Setup tunnel for host [172.31.21.72]
INFO[0000] [state] Pulling image [registry:443/rancher/rke-tools:v0.1.34] on host [172.31.21.72]
INFO[0003] [state] Successfully pulled image [registry:443/rancher/rke-tools:v0.1.34] on host [172.31.21.72]
...
INFO[0038] Kubernetes images pulled successfully
INFO[0038] [etcd] Building up etcd plane..
INFO[0038] [etcd] Pulling image [registry:443/rancher/coreos-etcd:v3.3.10-rancher1] on host [172.31.21.72]
INFO[0040] [etcd] Successfully pulled image [registry:443/rancher/coreos-etcd:v3.3.10-rancher1] on host [172.31.21.72]
INFO[0040] [etcd] Successfully started [etcd] container on host [172.31.21.72]
INFO[0040] [etcd] Saving snapshot [etcd-rolling-snapshots] on host [172.31.21.72]
INFO[0040] [etcd] Successfully started [etcd-rolling-snapshots] container on host [172.31.21.72]
...
INFO[0100] Finished building Kubernetes cluster successfully
Was able to reproduce with rke 0.2.5 - same cluster.yml
INFO[0011] [etcd] Pulling image [rancher/rke-tools:v0.1.34] on host [172.31.21.72]
FATA[0026] [etcd] Failed to bring up Etcd Plane: Can't pull Docker image [rancher/rke-tools:v0.1.34] for host [172.31.21.72]: Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Most helpful comment
Image used for RKE snapshot was removed from a system_images image and to a static image and tag, making it fail in any situation where that static image and tag is not available.
Making it available on the nodes manually and tagging as the needed image and tag would be workaround before it is fixed, or use v0.2.4.
The log indicating it was successfully pulled while it's not is tracked in https://github.com/rancher/rke/issues/1010