RKE version:
rke version 6ea9ff0 (latest master)
Docker version: (docker version,docker info preferred)
Server:
Version: 17.09.1-ce
API version: 1.32 (minimum version 1.12)
Go version: go1.8.5
Git commit: 19e2cf6
Built: Thu Dec 7 22:19:00 2017
OS/Arch: linux/amd64
Experimental: false
Operating system and kernel: (cat /etc/os-release, uname -r preferred)
4.14.16-coreos
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1632.2.1
VERSION_ID=1632.2.1
BUILD_ID=2018-02-01-2053
PRETTY_NAME="Container Linux by CoreOS 1632.2.1 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"
Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
Azure Private Cloud
cluster.yml file:
---
nodes:
- address: 10.18.160.30
hostname_override: sandboxworker-0
internal_address: 10.18.160.30
role:
- worker
user: sandboxadmin
- address: 10.18.160.31
hostname_override: sandboxworker-1
internal_address: 10.18.160.31
role:
- worker
user: sandboxadmin
- address: 10.18.160.32
hostname_override: sandboxmaster-0
internal_address: 10.18.160.32
role:
- controlplane
- etcd
user: sandboxadmin
- address: 10.18.160.34
hostname_override: sandboxmaster-1
internal_address: 10.18.160.34
role:
- controlplane
- etcd
user: sandboxadmin
- address: 10.18.160.33
hostname_override: sandboxmaster-2
internal_address: 10.18.160.33
role:
- controlplane
- etcd
user: sandboxadmin
kubernetes_version: v1.9.2-rancher1-2
network:
plugin: flannel
auth:
strategy: x509
authorization:
mode: rbac
services:
etcd:
kube-api:
service_cluster_ip_range: 10.233.0.0/18
extra_args:
cloud-config: "/etc/kubernetes/azure.conf"
v: 4
kube-controller:
cluster_cidr: 10.233.64.0/18
service_cluster_ip_range: 10.233.0.0/18
extra_args:
cloud-config: "/etc/kubernetes/azure.conf"
scheduler:
kubelet:
cluster_domain: kubelab.vpc.starbucks.net
cluster_dns_server: 10.233.0.3
infra_container_image: gcr.io/google_containers/pause-amd64:3.0
extra_args:
cloud-config: "/etc/kubernetes/azure.conf"
kubeproxy:
ssh_key_path: "~/.ssh/kubernetes"
ignore_docker_version: true
ingress:
provider: nginx
system_images:
etcd: quay.io/coreos/etcd:latest
kubernetes: rancher/k8s:v1.9.2-rancher1-2
nginx_proxy: rancher/rke-nginx-proxy:v0.1.1
cert_downloader: rancher/rke-cert-deployer:v0.1.1
kubernetes_services_sidecar: rancher/rke-service-sidekick:v0.1.0
kubedns: gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.8
dnsmasq: gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.8
kubedns_sidecar: gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.8
kubedns_autoscaler: gcr.io/google_containers/cluster-proportional-autoscaler-amd64:1.1.2-r2
Steps to Reproduce:
rke up
Results:
No ingress controller deployed...
This could be due to the job deployment error:
...
INFO[0046] [worker] Successfully started Worker Plane..
INFO[0046] [sync] Syncing nodes Labels and Taints
INFO[0049] [sync] Successfully synced nodes Labels and Taints
INFO[0049] [network] Setting up network plugin: flannel
INFO[0049] [addons] Saving addon ConfigMap to Kubernetes
INFO[0050] [addons] Successfully Saved addon to Kubernetes ConfigMap: rke-network-plugin
INFO[0050] [addons] Executing deploy job..
INFO[0050] [addons] Setting up KubeDNS
INFO[0050] [addons] Saving addon ConfigMap to Kubernetes
INFO[0050] [addons] Successfully Saved addon to Kubernetes ConfigMap: rke-kubedns-addon
INFO[0050] [addons] Executing deploy job..
FATA[0076] Failed to deploy addon execute job: Failed to get job complete status: <nil>
Failed to get job complete status: <nil>
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
sandboxmaster-0 Ready etcd,master 13m v1.9.2-rancher1
sandboxmaster-1 Ready etcd,master 13m v1.9.2-rancher1
sandboxmaster-2 Ready etcd,master 13m v1.9.2-rancher1
sandboxworker-0 Ready worker 13m v1.9.2-rancher1
sandboxworker-1 Ready worker 13m v1.9.2-rancher1
kubectl get all --all-namespaces
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system ds/kube-flannel 5 5 5 5 5 <none> 12m
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
kube-system deploy/kube-dns 1 1 1 1 2m
kube-system deploy/kube-dns-autoscaler 1 1 1 1 2m
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system rs/kube-dns-6bc5c78657 1 1 1 2m
kube-system rs/kube-dns-autoscaler-7b795dc5cf 1 1 1 2m
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system ds/kube-flannel 5 5 5 5 5 <none> 12m
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
kube-system deploy/kube-dns 1 1 1 1 2m
kube-system deploy/kube-dns-autoscaler 1 1 1 1 2m
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system rs/kube-dns-6bc5c78657 1 1 1 2m
kube-system rs/kube-dns-autoscaler-7b795dc5cf 1 1 1 2m
NAMESPACE NAME DESIRED SUCCESSFUL AGE
kube-system jobs/rke-kubedns-addon-deploy-job 1 1 3m
kube-system jobs/rke-network-plugin-deploy-job 1 1 13m
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system po/kube-dns-6bc5c78657-fx8k4 3/3 Running 0 2m
kube-system po/kube-dns-autoscaler-7b795dc5cf-8g7ng 1/1 Running 0 2m
kube-system po/kube-flannel-8dqqs 2/2 Running 1 12m
kube-system po/kube-flannel-nzzf8 2/2 Running 0 11m
kube-system po/kube-flannel-qj6b6 2/2 Running 0 12m
kube-system po/kube-flannel-x466v 2/2 Running 0 11m
kube-system po/kube-flannel-z5rzs 2/2 Running 1 12m
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default svc/kubernetes ClusterIP 10.233.0.1 <none> 443/TCP 13m
kube-system svc/kube-dns ClusterIP 10.233.0.3 <none> 53/UDP,53/TCP 2m
Current workaround is to run rke up every 5 minutes or so, a total of ~3 times to finish deploying all of the default addons (kube DNS and ingress controller)
This is related to issues: #303, #286, #329, #318
@HighwayofLife can you check the deploy job for ingress on the server manually to see if it succeeded after the first run:
docker ps -a | grep ingress-controller-deploy-job | grep -v pause
1f265523d1d9 rancher/k8s@sha256:589234f56767f841c0240ef3d5b0ef74c9487819006d35dceb568fce92d2ad45 "kubectl apply -f /et" 29 hours ago Exited (0) 29 hours ago k8s_rke-ingress-controller-pod_rke-ingress-controller-deploy-job-bjtz9_kube-system_891e6e5c-0deb-11e8-aa7e-42010a800006_0
root@hgalal-rke:~# docker logs 1f26
namespace "ingress-nginx" created
configmap "nginx-configuration" created
configmap "tcp-services" created
configmap "udp-services" created
serviceaccount "nginx-ingress-serviceaccount" created
clusterrole "nginx-ingress-clusterrole" created
role "nginx-ingress-role" created
rolebinding "nginx-ingress-role-nisa-binding" created
clusterrolebinding "nginx-ingress-clusterrole-nisa-binding" created
daemonset "nginx-ingress-controller" created
deployment "default-http-backend" created
service "default-http-backend" created
@galal-hussein I ran docker ps -a on both worker nodes that are slated to install ingress, and the ingress container doesn't appear on either node.
Also does not appear in any of the 3 masters.
Is there any other debug info that I could gather that would help localize this issue? It's plaguing me with every single cluster I stand up. I have to run: rke up; rke up; rke up every time to get the cluster online, obviously taking 3x longer than it should.
This has since been solved/fixed.
Has anyone got this working? I know this issue is fixed and closed, but I am facing this issue. And there is no way my job is getting executed after running it three times (not even 10 times). Is there anything else I can do to make this work? Took somewhere around 15 runs but yeah it worked.
@iamShantanu101 I am also facing this issue one of my cluster created 2 days back with kubernetes version 1.12.0. One cluster with same kubernetes version is working I created approx 1-2 months ago.
I've got the same problem even though the log shows:
INFO[0069] [addons] Saving addon ConfigMap to Kubernetes
INFO[0069] [addons] Successfully Saved addon to Kubernetes ConfigMap: rke-ingress-controller
INFO[0069] [addons] Executing deploy job..
INFO[0069] [ingress] ingress controller nginx is successfully deployed
I've just notice that the Batch Job rke-ingress-controller-deploy-job was not responding at all.
$ ku logs job.batch/rke-ingress-controller-deploy-job -n kube-system
error: timed out waiting for the condition
And then a I realize all resources with 91 days has been not responding.
NAMESPACE NAME DESIRED SUCCESSFUL AGE
kube-system job.batch/rke-ingress-controller-deploy-job 1 1 91d
kube-system job.batch/rke-kubedns-addon-deploy-job 1 1 91d
kube-system job.batch/rke-metrics-addon-deploy-job 1 1 91d
kube-system job.batch/rke-network-plugin-deploy-job 1 1 21h
kube-system job.batch/rke-user-addon-deploy-job 1 1 42m
So I just delete all of them and it starts to work.
Most helpful comment
Is there any other debug info that I could gather that would help localize this issue? It's plaguing me with every single cluster I stand up. I have to run:
rke up; rke up; rke upevery time to get the cluster online, obviously taking 3x longer than it should.