Rke: Ingress Controller Not Deployed

Created on 10 Feb 2018  路  10Comments  路  Source: rancher/rke

RKE version:
rke version 6ea9ff0 (latest master)

Docker version: (docker version,docker info preferred)

Server:
 Version:      17.09.1-ce
 API version:  1.32 (minimum version 1.12)
 Go version:   go1.8.5
 Git commit:   19e2cf6
 Built:        Thu Dec  7 22:19:00 2017
 OS/Arch:      linux/amd64
 Experimental: false

Operating system and kernel: (cat /etc/os-release, uname -r preferred)

4.14.16-coreos
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1632.2.1
VERSION_ID=1632.2.1
BUILD_ID=2018-02-01-2053
PRETTY_NAME="Container Linux by CoreOS 1632.2.1 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
Azure Private Cloud

cluster.yml file:

---
nodes:
- address: 10.18.160.30
  hostname_override: sandboxworker-0
  internal_address: 10.18.160.30
  role:
  - worker
  user: sandboxadmin
- address: 10.18.160.31
  hostname_override: sandboxworker-1
  internal_address: 10.18.160.31
  role:
  - worker
  user: sandboxadmin
- address: 10.18.160.32
  hostname_override: sandboxmaster-0
  internal_address: 10.18.160.32
  role:
  - controlplane
  - etcd
  user: sandboxadmin
- address: 10.18.160.34
  hostname_override: sandboxmaster-1
  internal_address: 10.18.160.34
  role:
  - controlplane
  - etcd
  user: sandboxadmin
- address: 10.18.160.33
  hostname_override: sandboxmaster-2
  internal_address: 10.18.160.33
  role:
  - controlplane
  - etcd
  user: sandboxadmin
kubernetes_version: v1.9.2-rancher1-2
network:
  plugin: flannel
auth:
  strategy: x509
authorization:
  mode: rbac
services:
  etcd:
  kube-api:
    service_cluster_ip_range: 10.233.0.0/18
    extra_args:
      cloud-config: "/etc/kubernetes/azure.conf"
      v: 4
  kube-controller:
    cluster_cidr: 10.233.64.0/18
    service_cluster_ip_range: 10.233.0.0/18
    extra_args:
      cloud-config: "/etc/kubernetes/azure.conf"
  scheduler:
  kubelet:
    cluster_domain: kubelab.vpc.starbucks.net
    cluster_dns_server: 10.233.0.3
    infra_container_image: gcr.io/google_containers/pause-amd64:3.0
    extra_args:
      cloud-config: "/etc/kubernetes/azure.conf"
  kubeproxy:
ssh_key_path: "~/.ssh/kubernetes"
ignore_docker_version: true
ingress:
  provider: nginx

system_images:
  etcd: quay.io/coreos/etcd:latest
  kubernetes: rancher/k8s:v1.9.2-rancher1-2
  nginx_proxy: rancher/rke-nginx-proxy:v0.1.1
  cert_downloader: rancher/rke-cert-deployer:v0.1.1
  kubernetes_services_sidecar: rancher/rke-service-sidekick:v0.1.0
  kubedns: gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.8
  dnsmasq: gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.8
  kubedns_sidecar: gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.8
  kubedns_autoscaler: gcr.io/google_containers/cluster-proportional-autoscaler-amd64:1.1.2-r2

Steps to Reproduce:
rke up

Results:
No ingress controller deployed...
This could be due to the job deployment error:

...
INFO[0046] [worker] Successfully started Worker Plane.. 
INFO[0046] [sync] Syncing nodes Labels and Taints       
INFO[0049] [sync] Successfully synced nodes Labels and Taints 
INFO[0049] [network] Setting up network plugin: flannel 
INFO[0049] [addons] Saving addon ConfigMap to Kubernetes 
INFO[0050] [addons] Successfully Saved addon to Kubernetes ConfigMap: rke-network-plugin 
INFO[0050] [addons] Executing deploy job..              
INFO[0050] [addons] Setting up KubeDNS                  
INFO[0050] [addons] Saving addon ConfigMap to Kubernetes 
INFO[0050] [addons] Successfully Saved addon to Kubernetes ConfigMap: rke-kubedns-addon 
INFO[0050] [addons] Executing deploy job..              
FATA[0076] Failed to deploy addon execute job: Failed to get job complete status: <nil> 
Failed to get job complete status: <nil>
$ kubectl get nodes
NAME              STATUS    ROLES         AGE       VERSION
sandboxmaster-0   Ready     etcd,master   13m       v1.9.2-rancher1
sandboxmaster-1   Ready     etcd,master   13m       v1.9.2-rancher1
sandboxmaster-2   Ready     etcd,master   13m       v1.9.2-rancher1
sandboxworker-0   Ready     worker        13m       v1.9.2-rancher1
sandboxworker-1   Ready     worker        13m       v1.9.2-rancher1
kubectl get all --all-namespaces
NAMESPACE     NAME              DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
kube-system   ds/kube-flannel   5         5         5         5            5           <none>          12m

NAMESPACE     NAME                         DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kube-system   deploy/kube-dns              1         1         1            1           2m
kube-system   deploy/kube-dns-autoscaler   1         1         1            1           2m

NAMESPACE     NAME                                DESIRED   CURRENT   READY     AGE
kube-system   rs/kube-dns-6bc5c78657              1         1         1         2m
kube-system   rs/kube-dns-autoscaler-7b795dc5cf   1         1         1         2m

NAMESPACE     NAME              DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
kube-system   ds/kube-flannel   5         5         5         5            5           <none>          12m

NAMESPACE     NAME                         DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kube-system   deploy/kube-dns              1         1         1            1           2m
kube-system   deploy/kube-dns-autoscaler   1         1         1            1           2m

NAMESPACE     NAME                                DESIRED   CURRENT   READY     AGE
kube-system   rs/kube-dns-6bc5c78657              1         1         1         2m
kube-system   rs/kube-dns-autoscaler-7b795dc5cf   1         1         1         2m

NAMESPACE     NAME                                 DESIRED   SUCCESSFUL   AGE
kube-system   jobs/rke-kubedns-addon-deploy-job    1         1            3m
kube-system   jobs/rke-network-plugin-deploy-job   1         1            13m

NAMESPACE     NAME                                      READY     STATUS    RESTARTS   AGE
kube-system   po/kube-dns-6bc5c78657-fx8k4              3/3       Running   0          2m
kube-system   po/kube-dns-autoscaler-7b795dc5cf-8g7ng   1/1       Running   0          2m
kube-system   po/kube-flannel-8dqqs                     2/2       Running   1          12m
kube-system   po/kube-flannel-nzzf8                     2/2       Running   0          11m
kube-system   po/kube-flannel-qj6b6                     2/2       Running   0          12m
kube-system   po/kube-flannel-x466v                     2/2       Running   0          11m
kube-system   po/kube-flannel-z5rzs                     2/2       Running   1          12m

NAMESPACE     NAME             TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
default       svc/kubernetes   ClusterIP   10.233.0.1   <none>        443/TCP         13m
kube-system   svc/kube-dns     ClusterIP   10.233.0.3   <none>        53/UDP,53/TCP   2m
kinbug

Most helpful comment

Is there any other debug info that I could gather that would help localize this issue? It's plaguing me with every single cluster I stand up. I have to run: rke up; rke up; rke up every time to get the cluster online, obviously taking 3x longer than it should.

All 10 comments

Current workaround is to run rke up every 5 minutes or so, a total of ~3 times to finish deploying all of the default addons (kube DNS and ingress controller)

This is related to issues: #303, #286, #329, #318

@HighwayofLife can you check the deploy job for ingress on the server manually to see if it succeeded after the first run:

docker ps -a | grep ingress-controller-deploy-job | grep -v pause
1f265523d1d9        rancher/k8s@sha256:589234f56767f841c0240ef3d5b0ef74c9487819006d35dceb568fce92d2ad45                                                      "kubectl apply -f /et"   29 hours ago        Exited (0) 29 hours ago                       k8s_rke-ingress-controller-pod_rke-ingress-controller-deploy-job-bjtz9_kube-system_891e6e5c-0deb-11e8-aa7e-42010a800006_0

root@hgalal-rke:~# docker logs 1f26
namespace "ingress-nginx" created
configmap "nginx-configuration" created
configmap "tcp-services" created
configmap "udp-services" created
serviceaccount "nginx-ingress-serviceaccount" created
clusterrole "nginx-ingress-clusterrole" created
role "nginx-ingress-role" created
rolebinding "nginx-ingress-role-nisa-binding" created
clusterrolebinding "nginx-ingress-clusterrole-nisa-binding" created
daemonset "nginx-ingress-controller" created
deployment "default-http-backend" created
service "default-http-backend" created

@galal-hussein I ran docker ps -a on both worker nodes that are slated to install ingress, and the ingress container doesn't appear on either node.
Also does not appear in any of the 3 masters.

Is there any other debug info that I could gather that would help localize this issue? It's plaguing me with every single cluster I stand up. I have to run: rke up; rke up; rke up every time to get the cluster online, obviously taking 3x longer than it should.

This has since been solved/fixed.

Has anyone got this working? I know this issue is fixed and closed, but I am facing this issue. And there is no way my job is getting executed after running it three times (not even 10 times). Is there anything else I can do to make this work? Took somewhere around 15 runs but yeah it worked.

@iamShantanu101 I am also facing this issue one of my cluster created 2 days back with kubernetes version 1.12.0. One cluster with same kubernetes version is working I created approx 1-2 months ago.

I've got the same problem even though the log shows:

INFO[0069] [addons] Saving addon ConfigMap to Kubernetes 
INFO[0069] [addons] Successfully Saved addon to Kubernetes ConfigMap: rke-ingress-controller 
INFO[0069] [addons] Executing deploy job..              
INFO[0069] [ingress] ingress controller nginx is successfully deployed 

I've just notice that the Batch Job rke-ingress-controller-deploy-job was not responding at all.

$ ku logs job.batch/rke-ingress-controller-deploy-job -n kube-system
error: timed out waiting for the condition

And then a I realize all resources with 91 days has been not responding.

NAMESPACE     NAME                                          DESIRED   SUCCESSFUL   AGE
kube-system   job.batch/rke-ingress-controller-deploy-job   1         1            91d
kube-system   job.batch/rke-kubedns-addon-deploy-job        1         1            91d
kube-system   job.batch/rke-metrics-addon-deploy-job        1         1            91d
kube-system   job.batch/rke-network-plugin-deploy-job       1         1            21h
kube-system   job.batch/rke-user-addon-deploy-job           1         1            42m

So I just delete all of them and it starts to work.

Was this page helpful?
0 / 5 - 0 ratings