Rke: Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system

Created on 9 Jul 2019  路  15Comments  路  Source: rancher/rke

I have tried the work around from rancher/rke#1295 and it didn't work for me, it produces the same error.

I'm not sure that #19189 would relate as this is a new cluster and not an upgrade to an existing one.

This is my first time using rke and setting up a k8s cluster so please let me know if I'm missing something obvious or if you need more information from me!

RKE version:
rke version v0.2.4

Docker version: (docker version,docker info preferred)

Same for both node 1 and node 2.

Client:
 Version:           18.09.7
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        2d0083d
 Built:             Thu Jun 27 17:56:23 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.7
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       2d0083d
  Built:            Thu Jun 27 17:23:02 2019
  OS/Arch:          linux/amd64
  Experimental:     false
Containers: 10
 Running: 7
 Paused: 0
 Stopped: 3
Images: 4
Server Version: 18.09.7
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.15.0-54-generic
Operating System: Ubuntu 18.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.789GiB
Name: k8s-node1
ID: ZC43:K3I7:HP2S:LFUA:JXMF:EXV2:V7UJ:H7QN:27IJ:S3DC:6XYW:CS2P
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

WARNING: No swap limit support

Operating system and kernel: (cat /etc/os-release, uname -r preferred)

Same for both node 1 and node 2.

NAME="Ubuntu"
VERSION="18.04.2 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.2 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
4.15.0-54-generic

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)

VM in Hyper-V

cluster.yml file:

# If you intened to deploy Kubernetes in an air-gapped environment,
# please consult the documentation on how to configure custom RKE images.
nodes:
- address: 10.0.1.74
  port: "22"
  internal_address: ""
  role:
  - controlplane
  - etcd
  - worker
  hostname_override: k8s-node1
  user: k8s
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
- address: 10.0.1.75
  port: "22"
  internal_address: ""
  role:
  - controlplane
  - etcd
  - worker
  hostname_override: k8s-node2
  user: k8s
  docker_socket: /var/run/docker.sock
  ssh_key: ""
  ssh_key_path: ~/.ssh/id_rsa
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
services:
  etcd:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    external_urls: []
    ca_cert: ""
    cert: ""
    key: ""
    path: ""
    snapshot: null
    retention: ""
    creation: ""
    backup_config: null
  kube-api:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    service_cluster_ip_range: 172.24.0.0/16
    service_node_port_range: ""
    pod_security_policy: false
    always_pull_images: false
  kube-controller:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    cluster_cidr: 172.25.0.0/24
    service_cluster_ip_range: 172.24.0.0/16
  scheduler:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
  kubelet:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    cluster_domain: k8s.adminarsenal.net
    infra_container_image: ""
    cluster_dns_server: 172.24.0.10
    fail_swap_on: false
  kubeproxy:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
network:
  plugin: flannel
  options:
    flannel_backend_type: vxlan
authentication:
  strategy: x509
  sans: []
  webhook: null
addons: ""
addons_include: []
system_images:
  etcd: rancher/coreos-etcd:v3.3.10-rancher1
  alpine: rancher/rke-tools:v0.1.34
  nginx_proxy: rancher/rke-tools:v0.1.34
  cert_downloader: rancher/rke-tools:v0.1.34
  kubernetes_services_sidecar: rancher/rke-tools:v0.1.34
  kubedns: rancher/k8s-dns-kube-dns:1.15.0
  dnsmasq: rancher/k8s-dns-dnsmasq-nanny:1.15.0
  kubedns_sidecar: rancher/k8s-dns-sidecar:1.15.0
  kubedns_autoscaler: rancher/cluster-proportional-autoscaler:1.3.0
  coredns: rancher/coredns-coredns:1.3.1
  coredns_autoscaler: rancher/cluster-proportional-autoscaler:1.3.0
  kubernetes: rancher/hyperkube:v1.14.3-rancher1
  flannel: rancher/coreos-flannel:v0.10.0-rancher1
  flannel_cni: rancher/flannel-cni:v0.3.0-rancher1
  calico_node: rancher/calico-node:v3.4.0
  calico_cni: rancher/calico-cni:v3.4.0
  calico_controllers: ""
  calico_ctl: rancher/calico-ctl:v2.0.0
  canal_node: rancher/calico-node:v3.4.0
  canal_cni: rancher/calico-cni:v3.4.0
  canal_flannel: rancher/coreos-flannel:v0.10.0
  weave_node: weaveworks/weave-kube:2.5.0
  weave_cni: weaveworks/weave-npc:2.5.0
  pod_infra_container: rancher/pause:3.1
  ingress: rancher/nginx-ingress-controller:0.21.0-rancher3
  ingress_backend: rancher/nginx-ingress-controller-defaultbackend:1.5-rancher1
  metrics_server: rancher/metrics-server:v0.3.1
ssh_key_path: ~/.ssh/id_rsa
ssh_cert_path: ""
ssh_agent_auth: false
authorization:
  mode: rbac
  options: {}
ignore_docker_version: false
kubernetes_version: ""
private_registries: []
ingress:
  provider: ""
  options: {}
  node_selector: {}
  extra_args: {}
cluster_name: ""
cloud_provider:
  name: ""
prefix_path: ""
addon_job_timeout: 30
bastion_host:
  address: ""
  port: ""
  user: ""
  ssh_key: ""
  ssh_key_path: ""
  ssh_cert: ""
  ssh_cert_path: ""
monitoring:
  provider: ""
  options: {}
restore:
  restore: false
  snapshot_name: ""
dns: null

Steps to Reproduce:

Save config and then run ./rke up

Results:

INFO[0075] [remove/rke-log-cleaner] Successfully removed container on host [10.0.1.74]
INFO[0075] [remove/rke-log-cleaner] Successfully removed container on host [10.0.1.75]
INFO[0075] [sync] Syncing nodes Labels and Taints
INFO[0075] [sync] Successfully synced nodes Labels and Taints
INFO[0075] [network] Setting up network plugin: flannel
INFO[0075] [addons] Saving ConfigMap for addon rke-network-plugin to Kubernetes
INFO[0075] [addons] Successfully saved ConfigMap for addon rke-network-plugin to Kubernetes
INFO[0075] [addons] Executing deploy job rke-network-plugin
FATA[0105] Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system
statustale

Most helpful comment

I had the same issue, and these two steps solved my problem

  1. Increase addon_job_timeout
  2. Check node free space (at lease 15%)

In my case, one of the nodes had DiskPressure state

All 15 comments

I'm having the same problem with rke + CentOS 7.6 VMs, running native docker 1.13.1 (selinux enabled).

rke v0.2.8 + kubernetes 1.13.10: works OK.
rke v0.2.8 + kubernetes 1.14.6: "rke up" fails with FATAL error "Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system". If I re-run "rke up", then rke finishes successfully and kubernetes cluster works OK.

I did get the same error.
I'am using ubuntu 19.10 on 5 hosts and rke v0.3.1.
I did so an rke - up with an plane yml file. and everything went ok.
then I change the image to rancher/hyperkube:v1.16.2-rancher1 and run rke -up.
everything went ok :)

If you intened to deploy Kubernetes in an air-gapped environment,

please consult the documentation on how to configure custom RKE images.

nodes:

  • address: 192.168.1.120
    port: "22"
    internal_address: ""
    role:

    • controlplane

    • worker

    • etcd

      hostname_override: ""

      user: bjorn

      docker_socket: /var/run/docker.sock

      ssh_key: ""

      ssh_key_path: ~/.ssh/id_rsa

      ssh_cert: ""

      ssh_cert_path: ""

      labels: {}

      taints: []

  • address: 192.168.1.122
    port: "22"
    internal_address: ""
    role:

    • worker

      hostname_override: ""

      user: bjorn

      docker_socket: /var/run/docker.sock

      ssh_key: ""

      ssh_key_path: ~/.ssh/id_rsa

      ssh_cert: ""

      ssh_cert_path: ""

      labels: {}

      taints: []

  • address: 192.168.1.123
    port: "22"
    internal_address: ""
    role:

    • worker

      hostname_override: ""

      user: bjorn

      docker_socket: /var/run/docker.sock

      ssh_key: ""

      ssh_key_path: ~/.ssh/id_rsa

      ssh_cert: ""

      ssh_cert_path: ""

      labels: {}

      taints: []

  • address: 192.168.1.124
    port: "22"
    internal_address: ""
    role:

    • worker

      hostname_override: ""

      user: bjorn

      docker_socket: /var/run/docker.sock

      ssh_key: ""

      ssh_key_path: ~/.ssh/id_rsa

      ssh_cert: ""

      ssh_cert_path: ""

      labels: {}

      taints: []

  • address: 192.168.1.125
    port: "22"
    internal_address: ""
    role:

    • worker

      hostname_override: ""

      user: bjorn

      docker_socket: /var/run/docker.sock

      ssh_key: ""

      ssh_key_path: ~/.ssh/id_rsa

      ssh_cert: ""

      ssh_cert_path: ""

      labels: {}

      taints: []

      services:

      etcd:

      image: ""

      extra_args: {}

      extra_binds: []

      extra_env: []

      external_urls: []

      ca_cert: ""

      cert: ""

      key: ""

      path: ""

      uid: 0

      gid: 0

      snapshot: null

      retention: ""

      creation: ""

      backup_config: null

      kube-api:

      image: ""

      extra_args: {}

      extra_binds: []

      extra_env: []

      service_cluster_ip_range: 10.43.0.0/16

      service_node_port_range: ""

      pod_security_policy: false

      always_pull_images: false

      kube-controller:

      image: ""

      extra_args: {}

      extra_binds: []

      extra_env: []

      cluster_cidr: 10.42.0.0/16

      service_cluster_ip_range: 10.43.0.0/16

      scheduler:

      image: ""

      extra_args: {}

      extra_binds: []

      extra_env: []

      kubelet:

      image: ""

      extra_args: {}

      extra_binds: []

      extra_env: []

      cluster_domain: cluster.local

      infra_container_image: ""

      cluster_dns_server: 10.43.0.10

      fail_swap_on: false

      kubeproxy:

      image: ""

      extra_args: {}

      extra_binds: []

      extra_env: []

      network:

      plugin: weave

      options: {}

      node_selector: {}

      authentication:

      strategy: x509

      sans: []

      webhook: null

      addons: ""

      addons_include: []

      system_images:

      etcd: ""

      alpine: ""

      nginx_proxy: ""

      cert_downloader: ""

      kubernetes_services_sidecar: ""

      kubedns: ""

      dnsmasq: ""

      kubedns_sidecar: ""

      kubedns_autoscaler: ""

      coredns: ""

      coredns_autoscaler: ""

      kubernetes: "rancher/hyperkube:v1.16.2-rancher1"

      flannel: ""

      flannel_cni: ""

      calico_node: ""

      calico_cni: ""

      calico_controllers: ""

      calico_ctl: ""

      calico_flexvol: ""

      canal_node: ""

      canal_cni: ""

      canal_flannel: ""

      canal_flexvol: ""

      weave_node: ""

      weave_cni: ""

      pod_infra_container: ""

      ingress: ""

      ingress_backend: ""

      metrics_server: ""

      windows_pod_infra_container: ""

      ssh_key_path: ~/.ssh/id_rsa

      ssh_cert_path: ""

      ssh_agent_auth: false

      authorization:

      mode: rbac

      options: {}

      ignore_docker_version: false

      kubernetes_version: ""

      private_registries: []

      ingress:

      provider: ""

      options: {}

      node_selector: {}

      extra_args: {}

      dns_policy: ""

      cluster_name: ""

      cloud_provider:

      name: ""

      prefix_path: ""

      addon_job_timeout: 0

      bastion_host:

      address: ""

      port: ""

      user: ""

      ssh_key: ""

      ssh_key_path: ""

      ssh_cert: ""

      ssh_cert_path: ""

      monitoring:

      provider: ""

      options: {}

      node_selector: {}

      restore:

      restore: false

      snapshot_name: ""

      dns: null

another me too, I'm getting this attempting to just run rke against docker locally as a test. re-running doesn't solve the issue though, it never resolves or installs completely:

cluster_name: local
dns:
  provider: coredns
nodes:
  - address: 127.0.0.1
    user: tessa
    role:
      - controlplane
      - etcd
      - worker
ssh_agent_auth: true

I have this problem too

after I run rke up command I'm gettin an error like below:
FATA[0058] Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system

When I check pods, _rke-network-plugin-deploy-job_ still _ContainerCreating_ status

$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                  READY   STATUS              RESTARTS   AGE
kube-system   rke-network-plugin-deploy-job-4482c   0/1     ContainerCreating   0          56m

In other it's takes about 5m.. Anyone can help me?

I am having the same problem.
[root@rancher01 ~]# docker ps -a | grep ebcb7b662f69
ebcb7b662f69 00405a225ef9 "kubectl apply -f ..." 28 minutes ago Exited (1) 28 minutes ago k8s_rke-network-plugin-pod_rke-network-plugin-deploy-job-8ndtx_kube-system_78c45434-bad1-4e2a-bdec-a769d8cf93fa_0

[root@rancher01 ~]# docker logs ebcb7b662f69 -f
error: the path "/etc/config/rke-network-plugin.yaml" cannot be accessed: stat /etc/config/rke-network-plugin.yaml: permission denied

It is related to https://github.com/rancher/rancher/issues/23662

Same here on vagrant Centos 1905.1 (hello, @lucky-sideburn).
Disabling SELinux is the key.

I ran into this issue and it was due to a node taking too long to become ready. I just had to wait until kubectl get nodes reported all as ready, and then run rke up --update-only to finish the cluster deployment.

It seems for me the issue was too low default value of rke "addon_job_timeout" (default is 30 seconds).. I increased the value, and rke network plugin deploy job starting being successful (https://github.com/rancher/rke/issues/1652).

I had the same issue, and these two steps solved my problem

  1. Increase addon_job_timeout
  2. Check node free space (at lease 15%)

In my case, one of the nodes had DiskPressure state

I'm hitting this too. The rke-network-plugin-deploy-job job never completes and doesn't give any logs. The nodes are all NotReady. No pods are up. I set addon_job_timeout to 180 and my nodes have 97% free space (around 190GB free).

RKE v1.1.3
kubectl v1.18.3
cluster.yml is using:

  • kubernetes_version: v1.18.3-rancher2-2
  • calico plugin

Watch out for SELinux or firewalls between kubelet (10250, if I don't go wrong) and apiserver (6443)

I've disabled firewalls and apparmor on Ubuntu 18 and still can't get CNI job to complete. Nodes are NotReady and CNI job won't complete. Also, why no logs???

kubectl logs -l rke-network-plugin-deploy-job -n kube-system

Looking in docker logs for now.

Same issue, logs say:
$ kubectl -nkube-system logs pod/rke-network-plugin-deploy-job-6bn62

Error from server: no preferred addresses found; known addresses: []

As @aijanai said, there can be firewall problems, so I added this two ports and it fix the issue

sudo ufw allow 6443
sudo ufw allow 10250

This issue/PR has been automatically marked as stale because it has not had activity (commit/comment/label) for 60 days. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

myselfghost picture myselfghost  路  17Comments

iljaweis picture iljaweis  路  20Comments

niko-lay picture niko-lay  路  14Comments

piwi91 picture piwi91  路  25Comments

nevermosby picture nevermosby  路  22Comments