Kops: Master status NotReady with --networking=flannel or weave on kops 1.6 alpha and kubernetes 1.6

Created on 4 Apr 2017 · 12Comments · Source: kubernetes/kops

Flannel pod is not getting scheduled in the master node, so it stays in NotReady state. It seems to be working fine for other nodes though

Source

felipejfc

Most helpful comment

@felipejfc @mumoshu Awesome! That did the trick. Many thanks!

autostatic on 5 Apr 2017

🎉1 👍1

All 12 comments

just tested with weave and it doesn't seem to be working as well, weave pod is not getting scheduled in master node...

Name:           ip-172-30-55-4.sa-east-1.compute.internal
Role:           master
Labels:         beta.kubernetes.io/arch=amd64
            beta.kubernetes.io/instance-type=m4.large
            beta.kubernetes.io/os=linux
            failure-domain.beta.kubernetes.io/region=sa-east-1
            failure-domain.beta.kubernetes.io/zone=sa-east-1a
            kubernetes.io/hostname=ip-172-30-55-4.sa-east-1.compute.internal
            kubernetes.io/role=master
            node-role.kubernetes.io/master=
Taints:         <none>
CreationTimestamp:  Tue, 04 Apr 2017 16:34:15 -0300
Phase:
Conditions:
  Type          Status  LastHeartbeatTime           LastTransitionTime          Reason              Message
  ----          ------  -----------------           ------------------          ------              -------
  OutOfDisk         False   Tue, 04 Apr 2017 16:42:16 -0300     Tue, 04 Apr 2017 16:34:15 -0300     KubeletHasSufficientDisk    kubelet has sufficient disk space available
  MemoryPressure    False   Tue, 04 Apr 2017 16:42:16 -0300     Tue, 04 Apr 2017 16:34:15 -0300     KubeletHasSufficientMemory  kubelet has sufficient memory available
  DiskPressure      False   Tue, 04 Apr 2017 16:42:16 -0300     Tue, 04 Apr 2017 16:34:15 -0300     KubeletHasNoDiskPressure    kubelet has no disk pressure
  Ready         False   Tue, 04 Apr 2017 16:42:16 -0300     Tue, 04 Apr 2017 16:34:15 -0300     KubeletNotReady         runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses:      172.30.55.4,172.30.55.4,ip-172-30-55-4.sa-east-1.compute.internal,ip-172-30-55-4.sa-east-1.compute.internal
Capacity:
 cpu:       2
 memory:    8178108Ki
 pods:      110
Allocatable:
 cpu:       2
 memory:    8075708Ki
 pods:      110
System Info:
 Machine ID:            1c0c38bc4e93485897710853b2940bbe
 System UUID:           EC298B97-8470-B99B-346E-1BB4F3B10B68
 Boot ID:           6b68cb85-b871-4604-bc67-660590653da9
 Kernel Version:        4.4.41-k8s
 OS Image:          Debian GNU/Linux 8 (jessie)
 Operating System:      linux
 Architecture:          amd64
 Container Runtime Version: docker://1.12.3
 Kubelet Version:       v1.6.0
 Kube-Proxy Version:        v1.6.0
PodCIDR:            100.96.0.0/24
ExternalID:         i-04a886fb5966b3c29
Non-terminated Pods:        (6 in total)
  Namespace         Name                                        CPU Requests    CPU Limits  Memory Requests Memory Limits
  ---------         ----                                        ------------    ----------  --------------- -------------
  kube-system           etcd-server-events-ip-172-30-55-4.sa-east-1.compute.internal            100m (5%)   0 (0%)      0 (0%)      0 (0%)
  kube-system           etcd-server-ip-172-30-55-4.sa-east-1.compute.internal               200m (10%)  0 (0%)      0 (0%)      0 (0%)
  kube-system           kube-apiserver-ip-172-30-55-4.sa-east-1.compute.internal            150m (7%)   0 (0%)      0 (0%)      0 (0%)
  kube-system           kube-controller-manager-ip-172-30-55-4.sa-east-1.compute.internal       100m (5%)   0 (0%)      0 (0%)      0 (0%)
  kube-system           kube-proxy-ip-172-30-55-4.sa-east-1.compute.internal                100m (5%)   0 (0%)      0 (0%)      0 (0%)
  kube-system           kube-scheduler-ip-172-30-55-4.sa-east-1.compute.internal            100m (5%)   0 (0%)      0 (0%)      0 (0%)

just saw that the daemonset contains:

        scheduler.alpha.kubernetes.io/tolerations: |
          [
            {
              "key": "dedicated",
              "operator": "Equal",
              "value": "master",
              "effect": "NoSchedule"
            }
          ]

so this seems to be expected, yet it seems that the cluster cannot work properly without weave-net deployed into master

felipejfc on 4 Apr 2017

I know a fix went out in 1.6.1 for something similar. You mind testing with 1.6.1?

chrislovecnm on 4 Apr 2017

@chrislovecnm just upgraded a 1.6.0 cluster with kubenet to 1.6.1 with weave and it is with exactly the same symptoms

felipejfc on 5 Apr 2017

@felipejfc @chrislovecnm I'm not sure if it is the only cause but anyway, tolerations in annotations had been removed since 1.6.
Would you mind trying with the newly added the tolerations field?

FYI in kube-aws I started rewriting all the manifests to include the field instead and the cluster seems to work so far:
https://github.com/kubernetes-incubator/kube-aws/pull/492/files#diff-ef25536c536667a40b993d4d24ab7567L625

mumoshu on 5 Apr 2017

Same issue here, trying to roll out Flannel v0.7.0 on a Kubernetes v1.6.1 cluster but the status of the master remains NotReady and no Flannel pod gets deployed. @mumoshu, which annotations should I add where to test if that works around this issue?

autostatic on 5 Apr 2017

@mumoshu we are already using the tolerations field, at least for weave
https://github.com/kubernetes/kops/blob/cbc36f614abdfce3788672fdff4b8362a93b0bfe/upup/models/cloudup/resources/addons/networking.weave/v1.9.4.yaml#L94-L96

Edit
hmm well, I may be wrong, the 1.9.4 yaml uses tolerations field but the one that's being created is 1.9.3 and it uses annotations

felipejfc on 5 Apr 2017

Hi @autostatic, I assume that you're trying to populate /etc/kubernetes/cni/net.d using a daemonset managing flannel pods, right?
AFAIK, you can't do that anymore since v1.6.0 - A k8s node requires /etc/kuberenetes/cni/net.d/ to be populated to mark it "Ready". On the other hand, a flannel pod requires the node to be already "Ready" before it is scheduled. Dead-lock. I believe it is the same issue as kube-aws is currently facing.

Perhaps running install-cni via rkt run, docker run, or even k8s static pods (typically located at /etc/kubernetes/manifests/*.yaml in a k8s node) would help?

mumoshu on 5 Apr 2017

@mumoshu I don't believe this is the issue here, since the nodes themselves gets to ready state, the master is the only one that does not

felipejfc on 5 Apr 2017

For weave's case we only need to tag a new kops version including this PR's changes
https://github.com/kubernetes/kops/pull/2251

felipejfc on 5 Apr 2017

@mumoshu Thanks for the pointer but Flannel still doesn't get deployed on the master even if I manually add /etc/cni/net.d/10-flannel.conf. I have to admit I'm not using kops so I'll take my issue elsewhere as I don't want to hijack this report. Thanks for the help though!

autostatic on 5 Apr 2017

@autostatic @mumoshu was right, using tolerations field makes it be deployed, as in:

...
spec:
  template:
    metadata:
      labels:
        tier: node
        app: flannel
        role.kubernetes.io/networking: "1"
    spec:
      hostNetwork: true
      nodeSelector:
        beta.kubernetes.io/arch: amd64
      serviceAccountName: flannel
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
...

flannel then gets deployed to master and it gets in ready state

felipejfc on 5 Apr 2017

🎉1

@felipejfc @mumoshu Awesome! That did the trick. Many thanks!

autostatic on 5 Apr 2017

🎉1 👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Automatic Instance Group labels for Node Affinity scheduling

RXminuS · 5Comments

Kubectl top nodes not working with the metrics server

minasys · 3Comments

CoreDNS externalCoreFile Parsing Invalid - Indentation

joshbranham · 3Comments

Cluster create fails with kops-version.txt not found

mikejoh · 3Comments

error: error validating "cluster-autoscaler.yml": error validating data: found invalid field tolerations for v1.PodSpec; if you choose to ignore these errors, turn validation off with --validate=false

endejoli · 4Comments