Kops: error updating from kops 1.5.3 > 1.6

Created on 19 May 2017 · 10Comments · Source: kubernetes/kops

Tried to update our cluster (3 masters, 3 nodes, no rbac) running weave from 1.5.3 > 1.6 using the drain & validate flag and hadn't created the configmap so it failed after updating the first master.

Added the required config map and continued with the update, the next master seemed to complete fine but the last master node wouldn't validate. Weave hadn't started on the node and looking at the log a lot of Unable to register node "ip-172-20-37-16.eu-west-1.compute.internal" with API server: Post https://127.0.0.1/api/v1/nodes: dial tcp 127.0.0.1:443: getsockopt: connection refused

Deleting the master resulted in the same errors when it started back up. Deleting the second master also made this one fail with the same error.

lifecyclrotten

Source

a1dutch

Most helpful comment

Seems like the weave-net daemonset didn't get updated, added the tolerations by editing the daemonset then all masters became ready, seems to be related to #2366

a1dutch on 20 May 2017

👍2

All 10 comments

We need the logs for the scheduler and the controller that are active

chrislovecnm on 19 May 2017

kube scheduler log is empty kube controller log

a1dutch on 19 May 2017

anything we can try, or do we need to re-create our cluster?

a1dutch on 19 May 2017

Seems like the weave-net daemonset didn't get updated, added the tolerations by editing the daemonset then all masters became ready, seems to be related to #2366

a1dutch on 20 May 2017

👍2

Thanks for the update. Will try to reproduce.

chrislovecnm on 20 May 2017

We are having a similar issue. I wonder, could this be a race condition for how the DaemonSet is updated?

If the Daemonset replacement request goes to a 1.5 node, since the tolerations attribute was not present, wouldn't the controller simply ignore that attribute?

An example of this I believe can be seen by our "untouched" upgraded Daemonset. The critical thing to note is that in the last-applied-configuration we see the tolerations: are present, however in the actual output of the Daemonset they are not.

Would it be possible or prudent to make the tolerations redundant and keep the annotation in conjunction with the explicit Tolerations?

apiVersion: v1
items:
- apiVersion: extensions/v1beta1
  kind: DaemonSet
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"extensions/v1beta1","kind":"DaemonSet","metadata":{"annotations":{},"labels":{"name":"weave-net","role.kubernetes.io/networking":"1"},"name":"weave-net","namespace":"kube-system"},"spec":{"template":{"metadata":{"labels":{"name":"weave-net","role.kubernetes.io/networking":"1"}},"spec":{"containers":[{"command":["/home/weave/launch.sh"],"image":"weaveworks/weave-kube:1.9.4","livenessProbe":{"httpGet":{"host":"127.0.0.1","path":"/status","port":6784},"initialDelaySeconds":30},"name":"weave","resources":{"limits":{"cpu":"100m","memory":"200Mi"},"requests":{"cpu":"100m","memory":"200Mi"}},"securityContext":{"privileged":true},"volumeMounts":[{"mountPath":"/weavedb","name":"weavedb"},{"mountPath":"/host/opt","name":"cni-bin"},{"mountPath":"/host/home","name":"cni-bin2"},{"mountPath":"/host/etc","name":"cni-conf"},{"mountPath":"/host/var/lib/dbus","name":"dbus"},{"mountPath":"/lib/modules","name":"lib-modules"}]},{"image":"weaveworks/weave-npc:1.9.4","name":"weave-npc","resources":{"limits":{"cpu":"100m","memory":"200Mi"},"requests":{"cpu":"100m","memory":"200Mi"}},"securityContext":{"privileged":true}}],"hostNetwork":true,"hostPID":true,"restartPolicy":"Always","securityContext":{"seLinuxOptions":{"type":"spc_t"}},"serviceAccountName":"weave-net","tolerations":[{"effect":"NoSchedule","key":"node-role.kubernetes.io/master"}],"volumes":[{"emptyDir":{},"name":"weavedb"},{"hostPath":{"path":"/opt"},"name":"cni-bin"},{"hostPath":{"path":"/home"},"name":"cni-bin2"},{"hostPath":{"path":"/etc"},"name":"cni-conf"},{"hostPath":{"path":"/var/lib/dbus"},"name":"dbus"},{"hostPath":{"path":"/lib/modules"},"name":"lib-modules"}]}}}}
    creationTimestamp: 2017-05-30T14:51:23Z
    generation: 3
    labels:
      name: weave-net
      role.kubernetes.io/networking: "1"
    name: weave-net
    namespace: kube-system
    resourceVersion: "356995"
    selfLink: /apis/extensions/v1beta1/namespaces/kube-system/daemonsets/weave-net
    uid: 735e25fb-4547-11e7-b4c9-123be1737864
  spec:
    selector:
      matchLabels:
        name: weave-net
        role.kubernetes.io/networking: "1"
    template:
      metadata:
        creationTimestamp: null
        labels:
          name: weave-net
          role.kubernetes.io/networking: "1"
      spec:
        containers:
        - command:
          - /home/weave/launch.sh
          image: weaveworks/weave-kube:1.9.4
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 3
            httpGet:
              host: 127.0.0.1
              path: /status
              port: 6784
              scheme: HTTP
            initialDelaySeconds: 30
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          name: weave
          resources:
            limits:
              cpu: 100m
              memory: 200Mi
            requests:
              cpu: 100m
              memory: 200Mi
          securityContext:
            privileged: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /weavedb
            name: weavedb
          - mountPath: /host/opt
            name: cni-bin
          - mountPath: /host/home
            name: cni-bin2
          - mountPath: /host/etc
            name: cni-conf
          - mountPath: /host/var/lib/dbus
            name: dbus
          - mountPath: /lib/modules
            name: lib-modules
        - image: weaveworks/weave-npc:1.9.4
          imagePullPolicy: IfNotPresent
          name: weave-npc
          resources:
            limits:
              cpu: 100m
              memory: 200Mi
            requests:
              cpu: 100m
              memory: 200Mi
          securityContext:
            privileged: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
        dnsPolicy: ClusterFirst
        hostNetwork: true
        hostPID: true
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext:
          seLinuxOptions:
            type: spc_t
        serviceAccount: weave-net
        serviceAccountName: weave-net
        terminationGracePeriodSeconds: 30
        volumes:
        - emptyDir: {}
          name: weavedb
        - hostPath:
            path: /opt
          name: cni-bin
        - hostPath:
            path: /home
          name: cni-bin2
        - hostPath:
            path: /etc
          name: cni-conf
        - hostPath:
            path: /var/lib/dbus
          name: dbus
        - hostPath:
            path: /lib/modules
          name: lib-modules
    updateStrategy:
      type: OnDelete
  status:
    currentNumberScheduled: 7
    desiredNumberScheduled: 7
    numberAvailable: 7
    numberMisscheduled: 0
    numberReady: 7
    observedGeneration: 3
    updatedNumberScheduled: 2
kind: List
metadata: {}
resourceVersion: ""
selfLink: ""

jrnt30 on 2 Jun 2017

👍1

Digging in deeper, this seems to be somewhat similar to https://github.com/kubernetes/kubernetes/issues/46073

jrnt30 on 2 Jun 2017

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot on 26 Dec 2017

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot on 25 Jan 2018

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 24 Feb 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Cycle Nodes

owenmorgan · 3Comments

kops drain node

chrislovecnm · 3Comments

Sublime does not seem to work on edit

chrislovecnm · 3Comments

error: error validating "cluster-autoscaler.yml": error validating data: found invalid field tolerations for v1.PodSpec; if you choose to ignore these errors, turn validation off with --validate=false

endejoli · 4Comments

SSL handshake time issue on AWS with kops 1.6.1 and weave

DocValerian · 4Comments