Tried to update our cluster (3 masters, 3 nodes, no rbac) running weave from 1.5.3 > 1.6 using the drain & validate flag and hadn't created the configmap so it failed after updating the first master.
Added the required config map and continued with the update, the next master seemed to complete fine but the last master node wouldn't validate. Weave hadn't started on the node and looking at the log a lot of Unable to register node "ip-172-20-37-16.eu-west-1.compute.internal" with API server: Post https://127.0.0.1/api/v1/nodes: dial tcp 127.0.0.1:443: getsockopt: connection refused
Deleting the master resulted in the same errors when it started back up. Deleting the second master also made this one fail with the same error.
We need the logs for the scheduler and the controller that are active
kube scheduler log is empty kube controller log
anything we can try, or do we need to re-create our cluster?
Seems like the weave-net daemonset didn't get updated, added the tolerations by editing the daemonset then all masters became ready, seems to be related to #2366
Thanks for the update. Will try to reproduce.
We are having a similar issue. I wonder, could this be a race condition for how the DaemonSet is updated?
If the Daemonset replacement request goes to a 1.5 node, since the tolerations attribute was not present, wouldn't the controller simply ignore that attribute?
An example of this I believe can be seen by our "untouched" upgraded Daemonset. The critical thing to note is that in the last-applied-configuration we see the tolerations: are present, however in the actual output of the Daemonset they are not.
Would it be possible or prudent to make the tolerations redundant and keep the annotation in conjunction with the explicit Tolerations?
apiVersion: v1
items:
- apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"extensions/v1beta1","kind":"DaemonSet","metadata":{"annotations":{},"labels":{"name":"weave-net","role.kubernetes.io/networking":"1"},"name":"weave-net","namespace":"kube-system"},"spec":{"template":{"metadata":{"labels":{"name":"weave-net","role.kubernetes.io/networking":"1"}},"spec":{"containers":[{"command":["/home/weave/launch.sh"],"image":"weaveworks/weave-kube:1.9.4","livenessProbe":{"httpGet":{"host":"127.0.0.1","path":"/status","port":6784},"initialDelaySeconds":30},"name":"weave","resources":{"limits":{"cpu":"100m","memory":"200Mi"},"requests":{"cpu":"100m","memory":"200Mi"}},"securityContext":{"privileged":true},"volumeMounts":[{"mountPath":"/weavedb","name":"weavedb"},{"mountPath":"/host/opt","name":"cni-bin"},{"mountPath":"/host/home","name":"cni-bin2"},{"mountPath":"/host/etc","name":"cni-conf"},{"mountPath":"/host/var/lib/dbus","name":"dbus"},{"mountPath":"/lib/modules","name":"lib-modules"}]},{"image":"weaveworks/weave-npc:1.9.4","name":"weave-npc","resources":{"limits":{"cpu":"100m","memory":"200Mi"},"requests":{"cpu":"100m","memory":"200Mi"}},"securityContext":{"privileged":true}}],"hostNetwork":true,"hostPID":true,"restartPolicy":"Always","securityContext":{"seLinuxOptions":{"type":"spc_t"}},"serviceAccountName":"weave-net","tolerations":[{"effect":"NoSchedule","key":"node-role.kubernetes.io/master"}],"volumes":[{"emptyDir":{},"name":"weavedb"},{"hostPath":{"path":"/opt"},"name":"cni-bin"},{"hostPath":{"path":"/home"},"name":"cni-bin2"},{"hostPath":{"path":"/etc"},"name":"cni-conf"},{"hostPath":{"path":"/var/lib/dbus"},"name":"dbus"},{"hostPath":{"path":"/lib/modules"},"name":"lib-modules"}]}}}}
creationTimestamp: 2017-05-30T14:51:23Z
generation: 3
labels:
name: weave-net
role.kubernetes.io/networking: "1"
name: weave-net
namespace: kube-system
resourceVersion: "356995"
selfLink: /apis/extensions/v1beta1/namespaces/kube-system/daemonsets/weave-net
uid: 735e25fb-4547-11e7-b4c9-123be1737864
spec:
selector:
matchLabels:
name: weave-net
role.kubernetes.io/networking: "1"
template:
metadata:
creationTimestamp: null
labels:
name: weave-net
role.kubernetes.io/networking: "1"
spec:
containers:
- command:
- /home/weave/launch.sh
image: weaveworks/weave-kube:1.9.4
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
host: 127.0.0.1
path: /status
port: 6784
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: weave
resources:
limits:
cpu: 100m
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /weavedb
name: weavedb
- mountPath: /host/opt
name: cni-bin
- mountPath: /host/home
name: cni-bin2
- mountPath: /host/etc
name: cni-conf
- mountPath: /host/var/lib/dbus
name: dbus
- mountPath: /lib/modules
name: lib-modules
- image: weaveworks/weave-npc:1.9.4
imagePullPolicy: IfNotPresent
name: weave-npc
resources:
limits:
cpu: 100m
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
hostNetwork: true
hostPID: true
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
seLinuxOptions:
type: spc_t
serviceAccount: weave-net
serviceAccountName: weave-net
terminationGracePeriodSeconds: 30
volumes:
- emptyDir: {}
name: weavedb
- hostPath:
path: /opt
name: cni-bin
- hostPath:
path: /home
name: cni-bin2
- hostPath:
path: /etc
name: cni-conf
- hostPath:
path: /var/lib/dbus
name: dbus
- hostPath:
path: /lib/modules
name: lib-modules
updateStrategy:
type: OnDelete
status:
currentNumberScheduled: 7
desiredNumberScheduled: 7
numberAvailable: 7
numberMisscheduled: 0
numberReady: 7
observedGeneration: 3
updatedNumberScheduled: 2
kind: List
metadata: {}
resourceVersion: ""
selfLink: ""
Digging in deeper, this seems to be somewhat similar to https://github.com/kubernetes/kubernetes/issues/46073
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Prevent issues from auto-closing with an /lifecycle frozen comment.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Most helpful comment
Seems like the weave-net daemonset didn't get updated, added the tolerations by editing the daemonset then all masters became ready, seems to be related to #2366