kops version are you running? The command kops version, will displayVersion 1.8.0 (git-5099bc5)
kubectl version will print thekops flag.Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.5", GitCommit:"cce11c6a185279d037023e02ac5249e14daa22bf", GitTreeState:"clean", BuildDate:"2017-12-07T16:16:03Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.5", GitCommit:"cce11c6a185279d037023e02ac5249e14daa22bf", GitTreeState:"clean", BuildDate:"2017-12-07T16:05:18Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
AWS
Applying any configuration resulting in the creation of a new container ends up with a pod stuck in ContainerCreating status and afterwards into sandbox failures (see logs).
First the pod stays in 'ContainerCreating' for a long time, afterwards there are several sandbox errors (see logs)
Normal deployment without the hanging.
kops get --name my.example.com -oyaml to display your cluster manifest.apiVersion: kops/v1alpha2
kind: Cluster
metadata:
creationTimestamp: 2017-09-07T12:12:21Z
name: kubernetes.xxx.xxx
spec:
api:
loadBalancer:
type: Public
authorization:
alwaysAllow: {}
channel: stable
cloudProvider: aws
configBase: s3://xxx-kops/kubernetes.xxx.xxx
dnsZone: Z16C10VSQ4D9E
docker:
logDriver: ""
storage: overlay2
etcdClusters:
- etcdMembers:
- instanceGroup: master-eu-central-1a
name: a
- instanceGroup: master-eu-central-1b
name: b
- instanceGroup: master-eu-central-1c
name: c
name: main
- etcdMembers:
- instanceGroup: master-eu-central-1a
name: a
- instanceGroup: master-eu-central-1b
name: b
- instanceGroup: master-eu-central-1c
name: c
name: events
iam:
legacy: true
kubeAPIServer:
runtimeConfig:
batch/v2alpha1: "true"
kubernetesApiAccess:
- 0.0.0.0/0
kubernetesVersion: 1.8.5
masterInternalName: api.internal.kubernetes.xxx.xxx
masterPublicName: api.kubernetes.xxx.xxx
networkCIDR: 172.20.0.0/16
networking:
weave:
mtu: 8912
nonMasqueradeCIDR: 100.64.0.0/10
sshAccess:
- 0.0.0.0/0
subnets:
- cidr: 172.20.32.0/19
name: eu-central-1a
type: Private
zone: eu-central-1a
- cidr: 172.20.64.0/19
name: eu-central-1b
type: Private
zone: eu-central-1b
- cidr: 172.20.96.0/19
name: eu-central-1c
type: Private
zone: eu-central-1c
- cidr: 172.20.0.0/22
name: utility-eu-central-1a
type: Utility
zone: eu-central-1a
- cidr: 172.20.4.0/22
name: utility-eu-central-1b
type: Utility
zone: eu-central-1b
- cidr: 172.20.8.0/22
name: utility-eu-central-1c
type: Utility
zone: eu-central-1c
topology:
bastion:
bastionPublicName: bastion.kubernetes.xxx.xxx
dns:
type: Public
masters: private
nodes: private
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2017-09-07T12:12:21Z
labels:
kops.k8s.io/cluster: kubernetes.xxx.xxx
name: bastions
spec:
image: kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28
machineType: t2.micro
maxSize: 1
minSize: 1
role: Bastion
subnets:
- utility-eu-central-1a
- utility-eu-central-1b
- utility-eu-central-1c
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2017-09-07T12:12:21Z
labels:
kops.k8s.io/cluster: kubernetes.xxx.xxx
name: master-eu-central-1a
spec:
image: kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28
machineType: m4.large
maxSize: 1
minSize: 1
role: Master
subnets:
- eu-central-1a
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2017-09-07T12:12:21Z
labels:
kops.k8s.io/cluster: kubernetes.xxx.xxx
name: master-eu-central-1b
spec:
image: kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28
machineType: m4.large
maxSize: 1
minSize: 1
role: Master
subnets:
- eu-central-1b
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2017-09-07T12:12:21Z
labels:
kops.k8s.io/cluster: kubernetes.xxx.xxx
name: master-eu-central-1c
spec:
image: kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28
machineType: m4.large
maxSize: 1
minSize: 1
role: Master
subnets:
- eu-central-1c
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2017-12-13T20:46:55Z
labels:
beta.kubernetes.io/fluentd-ds-ready: "true"
kops.k8s.io/cluster: kubernetes.xxx.xxx
name: nodes-base
spec:
image: kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-12-02
machineType: m4.2xlarge
maxSize: 4
minSize: 4
nodeLabels:
kops.k8s.io/instancegroup: nodes-base
role: Node
subnets:
- eu-central-1a
- eu-central-1b
- eu-central-1c
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-01-22T10:26:14Z
labels:
beta.kubernetes.io/fluentd-ds-ready: "true"
kops.k8s.io/cluster: kubernetes.xxx.xxx
name: nodes-general-purpose
spec:
image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-01-05
machineType: m4.2xlarge
maxSize: 0
minSize: 0
nodeLabels:
kops.k8s.io/instancegroup: nodes-general-purpose
role: Node
subnets:
- eu-central-1a
- eu-central-1b
- eu-central-1c
-v 10 flag.Jan 23 14:20:07 ip-172-20-96-21 kubelet[9704]: I0123 14:20:07.640211 9704 aws.go:1051] Could not determine public DNS from AWS metadata.
Jan 23 14:20:12 ip-172-20-96-21 kubelet[9704]: E0123 14:20:12.259174 9704 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
Jan 23 14:20:12 ip-172-20-96-21 kubelet[9704]: E0123 14:20:12.259200 9704 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
Jan 23 14:20:17 ip-172-20-96-21 kubelet[9704]: I0123 14:20:17.667982 9704 aws.go:1051] Could not determine public DNS from AWS metadata.
Jan 23 14:20:20 ip-172-20-96-21 kubelet[9704]: I0123 14:20:20.553308 9704 qos_container_manager_linux.go:320] [ContainerManager]: Updated QoS cgroup configuration
Jan 23 14:20:22 ip-172-20-96-21 kubelet[9704]: E0123 14:20:22.352318 9704 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
Jan 23 14:20:22 ip-172-20-96-21 kubelet[9704]: E0123 14:20:22.352346 9704 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
Jan 23 14:20:22 ip-172-20-96-21 kubelet[9704]: I0123 14:20:22.512209 9704 server.go:779] GET /metrics: (32.208317ms) 200 [[Prometheus/1.8.1] 172.20.91.21:34458]
Jan 23 14:20:27 ip-172-20-96-21 kubelet[9704]: I0123 14:20:27.687652 9704 aws.go:1051] Could not determine public DNS from AWS metadata.
Jan 23 14:20:30 ip-172-20-96-21 kubelet[9704]: E0123 14:20:30.431621 9704 remote_runtime.go:115] StopPodSandbox "4799a6a8c867bc324480b64df7221f13d6b83e8171a14c527b9c0559cf4b6426" from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 23 14:20:30 ip-172-20-96-21 kubelet[9704]: E0123 14:20:30.431669 9704 kuberuntime_manager.go:781] Failed to stop sandbox {"docker" "4799a6a8c867bc324480b64df7221f13d6b83e8171a14c527b9c0559cf4b6426"}
Jan 23 14:20:30 ip-172-20-96-21 kubelet[9704]: E0123 14:20:30.431708 9704 kubelet_pods.go:1063] Failed killing the pod "nginx-deployment-569477d6d8-jcbjz": failed to "KillPodSandbox" for "406a218c-0048-11e8-b572-026c39b367e0" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Jan 23 14:20:32 ip-172-20-96-21 kubelet[9704]: E0123 14:20:32.431681 9704 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
Jan 23 14:20:32 ip-172-20-96-21 kubelet[9704]: E0123 14:20:32.431728 9704 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
Jan 23 14:20:37 ip-172-20-96-21 kubelet[9704]: I0123 14:20:37.712114 9704 aws.go:1051] Could not determine public DNS from AWS metadata.
Jan 23 14:20:42 ip-172-20-96-21 kubelet[9704]: E0123 14:20:42.513956 9704 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
Jan 23 14:20:42 ip-172-20-96-21 kubelet[9704]: E0123 14:20:42.513986 9704 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
Jan 23 14:20:47 ip-172-20-96-21 kubelet[9704]: I0123 14:20:47.734079 9704 aws.go:1051] Could not determine public DNS from AWS metadata.
Jan 23 14:20:52 ip-172-20-96-21 kubelet[9704]: E0123 14:20:52.743703 9704 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
Jan 23 14:20:52 ip-172-20-96-21 kubelet[9704]: E0123 14:20:52.743728 9704 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
Jan 23 14:20:57 ip-172-20-96-21 kubelet[9704]: I0123 14:20:57.761509 9704 aws.go:1051] Could not determine public DNS from AWS metadata.
Jan 23 14:21:02 ip-172-20-96-21 kubelet[9704]: E0123 14:21:02.826386 9704 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
Jan 23 14:21:02 ip-172-20-96-21 kubelet[9704]: E0123 14:21:02.826413 9704 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
Jan 23 14:21:05 ip-172-20-96-21 kubelet[9704]: E0123 14:21:05.083079 9704 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
Jan 23 14:21:05 ip-172-20-96-21 kubelet[9704]: E0123 14:21:05.083105 9704 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
Jan 23 14:21:05 ip-172-20-96-21 kubelet[9704]: I0123 14:21:05.089787 9704 server.go:779] GET /stats/summary/: (61.282407ms) 200 [[Go-http-client/1.1] 172.20.66.110:33458]
Jan 23 14:21:07 ip-172-20-96-21 kubelet[9704]: I0123 14:21:07.780547 9704 aws.go:1051] Could not determine public DNS from AWS metadata.
Jan 23 14:21:12 ip-172-20-96-21 kubelet[9704]: E0123 14:21:12.904377 9704 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
Jan 23 14:21:12 ip-172-20-96-21 kubelet[9704]: E0123 14:21:12.904407 9704 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
Jan 23 14:21:17 ip-172-20-96-21 kubelet[9704]: I0123 14:21:17.800646 9704 aws.go:1051] Could not determine public DNS from AWS metadata.
Jan 23 14:21:20 ip-172-20-96-21 kubelet[9704]: I0123 14:21:20.554402 9704 qos_container_manager_linux.go:320] [ContainerManager]: Updated QoS cgroup configuration
Jan 23 14:21:22 ip-172-20-96-21 kubelet[9704]: I0123 14:21:22.502214 9704 server.go:779] GET /metrics: (9.704924ms) 200 [[Prometheus/1.8.1] 172.20.91.21:34458]
Jan 23 14:21:22 ip-172-20-96-21 kubelet[9704]: E0123 14:21:22.995951 9704 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
Jan 23 14:21:22 ip-172-20-96-21 kubelet[9704]: E0123 14:21:22.995978 9704 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
Jan 23 14:21:27 ip-172-20-96-21 kubelet[9704]: I0123 14:21:27.823773 9704 aws.go:1051] Could not determine public DNS from AWS metadata.
Jan 23 14:21:33 ip-172-20-96-21 kubelet[9704]: E0123 14:21:33.062525 9704 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
Jan 23 14:21:33 ip-172-20-96-21 kubelet[9704]: E0123 14:21:33.062556 9704 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
Jan 23 14:21:43 ip-172-20-96-21 kubelet[9704]: E0123 14:21:43.159664 9704 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
Jan 23 14:21:43 ip-172-20-96-21 kubelet[9704]: E0123 14:21:43.159715 9704 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
Jan 23 14:21:47 ip-172-20-96-21 kubelet[9704]: I0123 14:21:47.881168 9704 aws.go:1051] Could not determine public DNS from AWS metadata.
Jan 23 14:21:49 ip-172-20-96-21 kubelet[9704]: E0123 14:21:49.208647 9704 remote_runtime.go:115] StopPodSandbox "9dfd449d99efe66115045c5557efba54d57cab1b3617fb67fb412fc11487d266" from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 23 14:21:49 ip-172-20-96-21 kubelet[9704]: E0123 14:21:49.208684 9704 kuberuntime_gc.go:152] Failed to stop sandbox "9dfd449d99efe66115045c5557efba54d57cab1b3617fb67fb412fc11487d266" before removing: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jan 23 14:21:53 ip-172-20-96-21 kubelet[9704]: E0123 14:21:53.238500 9704 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
Jan 23 14:21:53 ip-172-20-96-21 kubelet[9704]: E0123 14:21:53.238527 9704 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
This started to happen suddenly a week ago. First it manifested itself as 'slow deployments' with intermittent sandbox failure. 3 days later, the deployments wouldn't finish anymore, always resulting in sandbox errors. Probably related to CNI from what I've searched, but all issues point to 'fixed' in 1.8.5 but somehow I get this problem.
I'm also using weavenet 2.0.1
The deployment I used for this test is:
apiVersion: apps/v1beta2 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: nginx-deployment
spec:
selector:
matchLabels:
app: nginx
replicas: 3 # tells deployment to run 2 pods matching the template
template: # create pods using pod definition in this template
metadata:
# unlike pod-nginx.yaml, the name is not included in the meta data as a unique name is
# generated from the deployment name
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
See this two links:
https://github.com/weaveworks/weave/issues/2797
https://github.com/weaveworks/weave/issues/2797
At the company I work for we just run into this yesterday :)
Basically weave does not reclaim unused IP addresses after nodes have been removed from the cluster. This was fixed in weave 2.1.1 but kops release 1.8.0 ships weave 2.0.5. The master branch in kops has weave 2.3.1, so we're waiting for a new release.
Meanwhile we're removing unused peers.
kubectl exec -n kube-system {MASTER_NODE_WEAVE_POD_ID} -c weave -- /home/weave/weave --local status ipam
kubectl exec -n kube-system {MASTER_NODE_WEAVE_POD_ID} -c weave -- /home/weave/weave --local rmpeer {MAC_OF_UNREACHABLE_NODE}
The node on which you run rmpeer will claim the unused addresses, so we're running the command across the master nodes
You can also upgrade kops to 1.8.1: https://github.com/kubernetes/kops/releases/tag/1.8.1
or you can upgrade weave image: https://github.com/kubernetes/kops/issues/3575 (look for AlexRRR's comment)
I have seen this same issue with 1.9.0-beta-2
@sstarcher any idea on root cause? Are you using weave?
yep, using weave. I see they have a few issues opened up.
https://github.com/weaveworks/weave/issues/3275
https://github.com/kubernetes/kops/issues/3575
already closed
https://github.com/weaveworks/weave/pull/3149
@yoz2326 Your steps fixed the issue for me. I had the issue after testing what would happen when I rebooted my nodes in a cluster. Note I'm actually using kubicorn with digitalocean, but I thought I'd post here to thank you and maybe help someone else who has the same issue :slightly_smiling_face:
You mentioned though that this is fixed in weave 2.1.1. But from what I can see this is still an issue when using :
weaveworks/weave-kube:2.3.0
weaveworks/weave-npc:2.3.0
For me, the output of the command was as follows:
kubectl exec -n kube-system weave-net-m6btt -c weave -- /home/weave/weave --local status ipam
e2:c2:ee:4f:09:4f(myfirstk8s-master-0) 2 IPs (06.2% of total) (2 active)
7e:12:35:ae:6a:2d(myfirstk8s-node-1) 12 IPs (37.5% of total) - unreachable!
72:91:37:2c:c8:e9(myfirstk8s-node-0) 8 IPs (25.0% of total) - unreachable!
22:00:53:3b:f3:4b(myfirstk8s-node-0) 4 IPs (12.5% of total)
51:82:5b:e0:91:16(myfirstk8s-node-1) 6 IPs (18.8% of total)
I followed your second command and removed the two unreachable nodes ( they are the same node, but after the reboot appear to have got a different mac address).
As soon as this happened the cluster sprang back into life.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Same issue is also with version weave 2.4
Same issue with version 2.4.1
I see it on weave 2.4.0 as well, with the cluster at k8s 1.10. This should probably be reopened.
We have autoscaling turned on, so nodes get added & removed pretty frequently; and that aggravates this issue I think. Some kind of automation around the rmpeer workaround would also help meanwhile.
@JaveriaK please upgrade to weave 2.5, this release has fixes to automate rmpeer and forget the nodes as they gets deleted from ASG.
Still seeing unreachable IPs in 2.5.0, although less frequently than before. It's also possible that the underlying cause may have changed. Just wanted to comment here first, please let me know if you'd like me to open a new issue.
I also see some weave pods restarting continuously. A log snippet from one of these is below:
INFO: 2019/02/18 23:20:05.099273 overlay_switch ->[ae:2a:da:c3:28:36(ip-172-31-42-187.us-west-1.compute.internal)] fastdp send InitSARemote: write tcp4 172.31.43.46:6783->172.31.42.187:19492: write: connection reset by peer
INFO: 2019/02/18 23:20:05.099305 overlay_switch ->[ae:2a:da:c3:28:36(ip-172-31-42-187.us-west-1.compute.internal)] using sleeve
INFO: 2019/02/18 23:20:05.102149 ->[172.31.42.187:19492|ae:2a:da:c3:28:36(ip-172-31-42-187.us-west-1.compute.internal)]: connection shutting down due to error: Inconsistent entries for 10.100.64.0: owned by f2:c1:39:11:01:d4 but incoming message says 1e:76:31:2a:3a:d1
INFO: 2019/02/18 23:35:02.335169 ->[172.31.42.208:19561|02:d0:fa:07:f7:ca(ip-172-31-42-208.us-west-1.compute.internal)]: connection shutting down due to error: cannot connect to ourself
INFO: 2019/02/18 23:35:02.335278 ->[172.31.42.208:6783|02:d0:fa:07:f7:ca(ip-172-31-42-208.us-west-1.compute.internal)]: connection shutting down due to error: cannot connect to ourself
@JaveriaK Please open a new issue in weave repo with relevant logs
It might be as silly as enabling CNI plugins ports. I am using weave-net so adequate ports are 6783/tcp, 6783/udp, 6784/udp on master node(s) in your firewall
Most helpful comment
See this two links:
https://github.com/weaveworks/weave/issues/2797
https://github.com/weaveworks/weave/issues/2797
At the company I work for we just run into this yesterday :)
Basically weave does not reclaim unused IP addresses after nodes have been removed from the cluster. This was fixed in weave 2.1.1 but kops release 1.8.0 ships weave 2.0.5. The master branch in kops has weave 2.3.1, so we're waiting for a new release.
Meanwhile we're removing unused peers.
kubectl exec -n kube-system {MASTER_NODE_WEAVE_POD_ID} -c weave -- /home/weave/weave --local status ipamkubectl exec -n kube-system {MASTER_NODE_WEAVE_POD_ID} -c weave -- /home/weave/weave --local rmpeer {MAC_OF_UNREACHABLE_NODE}The node on which you run
rmpeerwill claim the unused addresses, so we're running the command across the master nodes