1. What kops version are you running? The command kops version, will display
this information.
% kops version
Version 1.10.0-alpha.1 (git-7f70266f5)
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.4", GitCommit:"5ca598b4ba5abb89bb773071ce452e33fb66339d", GitTreeState:"clean", BuildDate:"2018-06-06T08:13:03Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.6", GitCommit:"a21fdbd78dde8f5447f5f6c331f7eb6f80bd684e", GitTreeState:"clean", BuildDate:"2018-07-26T10:04:08Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
Edited kops Cluster resource to set CNI to amazonvpc:
networking:
amazonvpc: {}
Then I used kops update to apply that resource template (YAML) to kops and then generated Terraform output.
Next, I ran terraform plan and terraform apply. This proceeded normally. At this point, I started a rolling update but noticed immediately that something was awry. As soon as I had updated kops with the new CNI configuration, an aws-node DaemonSet was launched on my cluster alongside the existing canal DaemonSet. This led to brokenness amongst pods.
I ended up fixing it by deleting the aws-node DaemonSet and then deleting all of the canal pods, allowing the canal DaemonSet to start fresh pods, which seem to work. I'm going to have to revert my changes to the Cluster resource and hope that it stays fixed.
6. What did you expect to happen?
I assumed that nothing would be changed as far as pods go until the rolling update was performed.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
9. Anything else do we need to know?
I'm chris.snell on Slack #kops-users if you have any questions.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle rotten
I was facing a similar issue today when I wanted to switch from weave to flannel-vxlan. I couldn't find anything about changing/migrating the CNI provider in the docs. So I just tried. And it failed. Maybe just a docs issue? A small disclaimer "do not change the cni provider in a running cluster" would have been enough.
I had a similar experience. It started the new pods, leaving the old daemonset in place. then the networking issues started. I had trouble rolling back to weave but eventually succeeded.
There is actually a section about switching CNI providers https://github.com/kubernetes/kops/blob/master/docs/networking.md#switching-between-networking-providers
However it says "Switching from kubenet to a CNI network provider has not been tested at this time." and by extension you can probably expect that switching from one CNI to another CNI probably hasn't been tested either.
Hello everyone,
Not sure how relevant is this now, but I've recently changed overlay networks, from weave and cilium to calico. This is a disruptive operation since nodes and masters have to be rolled. If you can afford downtime (10-30 minutes), this is how I've done it.
...
networking:
weave: {}
...
to
...
networking:
cni: {}
...
Save the file, run kops update cluster $NAME --yes
Remove everything that pertains to weave (svc, cluster roles, daemonsets...). This can be accomplished by running kubectl delete -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
Roll your masters and nodes as fast as possible:
kops rolling-update cluster --cloudonly --force --master-interval=1s --node-interval=1s --yes
Wait until the new masters pup up (they will show as NotReady) and then apply your desired overlay. In my case was calico as described here https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/calico (less or more than 50 nodes, not the etcd method).
Wait until your calico pods are ready and your masters and nodes will show as Ready
If you configured your cluster using networking=cni, then skip step 1
HTH
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
What @ssro suggested worked for me also. I switched from calico to weave.
The reason behind that was related to the fact that calico was previously installed with etcdv2 and when I did the upgrade to etcd-manager it broke calico (as it was relying on the etcd) - it couldn't find the certificates to boot.
Ditched it and replaced with weave. Everything's working as expected 馃槈
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen.
Mark the issue as fresh with/remove-lifecycle rotten.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
Hello everyone,
Not sure how relevant is this now, but I've recently changed overlay networks, from weave and cilium to calico. This is a disruptive operation since nodes and masters have to be rolled. If you can afford downtime (10-30 minutes), this is how I've done it.
to
Save the file, run
kops update cluster $NAME --yesRemove everything that pertains to weave (svc, cluster roles, daemonsets...). This can be accomplished by running
kubectl delete -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"Roll your masters and nodes as fast as possible:
Wait until the new masters pup up (they will show as
NotReady) and then apply your desired overlay. In my case was calico as described here https://docs.projectcalico.org/v3.6/getting-started/kubernetes/installation/calico (less or more than 50 nodes, not the etcd method).Wait until your calico pods are ready and your masters and nodes will show as
ReadyIf you configured your cluster using
networking=cni, then skip step 1HTH