I followed this guide to install Calico on my EKS cluster: https://docs.aws.amazon.com/eks/latest/userguide/calico.html
If I later remove Calico and restart all of the containers behind a deployment, networking breaks.
Some simple steps that reproduce the problem:
kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/release-1.5/config/v1.5/calico.yamlkubectl delete -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/release-1.5/config/v1.5/calico.yamlIt seems that iptable rules are not being reset when calico is uninstalled. A quick workaround is to recreate the nodes, but until then the nodes will not function correctly.
Also had the same issue!
I think in general, we need to figure out a) what our policy is towards supporting Calico as a first-class option in the VPC CNI and b) if we agree that Calico should be supported with the same rigour as the non-Calico deployments, set up full e2e testing of Calico and c) write a support script for Calico troubleshooting similar to https://github.com/aws/amazon-vpc-cni-k8s/blob/master/scripts/aws-cni-support.sh.
Obviously, that is a bigger thing than just this particular GH issue :) But, I think to truly support Calico, we need to revisit things as a big picture.
https://github.com/projectcalico/calico/blob/master/hack/remove-calico-policy/remove-policy.md
This one helps resetting by adding a daemonset to clean up after calico, at least a temporary fix.
I have faced similar issue and totally agree with jaypipes, either EKS supports it or not, if it's supported then it should work properly..
We have improved the documentation around uninstalling Calico. It is correct that some iptables rules are not being reset when Calico is uninstalled and that is now reflected in the docs.
Closing this issue for now, documentation is updated as @mogren mentioned. We will internally discuss on the points Jay brought up and have started adding e2e tests for calico with #1230. Will add calico specific cases soon.
Thank you!
Most helpful comment
I think in general, we need to figure out a) what our policy is towards supporting Calico as a first-class option in the VPC CNI and b) if we agree that Calico should be supported with the same rigour as the non-Calico deployments, set up full e2e testing of Calico and c) write a support script for Calico troubleshooting similar to https://github.com/aws/amazon-vpc-cni-k8s/blob/master/scripts/aws-cni-support.sh.
Obviously, that is a bigger thing than just this particular GH issue :) But, I think to truly support Calico, we need to revisit things as a big picture.