Amazon-vpc-cni-k8s: Uninstalling Calico breaks networking (iptable rules not reset on uninstall)

Created on 13 Aug 2019  路  6Comments  路  Source: aws/amazon-vpc-cni-k8s

I followed this guide to install Calico on my EKS cluster: https://docs.aws.amazon.com/eks/latest/userguide/calico.html

If I later remove Calico and restart all of the containers behind a deployment, networking breaks.

Some simple steps that reproduce the problem:

  1. Set up a simple deployment with a LoadBalancer service
  2. install calico kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/release-1.5/config/v1.5/calico.yaml
  3. Verify that you can reach the service through the load balancer
  4. uninstall calico kubectl delete -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/release-1.5/config/v1.5/calico.yaml
  5. Restart all of the pods behind the deployment created in step 1
  6. The service can no longer be reached through the load balancer, all targets reported as unhealthy.

It seems that iptable rules are not being reset when calico is uninstalled. A quick workaround is to recreate the nodes, but until then the nodes will not function correctly.

bug calico documentation prioritP1

Most helpful comment

I think in general, we need to figure out a) what our policy is towards supporting Calico as a first-class option in the VPC CNI and b) if we agree that Calico should be supported with the same rigour as the non-Calico deployments, set up full e2e testing of Calico and c) write a support script for Calico troubleshooting similar to https://github.com/aws/amazon-vpc-cni-k8s/blob/master/scripts/aws-cni-support.sh.

Obviously, that is a bigger thing than just this particular GH issue :) But, I think to truly support Calico, we need to revisit things as a big picture.

All 6 comments

Also had the same issue!

I think in general, we need to figure out a) what our policy is towards supporting Calico as a first-class option in the VPC CNI and b) if we agree that Calico should be supported with the same rigour as the non-Calico deployments, set up full e2e testing of Calico and c) write a support script for Calico troubleshooting similar to https://github.com/aws/amazon-vpc-cni-k8s/blob/master/scripts/aws-cni-support.sh.

Obviously, that is a bigger thing than just this particular GH issue :) But, I think to truly support Calico, we need to revisit things as a big picture.

https://github.com/projectcalico/calico/blob/master/hack/remove-calico-policy/remove-policy.md
This one helps resetting by adding a daemonset to clean up after calico, at least a temporary fix.

I have faced similar issue and totally agree with jaypipes, either EKS supports it or not, if it's supported then it should work properly..

We have improved the documentation around uninstalling Calico. It is correct that some iptables rules are not being reset when Calico is uninstalled and that is now reflected in the docs.

Closing this issue for now, documentation is updated as @mogren mentioned. We will internally discuss on the points Jay brought up and have started adding e2e tests for calico with #1230. Will add calico specific cases soon.

Thank you!

Was this page helpful?
0 / 5 - 0 ratings