Calico: k8s-policy-no-match causing default-deny NetworkPolicy to be bypassed on upgrade from Calico 2.1.5 to Calico 2.6.3

Created on 7 Dec 2017 · 8Comments · Source: projectcalico/calico

When I upgrade a cluster from kube 1.5.6 (with Calico 2.1.5) to kube 1.7.4 (with Calico 2.6.3), and then try to apply a "default-deny" kube NetworkPolicy (so that all traffic to all pods are blocked), traffic is still allowed. It appears this may be due to the k8s-policy-no-match policy still being there after the upgrade, and allowing traffic through before it would be dropped.

Expected Behavior

default-deny NetworkPolicy (that applies to a namespace but doesn't allow any traffic) should block all traffic

Current Behavior

On this specific upgrade scenario (described above), the traffic is allowed

Possible Solution

Possibly deleting the k8s-policy-no-match policy on the upgrade might solve the problem?

Steps to Reproduce (for bugs)

Start with a cluster at kube 1.5.6 and Calico 2.1.5
Upgrade that cluster to kube 1.7.4 and Calico 2.6.3
Create a pod in namespace "test-ns-1"
Create the following NeworkPolicy:

kubectl create -n test-ns-1 -f - <<EOF
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: default-deny
  namespace: test-ns-1
spec:
  podSelector: {}
EOF

Try to connect to that pod, and you will be able to (the policy should be blocking it but it isn't)

Context

Here is the iptables Chain protecting the pod from both a cluster that hasn't been upgraded (just installed to Calico 2.6.1) which works as I would expect, and one that has been upgraded to 2.6.3 and has the behavior I think is a bug:

2.6.1 cluster (not upgraded, just clean install, working properly)

Chain cali-tw-caliab932bb1a38 (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:wr_7R_2Ll4gXcpXS */ ctstate RELATED,ESTABLISHED
    0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:L-sEfYWoGlVgh-Lz */ ctstate INVALID
    0     0 MARK       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:W-fLKYSnYMhZd5Ja */ MARK and 0xfeffffff
    0     0 MARK       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:Uu1VC7XNaL2g8jPG */ /* Start of policies */ MARK and 0xfdffffff
    0     0 cali-pi-_fZoFCYDhDNhKISEPthv  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:RCEQSmEz2bmx2wqi */ mark match 0x0/0x2000000
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:i5LjpeGXCWw1Xxn5 */ /* Return if policy accepted */ mark match 0x1000000/0x1000000
    0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:HXgOyssdpP6KztWv */ /* Drop if no policies passed packet */ mark match 0x0/0x2000000
    0     0 cali-pri-k8s_ns.brad  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:cYtJveSK-8-vntTn */
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:Nr0JYr6_cj4zPWJK */ /* Return if profile accepted */ mark match 0x1000000/0x1000000
    0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:JT9V5j5nPunUBdLW */ /* Drop if no profiles matched */

2.6.3 cluster: upgraded from kube 1.5.6 (Calico 2.1.5) to kube 1.7.4 (Calico 2.6.3)

Chain cali-tw-calib1b1eec50b8 (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:A6Z6ayobcObMw_kv */ ctstate RELATED,ESTABLISHED
    0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:sIXICXfOSGJQxypA */ ctstate INVALID
    0     0 MARK       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:P0o_KjVvl5ruXji5 */ MARK and 0xfeffffff
    0     0 MARK       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:uH4oV6L5fNbFkqLs */ /* Start of policies */ MARK and 0xfdffffff
    0     0 cali-pi-_fZoFCYDhDNhKISEPthv  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:5ijObVeZuiCoI3wu */ mark match 0x0/0x2000000
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:uElfmU_jr-vQ04ec */ /* Return if policy accepted */ mark match 0x1000000/0x1000000
    0     0 cali-pi-k8s-policy-no-match  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:vkExxziUNt87hCpZ */ mark match 0x0/0x2000000
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:3HZWWz28QthvFdvJ */ /* Return if policy accepted */ mark match 0x1000000/0x1000000
    0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:_XJqN6yxOkc4q2r6 */ /* Drop if no policies passed packet */ mark match 0x0/0x2000000
    0     0 cali-pri-k8s_ns.brad  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:2r9Dk9gbua7Y_lln */
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:TFpA50GV-66vFEr4 */ /* Return if profile accepted */ mark match 0x1000000/0x1000000
    0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:lneUbySYs_UDD9ij */ /* Drop if no profiles matched */

Your Environment

Calico version: See above info. Non-working cluster was upgraded from 2.1.5 to 2.6.3
Orchestrator version (e.g. kubernetes, mesos, rkt): Non-working cluster upgraded from kube 1.5.6 to 1.7.4
Operating System and version: Ubuntu 16.04
Link to your project (optional):

Source

bradbehle

Most helpful comment

@caseydavenport I did some more digging on this issue, and the code to remove this policy does exist in v1.0.0 of calico-kube-controllers (the container log shows it being deleted), but that same code does not exist (or does not get run) in the v1.0.2 version. We can workaround this in our deploy by deleting the policy ourselves as soon as calico-kube-controllers is running, but I would like to know if this was taken out intentionally, and if so, why? (So that we don't delete it and then find out the hard way there is a reason it should be there). Also, this change in behavior could be seen as a serious issue, since it results in pods not being protected by deny policies like they should be once an upgrade happens.

bradbehle on 2 Jan 2018

👍2

All 8 comments

It does look like someone else hit this problem (see issue https://github.com/projectcalico/kube-controllers/issues/198), where the k8s-policy-no-match is not removed when upgrading to calico 2.6.3 from 2.1.5. Can someone let me know if that k8s-policy-no-match policy really should be getting deleted when upgrading a cluster?

And in our case when we upgrade from kube 1.5.6 (calico 2.1.5) to kube 1.7.4 (calico 2.6.3), we upgrade one node at a time (existing nodes are running calico-node 2.1.5 managed by systemd (NOT kube-hosted). So when we upgrade a node, we remove the systemd managed calico-node 2.1.5, and then start the calico-node daemonset on that node running 2.6.3. So we do have a period of time when we are running 2.6.3 on some nodes, and 2.1.5 on other nodes. At what point should the k8s-policy-no-match policy be removed? Once all the nodes have been upgraded? Once kube-controller is updated?

This is preventing us from moving to 2.6.3, so any help would be appreciated. Thanks.

bradbehle on 7 Dec 2017

@bradbehle yes, I think it's probably the k8s-policy-no-match chain that is preventing the default-deny form working correctly.

The relevant code seems to exist in v0.7.0 of the policy controller here.

However, corresponding code doesn't seem to exist in kube-controllers v1.0.0. Given the large version skew, it might be safest to do an incremental upgrade to Calico v2.5 first (which contains the above code), and then to v2.6.

However, I think that as soon as the new kube-controllers pod has been upgraded to v1.0 it is safe to remove the k8s-policy-no-match.

caseydavenport on 11 Dec 2017

bradbehle on 2 Jan 2018

👍2

@bradbehle thanks for the investigation.

Yes, I think we should fix this by re-implementing the missing code. I believe it was removed intentionally, the thinking being that users would not perform direct upgrades from an earlier version of the controllers code to v1.0, instead going through v0.7.0 which has the relevant code.

While that's still the recommended way, it should be low-cost to add this piece of code back into v1.0.x, hopefully saving some pain.

caseydavenport on 8 Jan 2018

👍1

The same behaviour happen to us (okay, we were in node 0.22 and a very old policy controller upgrading to 2.6.5) and @mrrandrade discovered that removing k8s-policy-no-match works fine.

The thing is that it used to work in policy-controller 1.0.0 (the one that we used as migration test) but not in 1.0.2 (released after our tests, and as there's no documented breaking changes, we've just used this one).

There's a question here: In upgrade docs, as it's not desired to have this 'migration' step, isn't better to have something that points to the removal of this policy after the migration, so DefaultDeny starts working again?

Thank you very much!!

rikatz on 12 Jan 2018

@rikatz I think it was a bug to remove this from v1.0.2, and I'd like to add it back into a v1.0.3 release and supersede v1.0.2. That way no manual step will be required.

We can clean up the code again once the code is sufficiently old.

caseydavenport on 12 Jan 2018

Ok, I think I found the root cause of this and I've got a fix here: https://github.com/projectcalico/kube-controllers/pull/208

caseydavenport on 16 Jan 2018

I've released https://github.com/projectcalico/kube-controllers/releases/tag/v1.0.3 with that fix in it.

@rikatz hopefully using v1.0.3 will remove the need for a migration step in your scripts. LMK!

caseydavenport on 17 Jan 2018

👍1

Was this page helpful?

0 / 5 - 0 ratings