Calico: Blocking traffic via kube-ipvs0 using GlobalNetworkPolicies

Created on 9 Nov 2018  路  11Comments  路  Source: projectcalico/calico

I'm currently looking at switching out custom iptables rules we've applied to hard-block certain traffic from cali+ interfaces to sensitive host ports, moving to use the GlobalNetworkPolicies in Calico v3 (with Kubernetes backend). Ran into a few issues with this and was wondering if anyone is able to advise. Scenario below:

Setup:

  • Kubernetes v1.12 w/ Canal (Calico v3.3, Flannel v0.9.0)
  • Taking sshd as a (bad) example, with it listening on all nodes, all interfaces and port 22
  • kube-proxy in IPVS mode
  • Built with kops latest

Previous configuration for iptables (example blocking ssh to nodes in cluster):

*filter
:INPUT ACCEPT [0:0]
-I INPUT 1 -i cali+ -p tcp -m tcp --dport 22 -m state --state NEW -j REJECT -m comment --comment "SSH"
:FORWARD ACCEPT [0:0]
-I FORWARD 1 -i cali+ -p tcp -m tcp -d {{ .SERVICE_CIDR }} --dport 22 -m state --state NEW -j REJECT -m comment --comment "SSH"
COMMIT

Previous custom configuration for Calico:

  chainInsertMode: append
  defaultEndpointToHostAction: RETURN

Previous behaviour (inside a pod running on node 10.200.0.10):

telnet 10.200.0.10 22 # blocked
telnet 10.200.0.20 22 # different node, blocked
telnet 172.17.0.1 22 # docker0 interface, blocked
telnet 10.10.0.10 22 # one of the kube-ipvs0 listening addresses on 10.200.0.10 node, blocked

New configuration is a Calico GlobalNetworkPolicy being deployed (no custom iptables rules or calico config):

  apiVersion: crd.projectcalico.org/v1
  kind: GlobalNetworkPolicy
  metadata:
    name: deny-rules
  spec:
    order: 1001
    applyOnForward: true
    egress:
    - action: Deny
      destination:
        nets:
        - "10.200.0.0/21" # Cluster CIDR
        - "10.10.0.0/16" # Flannel Network
        - "172.17.0.0/16" # docker0
        ports:
        - 22
      protocol: TCP

New behaviour (inside a pod running on node 10.200.0.10):

telnet 10.200.0.10 22 # blocked
telnet 10.200.0.20 22 # different node, blocked
telnet 172.17.0.1 22 # docker0 interface, blocked
telnet 10.10.0.10 22 # one of the kube-ipvs0 listening addresses on 10.200.0.10 node, this request isn't blocked and I can successfully ssh via this address to the 10.200.0.10 node

Are there any clear mistakes I may have made in my global network policy definition or will I just have to keep the INPUT chain rules in place for now, as it seems the GNP rules are occurring too late in the chains. Any help is appreciated, thanks!

All 11 comments

@KashifSaadat Thanks for your detailed information. I think I know what is going on here.

Basically calico apply policies after service ip has been DNATed to pod ip. (unless you use preDNAT policy in which case policy been applied before any kube-proxy DNAT rules. However, preDNAT policy will only apply to host endpoints (ethx interfaces) not workload endpoints (cali+ interfaces) )

Another fact is ipvs kube-proxy will route traffic for service ip (10.10.x.x) to input chain and DNAT happens after input chain. With your earlier setup, since you have a customised deny rule in input chain which reject all traffic to port 22, ssh will be blocked.
cali+ --> input ---> deny to port 22 --> DNAT to pod ip --> calico policy

With GlobalNetworkPolicy and with no customised deny rules, calico policy sees pod ip rather than service ip, probably you have some allow rule in place which allow ssh into pod ip.

AFAIU, avoid putting service ip into calico policy unless you are dealing with preDNAT policy. In your case, if you swap 10.10.0.0/16 with a pod CIDR. It should work for you.

Hey @song-jiang, thanks for the response!

I'm a bit confused with your suggestion, as in my above example I am already blocking the pod cidr which is 10.10.0.0/16. There's no specific allow rules allowing ssh into pods, but also I'm trying to prevent a pod ssh'ing into the node itself (rather than another pod).

@KashifSaadat Is 10.10.0.0/16 your pod cidr? kube-ipvs0 will only listen on addresses which is a service ip cidr. Can you let me know what is your pod CIDR and what is your service ip cidr? Also a output of ip a on your node would be helpful.

Hey @song-jiang, apologies for the confusion, details below:

With the configuration in kops you provide a nonMasqueradeCIDR which is the IP range for the cluster (services and pods), and then the VPC CIDR for the actual nodes (in AWS). The nonMasqueradeCIDR is then split up via kops with the result as follows:

  • Pod CIDR: 10.10.128.0/17
  • Service CIDR: 10.10.0.0/19

The rule I inserted covered both ranges so should have been fine anyways.

ip-10-200-0-10 ~ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 06:ea:4d:0b:32:ea brd ff:ff:ff:ff:ff:ff
    inet 10.200.0.10/25 brd 10.200.0.127 scope global dynamic eth0
       valid_lft 2554sec preferred_lft 2554sec
    inet6 fe80::4ea:4dff:fe0b:32ea/64 scope link
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:e3:4b:a4:d8 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
4: kube-ipvs0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
    link/ether 3a:63:48:45:5f:c9 brd ff:ff:ff:ff:ff:ff
    inet 10.10.0.1/32 brd 10.10.0.1 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.10.0.10/32 brd 10.10.0.10 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet6 fe80::3863:48ff:fe45:5fc9/64 scope link
       valid_lft forever preferred_lft forever
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue state UNKNOWN group default
    link/ether 7a:68:36:f9:5c:d8 brd ff:ff:ff:ff:ff:ff
    inet 10.10.130.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::7868:36ff:fef9:5cd8/64 scope link
       valid_lft forever preferred_lft forever
6: cali48750650425@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever

@KashifSaadat I see. Your pod CIDR and service CIDR is fine then. Could you please upload logs from your node sudo iptables-save -c and sudo ipset -L?
Also can you confirm cali48750650425@if3 is the pod interface initiating ssh packet?

Hey @song-jiang, apologies for the delayed response! I've attached files containing the output of those commands, below:

Some notes:

  • This is an ephemeral cluster periodically destroyed / recreated for testing purposes, so the IPs / interface names are a little different to the previous logs. In these logs the Cluster (VPC) CIDR is 10.250.0.0/21 (in my previous responses it was 10.200.0.0/21).
  • The above outputs are from a compute node running a test workload to validate the global network policy behaviour.
  • The pod interface initiating the ssh packet is calia32198b6323@if3 in the outputs attached.
  • I can't run ipset list on CoreOS due to a Kernel and userspace incompatible error so it was instead run inside a toolbox (Fedora) container on the OS, output should still be valid.

@KashifSaadat Thanks for your update. I believe we have got enough information. Will let you know once we have a solution for it.

@KashifSaadat Thanks for your patience. We have identified that this is a kube-proxy issue in ipvs mode. In your case, traffic goes to port '22' did not match a valid service port and was directed to local host sshd process. We will work with sig-network to get it resolved.

@song-jiang thanks very much for investigating, really appreciate it!

After discussing with the Kubernetes team, we think this needs to be fixed in kube-proxy: https://github.com/kubernetes/kubernetes/issues/72236

Looks like the outcome of this is that this was an upstream Kubernetes issue, so I'm going to close out this issue for now.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lwr20 picture lwr20  路  5Comments

caseydavenport picture caseydavenport  路  3Comments

wjentner picture wjentner  路  5Comments

holmesb picture holmesb  路  5Comments

Arvinderpal picture Arvinderpal  路  5Comments