I'm currently looking at switching out custom iptables rules we've applied to hard-block certain traffic from cali+ interfaces to sensitive host ports, moving to use the GlobalNetworkPolicies in Calico v3 (with Kubernetes backend). Ran into a few issues with this and was wondering if anyone is able to advise. Scenario below:
Setup:
Previous configuration for iptables (example blocking ssh to nodes in cluster):
*filter
:INPUT ACCEPT [0:0]
-I INPUT 1 -i cali+ -p tcp -m tcp --dport 22 -m state --state NEW -j REJECT -m comment --comment "SSH"
:FORWARD ACCEPT [0:0]
-I FORWARD 1 -i cali+ -p tcp -m tcp -d {{ .SERVICE_CIDR }} --dport 22 -m state --state NEW -j REJECT -m comment --comment "SSH"
COMMIT
Previous custom configuration for Calico:
chainInsertMode: append
defaultEndpointToHostAction: RETURN
Previous behaviour (inside a pod running on node 10.200.0.10):
telnet 10.200.0.10 22 # blocked
telnet 10.200.0.20 22 # different node, blocked
telnet 172.17.0.1 22 # docker0 interface, blocked
telnet 10.10.0.10 22 # one of the kube-ipvs0 listening addresses on 10.200.0.10 node, blocked
New configuration is a Calico GlobalNetworkPolicy being deployed (no custom iptables rules or calico config):
apiVersion: crd.projectcalico.org/v1
kind: GlobalNetworkPolicy
metadata:
name: deny-rules
spec:
order: 1001
applyOnForward: true
egress:
- action: Deny
destination:
nets:
- "10.200.0.0/21" # Cluster CIDR
- "10.10.0.0/16" # Flannel Network
- "172.17.0.0/16" # docker0
ports:
- 22
protocol: TCP
New behaviour (inside a pod running on node 10.200.0.10):
telnet 10.200.0.10 22 # blocked
telnet 10.200.0.20 22 # different node, blocked
telnet 172.17.0.1 22 # docker0 interface, blocked
telnet 10.10.0.10 22 # one of the kube-ipvs0 listening addresses on 10.200.0.10 node, this request isn't blocked and I can successfully ssh via this address to the 10.200.0.10 node
Are there any clear mistakes I may have made in my global network policy definition or will I just have to keep the INPUT chain rules in place for now, as it seems the GNP rules are occurring too late in the chains. Any help is appreciated, thanks!
@KashifSaadat Thanks for your detailed information. I think I know what is going on here.
Basically calico apply policies after service ip has been DNATed to pod ip. (unless you use preDNAT policy in which case policy been applied before any kube-proxy DNAT rules. However, preDNAT policy will only apply to host endpoints (ethx interfaces) not workload endpoints (cali+ interfaces) )
Another fact is ipvs kube-proxy will route traffic for service ip (10.10.x.x) to input chain and DNAT happens after input chain. With your earlier setup, since you have a customised deny rule in input chain which reject all traffic to port 22, ssh will be blocked.
cali+ --> input ---> deny to port 22 --> DNAT to pod ip --> calico policy
With GlobalNetworkPolicy and with no customised deny rules, calico policy sees pod ip rather than service ip, probably you have some allow rule in place which allow ssh into pod ip.
AFAIU, avoid putting service ip into calico policy unless you are dealing with preDNAT policy. In your case, if you swap 10.10.0.0/16 with a pod CIDR. It should work for you.
Hey @song-jiang, thanks for the response!
I'm a bit confused with your suggestion, as in my above example I am already blocking the pod cidr which is 10.10.0.0/16. There's no specific allow rules allowing ssh into pods, but also I'm trying to prevent a pod ssh'ing into the node itself (rather than another pod).
@KashifSaadat Is 10.10.0.0/16 your pod cidr? kube-ipvs0 will only listen on addresses which is a service ip cidr. Can you let me know what is your pod CIDR and what is your service ip cidr? Also a output of ip a on your node would be helpful.
Hey @song-jiang, apologies for the confusion, details below:
With the configuration in kops you provide a nonMasqueradeCIDR which is the IP range for the cluster (services and pods), and then the VPC CIDR for the actual nodes (in AWS). The nonMasqueradeCIDR is then split up via kops with the result as follows:
10.10.128.0/1710.10.0.0/19The rule I inserted covered both ranges so should have been fine anyways.
ip-10-200-0-10 ~ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
link/ether 06:ea:4d:0b:32:ea brd ff:ff:ff:ff:ff:ff
inet 10.200.0.10/25 brd 10.200.0.127 scope global dynamic eth0
valid_lft 2554sec preferred_lft 2554sec
inet6 fe80::4ea:4dff:fe0b:32ea/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:e3:4b:a4:d8 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
4: kube-ipvs0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
link/ether 3a:63:48:45:5f:c9 brd ff:ff:ff:ff:ff:ff
inet 10.10.0.1/32 brd 10.10.0.1 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.10.0.10/32 brd 10.10.0.10 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet6 fe80::3863:48ff:fe45:5fc9/64 scope link
valid_lft forever preferred_lft forever
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue state UNKNOWN group default
link/ether 7a:68:36:f9:5c:d8 brd ff:ff:ff:ff:ff:ff
inet 10.10.130.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::7868:36ff:fef9:5cd8/64 scope link
valid_lft forever preferred_lft forever
6: cali48750650425@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
@KashifSaadat I see. Your pod CIDR and service CIDR is fine then. Could you please upload logs from your node sudo iptables-save -c and sudo ipset -L?
Also can you confirm cali48750650425@if3 is the pod interface initiating ssh packet?
Hey @song-jiang, apologies for the delayed response! I've attached files containing the output of those commands, below:
Some notes:
10.250.0.0/21 (in my previous responses it was 10.200.0.0/21).calia32198b6323@if3 in the outputs attached.ipset list on CoreOS due to a Kernel and userspace incompatible error so it was instead run inside a toolbox (Fedora) container on the OS, output should still be valid.@KashifSaadat Thanks for your update. I believe we have got enough information. Will let you know once we have a solution for it.
@KashifSaadat Thanks for your patience. We have identified that this is a kube-proxy issue in ipvs mode. In your case, traffic goes to port '22' did not match a valid service port and was directed to local host sshd process. We will work with sig-network to get it resolved.
@song-jiang thanks very much for investigating, really appreciate it!
After discussing with the Kubernetes team, we think this needs to be fixed in kube-proxy: https://github.com/kubernetes/kubernetes/issues/72236
Looks like the outcome of this is that this was an upstream Kubernetes issue, so I'm going to close out this issue for now.