In a K8s dual-stack cluster traffic from an external machine to an ipv6 loadBalancerIp does not work. Ipv6 to the very same loadBalancerIp works from a K8s cluster node. Ipv4 to a loadBalancerIp in the same system works from an external machine.
K8s >=1.16.0 must be setup with dual-stack enabled; https://kubernetes.io/docs/concepts/services-networking/dual-stack/
Cilium is installed with quick-install.yaml with enable-ipv6: "true" as the only update.
The mconnect program is used for testing and is installed with mconnect-dual.yaml.txt
Service setup;
vm-003 ~ # kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
coredns ClusterIP 12.0.0.2 <none> 53/UDP,53/TCP 7m7s
kubernetes ClusterIP 12.0.0.1 <none> 443/TCP 7m9s
mconnect LoadBalancer 12.0.136.235 10.0.0.0 5001:31344/TCP 5m58s
mconnect-ipv6 LoadBalancer fd00:4000::33e0 1000:: 5001:30317/TCP 5m58s
vm-003 ~ # kubectl get svc mconnect-ipv6 -o yaml
apiVersion: v1
kind: Service
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"mconnect-ipv6","namespace":"default"},"spec":{"ipFamily":"IPv6","loadBalancerIP":"1000::","ports":[{"port":5001}],"selector":{"app":"mconnect"},"type":"LoadBalancer"}}
creationTimestamp: "2019-09-19T10:07:21Z"
name: mconnect-ipv6
namespace: default
resourceVersion: "399"
selfLink: /api/v1/namespaces/default/services/mconnect-ipv6
uid: 0f539225-6b1b-414f-aade-e76bb6c6fb25
spec:
clusterIP: fd00:4000::33e0
externalTrafficPolicy: Cluster
ipFamily: IPv6
loadBalancerIP: '1000::'
ports:
- nodePort: 30317
port: 5001
protocol: TCP
targetPort: 5001
selector:
app: mconnect
sessionAffinity: None
type: LoadBalancer
status:
loadBalancer:
ingress:
- ip: '1000::'
# From a node within the cluster;
vm-003 ~ # mconnect -address mconnect.default.svc.xcluster:5001 -nconn 100
Failed connects; 0
Failed reads; 0
mconnect-deployment-54f999b8c9-rtxkt 25
mconnect-deployment-54f999b8c9-7lrf7 25
mconnect-deployment-54f999b8c9-ggj8w 25
mconnect-deployment-54f999b8c9-njrjg 25
vm-003 ~ # mconnect -address mconnect-ipv6.default.svc.xcluster:5001 -nconn 100
Failed connects; 0
Failed reads; 0
mconnect-deployment-54f999b8c9-njrjg 25
mconnect-deployment-54f999b8c9-7lrf7 25
mconnect-deployment-54f999b8c9-rtxkt 25
mconnect-deployment-54f999b8c9-ggj8w 25
vm-003 ~ # mconnect -address [1000::]:5001 -nconn 100
Failed connects; 0
Failed reads; 0
mconnect-deployment-54f999b8c9-ggj8w 25
mconnect-deployment-54f999b8c9-7lrf7 25
mconnect-deployment-54f999b8c9-njrjg 25
mconnect-deployment-54f999b8c9-rtxkt 25
# From an external machine;
vm-201 ~ # mconnect -address 10.0.0.0:5001 -nconn 100
Failed connects; 0
Failed reads; 0
mconnect-deployment-54f999b8c9-ggj8w 25
mconnect-deployment-54f999b8c9-njrjg 25
mconnect-deployment-54f999b8c9-7lrf7 25
mconnect-deployment-54f999b8c9-rtxkt 25
vm-201 ~ # mconnect -address [1000::]:5001 -nconn 100
Failed connects; 100
Failed reads; 0
Trace on the incoming node;
vm-002 ~ # tcpdump -eni any port 5001
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
12:17:59.663346 In 00:00:00:01:01:c9 ethertype IPv6 (0x86dd), length 96: 1000::1:c0a8:1c9.52826 > 1000::.5001: Flags [S], seq 201229737, win 64800, options [mss 1440,sackOK,TS val 1421917500 ecr 0,nop,wscale 6], length 0
12:17:59.663554 Out e2:bb:b9:a8:26:ba ethertype IPv6 (0x86dd), length 96: 1000::1:c0a8:102.43925 > f00d::b00:200:0:dd5b.5001: Flags [S], seq 201229737, win 64800, options [mss 1440,sackOK,TS val 1421917500 ecr 0,nop,wscale 6], length 0
12:17:59.663624 In 72:da:5c:a1:b8:51 ethertype IPv6 (0x86dd), length 96: f00d::b00:200:0:dd5b.5001 > 1000::1:c0a8:102.43925: Flags [S.], seq 3934384089, ack 201229738, win 64766, options [mss 1390,sackOK,TS val 1553754036 ecr 1421917500,nop,wscale 7], length 0
12:17:59.663638 Out e2:bb:b9:a8:26:ba ethertype IPv6 (0x86dd), length 96: f00d::b00:200:0:dd5b.5001 > 1000::1:c0a8:102.43925: Flags [S.], seq 3934384089, ack 201229738, win 64766, options [mss 1390,sackOK,TS val 1553754036 ecr 1421917500,nop,wscale 7], length 0
12:17:59.663641 In e2:bb:b9:a8:26:ba ethertype IPv6 (0x86dd), length 96: f00d::b00:200:0:dd5b.5001 > 1000::1:c0a8:102.43925: Flags [S.], seq 3934384089, ack 201229738, win 64766, options [mss 1390,sackOK,TS val 1553754036 ecr 1421917500,nop,wscale 7], length 0
12:18:00.674117 In 00:00:00:01:01:c9 ethertype IPv6 (0x86dd), length 96: 1000::1:c0a8:1c9.52826 > 1000::.5001: Flags [S], seq 201229737, win 64800, options [mss 1440,sackOK,TS val 1421918511 ecr 0,nop,wscale 6], length 0
12:18:00.674219 Out e2:bb:b9:a8:26:ba ethertype IPv6 (0x86dd), length 96: 1000::1:c0a8:102.43925 > f00d::b00:200:0:dd5b.5001: Flags [S], seq 201229737, win 64800, options [mss 1440,sackOK,TS val 1421918511 ecr 0,nop,wscale 6], length 0
12:18:00.674501 In 72:da:5c:a1:b8:51 ethertype IPv6 (0x86dd), length 96: f00d::b00:200:0:dd5b.5001 > 1000::1:c0a8:102.43925: Flags [S.], seq 3934384089, ack 201229738, win 64766, options [mss 1390,sackOK,TS val 1553755047 ecr 1421917500,nop,wscale 7], length 0
12:18:00.674523 Out e2:bb:b9:a8:26:ba ethertype IPv6 (0x86dd), length 96: f00d::b00:200:0:dd5b.5001 > 1000::1:c0a8:102.43925: Flags [S.], seq 3934384089, ack 201229738, win 64766, options [mss 1390,sackOK,TS val 1553755047 ecr 1421917500,nop,wscale 7], length 0
12:18:00.674526 In e2:bb:b9:a8:26:ba ethertype IPv6 (0x86dd), length 96: f00d::b00:200:0:dd5b.5001 > 1000::1:c0a8:102.43925: Flags [S.], seq 3934384089, ack 201229738, win 64766, options [mss 1390,sackOK,TS val 1553755047 ecr 1421917500,nop,wscale 7], length 0
12:18:01.678552 In 72:da:5c:a1:b8:51 ethertype IPv6 (0x86dd), length 96: f00d::b00:200:0:dd5b.5001 > 1000::1:c0a8:102.43925: Flags [S.], seq 3934384089, ack 201229738, win 64766, options [mss 1390,sackOK,TS val 1553756051 ecr 1421917500,nop,wscale 7], length 0
12:18:01.678577 Out e2:bb:b9:a8:26:ba ethertype IPv6 (0x86dd), length 96: f00d::b00:200:0:dd5b.5001 > 1000::1:c0a8:102.43925: Flags [S.], seq 3934384089, ack 201229738, win 64766, options [mss 1390,sackOK,TS val 1553756051 ecr 1421917500,nop,wscale 7], length 0
12:18:01.678581 In e2:bb:b9:a8:26:ba ethertype IPv6 (0x86dd), length 96: f00d::b00:200:0:dd5b.5001 > 1000::1:c0a8:102.43925: Flags [S.], seq 3934384089, ack 201229738, win 64766, options [mss 1390,sackOK,TS val 1553756051 ecr 1421917500,nop,wscale 7], length 0
^C
13 packets captured
13 packets received by filter
0 packets dropped by kernel
Interfaces;
vm-002 ~ # ip -d link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 minmtu 0 maxmtu 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 00:00:00:01:00:02 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 00:00:00:01:01:02 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
4: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0 promiscuity 0 minmtu 0 maxmtu 0
ipip any remote any local any ttl inherit nopmtudisc addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
5: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/gre 0.0.0.0 brd 0.0.0.0 promiscuity 0 minmtu 0 maxmtu 0
gre remote any local any ttl inherit nopmtudisc addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
6: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 0
gretap remote any local any ttl inherit nopmtudisc addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
7: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
erspan remote any local any ttl inherit nopmtudisc okey 0.0.0.0 erspan_index 0 erspan_ver 1 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
8: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/sit 0.0.0.0 brd 0.0.0.0 promiscuity 0 minmtu 1280 maxmtu 65555
sit ip6ip remote any local any ttl 64 nopmtudisc addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
9: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/tunnel6 :: brd :: promiscuity 0 minmtu 68 maxmtu 65503
ip6tnl ip6ip6 remote any local any hoplimit inherit encaplimit 0 tclass 0x00 flowlabel 0x00000 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
10: ip6gre0@NONE: <NOARP> mtu 1448 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/gre6 :: brd :: promiscuity 0 minmtu 0 maxmtu 0
ip6gre remote any local any hoplimit inherit encaplimit 0 tclass 0x00 flowlabel 0x00000 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
11: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 1e:24:fc:c7:28:85 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 0 maxmtu 0
dummy addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
12: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default
link/ether 6e:39:73:e5:13:84 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 0 maxmtu 0
dummy addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
13: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 5e:77:df:5e:70:bb brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
veth addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
14: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 3a:3e:58:da:d9:2e brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
veth addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
15: cilium_vxlan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/ether 56:12:af:1c:dd:64 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
vxlan externaladdrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
17: lxc_health@if16: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether fe:e5:e0:b1:6c:07 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 minmtu 68 maxmtu 65535
veth addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
19: lxcda1bbba57e7f@if18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether e2:bb:b9:a8:26:ba brd ff:ff:ff:ff:ff:ff link-netnsid 1 promiscuity 0 minmtu 68 maxmtu 65535
veth addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
General Information
cilium version)v1.6.1uname -a)Linux vm-201 5.3.0 #1 SMP Wed Sep 18 16:45:16 CEST 2019 x86_64 GNU/Linuxkubectl version, Mesos, ...)K8s v1.16.0How to reproduce the issue
type: LoadBalancer and ipFamily: IPv6BTW I use assign-lb-ip to set the loadBalancerIp with;
assign-lb-ip -svc mconnect; assign-lb-ip -svc mconnect-ipv6
Ipv4 trace (working) in; trace.txt
The interface used by k8s communication is eth1;
vm-002 ~ # ifconfig eth1
eth1 Link encap:Ethernet HWaddr 00:00:00:01:01:02
inet addr:192.168.1.2 Bcast:0.0.0.0 Mask:255.255.255.0
inet6 addr: 1000::1:c0a8:102/120 Scope:Global
inet6 addr: fe80::200:ff:fe01:102/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:201984 errors:0 dropped:0 overruns:0 frame:0
TX packets:12298 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:294657219 (281.0 MiB) TX bytes:1129006 (1.0 MiB)
But the external ipv6 address 1000::1:c0a8:102/120 is hijacked also by cilium_host;
vm-002 ~ # ifconfig cilium_host
cilium_host Link encap:Ethernet HWaddr 6E:37:50:B0:BF:52
inet addr:11.0.2.253 Bcast:0.0.0.0 Mask:255.255.255.255
inet6 addr: fe80::6c37:50ff:feb0:bf52/64 Scope:Link
inet6 addr: 1000::1:c0a8:102/128 Scope:Global
UP BROADCAST RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:1533 errors:0 dropped:0 overruns:0 frame:0
TX packets:398 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:141091 (137.7 KiB) TX bytes:25364 (24.7 KiB)
but with a different prefix /128.
@uablrek sorry for the late reply, it seems you are running kube-proxy is that correct? Kube-proxy should handle these type of requests since you are not running Cilium with node-port.
No worries, I am just testing CNI-plugins with K8s dual-stack. Cilium seem to work except for this case. I saw on slack that somebody had got K8s dual-stack running with Cilium so I will make another try with Cilium as soon as I can. Thanks for your time.
This bug is still present in k8s v1.17.0-beta.1 and cilium 1.6.3.
@brb was this supposed to work in 1.6.3?
@uablrek Can you please include iptables-save -c output?
@brb was this supposed to work in 1.6.3?
@aanm No idea, haven't tried myself. It seems that the user is running with kube-proxy.
Sorry for the late reply. Output included below.
NOTE: External traffic to the lb-address works for ipv4, and from traffic from a node to the ipv6 lb-address also works. It is only when traffic to the ipv6 lb-address originates from outside the cluster that fails.
I have upgraded to K8s v1.17.0-rc.1.
@uablrek Thanks, but I don't see the mconnect svc being provisioned. Can you create it, and then provide the iptables-save outputs?
It is provisioned;
# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 12.0.0.1 <none> 443/TCP 2m48s
mconnect-ipv4 LoadBalancer 12.0.183.80 10.0.0.0 5001:31941/TCP 104s
mconnect-ipv6 LoadBalancer fd00:4000::70cb 1000:: 5001:32280/TCP 104s
But a requirement for K8s dual-stack is kube-proxy in "ipvs" mode. Here is output from ipvsadm -Ln;
ipvsadm.txt.
If I can find the time I will apply the PR for dual-stack support with mode "iptables" and rebuild K8s from "master" and test if there is a differnce today.
I have tried with kube-proxy in mode=iptables and dual stack. The PR https://github.com/kubernetes/kubernetes/pull/82462 was applied on K8s "master";
> kubectl version --short
Client Version: v1.17.0-rc.1
Server Version: v1.18.0-alpha.0.1678+9caece8bd9fab5-dirty
The result is worse than with mode=ipvs. Now traffic originating from outside the cluster fails for both ipv6 and ipv4. Traffic to the external addresses but originating from a node inside the cluster still works for both ipv6 and ipv4.
Here are output from "iptables-save -c";
@uablrek Thanks. Can you attach a sysdump (https://docs.cilium.io/en/v1.6/troubleshooting/#automatic-diagnosis)?
@uablrek Any idea who assigned 1000::1:c0a8:105 to cilium_host (maybe you can check systemd-networkd logs)? Can you try removing it, and see whether your test works?
That would be cri-o working (correctly) in dual-stack mode. In dual-stack a POD in host namespace shall get both ipv4 and ipv6 addresses, and in case there are many options it shall be assigned the node addresses that K8s is using on the node.
I will try to find a way to "trick" cri-o to assign only ipv4, but I am unsure if I can.
@uablrek Can you try running with other container runtime?
Not easily, no. But I build cri-o locally so I think I can modify it.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
@uablrek is the issue still persisting?
I will try to test this week.
@aanm Seems to work now. I tested on K8s v1.17.3 and cilium 1.7.0.
I still run "kube-proxy". I can disable it (i actually replace the kube-proxy binary with a sleep) if is is valuable.
Tested on K8s v1.18.0-alpha.5 also. Works fine.
My compliments on the smooth dual-stack installation. I just set enable-ipv6: "true" and it works :smile:
Fantastic! Thanks for testing it out. I'll close the issue. Feel free to re-open it if the issue shows up.