Calico: can not ping6 to ipv6 address of pod

Created on 24 Sep 2019 · 24Comments · Source: projectcalico/calico

In my K8S cluster of verison 1.16 , with just IPv4 stack on , I run the calico of version 3.9.0.
I create a default ipv4 ipPool and a default ipv6 ipPool .
then, I create a pod owning an ipv4 and an ipv6 address

3: eth0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default 
    link/ether 66:32:71:b9:a8:f1 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.28.156.7/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fc00::10:4511:66bb:a4e2:9c06/128 scope global 
       valid_lft forever preferred_lft forever

From another node , I failed to ping6 the pod ip fc00::10:4511:66bb:a4e2:9c06 .
Finally, I found the reason is that the IPv6 route on the pod's node is gone , like "fc00::10:4511:66bb:a4e2:9c06 via calicoXXXX " . I try to add this route manually, it succeeded to ping6 from another node . The strange is that the route that I added manually disappears after about 30 seconds.
So, I guess something remove the pod's ipv6 route on the node , which makes the node failed to forward packets to the pod .

BTW, the following sysctl config has been checked:

net.ipv6.conf.all.forwarding = 1
net.ipv6.conf.dce.accept_ra = 0
net.ipv6.conf.all.disable_ipv6 = 0

I also did other 2 test case:

when the calico on , I add the route by manual , like "fc00::10:4511:66bb:a4e2:9c06 via ens192" , the route will not disapear.
2 when the calico removed, I add the route by manual , like "fc00::10:4511:66bb:a4e2:9c06 via calicXXXX" , the route will not disapear.

So , I guess this is related to calico-node , After checking the bird6 config on the calico-node, I still do not figure out .

Anyone has an idear , thx

Your Environment

Calico version
3.9.0

Orchestrator version (e.g. kubernetes, mesos, rkt):
k8s 1.16.0
docker 18.09
Operating System and version:
centos 7.6
Link to your project (optional):

kinsupport

Source

weizhouBlue

Most helpful comment

We're expecting to have the next release of Calico in a couple weeks which should include dual stack support.

tmjd on 2 Dec 2019

🎉3

All 24 comments

Have you ensured Calico/Felix is enabled for IPv6? Specifically have you set IP6
and FELIX_IPV6SUPPORT are set as outlined here: https://docs.projectcalico.org/v3.9/networking/ipv6#enabling-ipv6-support-in-calico ?
I do not think that whole section is relevant since it sounds like you want to continue using IPv4 in kubernetes but want your pods to also have IPv6 connectivity.

Also I see in the prerequisites section Each host must have a default IPv6 route., you may want to make sure you have that also.

tmjd on 24 Sep 2019

hi @tmjd

the configure you mentioned is listed below , which should be correct

# cat /etc/cni/net.d/10-calico.conflist 
{
  "name": "k8s-pod-network",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "calico",
      "log_level": "INFO",
      "datastore_type": "kubernetes",
      "nodename": "dce-10-6-185-80",
      "mtu": 1480,
      "ipam": {
          "type": "calico-ipam",
          "assign_ipv4": "true",
          "assign_ipv6": "true"
      },
      "container_settings": {
          "allow_ip_forwarding": true
      },
      "policy": {
          "type": "k8s"
      },
      "kubernetes": {
          "kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
      }
    },
    {
      "type": "portmap",
      "snat": true,
      "capabilities": {"portMappings": true}
    }
  ]
}




# kubectl get pod --all-namespaces
NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE
default       test-nginx-58d5746967-krbpn                1/1     Running   1          16h
kube-system   calico-kube-controllers-64dc69495b-h9cqr   1/1     Running   0          3m22s
kube-system   calico-node-fcn8z                          1/1     Running   0          3m22s
kube-system   calico-node-p999q                          1/1     Running   0          3m22s
kube-system   calico-typha-ggnx7                         1/1     Running   0          3m22s
kube-system   coredns-5644d7b6d9-bfb8n                   1/1     Running   3          21h
kube-system   coredns-5644d7b6d9-lcjhx                   1/1     Running   3          21h
kube-system   etcd-dce-10-6-185-80                       1/1     Running   4          21h
kube-system   kube-apiserver-dce-10-6-185-80             1/1     Running   5          21h
kube-system   kube-controller-manager-dce-10-6-185-80    1/1     Running   5          21h
kube-system   kube-proxy-tftvl                           1/1     Running   4          21h
kube-system   kube-proxy-xllxr                           1/1     Running   3          21h
kube-system   kube-scheduler-dce-10-6-185-80             1/1     Running   4          21h

# kubectl exec calico-node-fcn8z printenv -n kube-system
CALICO_IPV6POOL_CIDR=fc00:0:0:10::/64
FELIX_IPV6SUPPORT=true
IP6=autodetect
......




# calicoctl get ipPool
NAME                  CIDR               SELECTOR   
default-ipv4-ippool   172.27.0.0/16      all()      
default-ipv6-ippool   fc00:0:0:10::/64   all()

BTW, you mentioned the default route of each node . I also set it , but the gateway fc00::1 does not exist , does it not matter , right ?

# ip a show dev dce
2: dce: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:b4:10:df brd ff:ff:ff:ff:ff:ff
    inet 10.6.185.90/16 brd 10.6.255.255 scope global dce
       valid_lft forever preferred_lft forever
    inet6 fc00::11/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:feb4:10df/64 scope link 
       valid_lft forever preferred_lft forever
[root@dce-10-6-185-90 ~]# 
[root@dce-10-6-185-90 ~]# ip -6 r
unreachable ::/96 dev lo metric 1024 error -113 pref medium
unreachable ::ffff:0.0.0.0/96 dev lo metric 1024 error -113 pref medium
unreachable 2002:a00::/24 dev lo metric 1024 error -113 pref medium
unreachable 2002:7f00::/24 dev lo metric 1024 error -113 pref medium
unreachable 2002:a9fe::/32 dev lo metric 1024 error -113 pref medium
unreachable 2002:ac10::/28 dev lo metric 1024 error -113 pref medium
unreachable 2002:c0a8::/32 dev lo metric 1024 error -113 pref medium
unreachable 2002:e000::/19 dev lo metric 1024 error -113 pref medium
unreachable 3ffe:ffff::/32 dev lo metric 1024 error -113 pref medium
fc00::10:22 dev dce metric 1024 pref medium
fc00::/64 dev dce proto kernel metric 256 pref medium
blackhole fc00::10:4511:66bb:a4e2:9c00/122 dev lo proto bird metric 1024 error -22 pref medium
fe80::/64 dev dce proto kernel metric 256 pref medium
default via fc00::1 dev dce metric 1024 pref medium

so , the strange is that the forward route on the node is set instantly by calico after the pod is created , but it disappears after about 15 seconds

weizhouBlue on 25 Sep 2019

Do you have something else that might be setting/removing routes? like NetworkManager or something else.

As for the default gateway, I'm not actually sure if it is needed in your setup or not. That might only be needed when dealing with IPv6 only. I think I wrote that doc but I do not remember specifically when that was needed.

tmjd on 25 Sep 2019

Maybe check system logs to see if there is anything logged about removing that route, that might help identify if something else is doing that.
I'd probably take a look at the calico-node logs also to see if it is removing it too. It is possible that the CNI plugin is the one adding the route but then calico-node is removing it, but 15 seconds feels like a long time for it to react.

tmjd on 25 Sep 2019

I do not run any process about operating route table.
Because after I delete the daemonset calico-node , I manually add the the route entry like "fc00::10:4511:66bb:a4e2:9c06 via calicXXXX" to the node , this route entry will not disappear and I could always ping6 to the pod from another node . In contrast , when the the daemonset calico-node running, I even set the route entry by manual, it will disappear soon.
So I think this must be related to the calico-node

I used the following command to monitor the route table when pod is created , and the route entry do be removed by someone

# ip -ts monitor route
[2019-09-27T00:04:51.845521] 172.25.156.10 dev calib880d25eca9 scope link 
[2019-09-27T00:04:51.845924] fc00::10:4511:66bb:a4e2:9c09 dev calib880d25eca9 metric 1024 pref medium
[2019-09-27T00:04:51.846271] ff00::/8 dev calib880d25eca9 table local metric 256 pref medium
[2019-09-27T00:04:51.846412] fe80::/64 dev calib880d25eca9 proto kernel metric 256 pref medium
[2019-09-27T00:04:51.848800] Deleted fe80::/64 dev calib880d25eca9 proto kernel metric 256 pref medium
[2019-09-27T00:04:53.568373] local fe80::ecee:eeff:feee:eeee dev lo table local proto unspec metric 0 pref medium
[2019-09-27T00:04:53.568466] local fe80:: dev lo table local proto unspec metric 0 pref medium

[2019-09-27T00:05:07.966301] Deleted fc00::10:4511:66bb:a4e2:9c09 dev calib880d25eca9 metric 1024 pref medium

weizhouBlue on 27 Sep 2019

hi @tmjd
I found this was done by the calico-node after checking its log . it found unexpected route and remove it at 2019-09-27 04:10:05.977

2019-09-27 04:10:02.481 [INFO][43] route_table.go 556: Syncing routes: found unexpected route; ignoring due to grace period. dest=fc00::10:4511:66bb:a4e2:9c0b/128 ifaceName="calib1d550aa66e" ipVersion=0x6
2019-09-27 04:10:02.481 [INFO][43] table.go 830: Invalidating dataplane cache ipVersion=0x6 reason="refresh timer" table="raw"
2019-09-27 04:10:02.481 [INFO][43] route_table.go 364: Interface in cleanup grace period, will retry after. ifaceName="calib1d550aa66e" ipVersion=0x6
2019-09-27 04:10:02.484 [INFO][43] table.go 518: Loading current iptables state and checking it is correct. ipVersion=0x6 table="mangle"
2019-09-27 04:10:02.561 [INFO][43] table.go 518: Loading current iptables state and checking it is correct. ipVersion=0x6 table="raw"
2019-09-27 04:10:02.658 [INFO][43] int_dataplane.go 978: Finished applying updates to dataplane. msecToApply=177.874252
2019-09-27 04:10:02.735 [INFO][43] int_dataplane.go 964: Applying dataplane updates
2019-09-27 04:10:02.736 [INFO][43] table.go 830: Invalidating dataplane cache ipVersion=0x6 reason="refresh timer" table="nat"
2019-09-27 04:10:02.736 [INFO][43] route_table.go 556: Syncing routes: found unexpected route; ignoring due to grace period. dest=fc00::10:4511:66bb:a4e2:9c0b/128 ifaceName="calib1d550aa66e" ipVersion=0x6
2019-09-27 04:10:02.736 [INFO][43] route_table.go 364: Interface in cleanup grace period, will retry after. ifaceName="calib1d550aa66e" ipVersion=0x6
2019-09-27 04:10:02.739 [INFO][43] table.go 518: Loading current iptables state and checking it is correct. ipVersion=0x6 table="nat"
2019-09-27 04:10:02.743 [INFO][43] int_dataplane.go 978: Finished applying updates to dataplane. msecToApply=7.2843659999999995
2019-09-27 04:10:05.977 [INFO][43] int_dataplane.go 964: Applying dataplane updates
2019-09-27 04:10:05.977 [INFO][43] ipsets.go 223: Asked to resync with the dataplane on next update. family="inet"
2019-09-27 04:10:05.977 [INFO][43] ipsets.go 223: Asked to resync with the dataplane on next update. family="inet6"
2019-09-27 04:10:05.977 [INFO][43] ipsets.go 306: Resyncing ipsets with dataplane. family="inet6"
2019-09-27 04:10:05.977 [INFO][43] ipsets.go 306: Resyncing ipsets with dataplane. family="inet"
2019-09-27 04:10:05.977 [INFO][43] route_table.go 561: Syncing routes: removing old route. dest=fc00::10:4511:66bb:a4e2:9c0b/128 ifaceName="calib1d550aa66e" ipVersion=0x6 routeProblems=[]string{"unexpected route"}
2019-09-27 04:10:05.979 [INFO][43] conntrack.go 78: Removing conntrack flows ip=fc00::10:4511:66bb:a4e2:9c0b
2019-09-27 04:10:05.982 [INFO][43] ipsets.go 356: Finished resync family="inet" numInconsistenciesFound=0 resyncDuration=4.691732ms
2019-09-27 04:10:06.059 [INFO][43] ipsets.go 356: Finished resync family="inet6" numInconsistenciesFound=0 resyncDuration=81.654688ms
2019-09-27 04:10:06.059 [INFO][43] int_dataplane.go 978: Finished applying updates to dataplane. msecToApply=82.500614

weizhouBlue on 27 Sep 2019

@fasaxc do you know why this might be happening?
Am I correct in that we expect felix to support pods with both IPv4 and IPv6 addresses?
Is there something else that must be enabled here that I am missing?

tmjd on 30 Sep 2019

Felix does support dual stack but I don't think libcalico-go does.

I think libcalico-go assumes single stack for k8s right now because when we wrote libcalico-go k8s only supported single-stack and there was no place in the Pod to write a second IP to. I know k8s has been doing some dual stack work but I don't know where that's got to and I don't think we've done any enabling work on our side to integrate with that.

fasaxc on 1 Oct 2019

@caseydavenport might know more ^^

fasaxc on 1 Oct 2019

Yeah, I think that's right. In etcd mode this is supported since the CNI plugin will just write multiple IPs to the workload endpoint.

Using the k8s API, we need to change this code: https://github.com/projectcalico/libcalico-go/blob/master//lib/backend/k8s/conversion/conversion.go#L180-L196

Right now it only expects a single IP (so can do IPv4 or IPv6 single-stack). It will need to be updated to support the new dual-stack APIs.

caseydavenport on 2 Oct 2019

great , @caseydavenport , will this be scheduled on follow-up release ? because K8S begin to support dual-stack after version 1.16

weizhouBlue on 9 Oct 2019

Yeah, I'm hoping to get this into a release in the near future.

If anyone is willing to help out on this, it would be much appreciated. I'd be happy to review PRs, tests, designs if someone has the bandwidth to work on them :)

caseydavenport on 9 Oct 2019

Is there any progress now? We would also like to use K8S double stack

yuchunyun on 21 Nov 2019

@caseydavenport I have seen that both project libcalico-go and calico/cni have been updated for dual stack. May I ask when calico can release them?

yuchunyun on 28 Nov 2019

We're expecting to have the next release of Calico in a couple weeks which should include dual stack support.

tmjd on 2 Dec 2019

🎉3

@weizhouBlue Dual stack support was released in Calico v3.11, so please retry with that version, and open a new issue if you see any problem.

neiljerram on 16 Jan 2020

@weizhouBlue any updates for this issue ? i tried calico 3.13.4 version, same issue with you pasted above
[2020-06-20T21:59:47.055464] fe80::/64 dev cali435d5d84a47 proto kernel metric 256 pref medium
[2020-06-20T21:59:47.055483] 10.244.36.210 dev cali435d5d84a47 scope link
[2020-06-20T21:59:47.057954] fc00:f00:0:24fe:200:8fa7:f4c7:af12 dev cali435d5d84a47 metric 1024 pref medium
[2020-06-20T21:59:47.121225] ff00::/8 dev cali2cdfcc6cfe1 table local metric 256 pref medium
[2020-06-20T21:59:47.121277] fe80::/64 dev cali2cdfcc6cfe1 proto kernel metric 256 pref medium
[2020-06-20T21:59:47.121292] 10.244.36.211 dev cali2cdfcc6cfe1 scope link
[2020-06-20T21:59:47.121303] fc00:f00:0:24fe:200:8fa7:f4c7:af13 dev cali2cdfcc6cfe1 metric 1024 pref medium
[2020-06-20T21:59:48.255069] local fe80::ecee:eeff:feee:eeee dev lo table local proto unspec metric 0 pref medium
[2020-06-20T21:59:48.255125] local fe80:: dev lo table local proto unspec metric 0 pref medium
[2020-06-20T21:59:48.597408] local fe80::ecee:eeff:feee:eeee dev lo table local proto unspec metric 0 pref medium
[2020-06-20T21:59:48.597490] local fe80:: dev lo table local proto unspec metric 0 pref medium
[2020-06-20T22:00:03.167886] Deleted fc00:f00:0:24fe:200:8fa7:f4c7:af13 dev cali2cdfcc6cfe1 metric 1024 pref medium
[2020-06-20T22:00:03.169382] Deleted fc00:f00:0:24fe:200:8fa7:f4c7:af12 dev cali435d5d84a47 metric 1024 pref medium

timyl on 20 Jun 2020

@timyl Could you open a new issue with full details of your setup and of what is not working for you?

neiljerram on 20 Jun 2020

@timyl
I think there are 2 ways:
(1) use etcd store type, with any version of k8s . I testes this method and worked
(2)use 1.18 k8s and supported version of calico. I have not tried this , maybe you could have a shot and tell me

weizhouBlue on 21 Jun 2020

appreciate it, will try that.

timyl on 21 Jun 2020

@weizhouBlue @neiljerram with weizhou's suggestion, I switched to etcd datastore and it was fixed. and besides, I tried k8s v18.2 with calico v3.12.1, no this issue

timyl on 21 Jun 2020

by the way, this is not an real dual stack scenario in my setup, because I just want to let container to handle some ipv6 traffic, don't need every pod has both ipv4/ipv6 address, so the pod ip is still ipv4, but in container, there have both ipv4 and ipv6 address; actually I tried kubernetes's official proposal of dual stack(https://kubernetes.io/docs/concepts/services-networking/dual-stack/), but failed with kube-proxy part....

timyl on 21 Jun 2020

@weizhouBlue
I installed k8s dual stack, based on calico3.14.1, I can ping6 to ipv6 address of pod; But I encountered a new problem, when creating a service of type NodePort, the NodePort is not mapped to host, so I can not access the service By hostIp:NodePort; Have you encountered the same problem, can you provide some suggestions?