I have calico configured and worked very well - its a very cool add on to K8s. Access to my pods are working directly using BGP to my router.
While pod access is cool, it would be better to use my service IPs since they are more stable than pod IPs - Is there any method to expose the service CIDR range either through BGP or possibly using a front end BIRD router? Or any other crazy ideas someone may have?
Thanks for any thoughts.
Sorry for the long delay on this.
This is something that can't be done out of the box today, but some users have managed to make it work with other mechanisms.
I think this is something we should try to do.
@caseydavenport, would you happen to have links to examples of how others have done this?
We'd like our Service CIDR to be routable as well.
I struggle to understand what the proper routing setup for this would be.
Would you just route to all nodes?
edit: I got this working https://github.com/projectcalico/calico/issues/1008#issuecomment-403242510
@stealthybox I'm afraid I don't have much on hand. This article touches on it, but doesn't go into many implementation details: https://kubernetes.io/blog/2016/10/kubernetes-and-openstack-at-yahoo-japan/
A simple solution would for each node to advertise the entire service IP range. Calico won't do this by default, but some minor modifications to the BIRD configuration should do it.
The obvious next solution is to only advertise the routes from nodes that actually have pods in that service - that's a bit trickier and probably needs some more thought.
Here's another anecdote with manual route configuration in the network core:
https://medium.com/@kyralak/accessing-kubernetes-services-without-ingress-nodeport-or-loadbalancer-de6061b42d72
Related issues for modifying export filters:
^ These do not solve adding custom, arbitrary static routes.
There is also this issue which proposes a more freeform approach:
@caseydavenport
I got this working by modifying the BIRD config templates in the calico/node container.
/etc/calico/confd/templates/bird_aggr.cfg.template$servicesubnet, for the k8s service CIDR$servicesubnet as a static blackhole route$servicesubnet/etc/calico/confd/templates/bird_ipam.cfg.templatecalico_pools which serves as the BGP session export filter, call accept_servicesubnet()calico_ipip which serves as the kernel routing table export filter, call reject_servicesubnet()We need to reject the service CIDR from being exported to the kernel routing table because it will interfere with kube-proxy when used in iptables mode. (I think it would also break ipvs).
This is for ipv4. For ipv6, use the bird6 templates.
To retemplate the configs and reload bird, run this in the container:
NODENAME=$HOSTNAME confd -confdir=/etc/calico/confd -onetime
You can debug the results with the birdcl command:
birdcl -s /run/calico/bird.ctl show route # ipv4
birdcl -s /run/calico/bird6.ctl show route # ipv6
birdcl -s /run/calico/bird.ctl # interactive shell, no readline
I'd be happy to contribute my changes to calico/node if we want to support a mechanism for exporting arbitrary BGP subnets without exporting them to the kernel.
I actually split $servicesubnet up into a list and loop over the range for the routes and functions. (We're using this to shard our service subnet over many nodes)
There may be a more robust way to acceptably represent this configuration using some of the ideas in the previous issues.
The obvious next solution is to only advertise the routes from nodes that actually have pods in that service - that's a bit trickier and probably needs some more thought.
This would be really cool.
A good way to do it would be to mange a separate set of pools that are sourced from a Services Controller on the kubernetes API.
The same controller could be responsible for ClusterIP and ExternalIP.
Maybe even use calico IPAM for services of type Loadbalancer? (although metallb is already good at this part).
By the way, if you do this, you'll want to run kube-proxy with --mode=ipvs.
kubeadm config snippet:
kubeProxy:
config:
mode: ipvs
clusterCIDR: {{ your subnet here }}
When running in IPVS, the IP's actually "exist" virtually in the linux kernel, which means that they will properly respond to ICMP and reject connections to non-listening ports.
When using IPTables, the routing is more of a special case side-effect which can get you into routing loops with ICMP and hanging connections when there's nothing on the port.
@stealthybox Can you share your templates (successful servicesubnet advertisement)? I鈥檝e tried replicating what you鈥檝e done but having no success. I want to be able to advertise the service network (ClusterIP) via Bird/Calico for OpenShift 3.10 and Calico 3.1, but hitting a wall.
@zeca-pereira This is what we have deployed.
# Generated by confd
{{/* servicesubnet_split -- a list of subnet strings
Implemntation Detail for exposing the k8s service subnet in this BIRD config
details:
https://github.com/projectcalico/calico/issues/1008#issuecomment-403242510
*/}}
{{ $servicesubnet_split := split "10.10.10.0/18" " " }}
# ------------- Static black hole addresses -------------
protocol static {
{{if ls "/"}}
{{range ls "/"}}
{{$parts := split . "-"}}
{{$cidr := join $parts "/"}}
route {{$cidr}} blackhole;
{{end}}
# kubernetes service subnets
{{range $servicesubnet_split}}
route {{.}} blackhole;
{{end}}
}
{{else}}# No static routes configured.{{end}}
function accept_servicesubnet () {
{{range $servicesubnet_split}}
if ( net = {{.}} ) then { accept; }
if ( net ~ {{.}} ) then { reject; }
{{end}}
}
function deny_servicesubnet () {
{{range $servicesubnet_split}}
if ( net = {{.}} ) then { reject; }
if ( net ~ {{.}} ) then { reject; }
{{end}}
}
# Aggregation of routes on this host; export the block, nothing beneath it.
function calico_aggr ()
{
{{range ls "/"}}
{{$parts := split . "-"}}
{{$cidr := join $parts "/"}}
{{$affinity := json (getv (printf "/%s" .))}}
{{if $affinity.state}}
# Block {{$cidr}} is {{$affinity.state}}
{{if eq $affinity.state "confirmed"}}
if ( net = {{$cidr}} ) then { accept; }
if ( net ~ {{$cidr}} ) then { reject; }
{{end}}
{{ else }}
# Block {{$cidr}} is implicitly confirmed.
if ( net = {{$cidr}} ) then { accept; }
if ( net ~ {{$cidr}} ) then { reject; }
{{ end }}
{{end}}
}
# Generated by confd
filter calico_pools {
calico_aggr();
accept_servicesubnet();
{{range ls "/v1/ipam/v4/pool"}}{{$data := json (getv (printf "/v1/ipam/v4/pool/%s" .))}}
if ( net ~ {{$data.cidr}} ) then {
accept;
}
{{end}}
reject;
}
{{$network_key := printf "/bgp/v1/host/%s/network_v4" (getenv "NODENAME")}}{{$network := getv $network_key}}
filter calico_ipip {
deny_servicesubnet();
{{range ls "/v1/ipam/v4/pool"}}{{$data := json (getv (printf "/v1/ipam/v4/pool/%s" .))}}
if ( net ~ {{$data.cidr}} ) then {
{{if $data.ipip_mode}}{{if eq $data.ipip_mode "cross-subnet"}}
if defined(bgp_next_hop) && ( bgp_next_hop ~ {{$network}} ) then
krt_tunnel = ""; {{/* Destination in ipPool, mode is cross sub-net, route from-host on subnet, do not use IPIP */}}
else
krt_tunnel = "{{$data.ipip}}"; {{/* Destination in ipPool, mode is cross sub-net, route from-host off subnet, set the tunnel (if IPIP not enabled, value will be "") */}}
accept;
} {{else}}
krt_tunnel = "{{$data.ipip}}"; {{/* Destination in ipPool, mode not cross sub-net, set the tunnel (if IPIP not enabled, value will be "") */}}
accept;
} {{end}} {{else}}
krt_tunnel = "{{$data.ipip}}"; {{/* Destination in ipPool, mode field is not present, set the tunnel (if IPIP not enabled, value will be "") */}}
accept;
} {{end}}
{{end}}
accept; {{/* Destination is not in any ipPool, accept */}}
}
@stealthybox I can鈥檛 thank you enough for this. It鈥檚 working flawlessly.
sure thing, @zeca-pereira
that took me a while to put together, so I'm glad others can make the config workable
notice servicesubnet_split is a list, so if you'd like to shard your routing of that subnet more evenly across your nodes, you can split it up in that config.
it's also possible to load that value from etcd using confd, but it's not part of any calico API object yet.
/cc @caseydavenport -- lmk if this is still something you're interested in supporting in Calico
@stealthybox it certainly is!
From a quick look at the config, it looks like you've:
Did I miss anything? I think that's a really good start. Have you thought at all how it might be exposed through the Calico configuration model? Also, ultimately I'd like to support advertising only from nodes that have a pod in a given service.
yep, that's accurate:
- in calico_pools which serves as the BGP session export filter, call accept_servicesubnet()
- in calico_ipip which serves as the kernel routing table export filter, call reject_servicesubnet()
As far as advertising services from their specific nodes, we might be able to create a Service controller that manages a separate set of Calico IP pools? https://github.com/projectcalico/calico/issues/1008#issuecomment-403248155
This would be added to calico-kube-controllers and basically do what kube-proxy does with IPTables/IPVS but for exposing BGP /w Calico instead
This operates on individual ClusterIP's though which is more granular than the subnet aggregation we use for Pod IP's.
If the advertisements took some time to change to other nodes, that would be okay since every Node should be capable of routing to the ClusterIP anyway, but you might want to wait some period before deleting routes since you would want to avoid a situation where a Service was not routable because no Nodes were advertising.
WDYT?
@stealthybox I think that sounds really good to me.
I think I'd like to structure the work like so:
Part 1: Start with the simple template changes, like you've done, but configured through an environment varaible (e.g. CALICO_CLUSTERIP_RANGE)
Part 2: Add support for configuring static routes to a particular node to the Calico data model through the data store.
Part 3: Write a controller which programs the static routes based on k8s ClusterIPs and ExternalIPs.
Part 1 should be dead simple, and I think a really useful increment. I'd like to think through part 2 and 3 a bit more though, but I think that can happen in parallel with implementing 1.
Does that make sense?
@caseydavenport
I'm following.
We'll need to spec the API for part 2.
I suppose I will patch part 1 here and update docs somewhere: https://github.com/projectcalico/node/tree/master/filesystem/etc/calico/confd/templates
I believe @neiljerram and @briansan are working on this for Calico v3.4.0. 馃帀
Hi,
I have deployed cluster version 3.4.0 this weekend. Etcd is know V3, here is an updated version of the work of @stealthybox.
in case it could help someone until an official release.
to apply i have found that kill -hup of confd update the conf files. Path remain the same.
I've just found that advertising services with externalTrafficPolicy=Cluster from Route Reflector nodes (not using BGP full-mesh) puts the Route Reflectors in the data path (Note: not using ECMP)
Is this considered?
Thank you
@grandich I don't think that's expected, but we very well could have a bug there. We've had a similar issue with RRs in the past.
Is your RR peered with an eBGP peer? I could imaging we would be replacing the next hop for eBGP peerings on RRs. Probably worth opening another issue for - could you raise one?
thanks @caseydavenport opened #3365
Most helpful comment
@zeca-pereira This is what we have deployed.
bird_aggr.cfg.template
bird_ipam.cfg.template