It seems that the ip failover used to setup the high availability of router, but it is said that it is keeping the virtual IP unchanged if service deployed into a new node.
I have a question that router service need a fix host IP (true?), which is used to configured in DNS server, so how to keep this IP not changed when router service is deployed to a new node?
Is this a part of HA router solution? or need to setup outside the openshift and any good suggestions if so?
Thanks.
@kargakis or @liggitt could you help with me about this question ? many thanks :)
@deads2k do you know who can help with this question ? thanks.
@ramr @pweil-
@thincal so if you configure a DNS entry using a node's IP address - what happens if the node goes down? You would need some sort of IP failover mechanism to ensure something (in this case a router) is always serving on that IP. If your environment is tolerant to downtime, you need to either update the DNS entry or start up a new node with that IP address.
Of course, this requires you to know when a problem happens and to be able to take the manual steps to
fix it immediately.
The HA solution is to handle these failure cases but also automate the recovery, so there's really no
manual intervention needed. In a nutshell, rather than use the node's IP address to configure the DNS entry, you would instead use VIPs (Virtual IP addresses) to handle failures. The VIPs would float between any of the nodes where ipfailover (really keepalived) is running in your OpenShift cluster and
handle failure cases when router instances die and are re-born or nodes die.
Hopefully an example with steps might make it clearer:
n > 5 of course).10.1.1.1 - 10.1.1.5.www.example.test to resolve to those 3 VIPs 10.1.100.[1-3]. ipfailover to ensure those 3 VIPs are always on one or more of those 5 infrastructure nodes.oadm ipfailover --replicas=5 --virtual-ips="10.1.100.1-3" --selector="infra=red" --watch-port=80 ...infra=red as above).oadm router ... --selector="infra=red" --replicas=2 # 2 router instances.10.1.100.[1-3] allocated to them. Now, since the router binds to 0.0.0.0 - with some kernel magic its binding on the virtual IP address as well and so serving requests on that VIP.keepaliveds clustered together (they do need multicast as they send VRRP advertisements) would notice a router is down.0.0.0.0, it automatically starts servicing the newly added VIP on its node. Now, that said, you can also do this with a similar approach (non VIP oriented solutions) outside of OpenShift - example, run a highly available external load balancer service and point it at your routers.
Example: if you are already running on one of the cloud providers ala Amazon, Rackspace, GCE you could use their load balancers (ala ELB on Amazon EC2) to divert traffic to the OpenShift router.
HTH
@ramr thanks very much for detailed explanation !
Most helpful comment
@thincal so if you configure a DNS entry using a node's IP address - what happens if the node goes down? You would need some sort of IP failover mechanism to ensure something (in this case a router) is always serving on that IP. If your environment is tolerant to downtime, you need to either update the DNS entry or start up a new node with that IP address.
Of course, this requires you to know when a problem happens and to be able to take the manual steps to
fix it immediately.
The HA solution is to handle these failure cases but also automate the recovery, so there's really no
manual intervention needed. In a nutshell, rather than use the node's IP address to configure the DNS entry, you would instead use VIPs (Virtual IP addresses) to handle failures. The VIPs would float between any of the nodes where
ipfailover(reallykeepalived) is running in your OpenShift cluster andhandle failure cases when router instances die and are re-born or nodes die.
Hopefully an example with steps might make it clearer:
n > 5of course).Let's further assume that the IP addresses for those 5 infrastructure nodes are
10.1.1.1 - 10.1.1.5.So let's use 3 VIPs (10.1.100.1, 10.1.100.2 and 10.1.100.3) -- note the 10.1.100.* block we are using here.
www.example.testto resolve to those 3 VIPs10.1.100.[1-3].Availability. Remember we had 5 infrastructure nodes we had allocated to run the routers.
Assuming these 5 infrastructure nodes are tagged with "infra=red", we can now run a service
ipfailoverto ensure those 3 VIPs are always on one or more of those 5 infrastructure nodes.And also monitor port 80 (that the router would run on - or you could use port 443), so that we can
watch for router failures.
oadm ipfailover --replicas=5 --virtual-ips="10.1.100.1-3" --selector="infra=red" --watch-port=80 ...use all the nodes, let's say we use 2 of these nodes and use the same selector(
infra=redas above).oadm router ... --selector="infra=red" --replicas=2# 2 router instances.10.1.100.[1-3]allocated to them. Now, since the router binds to0.0.0.0- with some kernel magic its binding on the virtual IP address as well and so serving requests on that VIP.keepaliveds clustered together (they do need multicast as they send VRRP advertisements) would notice a router is down.Either the local node keepalived notices nothing bound on port 80/443 or a remote keepalived notices a peer is down. In either case, one of the peer keepaliveds (which has a router instance running on its node) would pick up the VIP that now service it. And thanks to the afore mentioned magic, since the router is binding to
0.0.0.0, it automatically starts servicing the newly added VIP on its node.The keepaliveds would notice a new router in the mix and float over the VIPs if needed to that node.
Now, that said, you can also do this with a similar approach (non VIP oriented solutions) outside of OpenShift - example, run a highly available external load balancer service and point it at your routers.
Example: if you are already running on one of the cloud providers ala Amazon, Rackspace, GCE you could use their load balancers (ala ELB on Amazon EC2) to divert traffic to the OpenShift router.
HTH