Flannel: Handle subnet lease getting expired

Created on 29 Aug 2014 · 6Comments · Source: coreos/flannel

Although flannel will start renewing the lease an hour prior to expiration, it could still get lost: e.g. VM getting suspended. Flannel should try to get the same subnet assignment if it's still available but fall back to a new lease and signal the fact.

aredocumentation kinenhancement

Source

eyakubovich

👍2

Most helpful comment

Is there any work under way for this? It'd be incredibly useful as right now if a machine loses a lease and gets a new one it renders any containers on the machine with no network connectivity.

macb on 7 Mar 2016

👍3

All 6 comments

Is there any work under way for this? It'd be incredibly useful as right now if a machine loses a lease and gets a new one it renders any containers on the machine with no network connectivity.

macb on 7 Mar 2016

👍3

One implementation idea for this is in #610

tomdee on 28 Apr 2017

Also see #520 for some good questions about how flannel handles this at the moment.

tomdee on 28 Apr 2017

When fixing this, we should make sure this failure scenario is discussed clearly in the docs.

tomdee on 28 Apr 2017

FWIW, the system design that we've converged on for Cloud Foundry is that hosts are preferentially assigned their prior lease, even if it "expired." And if a new host appears, it is assigned a lease in the following priority order:

prefer subnets that have never been given out before, or subnets which were explicitly relinquished by a cleanly-terminating host.
if none of those exist, only then does the new host take over an expired lease, and in that case it chooses the oldest such lease.

This is meant to minimize the probability that a lease is "stolen" from a live, but partitioned, container host. But if that does occur, once the partition heals and the "victim" host re-connects, it will discover that its lease is no longer valid. In this case, the victim host falls into a special, noisy failure mode which will (1) prevent any new workloads from being scheduled and (2) trigger the orchestration system to evacuate any existing workloads. Once the evacuation is complete, the host will clean up any leftover networking state (e.g. remove the VXLAN device), acquire a new lease for itself and begin accepting new workloads.

We think this is the right plan. Feedback welcome.