Per the etcd ops guide: "Once a majority of members works, the etcd cluster elects a new leader automatically and returns to a healthy state. The new leader extends timeouts automatically for all leases. This mechanism ensures no lease expires due to server side unavailability."
Kubernetes event leases are 1hr by default are never renewed, and if not revoked according to their original TTL, the volume of events can eventually exceed the etcd storage space limit is exceeded or some other limit is hit (e.g. lease count results in excessively expensive revoke operations).
Would it be possible to either:
a) Persist remaining TTL for long leases so they are expired on-time after leader changes (say, TTLs of, say more than 5min ?)
b) Allow auto-renew to be disabled when creating leases
From @xiang90's comment on https://github.com/kubernetes/kubernetes/issues/65497 it sounds like this might need to persist remaining durations to support either.
c.f. coreos/etcd#9526. Thanks for finding @gyuho
a) Do not auto-renew TTLs for long leases (say, TTLs of, say more than 5min ?)
As I suggested in the k8s issue, a better approach is to persist the remaining TTL of long leases lazily. It wont hurt performance :)
@xiang90 Sounds good, I'll update the description to match. What do you mean by "lazily"?
@jpbetz
If we persist the deadline of all the leases through raft for every keepalive request, then after a new leader is elected it wont need to refresh all leases (since it knows the deadline already). But this can be super expensive when we have a lot of keepalives for short leases(i believe chubby paper mentioned this too). Yes, some users will send keepalive at second level.
However, if we persist nothing then we have to do a refresh which leads to the problem you just described. We can make a tradeoff by persisting the deadline of a long lease every X minutes (or only every X minutes only if it is refreshed by a keepalive)
Got it. Thanks @xiang90!
+1 to what Xiang described - this sounds really reasonable.
Great, I'll write up a short "design doc" for how we plan to implement and circulate it for review.
@xiang90 Here's an approach that seems to work well and I believe is inline with what was proposed above: https://github.com/coreos/etcd/pull/9924. Let me know if this differs from what you were thinking.
Most helpful comment
@jpbetz
If we persist the deadline of all the leases through raft for every keepalive request, then after a new leader is elected it wont need to refresh all leases (since it knows the deadline already). But this can be super expensive when we have a lot of keepalives for short leases(i believe chubby paper mentioned this too). Yes, some users will send keepalive at second level.
However, if we persist nothing then we have to do a refresh which leads to the problem you just described. We can make a tradeoff by persisting the deadline of a long lease every X minutes (or only every X minutes only if it is refreshed by a keepalive)