etcdserver: requested lease not found

Created on 28 Feb 2018  路  14Comments  路  Source: etcd-io/etcd

Bug reporting

We use an etcd cluster of 3 members with version
quay.io/coreos/etcd:v3.2.0
running in k8s cluster. Not a heavy load, about 10k writes a day.

lease, err := client.Lease.Grant(myContext, int64(24 * time.Hour))
if err != nil {
    log.Error("Unable to get a lease from etcd", "error", err)
    return err
}
_, err = client.KV.Put(myContext, key, string(msg), clientv3.WithLease(lease.ID))
if err != nil {
    log.Error("Unable to put a record into etcd", "error", err) // <- it's logged here 
    return err
}

How that even possible? We see the error a few times a day.
What is the best way to deal with it?

arebug

All 14 comments

Can you provide reproducible steps (locally)?

Not really, it's just a simple service that accepts records by grpc and store them in etcd for 24 hrs.

If you can give an idiomatic example for the scenario it would be nice.

And the main question how that could be possible that successfully created lease can't be found.

@sh1ng Do you have etcd server logs when this happened?

@sh1ng etcd:v3.2.0? there was a restore bug fixed by 452628432673f2210b39967758ea83b165c207d9; try upgrading to latest 3.2

Even after upgrade on quay.io/coreos/etcd:v3.2.16 we still see the same error.

Could it be possible that when parent context has been canceled etcd(or etcd client) returns revoked lease? We use streaming api to our service and a client might cancel it from time to time.

@sh1ng Do you server logs when this happened?

int64(24 * time.Hour) is 86400000000000 and I think what you want here is 60 * 60 * 24 = 86400, because Grant() expects a TTL in seconds.

When a lease is created with 86400000000000, it seems etcd loses the lease immediately. Probably it's just by an overflow: https://github.com/coreos/etcd/blob/master/lease/lessor.go#L599

We may want maxLeaseTTL to avoid unexpected behavior by overflows.

@yudai Got it right.

It gets overflow, and sets negative expiry time value, thus lease expires before that put request.

@yudai Do you want to send a fix?

@gyuho What value would you suggest for the max TTL? 10 years?
or, something like (Math.MaxInt64 / time.Second) - someBuffer ?

We usually lease promote with election timeout (which is 1 second by default), so (Math.MaxInt64 / time.Second) - time.Minute to be safe?

@gyuho thanks for the suggestion.
I chose 9,000,000,000 to make it very safe (223,372,036 seconds buffer) and easier to document/remember.

The fix will be released in 3.2 and 3.3.

Thanks guys!
Now it works like a charm.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ramanala picture ramanala  路  4Comments

cheyang picture cheyang  路  3Comments

suresh-chaudhari picture suresh-chaudhari  路  3Comments

ghost picture ghost  路  3Comments

hnlq715 picture hnlq715  路  3Comments