Continued from https://github.com/etcd-io/etcd/issues/10700
Sent out upstream gRPC fixes
For
go test -v -run TestLeasingTxnOwnerIf
# or
=== RUN TestLeasingNonOwnerPutError
{"level":"warn","ts":"2019-08-03T07:18:10.564Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-99ab9134-5b05-46e0-965d-601c54493c53/localhost:1341936467916325720","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: Error while dialing dial unix localhost:1341936467916325720: connect: no such file or directory\""}
{"level":"warn","ts":"2019-08-03T07:18:10.565Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-99ab9134-5b05-46e0-965d-601c54493c53/localhost:1341936467916325720","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0xbc42f5]
goroutine 26716 [running]:
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*csAttempt).sendMsg(0xc00017e700, 0x146c6c0, 0xc000038580, 0xc000038588, 0x5, 0x5, 0xc000038590, 0xa, 0xa, 0xc000038590, ...)
/go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:827 +0x155
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*clientStream).SendMsg.func2(0xc00017e700, 0x0, 0xc001cdca50)
/go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:692 +0x186
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*clientStream).withRetry(0xc000392120, 0xc001cdca50, 0xc0012b8d38, 0x20c62f8, 0x0)
/go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:572 +0x43d
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*clientStream).SendMsg(0xc000392120, 0x146c6c0, 0xc000038580, 0x0, 0x0)
/go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:698 +0x644
go.etcd.io/etcd/etcdserver/etcdserverpb.(*leaseLeaseKeepAliveClient).Send(0xc001cb2560, 0xc000038580, 0x0, 0x0)
/go/src/go.etcd.io/etcd/etcdserver/etcdserverpb/rpc.pb.go:3828 +0x6b
go.etcd.io/etcd/clientv3.(*lessor).sendKeepAliveLoop(0xc001c68a00, 0x1768ae0, 0xc001cb2560)
/go/src/go.etcd.io/etcd/clientv3/lease.go:573 +0x2d5
created by go.etcd.io/etcd/clientv3.(*lessor).resetRecv
/go/src/go.etcd.io/etcd/clientv3/lease.go:489 +0x345
FAIL go.etcd.io/etcd/clientv3/integration 136.425s
=== RUN TestLeasingReconnectTxn
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0xbc42f5]
goroutine 32579 [running]:
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*csAttempt).sendMsg(0xc0024e0980, 0x14974a0, 0xc001bc8b30, 0xc0017da388, 0x5, 0x5, 0xc0017da390, 0xb, 0xb, 0xc0017da390, ...)
/go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:827 +0x155
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*clientStream).SendMsg.func2(0xc0024e0980, 0x0, 0xc0009dcbe0)
/go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:692 +0x186
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*clientStream).withRetry(0xc001ac4000, 0xc0009dcbe0, 0xc0000c5aa8, 0x20c82f8, 0x0)
/go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:572 +0x43d
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*clientStream).SendMsg(0xc001ac4000, 0x14974a0, 0xc001bc8b30, 0x0, 0x0)
/go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:698 +0x644
go.etcd.io/etcd/etcdserver/etcdserverpb.(*watchWatchClient).Send(0xc001bc8b10, 0xc001bc8b30, 0x0, 0x20c82f8)
/go/src/go.etcd.io/etcd/etcdserver/etcdserverpb/rpc.pb.go:3697 +0x6b
go.etcd.io/etcd/clientv3.(*watchGrpcStream).run(0xc000152750)
/go/src/go.etcd.io/etcd/clientv3/watch.go:547 +0xd9b
created by go.etcd.io/etcd/clientv3.(*watcher).newWatcherGrpcStream
/go/src/go.etcd.io/etcd/clientv3/watch.go:277 +0x58c
FAIL go.etcd.io/etcd/clientv3/integration 162.826s
=== RUN TestLeasingNonOwnerPutError
{"level":"warn","ts":"2019-08-04T02:13:05.300Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-1e17e7fb-c8b4-43a1-9d89-79ca429efaef/localhost:51782595789236259660","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: Error while dialing dial unix localhost:51782595789236259660: connect: no such file or directory\""}
{"level":"warn","ts":"2019-08-04T02:13:05.300Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-1e17e7fb-c8b4-43a1-9d89-79ca429efaef/localhost:51782595789236259660","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[DEBUG] about to panic! true
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xbc1cb2]
goroutine 27073 [running]:
go.etcd.io/etcd/vendor/google.golang.org/grpc/internal/transport.(*Stream).Context(...)
/go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/internal/transport/transport.go:386
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*clientStream).Context(0xc0017a4480, 0xc001a7d080, 0x7f8cbd54b6)
/go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:565 +0x82
go.etcd.io/etcd/clientv3.(*lessor).sendKeepAliveLoop(0xc0000efb80, 0x1769b40, 0xc001a09d70)
/go/src/go.etcd.io/etcd/clientv3/lease.go:581 +0x349
created by go.etcd.io/etcd/clientv3.(*lessor).resetRecv
/go/src/go.etcd.io/etcd/clientv3/lease.go:489 +0x345
FAIL go.etcd.io/etcd/clientv3/integration 142.556s
Right now, we are debugging with gRPC tip https://github.com/grpc/grpc-go/pull/2958, which should fix the panic.
Now, only functional tests are flaky. Adding server output on failures:
https://github.com/etcd-io/etcd/blob/158354755a563ccddaaec4a8f20522121e19572d/test#L225-L240
Great. So we need grpc-go 1.23 to fix the panics?
Right now, we are debugging with gRPC tip grpc/grpc-go#2958, which should fix the panic.
Is it too late to get that into 1.23?
https://github.com/grpc/grpc-go/pull/2958#issuecomment-519207849
@jingyih @hexfusion gRPC usually releases from master branch cut. It should be included in 1.23.
I also saw TestRoundRobinBalancedResolvableFailoverFromRequestFail fail again in https://travis-ci.com/etcd-io/etcd/jobs/223131529
Also this failure (a functional test?) https://travis-ci.com/etcd-io/etcd/jobs/223267755
[...]
{"level":"fatal","ts":"2019-08-07T17:32:14.954Z","caller":"etcdmain/etcd.go:271","msg":"discovery failed","error":"cannot fetch cluster info from peer urls: could not retrieve cluster information from the given URLs","stacktrace":"go.etcd.io/etcd/etcdmain.startEtcdOrProxyV2\n\t/go/src/go.etcd.io/etcd/etcdmain/etcd.go:271\ngo.etcd.io/etcd/etcdmain.Main\n\t/go/src/go.etcd.io/etcd/etcdmain/main.go:46\nmain.main\n\t/go/src/go.etcd.io/etcd/main.go:28\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:200"}
./test: line 199: 8844 Terminated ./bin/etcd-agent --network tcp --address 127.0.0.1:${a}9027
FAILED! 'tail -1000 /tmp/etcd-functional-2/etcd.log'
tail: cannot open '/tmp/etcd-functional-2/etcd.log' for reading: No such file or directory
@tbg The functional test failures should be fixed by https://github.com/etcd-io/etcd/pull/11006.
Regarding TestRoundRobinBalancedResolvableFailoverFromRequestFail, @jpbetz any idea?
One more:
--- FAIL: TestSTMSerialize (0.28s)
v3_stm_test.go:204: etcdserver: leader changed
v3_stm_test.go:204: etcdserver: leader changed
Haven't seen an issue about this one, but also I don't think it's caused by my PR:
Closing this for now. Most notable ones are fixed.
Most helpful comment
Now, only functional tests are flaky. Adding server output on failures:
https://github.com/etcd-io/etcd/blob/158354755a563ccddaaec4a8f20522121e19572d/test#L225-L240