Etcd: Fix flaky tests for 3.4 release

Created on 5 Aug 2019  路  11Comments  路  Source: etcd-io/etcd

Most helpful comment

Now, only functional tests are flaky. Adding server output on failures:

https://github.com/etcd-io/etcd/blob/158354755a563ccddaaec4a8f20522121e19572d/test#L225-L240

All 11 comments

Sent out upstream gRPC fixes

For

go test -v -run TestLeasingTxnOwnerIf

# or

=== RUN   TestLeasingNonOwnerPutError
{"level":"warn","ts":"2019-08-03T07:18:10.564Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-99ab9134-5b05-46e0-965d-601c54493c53/localhost:1341936467916325720","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: Error while dialing dial unix localhost:1341936467916325720: connect: no such file or directory\""}
{"level":"warn","ts":"2019-08-03T07:18:10.565Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-99ab9134-5b05-46e0-965d-601c54493c53/localhost:1341936467916325720","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0xbc42f5]

goroutine 26716 [running]:
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*csAttempt).sendMsg(0xc00017e700, 0x146c6c0, 0xc000038580, 0xc000038588, 0x5, 0x5, 0xc000038590, 0xa, 0xa, 0xc000038590, ...)
    /go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:827 +0x155
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*clientStream).SendMsg.func2(0xc00017e700, 0x0, 0xc001cdca50)
    /go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:692 +0x186
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*clientStream).withRetry(0xc000392120, 0xc001cdca50, 0xc0012b8d38, 0x20c62f8, 0x0)
    /go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:572 +0x43d
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*clientStream).SendMsg(0xc000392120, 0x146c6c0, 0xc000038580, 0x0, 0x0)
    /go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:698 +0x644
go.etcd.io/etcd/etcdserver/etcdserverpb.(*leaseLeaseKeepAliveClient).Send(0xc001cb2560, 0xc000038580, 0x0, 0x0)
    /go/src/go.etcd.io/etcd/etcdserver/etcdserverpb/rpc.pb.go:3828 +0x6b
go.etcd.io/etcd/clientv3.(*lessor).sendKeepAliveLoop(0xc001c68a00, 0x1768ae0, 0xc001cb2560)
    /go/src/go.etcd.io/etcd/clientv3/lease.go:573 +0x2d5
created by go.etcd.io/etcd/clientv3.(*lessor).resetRecv
    /go/src/go.etcd.io/etcd/clientv3/lease.go:489 +0x345
FAIL    go.etcd.io/etcd/clientv3/integration    136.425s
=== RUN   TestLeasingReconnectTxn
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0xbc42f5]

goroutine 32579 [running]:
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*csAttempt).sendMsg(0xc0024e0980, 0x14974a0, 0xc001bc8b30, 0xc0017da388, 0x5, 0x5, 0xc0017da390, 0xb, 0xb, 0xc0017da390, ...)
    /go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:827 +0x155
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*clientStream).SendMsg.func2(0xc0024e0980, 0x0, 0xc0009dcbe0)
    /go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:692 +0x186
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*clientStream).withRetry(0xc001ac4000, 0xc0009dcbe0, 0xc0000c5aa8, 0x20c82f8, 0x0)
    /go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:572 +0x43d
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*clientStream).SendMsg(0xc001ac4000, 0x14974a0, 0xc001bc8b30, 0x0, 0x0)
    /go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:698 +0x644
go.etcd.io/etcd/etcdserver/etcdserverpb.(*watchWatchClient).Send(0xc001bc8b10, 0xc001bc8b30, 0x0, 0x20c82f8)
    /go/src/go.etcd.io/etcd/etcdserver/etcdserverpb/rpc.pb.go:3697 +0x6b
go.etcd.io/etcd/clientv3.(*watchGrpcStream).run(0xc000152750)
    /go/src/go.etcd.io/etcd/clientv3/watch.go:547 +0xd9b
created by go.etcd.io/etcd/clientv3.(*watcher).newWatcherGrpcStream
    /go/src/go.etcd.io/etcd/clientv3/watch.go:277 +0x58c
FAIL    go.etcd.io/etcd/clientv3/integration    162.826s
=== RUN   TestLeasingNonOwnerPutError
{"level":"warn","ts":"2019-08-04T02:13:05.300Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-1e17e7fb-c8b4-43a1-9d89-79ca429efaef/localhost:51782595789236259660","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: Error while dialing dial unix localhost:51782595789236259660: connect: no such file or directory\""}
{"level":"warn","ts":"2019-08-04T02:13:05.300Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-1e17e7fb-c8b4-43a1-9d89-79ca429efaef/localhost:51782595789236259660","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[DEBUG] about to panic! true
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xbc1cb2]
goroutine 27073 [running]:
go.etcd.io/etcd/vendor/google.golang.org/grpc/internal/transport.(*Stream).Context(...)
    /go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/internal/transport/transport.go:386
go.etcd.io/etcd/vendor/google.golang.org/grpc.(*clientStream).Context(0xc0017a4480, 0xc001a7d080, 0x7f8cbd54b6)
    /go/src/go.etcd.io/etcd/vendor/google.golang.org/grpc/stream.go:565 +0x82
go.etcd.io/etcd/clientv3.(*lessor).sendKeepAliveLoop(0xc0000efb80, 0x1769b40, 0xc001a09d70)
    /go/src/go.etcd.io/etcd/clientv3/lease.go:581 +0x349
created by go.etcd.io/etcd/clientv3.(*lessor).resetRecv
    /go/src/go.etcd.io/etcd/clientv3/lease.go:489 +0x345
FAIL    go.etcd.io/etcd/clientv3/integration    142.556s

Right now, we are debugging with gRPC tip https://github.com/grpc/grpc-go/pull/2958, which should fix the panic.

Now, only functional tests are flaky. Adding server output on failures:

https://github.com/etcd-io/etcd/blob/158354755a563ccddaaec4a8f20522121e19572d/test#L225-L240

Great. So we need grpc-go 1.23 to fix the panics?

Right now, we are debugging with gRPC tip grpc/grpc-go#2958, which should fix the panic.

Is it too late to get that into 1.23?

https://github.com/grpc/grpc-go/pull/2958#issuecomment-519207849

@jingyih @hexfusion gRPC usually releases from master branch cut. It should be included in 1.23.

I also saw TestRoundRobinBalancedResolvableFailoverFromRequestFail fail again in https://travis-ci.com/etcd-io/etcd/jobs/223131529

Also this failure (a functional test?) https://travis-ci.com/etcd-io/etcd/jobs/223267755

[...]
{"level":"fatal","ts":"2019-08-07T17:32:14.954Z","caller":"etcdmain/etcd.go:271","msg":"discovery failed","error":"cannot fetch cluster info from peer urls: could not retrieve cluster information from the given URLs","stacktrace":"go.etcd.io/etcd/etcdmain.startEtcdOrProxyV2\n\t/go/src/go.etcd.io/etcd/etcdmain/etcd.go:271\ngo.etcd.io/etcd/etcdmain.Main\n\t/go/src/go.etcd.io/etcd/etcdmain/main.go:46\nmain.main\n\t/go/src/go.etcd.io/etcd/main.go:28\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:200"}

./test: line 199:  8844 Terminated              ./bin/etcd-agent --network tcp --address 127.0.0.1:${a}9027

FAILED! 'tail -1000 /tmp/etcd-functional-2/etcd.log'

tail: cannot open '/tmp/etcd-functional-2/etcd.log' for reading: No such file or directory

@tbg The functional test failures should be fixed by https://github.com/etcd-io/etcd/pull/11006.

Regarding TestRoundRobinBalancedResolvableFailoverFromRequestFail, @jpbetz any idea?

One more:

--- FAIL: TestSTMSerialize (0.28s)

    v3_stm_test.go:204: etcdserver: leader changed

    v3_stm_test.go:204: etcdserver: leader changed

Haven't seen an issue about this one, but also I don't think it's caused by my PR:

https://travis-ci.com/etcd-io/etcd/jobs/223465586

Closing this for now. Most notable ones are fixed.

Was this page helpful?
0 / 5 - 0 ratings