With clientv3 on the master branch (7648841bad1ef56d83e568ae006f0bb2dc9e917a), clientv3.New() won't return any error even when no endpoint is available to connect.
The following reproduction code doesn't panic with master, while the same code with v3.3.x panics.
conf := clientv3.Config{
Endpoints: []string{"invalid_endpoint:2379"},
DialTimeout: 10 * time.Second,
}
client, err := clientv3.New(conf)
if err != nil {
panic("connection failed")
}
I assume this behavior change has been introduced along with the new load balancer. It makes sense that the load balancer internally handles connection errors and hide them from users, because the load balancer is responsible for endpoint selection and reconnection.
(I myself is fine with this behavior, since following requests will fail when no endpoint is available and we can detect the problem at that timing)
However, this looks a breaking change for users. I assume some code has different error handlers when no endpoint is available at the startup of applications (like just die instead of retry).
I therefore think that at least this change is mentioned in the CHANGELOG as a breaking change.
@yudai Yeah, thanks for pointing out. We may have overlooked that change. Now balancer has internal routine for endpoint resolver, that could delay initial connection start. For now, you may need DialOptions: []grpc.DialOption{grpc.WithBlock()},. Will improve docs and change logs. /cc @jpbetz
grpc.WithBlock() is going to be required for sure. Should we set that by default to attempt to preserve 3.3- behavior?
That said, I'm not certain the right thing happens even with that option set.
Confirmed DialOptions: []grpc.DialOption{grpc.WithBlock()}, would return errors like old client, but without this dial option, connect happens in background returning the client object before initial connection.
We plan to document this clearly (ref https://github.com/coreos/etcd/issues/9829). Also, need more research about enabling it by default (since you cannot disable WithBlock once configured).
Using DialOptions: []grpc.DialOption{grpc.WithBlock()}, in my code leads to the following compile error: cannot use []"google.golang.org/grpc".DialOption literal (type []"google.golang.org/grpc".DialOption) as type []"go.etcd.io/etcd/vendor/google.golang.org/grpc".DialOption in field value (note: I'm a Go beginner)
@philippgille Looks like you've hit an issue with go dependency management in your project. In particular, it looks like a vendoring issue since google.golang.org/grpc.DialOption is the correct type for DialOption but I see it referenced in the compile error as go.etcd.io/etcd/vendor/google.golang.org/grpc.DialOption. Not knowing your you're doing go dependency management, I can't say much more, but ensure you always importing with the name google.golang.org/grpc and then review how your managing your dependencies if that doesn't fix it.
@philippgille, based on this comment, I solved the problem like this:
clientConfig := clientv3.Config{
Endpoints: []string{"http://" + *etcdHost},
DialTimeout: 2 * time.Second,
}
etcd, err := clientv3.New(clientConfig)
if err != nil {
return nil, errors.Wrap(err, "could not connect to etcd")
}
timeoutCtx, cancel := context.WithTimeout(context.Background(), 2 * time.Second)
defer cancel()
_, err = etcd.Status(timeoutCtx, clientConfig.Endpoints[0])
if err != nil {
return nil, errors.Wrapf(err, "error checking etcd status: %v", err)
}