Also see: https://github.com/grpc/grpc-go/issues/1741#issuecomment-366964806
cc: @gyuho
I'm using etcd embedded in https://github.com/purpleidea/mgmt and trying to get unix domain sockets working to create an single server 'cluster' with no external port open.
I need to do some more extensive testing but functionally everything seems to work fine in my case except that I get this warning:
WARNING: 2018/02/20 13:24:57 grpc: addrConn.createTransport failed to connect to {clients.sock:0 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: lookup clients.sock: no such host". Reconnecting...
It might just be related to clustering parts that are not needed/used when running single instance?
I've boiled down the issue to a part in grpc where an "tcp" dialer is created if none is provided: https://github.com/grpc/grpc-go/blob/3926816d541db48f3e4c1c87cff75ceeb205309e/clientconn.go#L438
Changing this to "unix" seems to stop the warnings, however I have not fully tested etc functionality.
As suggested by @menghanl (https://github.com/grpc/grpc-go/issues/1741#issuecomment-367082157) having etcd pass a full url to grpc and let it do additional parsing did not yet solve the problem. But this may be due to my inexperience with the components involved.
I also noticed there are multiple entrypoints where the warning originated from in this case:
https://github.com/coreos/etcd/blob/master/clientv3/client.go#L352 and https://github.com/coreos/etcd/blob/master/embed/serve.go#L195
I'll continue investigation w.r.t. the mgmt embedded etcd as I'm currently not using etcd outside this context so don't have any other functional reference in that regard.
@gyuho I have some extra cycles Thursday if you are not too deep into this already I can assist or take this on.
@hexfusion Sure go ahead. Thanks!
@aequitas would you mind sharing your mgmt startup flags to reproduce so I can see the whole picture. Going to dig into this now, thanks.
@hexfusion sure, sorry for the late response, thanks for helping out on this.
You might want to be running this branch (https://github.com/aequitas/mgmt/tree/unix-domain-sockets) instead of the current master. Both should show the warning message. But on my branch mgmt doesn't default to TCP for client connection so might expose additional problem paths.
./mgmt run --client-urls unix://clients.sock:0 --server-urls unix://servers.sock:0 --tmp-prefix
Running 'normal' etcd also exposes the warning:
etcd --listen-peer-urls unix://etcd:0 --listen-client-urls unix://etcd:1 --advertise-client-urls unix://etcd:1
$ etcd --version
etcd Version: 3.3.1
Git SHA: GitNotFound
Go Version: go1.9.4
Go OS/Arch: darwin/amd64
@aequitas I really think you hit the nail on the head with your assessment. I proposed an upstream patch to grpc-go allowing for unix proto support via DialContext. Also a minor patch to etcd. With these in place, I am able to have both peer and client work with unix socket. We will see where this goes from here but please test when you get a chance. On a sidenote mgmt is really cool I hope to start using this :).
On a sidenote mgmt is really cool I hope to start using this :).
@hexfusion I think so too! ;) Thanks for your help, and please get involved! We've got a number of tagged open issues for new users who want to explore the project. Join us in #mgmtconfig on Freenode IRC.
@hexfusion awesome, thanks for your effort.
Do you know what functionality is broken by this, except for the warning being shown? My guess is it is clustering as that is the only part I had not tested yet?
Looking forward to your patches being merged. For me this is/was one of the blocker to start using mgmt in production.
I'll update soon with testing results.
@hexfusion so short term result (using: https://github.com/purpleidea/mgmt/pull/343/commits/75f323e95ff4d54db9c781310ea0419600e9ebed) :
$ ./mgmt
panic: http: multiple registrations for /debug/requests
goroutine 1 [running]:
net/http.(*ServeMux).Handle(0x61c6480, 0x558bad9, 0xf, 0x6137720, 0x55e4ff0)
/Users/johan/.brew/Cellar/go/1.9.4/libexec/src/net/http/server.go:2270 +0x627
net/http.(*ServeMux).HandleFunc(0x61c6480, 0x558bad9, 0xf, 0x55e4ff0)
/Users/johan/.brew/Cellar/go/1.9.4/libexec/src/net/http/server.go:2302 +0x55
net/http.HandleFunc(0x558bad9, 0xf, 0x55e4ff0)
/Users/johan/.brew/Cellar/go/1.9.4/libexec/src/net/http/server.go:2314 +0x4b
golang.org/x/net/trace.init.0()
/Users/johan/.gopath/src/golang.org/x/net/trace/trace.go:115 +0x42
golang.org/x/net/trace.init()
<autogenerated>:1 +0x1cd
github.com/purpleidea/mgmt/vendor/google.golang.org/grpc.init()
<autogenerated>:1 +0xa0
github.com/purpleidea/mgmt/etcd.init()
<autogenerated>:1 +0xbe
github.com/purpleidea/mgmt/lib.init()
<autogenerated>:1 +0x7f
main.init()
<autogenerated>:1 +0x4e
Probably due to grpc imports by mgmt itself, commenting those out 'solved' the problem. But I have to learn a bit more about go import mechanics to make proper assumptions. Also I don't know if the current method I used for testing (changing the submodule heads in vendor/ to your branches) is the best approach.
@aequitas thanks for the report will dig in this evening. Maybe this patch will be useful short term for testing.
diff --git a/vendor/google.golang.org/grpc/clientconn.go b/vendor/google.golang.org/grpc/clientconn.go
index 71de2e50d..1a747909c 100644
--- a/vendor/google.golang.org/grpc/clientconn.go
+++ b/vendor/google.golang.org/grpc/clientconn.go
@@ -378,7 +378,12 @@ func DialContext(ctx context.Context, target string, opts ...DialOption) (conn *
if cc.dopts.copts.Dialer == nil {
cc.dopts.copts.Dialer = newProxyDialer(
func(ctx context.Context, addr string) (net.Conn, error) {
- return (&net.Dialer{}).DialContext(ctx, "tcp", addr)
+ proto := "tcp"
+ t := parseTarget(addr)
+ if t.Scheme == "unix" {
+ proto = t.Scheme
+ }
+ return (&net.Dialer{}).DialContext(ctx, proto, t.Endpoint)
},
)
}
@hexfusion I applied both your patches manually and everything works as expected :). Will deploy this to my 'production' installation and do some realworld testing. But for now everything looks good.
@hexfusion At the moment I've stopped development on mgmt so my urgency for this change is gone. If however you need something tested let me know and I'll see if I can help.
@aequitas understood. As the issue is one of general support I would like to leave this ticket open for tracking, thanks!
Fixed by #9354
/cc @aequitas ^^