Please answer these questions before submitting your issue. Thanks!
go version)?go1.7rc6 from docker hub:
$ docker run -ti golang:1.7 /bin/sh
# go version
go version go1.7rc6 linux/amd64
go env)?Minikube - x86 VirtualBox VM on a Mac
# go env
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build828064041=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
Compiled Weave Scope with go1.7 (see https://github.com/weaveworks/scope/pull/1797), ran it on a minikube instance and it couldn't connect to a NATs instance:
$ kubectl logs --namespace=scope query-1106217792-nv4ar
<app> INFO: 2016/08/16 14:04:32.999799 app starting, version 919c3be, ID 357f1fafb9b6ed6f
<app> INFO: 2016/08/16 14:04:33.000030 command line args: --app.collector=dynamodb://abc:[email protected]:8000/reports --app.collector.s3=s3://abc:[email protected]:4569/s3 --app.http.address=:80 --app.log.http=true --app.memcached.hostname=memcached.scope.svc.cluster.local --app.memcached.service=memcached --app.memcached.timeout=100ms --app.nats=nats://nats.scope.svc.cluster.local:4222 --app.userid.header=X-Scope-OrgID --logtostderr=true --mode=app --weave=false
<app> INFO: 2016/08/16 14:04:33.000514 Connecting nats to nats://nats.scope.svc.cluster.local:4222
2016/08/16 14:04:33 nats err: dial tcp: no suitable address found
<app> FATA: 2016/08/16 14:04:33.000747 Error creating collector: nats: no servers available for connection
The "nats: no servers available for connection" comes form the NATS client, which hides the original error. the "dial tcp: no suitable address found" is me adding some logging to the client to show the error.
Kubectl exec'ing into a pod on the machine shows NATs is indeed accessible:
$ kubectl exec -ti --namespace=scope pipe-4267260430-ci7f0 /bin/sh
/home/weave # nslookup nats.scope.svc.cluster.local
Name: nats.scope.svc.cluster.local
Address 1: 10.0.0.132
# telnet nats.scope.svc.cluster.local 4222
INFO {"server_id":"T452ED9wLfSbSr3lPntysO","version":"0.8.0","go":"go1.6.2","host":"0.0.0.0","port":4222,"auth_required":false,"ssl_required":false,"tls_required":false,"tls_verify":false,"max_payload":1048576}
-ERR 'Unknown Protocol Operation'
-ERR 'Parser Error'
Connection closed by foreign host
# apk add drill
(1/2) Installing ldns (1.6.17-r3)
(2/2) Installing drill (1.6.17-r3)
Executing busybox-1.24.2-r0.trigger
OK: 25 MiB in 45 packages
/home/weave # drill nats.scope.svc.cluster.local
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 53508
;; flags: qr aa rd ra ; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;; nats.scope.svc.cluster.local. IN A
;; ANSWER SECTION:
nats.scope.svc.cluster.local. 30 IN A 10.0.0.132
;; AUTHORITY SECTION:
;; ADDITIONAL SECTION:
;; Query time: 1 msec
;; SERVER: 10.0.0.10
;; WHEN: Tue Aug 16 14:14:29 2016
;; MSG SIZE rcvd: 62
/home/weave # exit
I expect it to connect and work, as it does with go1.6
What does running with environment variable GODEBUG=netdns=1 say?
I suspect your Go 1.6 vs Go 1.7 differ in how they were built (cgo/netgo/etc), because I don't think anything changed in this area during Go 1.7.
/cc @mdempsky
I suspect your Go 1.6 vs Go 1.7 differ in how they were built (cgo/netgo/etc), because I don't think anything changed in this area during Go 1.7.
Possibly; although the only change I made was to use a different image (1.7 vs 1.6.2).
What does running with environment variable GODEBUG=netdns=1 say?
Will have a go now. If this doesn't help, I can try and provide a minimal reproduction tomorrow.
What does running with environment variable GODEBUG=netdns=1 say?
With go1.7rc6:
$ kubectl logs --namespace=scope query-846243711-ptwwy
<app> INFO: 2016/08/16 17:06:25.596993 app starting, version 919c3be, ID 336f06fd3b5d2bf3
<app> INFO: 2016/08/16 17:06:25.597260 command line args: --app.collector=dynamodb://abc:[email protected]:8000/reports --app.collector.s3=s3://abc:[email protected]:4569/s3 --app.http.address=:80 --app.log.http=true --app.memcached.hostname=memcached.scope.svc.cluster.local --app.memcached.service=memcached --app.memcached.timeout=100ms --app.nats=nats://nats.scope.svc.cluster.local:4222 --app.userid.header=X-Scope-OrgID --logtostderr=true --mode=app --weave=false
go package net: built with netgo build tag; using Go's DNS resolver
<app> INFO: 2016/08/16 17:06:25.597506 Connecting nats to nats://nats.scope.svc.cluster.local:4222
2016/08/16 17:06:25 nats: nats://nats.scope.svc.cluster.local:4222
2016/08/16 17:06:25 foo: nats.scope.svc.cluster.local:4222
2016/08/16 17:06:25 nats err: dial tcp: no suitable address found
<app> FATA: 2016/08/16 17:06:25.597841 Error creating collector: nats: no servers available for connection
Can you compare that with Go 1.6?
For your Go 1.7 output, I see:
go package net: built with netgo build tag; using Go's DNS resolver
I think the relevant change in Go 1.7 was 72c11808 (net: don't do DNS for onion and local addresses) for #13705.
When @mikioh mentioned RFC 6762, which says:
This document specifies that the DNS top-level domain ".local." is a
special domain with special semantics, namely that any fully
qualified name ending in ".local." is link-local, and names within
this domain are meaningful only on the link where they originate.
This is analogous to IPv4 addresses in the 169.254/16 prefix or IPv6
addresses in the FE80::/10 prefix, which are link-local and
meaningful only on the link where they originate.
Any DNS query for a name ending with ".local." MUST be sent to the
mDNS IPv4 link-local multicast address 224.0.0.251 (or its IPv6
equivalent FF02::FB). The design rationale for using a fixed
multicast address instead of selecting from a range of multicast
addresses using a hash function is discussed in Appendix B.
Implementers MAY choose to look up such names concurrently via other
mechanisms (e.g., Unicast DNS) and coalesce the results in some
fashion. Implementers choosing to do this should be aware of the
potential for user confusion when a given name can produce different
results depending on external network conditions (such as, but not
limited to, which name lookup mechanism responds faster).
I missed this part:
Implementers MAY choose to look up such names concurrently via other
mechanisms (e.g., Unicast DNS) and coalesce the results in some
fashion.
And currently we're just always skipping DNS for *.local addresses:
// avoidDNS reports whether this is a hostname for which we should not
// use DNS. Currently this includes only .onion and .local names,
// per RFC 7686 and RFC 6762, respectively. See golang.org/issue/13705.
func avoidDNS(name string) bool {
if name == "" {
return true
}
if name[len(name)-1] == '.' {
name = name[:len(name)-1]
}
return stringsHasSuffixFold(name, ".onion") || stringsHasSuffixFold(name, ".local")
}
We should probably relax the *.local case, at least for the netgo case, but maybe in all cases.
I doubt it's relevant, but what is your /etc/resolv.conf and /etc/nsswitch.conf?
/cc @mikioh @mdempsky @ianlancetaylor for any opinions and whether this is Go 1.7.1 worthy.
I am definitely for relaxing the *.local case. We use DNS for resolving .local domains and this is a blocker for us to go to Go 1.7. We actually have a local DNS server setup specifically for this case, which is local to the machine, and not propagated to upstream DNS servers.
Thanks for looking into this!
Relaxing the .local sounds like a plan, as things like kubedns and weavedns depend heavily on it.
@bradfitz I put together a quick CL to relax the .local case: https://go-review.googlesource.com/#/c/27250/
Sorry just realised there were some more outstanding questions:
Can you compare that with Go 1.6?
$ kubectl logs --namespace=scope query-846243711-d3dhl
<app> INFO: 2016/08/17 09:49:14.140623 app starting, version b3b160c, ID 24bcbcba11b7f7d3
<app> INFO: 2016/08/17 09:49:14.141131 command line args: --app.collector=dynamodb://abc:[email protected]:8000/reports --app.collector.s3=s3://abc:[email protected]:4569/s3 --app.http.address=:80 --app.log.http=true --app.memcached.hostname=memcached.scope.svc.cluster.local --app.memcached.service=memcached --app.memcached.timeout=100ms --app.nats=nats://nats.scope.svc.cluster.local:4222 --app.userid.header=X-Scope-OrgID --logtostderr=true --mode=app --weave=false
go package net: built with netgo build tag; using Go's DNS resolver
<app> INFO: 2016/08/17 09:49:14.146056 Connecting nats to nats://nats.scope.svc.cluster.local:4222
2016/08/17 09:49:14 nats: nats://nats.scope.svc.cluster.local:4222
2016/08/17 09:49:14 foo: nats.scope.svc.cluster.local:4222
<app> INFO: 2016/08/17 09:49:14.152539 listening on :80
I doubt it's relevant, but what is your /etc/resolv.conf and /etc/nsswitch.conf?
These jobs are running inside an Alpine container:
/home/weave # cat /etc/resolv.conf
search scope.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.0.0.10
options ndots:5
/home/weave # cat /etc/nsswitch.conf
cat: can't open '/etc/nsswitch.conf': No such file or directory
CL https://golang.org/cl/27250 mentions this issue.
Can we consider this for 1.7.1? I see the milestone has been set. Thanks!
Alas, but as mentioned in https://tools.ietf.org/html/draft-adpkja-dnsop-special-names-problem, it's better to stay away from the disturbance of TLD usage.
cGo DNS resolution of .local hostnames is still broken in Go 1.7.5, in Kubernetes scenarios.
Working around this by configuring environment variable GODEBUG with value netdns=go.
@mcandre, if you have a bug report, please file a new bug. We don't track closed issues. I'm not sure what you're saying, though. I don't know what you mean by "broken", or what a Kubernetes scenario means.
Most helpful comment
Can you compare that with Go 1.6?
For your Go 1.7 output, I see:
I think the relevant change in Go 1.7 was 72c11808 (net: don't do DNS for onion and local addresses) for #13705.
When @mikioh mentioned RFC 6762, which says:
I missed this part:
And currently we're just always skipping DNS for *.local addresses:
We should probably relax the *.local case, at least for the
netgocase, but maybe in all cases.I doubt it's relevant, but what is your /etc/resolv.conf and /etc/nsswitch.conf?
/cc @mikioh @mdempsky @ianlancetaylor for any opinions and whether this is Go 1.7.1 worthy.