From time to time I get this error when downloading new packages with stack build.
Trying to build the next day usually succeeds.
With the latest stack version there seems to be additional debugging output so I report this (again? I remember reading about this before but cannot find any issue about it).
Run command stack build
stack-yaml:
resolver: nightly-2016-07-13
packages:
- '.'
extra-deps: [SDL-0.6.5.1, SDL-image-0.6.1.2, svgcairo-0.13.1.0]
flags: {}
extra-package-dbs: []
The build succeeds.
Nothing happens, after about 85 seconds the error (see below) appears.
$ stack build --verbose
Version 1.1.2 x86_64 hpack-0.14.1
2016-08-28 15:02:32.265620: [debug] Checking for project config at: /<somePath>/stack.yaml @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:Stack.Config src/Stack/Config.hs:811:9)
2016-08-28 15:02:32.265801: [debug] Loading project config file stack.yaml @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:Stack.Config src/Stack/Config.hs:829:13)
2016-08-28 15:02:32.266757: [debug] Checking whether stack was built with libgmp4 @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:Stack.Config src/Stack/Config.hs:326:5)
2016-08-28 15:02:32.266858: [debug] Run process: ldd /usr/bin/stack @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:System.Process.Read src/System/Process/Read.hs:283:3)
2016-08-28 15:02:32.271302: [debug] Stack was not built with libgmp4 @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:Stack.Config src/Stack/Config.hs:330:14)
2016-08-28 15:02:32.271417: [debug] Trying to decode ~/.stack/build-plan-cache/x86_64-linux/nightly-2016-07-13.cache @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:Data.Binary.VersionTagged src/Data/Binary/VersionTagged.hs:55:5)
2016-08-28 15:02:32.277384: [debug] Success decoding ~/.stack/build-plan-cache/x86_64-linux/nightly-2016-07-13.cache @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:Data.Binary.VersionTagged src/Data/Binary/VersionTagged.hs:64:13)
2016-08-28 15:02:32.277510: [debug] Getting system compiler version @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:Stack.Setup src/Stack/Setup.hs:341:17)
2016-08-28 15:02:32.281052: [debug] Run process: ghc --info @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:System.Process.Read src/System/Process/Read.hs:283:3)
2016-08-28 15:02:32.314816: [debug] Asking GHC for its version @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:Stack.Setup.Installed src/Stack/Setup/Installed.hs:94:13)
2016-08-28 15:02:32.314917: [debug] Run process: ghc --numeric-version @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:System.Process.Read src/System/Process/Read.hs:283:3)
2016-08-28 15:02:32.338385: [debug] Getting Cabal package version @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:Stack.GhcPkg src/Stack/GhcPkg.hs:165:5)
2016-08-28 15:02:32.338492: [debug] Run process: ghc-pkg --no-user-package-db field --simple-output Cabal version @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:System.Process.Read src/System/Process/Read.hs:283:3)
2016-08-28 15:02:32.352291: [debug] Resolving package entries @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:Stack.Setup src/Stack/Setup.hs:221:5)
2016-08-28 15:02:32.352603: [debug] Getting global package database location @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:Stack.GhcPkg src/Stack/GhcPkg.hs:48:5)
2016-08-28 15:02:32.352716: [debug] Run process: ghc-pkg --no-user-package-db list --global @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:System.Process.Read src/System/Process/Read.hs:283:3)
2016-08-28 15:02:32.373896: [debug] Run process: ghc-pkg --global --no-user-package-db dump --expand-pkgroot @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:System.Process.Read src/System/Process/Read.hs:283:3)
2016-08-28 15:02:32.404226: [debug] Ignoring package dlist due to wanting version 0.7.1.2 instead of 0.8.0.1 @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:Stack.Build.Installed src/Stack/Build/Installed.hs:189:5)
2016-08-28 15:02:32.404437: [debug] Ignoring package setlocale due to wanting version 1.0.0.4 instead of 1.0.0.3 @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:Stack.Build.Installed src/Stack/Build/Installed.hs:189:5)
2016-08-28 15:02:32.404521: [debug] Ignoring package gtk2hs-buildtools due to wanting version 0.13.2.1 instead of 0.13.1.0 @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:Stack.Build.Installed src/Stack/Build/Installed.hs:189:5)
2016-08-28 15:02:32.404896: [debug] Run process: ghc-pkg --user --no-user-package-db --package-db ~/.stack/snapshots/x86_64-linux/nightly-2016-07-13/8.0.1/pkgdb dump --expand-pkgroot @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:System.Process.Read src/System/Process/Read.hs:283:3)
2016-08-28 15:02:32.444971: [debug] Run process: ghc-pkg --user --no-user-package-db --package-db /<somePath>/.stack-work/install/x86_64-linux/nightly-2016-07-13/8.0.1/pkgdb dump --expand-pkgroot @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:System.Process.Read src/System/Process/Read.hs:283:3)
2016-08-28 15:02:32.460735: [debug] Trying to decode ~/.stack/indices/Hackage/00-index.cache @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:Data.Binary.VersionTagged src/Data/Binary/VersionTagged.hs:55:5)
2016-08-28 15:02:32.600878: [debug] Success decoding ~/.stack/indices/Hackage/00-index.cache @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:Data.Binary.VersionTagged src/Data/Binary/VersionTagged.hs:64:13)
2016-08-28 15:02:32.619791: [debug] Getting global package database location @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:Stack.GhcPkg src/Stack/GhcPkg.hs:48:5)
2016-08-28 15:02:32.619895: [debug] Run process: ghc-pkg --no-user-package-db list --global @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:System.Process.Read src/System/Process/Read.hs:283:3)
2016-08-28 15:02:32.634506: [debug] Precompiled cache input = ["--dependency=base=base-4.9.0.0","--dependency=bytestring=bytestring-0.10.8.1","--dependency=hxt-charproperties=hxt-charproperties-9.2.0.1-L1rhanzXQ304SfBOSkeXkX","--dependency=parsec=parsec-3.1.11-BCos4GEVCuDB8dnOCBHO6X","--dependency=text=text-1.2.2.1-5QpmrLQApEZ4Ly9nMHWY0s"] @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:Stack.Build.Cache src/Stack/Build/Cache.hs:261:13)
2016-08-28 15:02:32.634801: [debug] Precompiled cache input = ["--dependency=base=base-4.9.0.0","--dependency=hxt-charproperties=hxt-charproperties-9.2.0.1-L1rhanzXQ304SfBOSkeXkX"] @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:Stack.Build.Cache src/Stack/Build/Cache.hs:261:13)
2016-08-28 15:02:32.673020: [debug] Downloading /hackage.fpcomplete.com/package/hxt-unicode-9.0.2.4.tar.gz @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:Network.HTTP.Download.Verified src/Network/HTTP/Download/Verified.hs:231:9)
2016-08-28 15:02:32.674713: [debug] Downloading /hackage.fpcomplete.com/package/hxt-regex-xmlschema-9.2.0.2.tar.gz @(stack-1.1.2-K10UHJ5rbiiHm4AmYlkZE:Network.HTTP.Download.Verified src/Network/HTTP/Download/Verified.hs:231:9)
Progress: 2/4HttpExceptionRequest Request {
host = "s3.amazonaws.com"
port = 443
secure = True
requestHeaders = []
path = "/hackage.fpcomplete.com/package/hxt-unicode-9.0.2.4.tar.gz"
queryString = ""
method = "GET"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
}
(ConnectionFailure getAddrInfo: does not exist (Name or service not known))
HttpExceptionRequest Request {
host = "s3.amazonaws.com"
port = 443
secure = True
requestHeaders = []
path = "/hackage.fpcomplete.com/package/hxt-regex-xmlschema-9.2.0.2.tar.gz"
queryString = ""
method = "GET"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
}
(ConnectionFailure getAddrInfo: does not exist (Name or service not known))
$ stack --version
Version 1.1.2 x86_64 hpack-0.14.1
via Arch Linux repository:
$ yaourt -Qi stack
Name : stack
Version : 1.1.2-20
Description : The Haskell Tool Stack
Architecture : x86_64
URL : https://github.com/commercialhaskell/stack
Licenses : custom:BSD3
Groups : None
Provides : None
Depends On : gmp libffi zlib
Optional Deps : ghc [installed]
Required By : None
Optional For : None
Conflicts With : None
Replaces : None
Installed Size : 34.86 MiB
Packager : Felix Yan <[email protected]>
Build Date : Sa 27 Aug 2016 06:33:40 CEST
Install Date : So 28 Aug 2016 02:16:46 CEST
Install Reason : Explicitly installed
Install Script : Yes
Validated By : SHA-256 Sum
Trying to build the next day usually succeeds.
Do you also mean that building again right away fails and building fails for the rest of the day?
In those moments, is Internet reachable? Is s3.amazonaws.com reachable?
However, some of these questions are answered in #483; the problem persisted for a while, and the issue continued with this this (probably unaddressed) comment:
In Stack.Types.StackT, there's a usage of tlsManagerSettings. You want to
modify the managerResponseTimeout field of it.
However, the issue seems to be about connections failing (which is normal) and Stack not retrying automatically. If so, retrying by hand right away should work.
Do you also mean that building again right away fails and building fails for the rest of the day?
Yes, this time I tried about 10 times to build the project and it failed every time. A couple of hours later... still fails. Another hour later it works on the second attempt.
In those moments, is Internet reachable?
Yes, I couldn't live otherwise :)
Is s3.amazonaws.com reachable?
It didn't occur to me it could be down for such a long time but I'll test it the next time I get the error.
If it bothers me enough I'll try to look into the tlsManagerSettings.
I'm facing this issue inside Travis' docker image. Stack fails to fetch build plan:
travis@051ccde25adf:~/transient-haskell/transient-universe$ stack setup
Downloading lts-7.7 build plan ...HttpExceptionRequest Request {
host = "raw.githubusercontent.com"
port = 443
secure = True
requestHeaders = []
path = "/fpco/lts-haskell/master//lts-7.7.yaml"
queryString = ""
method = "GET"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
}
(ConnectionFailure getAddrInfo: does not exist (Name or service not known))
Is this because secure HTTP? Maybe I miss some packages inside container?
@geraldus Part of the message (ConnectionFailure getAddrInfo: does not exist (Name or service not known)) suggests DNS name resolution failed, and if so the use of secure HTTP can't the problem — the connection is failing earlier.
Travis boxes have very flaky network connections, so unless it happens 100% of the time I assume it's normal. If command foo uses the network, you want to replace, in the build script, foo by travis_retry foo (that works at least in .travis.yml; google travis_retry for more info).
@Blaisorblade sorry, it was not clear that I'm running Travis container __locally__. I've tried to stack setup several times and each attempt failed. I can open requested URL in browser on host machine.
Network works:
travis@051ccde25adf:~/transient-haskell/transient-universe$ curl google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="http://www.google.ru/?gfe_rd=cr&ei=rKI1WOWaEY7DuAGU_YCoBQ">here</A>.
</BODY></HTML>
travis@051ccde25adf:~/transient-haskell/transient-universe$ curl http://raw.githubusercontent.com/fpco/lts-haskell/master//lts-7.7.yaml
travis@051ccde25adf:~/transient-haskell/transient-universe$ curl https://raw.githubusercontent.com/fpco/lts-haskell/master//lts-7.7.yaml
curl: (6) Couldn't resolve host 'raw.githubusercontent.com'
Opening requested link in browser with insecure protocol redirects me to HTTPS.
I bet I can simply copy build plan from host machine.
@geraldus Ah I retract my previous answer then, sorry for the misunderstanding. However:
curl: (6) Couldn't resolve host 'raw.githubusercontent.com'
That's in the end the same error that stack is getting, so it sounds like an issue with the container and not with stack per se.
However, getting that error for HTTPS but not HTTP makes no sense so the message is probably misleading.
Indeed, googling it suggests (as a first hypothesis) you might be behind a proxy (maybe unknown to you). Possibly you have http_proxy set correctly, but https_proxy unset or set incorrectly — you probably want the same setting.
https://github.com/Homebrew/legacy-homebrew/issues/42631
http://stackoverflow.com/a/34514721/53974
There are unconfirmed reports of proxy problems (#2672), but #1673 suggests stack supports HTTP proxies correctly.
I bet I can simply copy build plan from host machine.
That might work but stack will download lots more stuff, so it's probably worth fixing the network.
@Blaisorblade thanks for detailed answer, I other thing I've forgot to mention is that I was able git clone my project using HTTPS.
OK, folks, in my case I suppose my ISP is the culprit. Several months ago GitHub was blacklisted by most Russian ISP (twice this year if I recall correctly). Temporal workaround was to add alternative IPs for GitHub services in /etc/hosts. At present moment I have that IP list commented out in my hosts file and GitHub seems to work. However, I tried to uncomment line for raw.githubusercontent.com:
151.101.12.133 raw.githubusercontent.com
and re-run setup again. This solved network issue.
O_O Oh dear. Thanks @geraldus for solving the mystery on his side!
@Blaisorblade, I was too quick. Now stack fetched the build plan, but it fails to download GHCJS snapshot:
travis@051ccde25adf:~/transient-haskell/transient-universe$ stack setup $ARGS
Preparing to install GHCJS to an isolated location.
This will not interfere with any system-level installation.
Preparing to download ghcjs-0.2.1.9007007_ghc-8.0.1 ...HttpExceptionRequest Request {
host = "ghcjs.tolysz.org"
port = 80
secure = False
requestHeaders = []
path = "/ghc-8.0-2016-11-03-lts-7.7-9007007.tar.gz"
queryString = ""
method = "GET"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
}
(ConnectionFailure getAddrInfo: does not exist (Name or service not known))
Maybe this is a http-client bug?
@geraldus That URL seems to be served by S3 (looking at headers from curl -v -I 'http://ghcjs.tolysz.org/ghc-8.0-2016-11-03-lts-7.7-9007007.tar.gz'), is that blacklisted too?
Otherwise: Well, maybe it's a bug in some software you're using, and it could be http-client, but I'd recommend an "innocent until proven guilty" attitude, also because we would need details to attempt a fix—between your ISP, your host, Docker and the particular container there's a bunch of things that can go wrong.
Agree, innocent until proven guilty :)
I have exactly same issue when running from NixOS (installed in VirtualBox hosted by Windows 7) behind cntlm proxy. Trying to do the following:
stack install --nix ghc-mod hlint stylish-haskell hindent pandoc
It fails with:
HttpExceptionRequest Request {
host = "s3.amazonaws.com"
port = 443
secure = True
requestHeaders = []
path = "/hackage.fpcomplete.com/package/ansi-terminal-0.6.2.3.tar.gz"
queryString = ""
method = "GET"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
}
(ConnectionFailure getAddrInfo: does not exist (Name or service not known))
The call to stack setup --nix went fine before that and downloaded latest snapshot, ghc, etc.
Also have this issue from time to time in ubuntu 16.10 running in hyper-v (had it in virtualbox some time ago) with
$ stack --version
Version 1.3.2, Git revision 3f675146590da4f3edf768b89355f798229da2a5 (4395 commits) x86_64 hpack-0.15.0
At the same time I can resolve s3.amazonaws.com with host command and fetch actual file with wget.
I have the same on Ubuntu 16.10:
stack --version
Version 1.3.2, Git revision 3f675146590da4f3edf768b89355f798229da2a5 (4395 commits) x86_64 hpack-0.15.0
stack install ghc-mod
HttpExceptionRequest Request {
host = "s3.amazonaws.com"
port = 443
secure = True
requestHeaders = []
path = "/hackage.fpcomplete.com/package/refact-0.3.0.2.tar.gz"
queryString = ""
method = "GET"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
}
(ConnectionFailure getAddrInfo: does not exist (Name does not resolve))`
curl "https://s3.amazonaws.com/hackage.fpcomplete.com/package/refact-0.3.0.2.tar.gz" -I
HTTP/1.1 200 OK
x-amz-id-2: so0j4mMuBOPS/BGK0005/iV81xaQ4U97qgfRLHa83kzTIL9XvkMYK8Ly911h7jT099F22HKG2Os=
x-amz-request-id: 22A8D8420870FB29
Date: Sun, 29 Jan 2017 04:12:35 GMT
Last-Modified: Sat, 21 Nov 2015 23:15:27 GMT
ETag: "4947b12687e0b759cc7f318c79468484"
Accept-Ranges: bytes
Content-Type: binary/octet-stream
Content-Length: 2345
Server: AmazonS3
curl "https://s3.amazonaws.com/hackage.fpcomplete.com/package/refact-0.3.0.2.tar.gz" -o refact-0.3.0.2.tar.gz actually downloads the archive.
I am having this issue too, I can wget the addresses involved but stack build --install-ghc and stack setup do not succeed.
$ uname -a
Linux ubuntudev 4.8.0-34-generic #36-Ubuntu SMP Wed Dec 21 17:24:18 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.10
DISTRIB_CODENAME=yakkety
DISTRIB_DESCRIPTION="Ubuntu 16.10"
$ stack --version
Version 1.3.2, Git revision 3f675146590da4f3edf768b89355f798229da2a5 (4395 commits) x86_64 hpack-0.15.0
$ stack build --install-ghc
Preparing to install GHC (nopie) to an isolated location.
This will not interfere with any system-level installation.
Preparing to download ghc-nopie-8.0.1 ...HttpExceptionRequest Request {
host = "github.com"
port = 443
secure = True
requestHeaders = []
path = "/commercialhaskell/ghc/releases/download/ghc-8.0.1-release/ghc-8.0.1-x86_64-deb7-linux.tar.xz"
queryString = ""
method = "GET"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
}
(ConnectionFailure getAddrInfo: does not exist (Name does not resolve))
$ stack setup
HttpExceptionRequest Request {
host = "raw.githubusercontent.com"
port = 443
secure = True
requestHeaders = []
path = "/fpco/stackage-content/master/stack/stack-setup-2.yaml"
queryString = ""
method = "GET"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
}
(ConnectionFailure getAddrInfo: does not exist (Name does not resolve))
So I just reopened #483 since it contains info on what might be wrong.
New Ubuntu 16.10 install here. Just echoing previous comments. Same error, but running a wget on the files in question works totally fine.
These errors seem to come _very_ fast. Is there a possibility that the default timeout value is too low?
I think running stack with something like strace -s 100 could help, at least we will see what is happening at the lower levels. Unfortunately, I can't reproduce this bug now, so if anyone can, please put it somewhere like gist.
@cvb This is the output of
strace -s 100 stack install extra 2>&1
@cryo28 Could you run strace -qq -q -yy -f -x -s 100 -o strace_output.txt -e trace=network,read,write stack install extra? Stack seems to use separate threads for downloading and other network communication and to handle that strace need -f, also I really should try what I'm suggesting before sending the comment.
I've been having the same issue. Every stack command I issue failed immediately, eg:
$ stack setup
HttpExceptionRequest Request {
host = "s3.amazonaws.com"
port = 443
secure = True
requestHeaders = []
path = "/haddock.stackage.org/snapshots.json"
queryString = ""
method = "GET"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
}
(ConnectionFailure getAddrInfo: does not exist (Name does not resolve))
In my case at least it seems that it was certainly related to the use of a proxy. I was running stack inside a VM (Centos 6 running inside Virtualbox VM, Host OS was Windows). After disconnecting from the corporate network and unsetting http_proxy & https_proxy, it all started working correctly.
Using curl to fetch the above URL worked fine, with or without the proxy.
I notice that the request shows 'Proxy = Nothing'. Does that mean it is not picking up the proxy settings from the environment?
I am having this issue too. curl/wget package work fine :(
Strace:
strace_output.txt
For those who, like me, have struggled with this on Ubuntu 16.10 (in my case, server edition running on EC2), it seems that some changes to DNS services are the culprit. In any case, I was able to get things working by following the directions at the following link, excluding any lines related to NetworkManager, which seems to not be a part of Ubuntu Server: http://www.mjblythe.com/hacks/2016/11/issues-upgrading-from-ubuntu-15-10-to-16-10-via-16-04/#Solution
I haven't done a truly thorough investigation of the issue, but things are working on my end without any obvious explosions. Hopefully, this helps a few people out.
@matthewleon Did curl/wget work for you prior to the changes? Knowing that could be extremely helpful. "Issues with systemd DNS proxying" could be a more promising venue for investigation.
@Blaisorblade Yes, both worked fine. FWIW stack never seemed to fail when downloading build plans, copies of the GHC source, etc. Just the packages hosted in S3 buckets.
@nikita-b Thanks for the strace — there, most name resolution requests are handled by some local DNS proxy on IP 127.0.0.53—that's apparently systemd since release 231 (as hinted by https://github.com/gdnsd/gdnsd/issues/128). Can you post the content of your /etc/nsswitch.conf, or at least the hosts entry? Since your issue is systemd-related, if you're using Ubuntu 16.10 maybe your problem is solved by http://www.mjblythe.com/hacks/2016/11/issues-upgrading-from-ubuntu-15-10-to-16-10-via-16-04/#Solution. Caveat: I still have little clue myself, so please proceed with care and back up affected files (or refrain if not confident).
Analyzing this further, the changelog entry is enlightening:
https://github.com/systemd/systemd/blob/8feabc46263079cffba8a39c4082563320aeffc0/NEWS#L899-L915
systemd-resolved now listens on the local IP address 127.0.0.53:53
for DNS requests. This improves compatibility with local programs
that do not use the libc NSS or systemd-resolved's bus APIs for name
resolution. [...] Note that this local
DNS service is not as fully featured as the libc NSS or
systemd-resolved's bus APIs. [...] It is thus strongly recommended for
all applications to use the libc NSS API or native systemd-resolved
bus API instead.
It's not clear why any of the limitations mentioned should cause this issue. But getAddrInfo (https://www.stackage.org/haddock/lts-7.0/network-2.6.3.1/Network-Socket.html#v:getAddrInfo) is implemented by NSS, it seems, so it seems we're using the correct API and it's not clear why this should fail.
Maybe affected systems are contacting systemd through this DNS server because of a misconfiguration (say, in nsswitch.conf) or a bug.
/etc/nsswitch.conf
# /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.
passwd: compat
group: compat
shadow: compat
hosts: files mdns4_minimal [NOTFOUND=return] resolve [!UNAVAIL=return] dns
networks: files
protocols: db files
services: db files
ethers: db files
rpc: db files
netgroup: nis
/etc/hosts/
127.0.0.1 localhost
127.0.1.1 ubuntu
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
Both files were not modified. I added entries to the hosts, but removing them did not help.
But if set DNS from Google (8.8.8.8), then everything really works correctly :(
OK, I finally have an almost complete hypothesis that might explain this bug. Haven't time to do follow-up experiments, but "sensible theory" is progress. Thanks @matthewleon for the initial hint, @nikita-b for gathering compelling evidence, and others.
/etc/resolv.conf in affected vs unaffected states? I think the fix is not quite using Google's DNS, as much as removing the auto-added 127.0.0.53 from the list of resolver. That appears to be a semi-working DNS provided by systemd.Sorry no time to try this myself.
@nikita-b and other Ubuntu 16.10 users. Consider trying out https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1624320/comments/8.
Amazon (and half the Internet) uses CNAME records. The other half should keep working, which fits @matthewleon's note.
Unknown: why is stack not using NSS? I suspect some linking problem—NSS must be dynamically linked, Haskell is statically linked. But that's not the full story.
I googled "systemd dns compatibility" and found https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1624320. That, together with your strace log, suggests that sometimes stack contacts systemd as DNS server (which should NOT happen, but maybe there's some problem with NSS). Since that resolver is only semi-compliant, trouble results.
In particular, bug https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1647031 shows that systemd does not follow CNAME records. Somebody complained that Amazon fails but other hosts work. I verified Amazon S3 uses a CNAME record as suggested in that bug:
$ dig +no{cmd,comments,stats} s3.amazonaws.com
;s3.amazonaws.com. IN A
s3.amazonaws.com. 3573 IN CNAME s3-1.amazonaws.com.
s3-1.amazonaws.com. 2 IN A 54.231.49.67
The bug is open in Yakkety = 16.10.
Follow-up: @nikita-b can you check if doing Ubuntu package upgrades (by apt-get upgrade or your preferred method) fixes the bug? It might. I've tried to read https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1647031 but I am confused if they fixed their part of the bug and on which release.
FWIW: rebuilding stack from source might also provide an inferior workaround, but you need a working setup first. But for now, stack official binaries are still affected.
@stulli Is your Arch Linux system using systemd as well? Any clue if you're hitting the same bug?
Yes, I am using systemd as well. Unfortunately I don't have access to that
machine for two more weeks.
Am 15.03.2017 4:02 nachm. schrieb "Paolo G. Giarrusso" <
[email protected]>:
@stulli https://github.com/stulli Is your Arch Linux system using
systemd as well? Any clue if you're hitting the same bug?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/commercialhaskell/stack/issues/2536#issuecomment-286769732,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AADjE1_MWtTuLhoqZS2jEPSmUUqn7YCRks5rl_2agaJpZM4Ju7NC
.
Might try downloading a dynamically linked stack executable from https://github.com/commercialhaskell/stack/releases/download/v1.3.2/stack-1.3.2-linux-x86_64.tar.gz. This is dynamically linked with glibc and libgmp.
This worked for me: unset http_proxy , and/or unset https_proxy . Had in the beginning the same error message. Typed these commands in. Then it worked right away.
For archlinux users make sure you have run the following line "sudo ln -s /run/systemd/resolve/resolv.conf /etc/resolv.conf" if you are using systemd to handle your networking (ls -a /etc, to ensure symbolic link). This fixed the issue for me.
As a archlinux user I was using stack installed by the installer script:
(curl -sSL https://get.haskellstack.org/ | sh) who install the static linked version !
The solution of @jm4games didn't work for me because I'm using Network-manager and not systemd-resolve (only one shall be activated at the same time).
So the solution was given by @borsboom : download and install the dynamic linked version of stack.
I don't understand why more people don't get this error (the script install always the static version).
@Blaisorblade thanks a lot for your analysis !
Had the same problem on Ubuntu 17.10 and following the steps outlined in https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1624320/comments/8 as proposed by @Blaisorblade fixed it. Thanks @Blaisorblade for the workaround.
@mkoerner Can you reliably reproduce this issue when not using the workaround?
It would be quite valuable for us to find out, as theoretically the systemd-resolved CNAME problem should be fixed in Ubuntu 17.10.
No response to last question, closing as it seems to be resolved.
(If anybody sees this issue reappear, please reopen or comment.)
I see this issue again, and I am running in a virtual machine, with newly installed stack via curl | sh.
This worked for me: unset http_proxy , and/or unset https_proxy . Had in the beginning the same error message. Typed these commands in. Then it worked right away.
work for me too.
I see the same error again in WSL Ubuntu 18.04. There is a proxy running on Windows but there is no http_proxy nor https_proxy environmental variable in WSL Ubuntu. Furthermore, it looks like the latest release only contains statically linked executables ? Could you suggest how to fix ? :D
Most helpful comment
OK, I finally have an almost complete hypothesis that might explain this bug. Haven't time to do follow-up experiments, but "sensible theory" is progress. Thanks @matthewleon for the initial hint, @nikita-b for gathering compelling evidence, and others.
Questions for @nikita-b
/etc/resolv.confin affected vs unaffected states? I think the fix is not quite using Google's DNS, as much as removing the auto-added127.0.0.53from the list of resolver. That appears to be a semi-working DNS provided by systemd.Candidate reproduction
Sorry no time to try this myself.
Candidate workaround
@nikita-b and other Ubuntu 16.10 users. Consider trying out https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1624320/comments/8.
Underlying bug
Amazon (and half the Internet) uses CNAME records. The other half should keep working, which fits @matthewleon's note.
Unknown: why is stack not using NSS? I suspect some linking problem—NSS must be dynamically linked, Haskell is statically linked. But that's not the full story.
Explanation
I googled "systemd dns compatibility" and found https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1624320. That, together with your strace log, suggests that sometimes stack contacts
systemdas DNS server (which should NOT happen, but maybe there's some problem with NSS). Since that resolver is only semi-compliant, trouble results.In particular, bug https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1647031 shows that systemd does not follow CNAME records. Somebody complained that Amazon fails but other hosts work. I verified Amazon S3 uses a CNAME record as suggested in that bug:
The bug is open in Yakkety = 16.10.