Lnd: Crash during startup on lookup soa.nodes.lighning.directory: server misbehaving

Created on 28 Jun 2018  Â·  22Comments  Â·  Source: lightningnetwork/lnd

Background

My lnd started crashing after updating to master. Before this update I was running git master from few days ago and didn't experience such problem.

Lnd crashes directly on startup with the line:

lookup soa.nodes.lightning.directory on 192.168.69.1:53: server misbehaving
(that IP is my router)

I tested latest release lnd-v0.4.2-beta and everything works normally. So I expect something wrong has been commited to master recently.

Your environment

  • git master c344a3a642b9a8dc8303d7f6b4bd130d7eb747f5
  • Linux server01 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux
  • bitcoind 0.16.1

Steps to reproduce

# ./lnd
lookup soa.nodes.lightning.directory on 192.168.69.1:53: server misbehaving

Expected behaviour

Node should start normally.

Actual behaviour

Process exits.

Most helpful comment

Seems like cloudflare dns (1.1.1.1) doens't resolve:

dig soa.nodes.lightning.directory @1.1.1.1

; <<>> DiG 9.10.6 <<>> soa.nodes.lightning.directory @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 40711
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1452
;; QUESTION SECTION:
;soa.nodes.lightning.directory. IN  A

;; Query time: 4134 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Tue Jul 10 12:11:32 CEST 2018
;; MSG SIZE  rcvd: 58

ediskraba1@edismacbook:~$ dig soa.nodes.lightni

The Google DNS does resolve.

I think it's a problem with the nodes.lightning.directory DNS server configuration:

NS for .directory > demand.alpha.aridns.net.au.

> dig ns lightning.directory @demand.alpha.aridns.net.au.
...
;; AUTHORITY SECTION:
lightning.directory.    86400   IN  NS  nodes.lightning.directory.
lightning.directory.    86400   IN  NS  nodes2.lightning.directory.

;; ADDITIONAL SECTION:
nodes.lightning.directory. 86400 IN A   104.131.26.124
nodes2.lightning.directory. 86400 IN    A   104.131.26.124
> dig soa lightning.directory @104.131.26.124
; <<>> DiG 9.10.6 <<>> soa lightning.directory @104.131.26.124
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 18450
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;lightning.directory.       IN  SOA

;; Query time: 175 msec
;; SERVER: 104.131.26.124#53(104.131.26.124)
;; WHEN: Tue Jul 10 12:31:12 CEST 2018
;; MSG SIZE  rcvd: 37

no SOA configured for lightning.directory.

All 22 comments

Has this always been the case? Or is this a new issue? Are you able to manually hit that hostname using dig?

â›°dig soa.nodes.lightning.directory

; <<>> DiG 9.8.3-P1 <<>> soa.nodes.lightning.directory
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 65275
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 2

;; QUESTION SECTION:
;soa.nodes.lightning.directory. IN  A

;; ANSWER SECTION:
soa.nodes.lightning.directory. 60 IN    A   104.131.26.124

;; AUTHORITY SECTION:
lightning.directory.    22977   IN  NS  nodes.lightning.directory.
lightning.directory.    22977   IN  NS  nodes2.lightning.directory.

;; ADDITIONAL SECTION:
nodes.lightning.directory. 22977 IN A   104.131.26.124
nodes2.lightning.directory. 22977 IN    A   104.131.26.124

;; Query time: 104 msec
;; SERVER: 10.2.1.1#53(10.2.1.1)
;; WHEN: Thu Jun 28 11:59:37 2018
;; MSG SIZE  rcvd: 130

Do you have any special set up with your networking?

Hey sorry for my fat fingers, this issue has been submitted by mistake and it wasn't filled completely. So I copied details from #1464 here and deleted #1464.

To answer your questions:

# dig soa.nodes.lightning.directory

; <<>> DiG 9.10.3-P4-Debian <<>> soa.nodes.lightning.directory
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 44827
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;soa.nodes.lightning.directory. IN  A

;; Query time: 0 msec
;; SERVER: fd45:d1e7:1965::1#53(fd45:d1e7:1965::1)
;; WHEN: Thu Jun 28 22:19:53 UTC 2018
;; MSG SIZE  rcvd: 58

Seems that it's not working, and no, I don't have any special network setup.

Has this always been the case? Or is this a new issue?

As stated in issue description now, it does NOT work with recent master, everything however DOES work correctly using lnd-v0.4.2-beta or even on clone of git which was few days old. So the problem has been introduced in recent days.

Hmm, if you can't hit the host as normal, then that might point to an issue with your set up. The seed itself hasn't changed at all.

Will spin up a fresh node on master to see if I can repro.

That's quite strange now.

When I put IP to /etc/hosts (I resoved it on another machine), now both master and v0.4.2-beta works.

When I comment-out the IP in /etc/hosts, master stops working and v0.4.2-beta still works. Both on the same node directory.

So obviously something is wrong with my network, because the DNS doesn't resolve, but anyway both binaries have different behavior in such case.

I tried both versions with new node setup (fresh lnddir) and the behavior is still the same. Latest master doesn't boot-up, but v0.4.2 does.

Just booted up a fresh node on master, and was able to perform initial bootstrap (via the DNS seed) no problem.

If you change locations (and networks) does the issue persist?

Let me fix my DNS issues. I'll report back.

Ok, after setting Google's DNS on my edge router, I can start up node normally. I have no explanation for this and why different binaries of lnd behaves differently, but I suppose it's not worth debugging such corner case. Thanks for your time!

One thing that changed between 0.4.2 and master is that we'll randomize which bootstrapper we use. With 0.4.2, after the initial bootstrap, we would only consult the peers we know of on disk. In the current master, in order to continue to sample new fresh peers, we'll occasionally hit the DNS seed again.

Happy I was able to help!

Seems like cloudflare dns (1.1.1.1) doens't resolve:

dig soa.nodes.lightning.directory @1.1.1.1

; <<>> DiG 9.10.6 <<>> soa.nodes.lightning.directory @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 40711
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1452
;; QUESTION SECTION:
;soa.nodes.lightning.directory. IN  A

;; Query time: 4134 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Tue Jul 10 12:11:32 CEST 2018
;; MSG SIZE  rcvd: 58

ediskraba1@edismacbook:~$ dig soa.nodes.lightni

The Google DNS does resolve.

I think it's a problem with the nodes.lightning.directory DNS server configuration:

NS for .directory > demand.alpha.aridns.net.au.

> dig ns lightning.directory @demand.alpha.aridns.net.au.
...
;; AUTHORITY SECTION:
lightning.directory.    86400   IN  NS  nodes.lightning.directory.
lightning.directory.    86400   IN  NS  nodes2.lightning.directory.

;; ADDITIONAL SECTION:
nodes.lightning.directory. 86400 IN A   104.131.26.124
nodes2.lightning.directory. 86400 IN    A   104.131.26.124
> dig soa lightning.directory @104.131.26.124
; <<>> DiG 9.10.6 <<>> soa lightning.directory @104.131.26.124
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 18450
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;lightning.directory.       IN  SOA

;; Query time: 175 msec
;; SERVER: 104.131.26.124#53(104.131.26.124)
;; WHEN: Tue Jul 10 12:31:12 CEST 2018
;; MSG SIZE  rcvd: 37

no SOA configured for lightning.directory.

@Roasbeef I'm having the same problem, maybe we should reopen the issue? For me, neither Google nor Cloudfare working. There might be something wrong with nodes.lightning.directory DNS config as @edspiner pointed:

$ host -t ns lightning.directory 8.8.8.8
Using domain server:
Name: 8.8.8.8
Address: 8.8.8.8#53
Aliases:

Host lightning.directory not found: 2(SERVFAIL)

Had to reboot the server, but generally it should boot if it's down, fixing
it shortly in a new commit. Apologies for the issue.

On Thu, Aug 16, 2018, 7:53 PM Alexey Zagarin notifications@github.com
wrote:

@Roasbeef https://github.com/Roasbeef I'm having the same problem,
maybe we should reopen the issue? For me, neither Google nor Cloudfare
working. There might be something wrong with nodes.lightning.directory
DNS config as @edspiner https://github.com/edspiner pointed.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/lightningnetwork/lnd/issues/1463#issuecomment-413742962,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA87LnHextReXEoxwoMyIP8C9T59u8xzks5uRjAzgaJpZM4U7l5q
.

Hmm, DNS still isn't working, but LND self-healed somehow. Is the lightning.directory domain name a SPOF for the whole lightning network?

hmm this still doesn't work with cloudflare dns(1.1.1.1) for some reason.. google dns is fine.. Can we all look into this

@KAMEHOB yes, that's true – again, both aren't working:

$ dig soa lightning.directory @1.1.1.1

; <<>> DiG 9.10.6 <<>> soa lightning.directory @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 50764
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1452
;; QUESTION SECTION:
;lightning.directory.       IN  SOA

;; Query time: 641 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Wed Dec 19 08:10:58 +07 2018
;; MSG SIZE  rcvd: 48

$ dig soa lightning.directory @8.8.8.8

; <<>> DiG 9.10.6 <<>> soa lightning.directory @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 3571
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;lightning.directory.       IN  SOA

;; Query time: 454 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Wed Dec 19 08:11:07 +07 2018
;; MSG SIZE  rcvd: 48

But my LND is ok now, maybe it's because it has peers cached.

SOA records are missing, but A records are there

root@blockchain:~# dig a nodes.lightning.directory @1.1.1.1

; <<>> DiG 9.11.3-1ubuntu1.3-Ubuntu <<>> a nodes.lightning.directory @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17174
;; flags: qr rd ra; QUERY: 1, ANSWER: 24, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1452
;; QUESTION SECTION:
;nodes.lightning.directory.     IN      A

;; ANSWER SECTION:
nodes.lightning.directory. 60   IN      A       35.207.7.46
nodes.lightning.directory. 60   IN      A       35.231.22.37
nodes.lightning.directory. 60   IN      A       35.231.50.143
nodes.lightning.directory. 60   IN      A       46.4.18.160
nodes.lightning.directory. 60   IN      A       73.140.245.30
nodes.lightning.directory. 60   IN      A       82.197.218.97
nodes.lightning.directory. 60   IN      A       83.162.151.227
nodes.lightning.directory. 60   IN      A       85.25.255.147
nodes.lightning.directory. 60   IN      A       86.91.55.71
nodes.lightning.directory. 60   IN      A       88.99.4.70
nodes.lightning.directory. 60   IN      A       88.99.36.224
nodes.lightning.directory. 60   IN      A       94.15.129.171
nodes.lightning.directory. 60   IN      A       98.29.202.246
nodes.lightning.directory. 60   IN      A       98.103.37.42
nodes.lightning.directory. 60   IN      A       104.41.141.41
nodes.lightning.directory. 60   IN      A       172.81.182.233
nodes.lightning.directory. 60   IN      A       178.62.237.239
nodes.lightning.directory. 60   IN      A       178.248.200.126
nodes.lightning.directory. 60   IN      A       184.164.175.135
nodes.lightning.directory. 60   IN      A       206.81.4.103
nodes.lightning.directory. 60   IN      A       213.174.156.78
nodes.lightning.directory. 60   IN      A       13.90.192.114
nodes.lightning.directory. 60   IN      A       35.188.204.213
nodes.lightning.directory. 60   IN      A       35.196.22.24

;; Query time: 137 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Wed Dec 19 01:26:56 UTC 2018
;; MSG SIZE  rcvd: 438

However I was receiving "i/o timeout" errors on 1.1.1.1 . Probably we can raise the time allowed for the DNS to resolve if there is such feature at all.

This issue still occurs with Cloudflare DNS

I am currently getting similar errors trying to launch lnd, after upgrading from 0.9.2 to 0.10.0-beta.rc5. Right now I can't resolve nodes.lightning.directory at any DNS server I tried (including Google and CloudFlare public DNS). Launching lnd --nobootstrap doesn't help; it still tries to resolve the name and exits.

I can't resolve nodes.lightning.directory either, looks like the service is down. cc @Roasbeef

I rechecked my config file and found tor.dns=nodes.lightning.directory was what was causing the attempt to resolve that on startup. I don't know where I got that from, but the latest documentation suggests using tor.dns=soa.nodes.lightning.directory:53, and changing it to that resolves the issue. So possibly it's not a problem that nodes.lightning.directory doesn't resolve.

It seems that either something changed with the configuration of the bootstrap servers recently such that nodes.lightning.directory no longer responds on tcp/53, or maybe something changed between 0.9 and 0.10 that caused this to be a fatal error when it wasn't before.

Was this page helpful?
0 / 5 - 0 ratings