Gluon: package:gluon-wan-dnsmasq secondary dnsmasq (the one on port 54) is not starting automatically on rpi

Created on 23 Oct 2017  路  13Comments  路  Source: freifunk-gluon/gluon

normally there should be running two instances of dnsmasq on every gluon node. One listening on standard port 53 and handling all dns-request by the clients and the other one running exclusively on 127.0.0.1:54 being used for e.g. resolving the hostname of a vpn-server. I have a raspberrypi v1 here running on LEDE Reboot 17.01-SNAPSHOT r3535+34-ee32de4426 and ps | grep dnsmasq gives only one instance. starting /usr/sbin/dnsmasq -x /var/run/gluon-wan-dnsmasq.pid -u root -i lo -p 54 -h -r /var/gluon/wan-dnsmasq/resolv.conf manually does work and after this resolving gateway hostnames also does. But for some reason the instance, unlike its primary companion, is not started automatically. I have no idea how to find that reason, but maybe someone else has?

bug

All 13 comments

Please provide the output of the logread command.

  • Which Gluon release do you use? (look at /lib/gluon/gluon-version)
  • Does starting gluon-wan-dnsmasq using /etc/init.d/gluon-wan-dnsmasq start work?

normally there should be running two instances of dnsmasq on every gluon node.
that is not the "normal" configuration.

This happend only if you enable the dns cache. This is feature has been added by Gluon 2017.1.x.
And as far as most nodes/domains, this feature is still turned off.

I assume, there is some kind of issue with the rpi image, since a very similar issue popped up 2 days ago for "ddorf":

https://forum.freifunk.net/t/raspberry-pi-2-keine-verbindung-fehlersuche/15843

(There it lacks the logread output since yesterday, so it's hard to tell, what's really happening. a verification with a "well stablished" TP-link would help to falsify issues on the WAN part of the environment like the home gateway on that site.)

concerning the "dnsmasq-wan not starting": i observered this issue while trying to backport it to 2016.2 (due to the different CVSs), but i attributed that to my scripting and fixed it by using the "old" startscript from 2016.2.

What @Adorfer writes is incorrect. There are always two dnsmasq instances, one for WAN, and one for everything else. The main instance is used by the node, but inaccessible for clients when the DNS cache is disabled.

preamble: sorry for letting you wait. my browser tab did not update properly and i did not see your comments until i posted this very one.

i tried to track this down. so i did logread | grep dnsmasq right after reboot. Et voila:
daemon.notice procd: /etc/rc.d/S60gluon-wan-dnsmasq: Segmentation fault
running /etc/rc.d/S60gluon-wan-dnsmasq start manually segfaults, too.

yes, /etc/init.d/gluon-wan-dnsmasq start also gives Segmentation fault

cat /lib/gluon/gluon-version => v2017.1.3

could the problem be, that there was no DNS-Cache related entry in our site.conf, when the images was built?

This is a segfault in the init script, i.e. busybox. Try the following commands:

cd /tmp
ulimit -c unlimited
/etc/init.d/gluon-wan-dnsmasq start

Please provide the core file that should be generated in /tmp.

dnsmasq.1508720990.7117.11.core.gz
md5sum of decompressed file should be 6945bc0d652e2a09066111bd4d0453b8

Ah, it is dnsmasq that is crashing after all, not busybox. I'll have a look at it.

@lrnzo Please try replacing /usr/lib/libpacketmark.so with the version at https://home.universe-factory.net/neoraider/libpacketmark.so and check if the segfault disappears.

With the version of libpacketmark.so you mentioned above, there is no more segfaulting :) logread says the gluon-wan-dnsmasq starts properly.

hi Neo,

just playing with a master (v2017.1-125-g207337b) containing the fix on an Archer C25 and this issue still seems to be there.

:~# /etc/init.d/gluon-wan-dnsmasq start
Segmentation fault

:~# logread
...
Fri Nov 3 13:54:07 2017 kern.info kernel: [ 913.075436] do_page_fault(): sending SIGSEGV to dnsmasq for invalid read access from 00000000
Fri Nov 3 13:54:07 2017 kern.info kernel: [ 913.084317] epc = 00000000 in dnsmasq[400000+21000]
Fri Nov 3 13:54:07 2017 kern.info kernel: [ 913.089417] ra = 77616493 in libpacketmark.so[77616000+10000]
Fri Nov 3 13:54:07 2017 kern.info kernel: [ 913.095452]

i'll attach the core dump.. thx ede

dnsmasq.1509714104.3576.11.core.gz

are you sure that your build is clean/fresh?
i'm reopening this issue to be safe

Well, the issue only affected ARM targets before. The new issue is closely related, but affects MIPS instead. I suspect a musl issue, but I'll have to investigate...

On 11/4/2017 21:03, Matthias Schiffer wrote:

Closed #1245 via 460830bea135001c039fb68f8ce069a1ba307001.

just to confirm. build and installed v2017.1-131-g460830b and the issue is gone. thx!.. ede

Was this page helpful?
0 / 5 - 0 ratings

Related issues

RalfJung picture RalfJung  路  5Comments

sargon picture sargon  路  4Comments

lephisto picture lephisto  路  5Comments

Nurtic-Vibe picture Nurtic-Vibe  路  5Comments

A-Kasper picture A-Kasper  路  4Comments