Nodemcu-firmware: mDNS stops after about 5 minutes

Created on 21 Mar 2016  路  11Comments  路  Source: nodemcu/nodemcu-firmware

Initially mDNS service works correctly.
The .local address is correctly resolved, information supplied can be correctly retrieved using dns-sd in Windows, and I see the traffic using Wireshark.
But after about 5 minutes (always), the mDNS service stops.

Is there anyone with the same experience?

Most helpful comment

It seems I found the solution.
Checking the mDNS traffic of a RaspberryPi, I found that with a TTL of 120, every 100 seconds there is a "MDNS standard query response". This was not the case with the current NodeMCU firmware. This would explain that after the initial TTL mDNS fails.
Going through the source code I came across mdns.c in lwip\core.
There is a "_loopback function for the multicast(224.0.0.251) messages received at port 5353_".
Below this line a call to mdns_enable() is commented out.
I removed the comments, recompiled and now I also see a "MDNS standard query response" for the NodeMCU node, about every 260 seconds with a TTL of 300.

I cannot judge why mdns_enable()) was disabled, but for now it seems to resolve this issue.

All 11 comments

Hmm. With dns-sd on windows, I see the service in the list even after a long time. Also running

ping fishtank.local

works (after a pause when it tries to do a AAAA lookup first)

My mdns setup is the example on the doc page.

Can you give an example list of commands that demonstrates the problem?

Actually it stopped working for me too, but after about an hour.... My suspicion is that it is a bug in the espressif sdk, but I'll see if I can figure out what is going on....

See http://bbs.espressif.com/viewtopic.php?t=1610 for someone else with the same problem -- but he claims it is solved.

I offer to restart mdns every n minutes as a solution with a timer.

My issue seems clearly linked to the reported espressif sdk issue reported in the link: mine does not respond after 5 minutes, which links to a TTL of 300 seconds.

As suggested I also tried to restart it with a timer. Not working reliably.
Works a couple of times, but then stops again.

Shouldn't the service not re-advertise after it's TTL?

It seems I found the solution.
Checking the mDNS traffic of a RaspberryPi, I found that with a TTL of 120, every 100 seconds there is a "MDNS standard query response". This was not the case with the current NodeMCU firmware. This would explain that after the initial TTL mDNS fails.
Going through the source code I came across mdns.c in lwip\core.
There is a "_loopback function for the multicast(224.0.0.251) messages received at port 5353_".
Below this line a call to mdns_enable() is commented out.
I removed the comments, recompiled and now I also see a "MDNS standard query response" for the NodeMCU node, about every 260 seconds with a TTL of 300.

I cannot judge why mdns_enable()) was disabled, but for now it seems to resolve this issue.

That is really weird. In the current dev branch, mdns_enable just does a udp_recv call. The line after the commented out call is a better udp_recv call (as it passes in the info structure).

I also realized that there is a potential memory overwrite as the code does not check that the DNS response packet fits into the allocated space. This can be overflowed if the user wants to send some large attributes.

I'm working on a complete fix.....

Unfortunately the fix didn't work.
But I see the following using Wireshark:

  • When calling mdns.register, my Windows PC registers the IP address of the .local logical address
  • When I perform a ping to the logical address, the PC translates it to the registered IP address.
  • This works until the TTL of the mDNS registration
  • When pinging after this period, a broadcast is sent to port 5353 of 224.0.0.251 to find the IP address again of the logical address
  • This is where NodeMCU should respond with its IP address, but doesn't.
    This is the message to which it should respond:
    Standard query 0x0000 A nodemcu.local, "QM" question AAAA nodemcu.local, "QM" question
  • When performing a dns-sd -B, I see that the NodeMCU does respond to the following message
    Standard query 0x0000 PTR _http._tcp.local, "QM" question
  • So the service is still responding to the UDP port

So for now my conclusion is that the firmware is not responding to the Standard query 0x0000 A message.

@FrankX0 This should be resolved by PR #1192. I'd be interested if you could test that and see if it works better for you.

This is a BIG improvement!
It now seems to work stable over time.
I guess there could be some improvements in the future but for now these are the results:

  • Windows 10 + Chrome: OK
  • Windows 10 + MS Edge: FAIL (cannot resolve the IP address)
  • Windows 10 + Firefox: OK
  • IOS 9 + Chrome: OK
  • IOS 9 + Safari: OK
  • IOS 9 + Firefox: OK

Thanks so far for your effort!

I have now put up a new PR -- #1197 which reorganizes the code. I also fixed another bug that service browsers wouldn't see the service advertised by the nodemcu.

Resolved by a merge.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

adamdyga picture adamdyga  路  4Comments

NicolSpies picture NicolSpies  路  6Comments

ShAzmoodeh picture ShAzmoodeh  路  6Comments

tibinoo picture tibinoo  路  5Comments

HHHartmann picture HHHartmann  路  7Comments