Zephyr: LwM2M: UDP local port setting not obeyed, random port doesn't work

Created on 11 May 2018  路  32Comments  路  Source: zephyrproject-rtos/zephyr

I'm running into a number of fancy issues with LwM2M on a Nucleo-F429ZI board connected via Ethernet.

_If_ the app is actually capable of sending out packets over the network I'm getting the following output:

[lib/lwm2m_rd_client] [INF] sm_do_init: RD Client started with endpoint 'nucleo_f429zi' and client lifetime 30
[lib/lwm2m_rd_client] [DBG] sm_send_registration: registration sent [212.18.24.70]
[lib/lwm2m_engine] [DBG] retransmit_request: Resending message: 0x20004024
[lib/lwm2m_engine] [DBG] retransmit_request: Resending message: 0x20004024
[lib/lwm2m_engine] [DBG] retransmit_request: Resending message: 0x20004024
[lib/lwm2m_engine] [DBG] retransmit_request: Resending message: 0x20004024
[lib/lwm2m_rd_client] [WRN] do_registration_timeout_cb: Registration Timeout

Further investigation with a mirrored port and wireshark shows that the packet is actually sent out, and ACKed by the server but the source IP chosen by zephyr is actually random instead of the configured port via CONFIG_LWM2M_LOCAL_PORT=5683 and for some reason Zephyr doesn't route the reply sent to the chosen source port back to the receive function to the app but generates a ICMP message "Destination unreachable".

Networking Networking Clients bug medium

Most helpful comment

I have a rather large LwM2M related patch queue, but over the next 30 patches DNS support is being added to the example.

All 32 comments

Hello @therealprof, I don't have a Nucleo-F429ZI board, but I do have a FRDM-K64F which I can use to dig into the issue you're describing (it also has an ethernet port). Let me take a look and get back to you.

It would be great to see the output of the following Zephyr shell commands:
net iface
net conn

I can confirm that CONFIG_LWM2M_LOCAL_PORT=5683 is completely ignored at the moment. It looks like the code that used to set the local port was re-written at some point and this config was never updated.

However, after adjusting the sample's settings to disable IPv6 and point at a local Leshan server instance, my FRDM-K64F connects just fine using the random port assigned.

Here's the prj.conf diff:
https://hastebin.com/cuquwewohu.diff

And working log from my device:
https://hastebin.com/tajesayeda.log

That being said, I'm looking into a patch to restore the CONFIG_LWM2M_LOCAL_PORT functionality.

Hm, weird. I also had to disable IPv6 because it had other issues. Also IPv4 doesn't work reliably and depends on how quickly a DHCP address can be obtained...

Let me try to obtain the other information you're looking for...

shell> net conn
     Context    Iface         Flags Local               Remote
[ 1] 0x2000365c 0x20011280    4DU   192.168.11.81:65297 212.18.24.70:5683
shell> net if[lib/lwm2m_engine] [DBG] retransmit_request: Resending message: 0x20004024
ace
Hostname: zephyr


Interface 0x20011280 (Ethernet) [0]
===================================
Link addr : 00:80:E1:2C:02:DE
MTU       : 1500
IPv4 unicast addresses (max 5):
        192.168.11.81 DHCP preferred
IPv4 multicast addresses (max 5):
        <none>
IPv4 gateway : 192.168.11.200
IPv4 netmask : 255.255.255.0
DHCPv4 lease time : 72000
DHCPv4 renew time : 36000
DHCPv4 server     : 192.168.11.240
DHCPv4 requested  : 192.168.11.81
DHCPv4 state      : bound
DHCPv4 attempts   : 1
shell>

IPv6 has worked for me using QEMU whenever I've tested it. What issues were you seeing? The reason I disabled it above was that I've had to disable IPv6 on my home network for a while as it was causing issues with another workstation.

@mike-scott Outgoing ICMP only works sporadically, i.e. ping replies are sometimes not recognised while the other way around works nicely. And routed connections in general don't seem to work, only IPs within the same prefix discovered using neighbour discovery. Oh, and trying to ping local scoped addresses sends the shell off to an endless loop or something.

@therealprof what kind of LwM2M server are you connecting to? Normally we test with Leshan or Wakaama.

Also, to enable routing outside the local network w/o DHCP, add this config:
CONFIG_NET_APP_MY_IPV4_GW="{your gateway IP}"

See above for corrected config value

@mike-scott

what kind of LwM2M server are you connecting to? Normally we test with Leshan or Wakaama.

A completely new LwM2M server implementation written from scratch for my company. The server side works nicely, but the client side is a bit lacking so I'm trying to grab any straw I can find to throw at the server. ;)

Routing works just fine for IPv4 with DHCP, IPv6 with SLAAC seems to be broken in that regard.

Your server is returning 1 option value for COAP_OPTION_LOCATION_PATH:
[0] rd/95718949a6c440118201ccae40c81d61

Technically this is incorrect. Each portion of the path needs it's own option. So it should be returning:
[0] rd
[1] 95718949a6c440118201ccae40c81d61

Also the coap end point value of "95718949a6c440118201ccae40c81d61" is larger than the CoAP default option length of 12. In Zephyr we need to enable values longer than 12 by adding the following 2 configs in the prj.conf (in your case the value is 32 chars but I'm using 36 just to be safe):
CONFIG_COAP_EXTENDED_OPTIONS_LEN=y
CONFIG_COAP_EXTENDED_OPTIONS_LEN_VALUE=36

Yes, DHCP would be setting the network gateway value of the net iface. I'll make a note to see how SLAAC is setting this.

@mike-scott

Technically this is incorrect. Each portion of the path needs it's own option. So it should be returning:

Thanks a lot for debugging our software. ;) I'll look into it but can't change this right away since this sever is at the end of a longish CD chain...

Also the coap end point value of "95718949a6c440118201ccae40c81d61" is larger than the CoAP default option length of 12. In Zephyr we need to enable values longer than 12 by adding the following 2 configs in the prj.conf (in your case the value is 32 chars but I'm using 36 just to be safe):
CONFIG_COAP_EXTENDED_OPTIONS_LEN=y
CONFIG_COAP_EXTENDED_OPTIONS_LEN_VALUE=36

Okay, I'll test this but my gut tells me this is likely not related to Zephyr not accepting the reply on the port it was sent out from.

While you're testing, you can also add the following 2 configs to prj.conf to enable ALL of the sys log messages for the net subsys:
CONFIG_NET_LOG_GLOBAL=y
CONFIG_SYS_LOG_DEFAULT_LEVEL=4

Be warned that this will significantly increase the size of the binary and the output from UART.

A full log example of my client connecting to your server:
https://hastebin.com/saxugibilu.go

Note line 30: Check UDP listener for pkt 0x20004a48 src port 5683 dst port 60166 family 29

This mentions the incoming port which should be the same as the randomly assigned outgoing port.

O_O The picture is getting clearer... I think this is what is happening:

  • Zephyr boots up
  • Network is initialised with default IP 192.168.0.2
  • LwM2M registers UDP bindings:
    [net/ctx] [DBG] net_context_bind: (0x20005e00): Context 0x20003688 binding to UDP 192.0.2.1:57083 iface 0x20011920 [net/conn] [DBG] net_conn_register: (0x20005e00): [1/2/17/0x33] remote 0x20003698/212.18.24.70/5683 local 0x2000fdc0/192.0.2.1/57083 cb 0x08006459 ud 0x20000000
  • LwM2M tries to send packets to destination, all of them failing
  • Meanwhile system does DHCP to obtain usable address
  • At some point LwM2M sends out packets using working connection but without rebinding the context first, so when reply packet comes in, it doesn't match any binding:
    [net/ipv4] [DBG] net_ipv4_process_pkt: (0x20003e08): IPv4 packet received from 212.18.24.70 to 192.168.11.79 [net/conn] [DBG] net_conn_input: (0x20003e08): Check UDP listener for pkt 0x200069e0 src port 5683 dst port 57083 family 2 chksum 0xc0d2 len 69 [net/conn] [DBG] net_conn_input: (0x20003e08): No match found. [net/icmpv4] [DBG] net_icmpv4_send_error: (0x20003e08): Sending ICMPv4 Error Message type 3 code 3 from 192.168.11.79 to 212.18.24.70

It seems to be possible to bind to any IP, at least that's what the DHCP client does:

[net/ctx] [DBG] net_context_bind: (0x20003e08): Context 0x200036d0 binding to UDP 0.0.0.0:49262 iface 0x20011920

Maybe another option would be to listen to configuration changes and rebind, also a reregistration of the LwM2M client would make a lot of sense since the UDP connection has become unusable at that point.

I agree something is fishy with the DHCP flow. Can you temporarily swap to a static IP and continue testing?

Temporarily for kicks here at home: sure. Busy with some other stuff at the moment, though. ;)

I was thinking that would unblock you for testing your LwM2M server.

Not really, can't use static IP addresses anywhere at work. For some real testing I'd even need DNS lookups to work and preferably IPv6, too.

Can you try commenting the following in prj.conf:
CONFIG_NET_APP_MY_IPV4_ADDR="192.0.2.1"

That should stop the initial bind from happening till DHCP is set.

I have a rather large LwM2M related patch queue, but over the next 30 patches DNS support is being added to the example.

Fantastic. Removing CONFIG_NET_APP_MY_IPV4_ADDR makes the whole thing work with DHCP.

Created a new issue to track DHCP bug. This issue can track rhe fix to local port setting.

@jukkar How do you recommend setting the local port via the net_app_init_udp_client() function?

A call to net_app_init_udp_client() with a client_addr that has the port set requires a valid sa_family that will later be checked against the remote_addr->sa_family (after parsing). The remote_addr sa_family can't really be predicted prior to parsing.. so it's unlikely that client_addr check will match unless we add some handling for letting the sa_family of client_addr be AF_UNSPEC.

And the bind_local() call later in the function binds to the local port so I can't change it after the net_app_init_udp_client() call.

@jukkar Going a bit further, the only thing that's used from the client_addr passed into net_app_init_client() is the sa_family. Everything else is ignored. Setting the port is currently not supported.

@jukkar This is the best I could come up with on the fly to set the port during a call to net_app_init_udp_client():
https://hastebin.com/hicafaqaso.php

Note that the first 2 changes in subsys/net/lib/app/client.c are a bugfix where the port isn't converted back to host byte-order when passing to _net_app_set_local_addr() where it's later converted back to network byte-order.

I can submit this as a PR if you're OK with the approach. API documentation in include/net/net_app.h probably needs to be updated as well... client_addr == "@param client_addr Local address of the client. If set to NULL, then the API will figure out a proper address where to bind the context."

When it really needs to say something like:
@param client_addr Local family and port of the client. If set to NULL, then the API will use a random port to bind the context.

Hrm, makes me wonder if we need the client_addr param at all.. no one uses it in Zephyr. I checked every instance of:
net_app_init_client
net_app_init_udp_client
net_app_init_tcp_client

The client_addr param is always NULL.

Seems like we could replace it with local_port (u16_t) param. Here's what the patch would look like for that:
https://hastebin.com/ogemopocuz.m
But I'm not sure we want to change the netapp API at this point.

@mike-scott

Hrm, makes me wonder if we need the client_addr param at all.. no one uses it in Zephyr.

That's the outgoing address, right? Where as the listening address has only limited uses in embedded as mentioned in #7500, the outgoing address might be a whole lot more useful in can the device has multiple interfaces and/or addresses and you want to control exactly which one to use. I'm not surprised at all that no example in Zephyr uses that feature since it constrains the use even more to very specific setups which is not that of a great idea.

However this may be crucial to have in custom applications so I'd be very much in favour of keeping that feature.

@jukkar This is the best I could come up with on the fly to set the port during a call to net_app_init_udp_client():
https://hastebin.com/hicafaqaso.php

The first proposal looks sane to me, so please send a PR for that one. I do not think it makes sense to convert the client_addr to port like you do in the second proposal as then that would be an API change.

As you noticed the client_addr is always NULL which imho makes more sense than to have user to set the source address. Anyway, if needed the client_addr could be set, but of course this thing was not properly tested as we see now.

Was this page helpful?
0 / 5 - 0 ratings