Riot: netdev: iolist not checked for empty elements on send

Created on 12 Mar 2019  路  40Comments  路  Source: RIOT-OS/RIOT

Description

It is not possible to send an UDP packet with an empty payload via an at86rf2xx-shipping board (considering #11161 is merged), as an assertion in the device driver is hit. I was able to find the same problem in cc2420 but did not check any more devices.

_Edit: the discussion came to the conclusion that (3) should be implemented (see Tracking list)_

I am raising this issue instead of fixing it, since I'm not sure where to fix it exactly, as I see three possible solutions. The assert (see "Actual results" section) itself is correct, so the deepest level to fix this would be to add a check for len in at86rf2xx_sram_write() (1).

https://github.com/RIOT-OS/RIOT/blob/b50ad9ed4cecf367f05e009f63b8fa5b5cc0b06e/drivers/at86rf2xx/at86rf2xx_internal.c#L71-L81

or at86rf2xx_tx_load() (2) if we just want to solve this for sending specifically by checking there for len

https://github.com/RIOT-OS/RIOT/blob/b50ad9ed4cecf367f05e009f63b8fa5b5cc0b06e/drivers/at86rf2xx/at86rf2xx.c#L158-L164

However, one typically knows what they are doing when they are sending with the device driver directly, so maybe we rather want to fix the netdev interface handling of the device(s) by checking in this loop for iol->iol_len == 0 (3).

https://github.com/RIOT-OS/RIOT/blob/b50ad9ed4cecf367f05e009f63b8fa5b5cc0b06e/drivers/at86rf2xx/at86rf2xx_netdev.c#L102-L110

Lastly, we could just fix it in GNRC, which would be the least intrusive fix, as we only need to fix the gnrc_pktsnip_t to iolist_t conversion instead of checking all device drivers somewhere here (4)

https://github.com/RIOT-OS/RIOT/blob/b50ad9ed4cecf367f05e009f63b8fa5b5cc0b06e/sys/net/gnrc/netif/ieee802154/gnrc_netif_ieee802154.c#L265-L290

(which would however remove the advantage of mapping the gnrc_pktsnip_t list directly to iolist_t).

Steps to reproduce the issue

Merge #11161 and (was merged) try to send a UDP packet with empty payload using gnrc_networking on samr21-xpro (with other boards the actual result might differ):

udp send ff02::1 1337 ""

Expected results

The packet is sent and can be received by another radio (or sniffed with a sniffer).

Actual results

The node aborts on a failed assertion in this line

https://github.com/RIOT-OS/RIOT/blob/b50ad9ed4cecf367f05e009f63b8fa5b5cc0b06e/cpu/sam0_common/periph/spi.c#L162

Versions

Current master (4af354852b555de41cb55b7c820c30a6c29a5a55) with #11161 merged.

Tracking

To implement (3) and close this issue the following tasks need to be completed:

  • [x] adapt and amend to doc of netdev_driver_t::_send #11195
  • [x] fix the device drivers

    • [x] [cc2538_rf](https://github.com/RIOT-OS/RIOT/blob/4af354852b555de41cb55b7c820c30a6c29a5a55/cpu/cc2538/radio/cc2538_rf_netdev.c#L274-L284) @kYc0o not affected https://github.com/RIOT-OS/RIOT/issues/11163#issuecomment-478632370

    • [x] [esp32/esp-eth](https://github.com/RIOT-OS/RIOT/blob/4af354852b555de41cb55b7c820c30a6c29a5a55/cpu/esp32/esp-eth/esp_eth_netdev.c#L198-L205) @gschorcht #11186

    • [x] [esp32/esp-wifi](https://github.com/RIOT-OS/RIOT/blob/4af354852b555de41cb55b7c820c30a6c29a5a55/cpu/esp32/esp-wifi/esp_wifi_netdev.c#L296-L303) @gschorcht #11184

    • [x] [esp8266/esp-wifi](https://github.com/RIOT-OS/RIOT/blob/master/cpu/esp8266/esp-wifi/esp_wifi_netdev.c#L421-L426); not affected as it copies the data byte-by-byte (@gschorcht)

    • [x] [esp-now](https://github.com/RIOT-OS/RIOT/blob/4af354852b555de41cb55b7c820c30a6c29a5a55/cpu/esp_common/esp-now/esp_now_netdev.c#L649-L661) @gschorcht, also should check if iol_base is a NULL pointer here #11187

    • [x] netdev_tap; not affected since it is wrapping writev()

    • [x] socket_zep; not affected since it is wrapping writev()

    • [x] [nrf802154](https://github.com/RIOT-OS/RIOT/blob/4af354852b555de41cb55b7c820c30a6c29a5a55/cpu/nrf52/radio/nrf802154/nrf802154.c#L265-L273) @bergzand #11176

    • [x] [nrfble](https://github.com/RIOT-OS/RIOT/blob/4af354852b555de41cb55b7c820c30a6c29a5a55/cpu/nrf5x_common/radio/nrfble/nrfble.c#L277); not affected, see https://github.com/RIOT-OS/RIOT/issues/11163#issuecomment-473216806

    • [x] [nrfmin](https://github.com/RIOT-OS/RIOT/blob/4af354852b555de41cb55b7c820c30a6c29a5a55/cpu/nrf5x_common/radio/nrfmin/nrfmin.c#L330-L337) @haukepetersen -> #11194 shown to be working without problems in current master

    • [x] [at86rf2xx](https://github.com/RIOT-OS/RIOT/blob/4af354852b555de41cb55b7c820c30a6c29a5a55/drivers/at86rf2xx/at86rf2xx_netdev.c#L102-L110) @miri64 #11196

    • [x] cc110x; not affected according to https://github.com/RIOT-OS/RIOT/issues/11163#issuecomment-472449069

    • [x] [cc2420](https://github.com/RIOT-OS/RIOT/blob/4af354852b555de41cb55b7c820c30a6c29a5a55/drivers/cc2420/cc2420.c#L122-L137) @kYc0o https://github.com/RIOT-OS/RIOT/pull/11331

    • [x] [enc28j60](https://github.com/RIOT-OS/RIOT/blob/4af354852b555de41cb55b7c820c30a6c29a5a55/drivers/enc28j60/enc28j60.c#L291-L294) @maribu https://github.com/RIOT-OS/RIOT/pull/11214

    • [x] [encx24j600](https://github.com/RIOT-OS/RIOT/blob/4af354852b555de41cb55b7c820c30a6c29a5a55/drivers/encx24j600/encx24j600.c#L296-L299) @kaspar030, @maribu https://github.com/RIOT-OS/RIOT/pull/11325

    • [x] ethos; not affected as it copies the data byte-by-byte, using iol_len for the length check

    • [x] [kw2xrf](https://github.com/RIOT-OS/RIOT/blob/4af354852b555de41cb55b7c820c30a6c29a5a55/drivers/kw2xrf/kw2xrf_netdev.c#L145-L153) @PeterKietzmann -> NOT AFFECTED https://github.com/RIOT-OS/RIOT/issues/11163#issuecomment-478569711

    • [x] [mrf24j40](https://github.com/RIOT-OS/RIOT/blob/4af354852b555de41cb55b7c820c30a6c29a5a55/drivers/mrf24j40/mrf24j40_netdev.c#L73-L88) @bergzand -> #11321

    • [x] slipdev; not affected as it copies the data byte-by-byte, using iol_len for the length check

    • [x] [sx127x](https://github.com/RIOT-OS/RIOT/blob/4af354852b555de41cb55b7c820c30a6c29a5a55/drivers/sx127x/sx127x_netdev.c#L79-L81) @jia200x -> #11215

    • [x] [w5100](https://github.com/RIOT-OS/RIOT/blob/4af354852b555de41cb55b7c820c30a6c29a5a55/drivers/w5100/w5100.c#L213-L217) @maribu; not affected, see https://github.com/RIOT-OS/RIOT/issues/11163#issuecomment-474748136

    • [x] [xbee](https://github.com/RIOT-OS/RIOT/blob/4af354852b555de41cb55b7c820c30a6c29a5a55/drivers/xbee/xbee.c#L666-L668); (checksum calculation not affected) @kYc0o https://github.com/RIOT-OS/RIOT/pull/11332

drivers network bug tracking

All 40 comments

Just talked with @kaspar030 and he thinks (3) is probably the most sensible fix.

Interesting bug. :wink:

I think we should rule out having to "fix" the iolists within gnrc. It would basically require every sent iolist to be traversed one more time.

I'd say that (1) and (2) are internal APIs, so the can simply expect len != 0, as every caller of that API is "under our control" and can be checked and fixed by us. But I'd say (3) is a global API and should be prepared to be called with list elements that have len == 0. (Keeping in mind that this case is something that will only happen under rare circumstances, someone using that API will unlikely test for that. Treating len == 0 gracefully there can potentially prevent a lot of pain and debugging when other code than GNRC uses the driver.)

Without thinking about the implications for a fix, I don't think that it makes sense if an iolist element calls any tx_load or whatever SPI triggering function if len is zero. However, IMO it's reasonable to check len under the user API so (3) would be the most sensible solution here from my perspective.

+1 for (3). As len is probably already in a register, that shouldn't add much in terms of number of instructions in that loop.

Another +1 for solution (3) here for the reasons mentioned above.

Ok, then let's convert this into a tracking issue. I am doing it right now.

@miri64: Feel free to just dump a list of network device drivers and assign people to each device to check and if needed to fix the issue. cc110x is not affected, I can easily check/fix w5100 and en28j60 as I have the hardware in reach.

That was what I was planning ;-).

mrf24j40 is affected too, similar construct and fix as the at86rf2xx.

Ok, tracking list is done. Please check if you can find the time to fix the device I assigned to you.

cc110x is not affected

Well it does a weird thing (as nrfble does) where it just takes the first packet snip for sending. So I'm not too sure about that.

cc110x uses a temporary buffer, as the on-device buffer is only 64 Bytes in size. The iolist is memcpyed into that buffer and memcpy is fine with a size of 0.

cc110x uses a temporary buffer, as the on-device buffer is only 64 Bytes in size. The iolist is memcpyed into that buffer and memcpy is fine with a size of 0.

:+1: memcpy() with size 0 seems to be undefined behavior with some compilers though (see PRs that spun out of https://github.com/RIOT-OS/RIOT/pull/10782).

Hm, my guess was also that memcpy called with a size of 0 does nothing.

Event though memcpy is probabely implemented in a way like

void *memcpy(void *_dst, const void *_src, size_t len)
{
    unsigned char *dst = _dst;
    const unsigned char *src = _src;
    while(len-- > 0)
        *dst++ = *src++;
    return _dst;
}

the C99 standard says in clause _7.21.1 String handling _ sentence (2):

"Where an argument declared as size_t n specifies the length of the array for a function, n can have the value zero on a call to that function. Unless explicitly stated otherwise in the description of a particular function in this subclause, pointer arguments on such a call shall still have valid values, as described in 7.1.4. On such a call, ..., and a function that copies characters copies zero characters."

For memcpy there is no explicit statement :worried: so that the pointers should be valid. The question is whether iol->iol_base is valid or simply NULL.

Please only mark the task as done, when the corresponding PR is merged. Otherwise, it will only lead to confusion.

nrf_ble is not effected by this issue. Its send call is encapsulated by netdev_ble_send(netdev_t *dev, netdev_ble_pkt_t *pkt), which makes sure that there is always a valid iotlist element passed.

nrfmin fix is PRed.

I will post tomorrow the sx127x fix

The w5100 is not affected (I checked both the code and tested to be sure). The code wastes a bunch of CPU cycles that could be avoided by handling the len == 0 case specifically. But I think this actually the right thing to do, as increasing ROM size to optimize for a special case that never occurs under normal circumstances looks like a bad deal to me. (Provided that the special case is only treated inefficient and not incorrect.)

fix for sx127x provided in #11215

Ping? We should aim to fix this until feature freeze.

Accidentally closed automatically via #11215

@PeterKietzmann Looking at this bit I think the kw2xrf driver is unaffected, but haven't confirmed yet.

I can confirm that the kw2xrf is not affected, as indicated by @bergzand. Tested with gnrc_networking on a pba-d-01-kw2x and udp send ff02::1 1337 "". There is no failed assertion here. Furthermore, I set up a UDP server on an other node which received the UDP packet without payload

# PKTDUMP: data received:
# ~~ SNIP  0 - size:   0 byte, type: NETTYPE_UNDEF (0)
# 00000000~~ SNIP  1 - size:   8 byte, type: NETTYPE_UDP (4)
#    src-port:  1337  dst-port:  1337
#    length: 8  cksum: 0xa8d5
# ~~ SNIP  2 - size:  40 byte, type: NETTYPE_IPV6 (2)
# traffic class: 0x00 (ECN: 0x0, DSCP: 0x00)
# flow label: 0x00000
# length: 8  next header: 17  hop limit: 64
# source address: fe80::4bf0:6d4f:52a8:432a
# destination address: ff02::1
# ~~ SNIP  3 - size:  18 byte, type: NETTYPE_NETIF (-1)
# if_pid: 7  rssi: -81  lqi: 255
# flags: BROADCAST 
# src_l2addr: 49:F0:6D:4F:52:A8:43:2A
# dst_l2addr: FF:FF
# ~~ PKT    -  4 snips, total size:  66 byte

@kaspar030 @kYc0o do you have time to at least check if the devices I assigned you above are affected?

@kaspar030, @miri64: encx24j600 is affected, I opened an untested PR to fix it

@miri64, @kYc0o: cc2538_rf is unaffected, it calls rfcore_write_fifo() which looks like this:

https://github.com/RIOT-OS/RIOT/blob/200ce496389146f9def0f4982c11f6b2de6bb3aa/cpu/cc2538/radio/cc2538_rf_internal.c#L115-L124

This should be just fine with len == 0

cc2420 is affected. cc2420_fifo_write() calls spi_transfer_regs().

@miri64 according to https://github.com/RIOT-OS/RIOT/blob/299a1903e63bebc657e36104edb780cd6b3b7155/drivers/cc2420/cc2420.c#L115-L116 pkt_len cannot bet 0 if I'm not mistaken. Otherwise I can provide a patch to check for iol->iol_len as agreed.

For the cc2538 the FIFO apparently accepts lengths of value 0:

https://github.com/RIOT-OS/RIOT/blob/299a1903e63bebc657e36104edb780cd6b3b7155/cpu/cc2538/radio/cc2538_rf_netdev.c#L269-L272

cc2420 is affected. cc2420_fifo_write() calls spi_transfer_regs().

According to the driver pkt_len is always at least 2 bytes long, so whenever the netdev layer is used there will be no calls to spi_transfer_regs() of length=0. Am I missing something maybe?

This loop is run in cc2420_tx_prepare() called by cc2420_send called by netdev_driver_t::send():

https://github.com/RIOT-OS/RIOT/blob/200ce496389146f9def0f4982c11f6b2de6bb3aa/drivers/cc2420/cc2420.c#L135-L137

If any iolist entry has length zero, spi_transfer_regs() will be called with len == 0 by cc2420_fifo_write()

Ok I'll provide a PR with the fix.

See #11331.

Great! Now only the remains xbee is unclear.

The doc of uart_write() doesn't mention anything about len == 0 so I would rather think that we can't assume that setting len to 0 is safe.

See #11332 for xbee. It's not a fix since we don't know if it's safe, but I'd say better to have it. I've checked some UART drivers and it seems none would write a 0 length byte to UART, I didn't check them all though.

Since nrfmin works in current master without #11194, I closed that PR and since all devices are now confirmed to be working for empty payloads, we can close the issue as well.

Thanks everybody for the help!

Good job guys, fixing ~20 network drivers in relatively short time!

And to not have to do this again, I amended the testing procedures to the release-specs https://github.com/RIOT-OS/Release-Specs/pull/112

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nikosft picture nikosft  路  6Comments

miri64 picture miri64  路  3Comments

pietrotedeschi picture pietrotedeschi  路  4Comments

chrysn picture chrysn  路  5Comments

hcnhcn012 picture hcnhcn012  路  5Comments