Riot: shell/ping6: Incorrect handling of unexpected pongs

Created on 13 May 2019  路  31Comments  路  Source: RIOT-OS/RIOT

Description

ping6 does check received pongs properly

Steps to reproduce the issue

  • Set up two devices A and B that can exchange messages
  • Modify device B to send multiple responses over some extended period of time (e.g. seconds)
  • On A: ping6 node B more than once
  • On A: ping6 node B and immediately afterwards ping6 some unreachable/unassigned IPv6 address

Expected results

  • ping6 ignores any pongs not matching a ping it send and filters out duplicates

Actual results

  • ping6 correctly filters out duplicates (it does print them marked as duplicates but ignores them for the statistics - which is very nice design choice)
  • ping6 treats every pong it receives as response (skipping duplicates):

    • The first of each duplicated pong matching a ping of the last invocation of ping6 is counted as a response.

    • Duplicated pongs from a previous invocation of ping6 are even reported even if the pinged address no longer matches

    • Because the unrelated pongs are counted as responses, ping6 might end after receiving 3 pongs without having send 3 pings

E.g. see the stats here:

> 2019-05-13 17:32:42,837 - INFO #  ping6 fe80::1114:7221:1336:102a
2019-05-13 17:32:42,845 - INFO # 12 bytes from fe80::1114:7221:1336:102a: icmp_seq=1 ttl=64 rssi=-61 dBm time=32.115 ms
2019-05-13 17:32:42,852 - INFO # 12 bytes from fe80::1114:7221:1336:102a: icmp_seq=0 ttl=64 rssi=-67 dBm time=7.957 ms
2019-05-13 17:32:42,861 - INFO # 12 bytes from fe80::1114:7221:1336:102a: icmp_seq=0 ttl=64 rssi=-61 dBm time=15.542 ms (DUP!)
2019-05-13 17:32:42,869 - INFO # 12 bytes from fe80::1114:7221:1336:102a: icmp_seq=0 ttl=64 rssi=-62 dBm time=23.820 ms (DUP!)
2019-05-13 17:32:42,877 - INFO # 12 bytes from fe80::1114:7221:1336:102a: icmp_seq=0 ttl=64 rssi=-62 dBm time=32.099 ms (DUP!)
2019-05-13 17:32:43,854 - INFO # 12 bytes from fe80::1114:7221:1336:102a: icmp_seq=1 ttl=64 rssi=-77 dBm time=9.959 ms
2019-05-13 17:32:43,855 - INFO # 
2019-05-13 17:32:43,859 - INFO # --- fe80::1114:7221:1336:102a PING statistics ---
2019-05-13 17:32:43,866 - INFO # 2 packets transmitted, 3 packets received, 3 duplicates, 2147483598% packet loss
2019-05-13 17:32:43,870 - INFO # round-trip min/avg/max = 7.957/20.248/32.115 ms

Versions

(At least) current master

network bug

Most helpful comment

Mhhh indeed luid_custom() does not do what I thought it does. However, I think I go for a different approach then you propose, since a different ID is also not guaranteed if luid is not provided, so I think we need a global counter.

All 31 comments

@miri64: Care to take a look?

Will do, but only after this week at the earliest. If it is pressing, feel free to fix it earlier.

There is code to prevent this happening but it relies on id being unique per each execution of ping6.

It's generating id with luid_custom() which does not return unique values:
https://github.com/RIOT-OS/RIOT/blob/6e9612f1d1e41ef56b79c1fcddc3d7e3aae2b9d7/sys/shell/commands/sc_gnrc_icmpv6_echo.c#L244

I replaced luid_custom() with luid_get() and it appears to have resolved the problem. Of course this still doesn't handle the case where MODULE_LUID is not available.

Mh... I don't really get the problem. If there are duplicates ping (even the iputils implementation) shows them. If you spoof echo replies they are shown as well, but there is no way to distinguish spoofs from real if you use the correct addresses.

  • On _A_: ping6 node _B_ and immediately afterwards ping6 some unreachable/unassigned IPv6 address

What does the latter accomplish? I should not get any responses, as there is no node to send them.

  • On A: ping6 node B and immediately afterwards ping6 some unreachable/unassigned IPv6 address

What does the latter accomplish? I should not get any responses, as there is no node to send them.

If B sends duplicates for some extended time, and A stops pinging B but starts pinging unavailable node C, A will receive the delayed responses of B. But as the sender address is not matching C, A should ignore the replies (or give a debug message). ping6 however takes the delayed responses into account and will stop pinging when 3 responses (regardless of sender address) are receiced. If it only send 2 requests in that time, it will also print 133% success ratio...

If B sends duplicates for some extended time, and A stops pinging B but starts pinging unavailable node C, A will receive the delayed responses of B. But as the sender address is not matching C, A should ignore the replies (or give a debug message). ping6 however takes the delayed responses into account and will stop pinging when 3 responses (regardless of sender address) are receiced. If it only send 2 requests in that time, it will also print 133% success ratio...

Ahhh, so the sender address not matching is the problem?

@maribu #11933 should fix the issue.

@maribu #11933 should fix the issue.

It should fix 2 but it wouldn't fix 1:

1) On A: ping6 node B more than once
1) On A: ping6 node B and immediately afterwards ping6 some unreachable/unassigned IPv6 address

2. On A: ping6 node B and immediately afterwards ping6 some unreachable/unassigned IPv6 address

Ok, but then I don't understand what is happening. If I ping an unreachable address, there should be no reply. As we now with #11933 check the source of the packet against the pinged address the second case should also be fixed. Setting the ID to a more node-specific one is in my eyes only patch-work to fix a sympton, but not the underlying issue (if two nodes have the same ID by chance the problem would still exist)

I mean the implementation of our ping is largely based on the inetutil one and the original ping implementation (they just use the PID of the process) and they don't have these kind of problems.

@maribu can you supply with the code of your modified node B?

Yes the second case (pinging two different nodes) is fixed by source address checking, but not the first case (pinging the same node twice).

I don't actually know what is the intended use of id or whether it's supposed to be unique per node or unique per invocation. But if we make it unique per invocation then it helps solve case 1:

  • invoke ping6 to send a ping to any node
  • invoke ping6 again to ping the same node
  • any delayed pongs from the first invocation will be counted by the second invocation. Maybe not everyone agrees that this is wrong behavior, but if we can prevent it I think we should.

they don't have these kind of problems

I agree and I think maybe it's worth noting that delayed pongs are required to trigger this issue, and Riot is probably the first OS I've encountered so many delayed pongs with, and that's maybe why it's an issue for us and not others. Usually it's just due to xtimer issues or other highly experimental stuff. In most normal cases and for most normal OSs I think it's likely that delayed pongs are so uncommon that this would be a nonissue.

Ok, sorry then I misunderstood you. id should change "enough" to be distinguishable between invocations. As far as I understand luid_custom() this should be the case, but luid might be wrong there. I'll investigate.

Mhhh indeed luid_custom() does not do what I thought it does. However, I think I go for a different approach then you propose, since a different ID is also not guaranteed if luid is not provided, so I think we need a global counter.

(will provide a separate fix tomorrow)

@maribu can you supply with the code of your modified node B?

I used unmodified RIOT master with one node using an at86rf2xx (802.15.4 breakout header for RPi by OpenLabs) and a nrf52840-dk. I did not investigate further yet what the problem is yet, so I cannot provide any details.

But your steps to reproduce say

  • Modify device _B_ to send multiple responses over some extended period of time (e.g. seconds)

(will provide a separate fix tomorrow)

See https://github.com/RIOT-OS/RIOT/pull/11938

But your steps to reproduce say

  • Modify device B to send multiple responses over some extended period of time (e.g. seconds)

Yes, and this is how I would reproduce the issue in the most generalized and controlled way ;-) I could provide a small program to trigger the issue using e.g. Linux RAW sockets. It would require node B to be a Linux host and node A to use e.g. Ethernet. But I won't have time for that until next week.

I think it should be possible to use a debugger to halt the CPU either during or after a ping reception and then unhalt it to release the pong at a controlled time.

I think it should be possible to use a debugger to halt the CPU either during or after a ping reception and then unhalt it to release the pong at a controlled time.

I did not test it that way, but whoever reviews #11938 could do so :-)

Apparently, just rewording the OP of the PR was not enough.

@benemorius @maribu please check if this is still an issue with #11933 and #11938 being merged.

Will do in a few hours hopefully. (Sorry for being so unresponsive. This week is super busy :-/)

Thanks for your fixes!

With current RIOT master sending pings using an nrf52840dk to a Nucleo-F767ZI (with disabled stm32-eth) supplied with an AT86RF233 (OpenLabs RPi 802.15.4 breakout board):

> ping6 fe80::1114:7221:1327:102a
2019-08-09 22:22:58,708 - INFO #  ping6 fe80::1114:7221:1327:102a
2019-08-09 22:22:58,730 - INFO # 12 bytes from fe80::1114:7221:1327:102a: icmp_seq=0 ttl=64 rssi=-45 dBm time=14.173 ms
2019-08-09 22:22:58,738 - INFO # 12 bytes from fe80::1114:7221:1327:102a: icmp_seq=0 ttl=64 rssi=-41 dBm time=21.842 ms (DUP!)
2019-08-09 22:22:58,746 - INFO # 12 bytes from fe80::1114:7221:1327:102a: icmp_seq=0 ttl=64 rssi=-41 dBm time=30.122 ms (DUP!)
2019-08-09 22:22:58,754 - INFO # 12 bytes from fe80::1114:7221:1327:102a: icmp_seq=0 ttl=64 rssi=-41 dBm time=38.404 ms (DUP!)
2019-08-09 22:22:59,731 - INFO # 12 bytes from fe80::1114:7221:1327:102a: icmp_seq=1 ttl=64 rssi=-45 dBm time=15.761 ms
2019-08-09 22:22:59,739 - INFO # 12 bytes from fe80::1114:7221:1327:102a: icmp_seq=1 ttl=64 rssi=-41 dBm time=23.434 ms (DUP!)
2019-08-09 22:22:59,747 - INFO # 12 bytes from fe80::1114:7221:1327:102a: icmp_seq=1 ttl=64 rssi=-41 dBm time=31.714 ms (DUP!)
2019-08-09 22:22:59,756 - INFO # 12 bytes from fe80::1114:7221:1327:102a: icmp_seq=1 ttl=64 rssi=-41 dBm time=39.995 ms (DUP!)
2019-08-09 22:23:00,729 - INFO # 12 bytes from fe80::1114:7221:1327:102a: icmp_seq=2 ttl=64 rssi=-45 dBm time=14.172 ms
2019-08-09 22:23:00,730 - INFO # 
2019-08-09 22:23:00,734 - INFO # --- fe80::1114:7221:1327:102a PING statistics ---
2019-08-09 22:23:00,740 - INFO # 3 packets transmitted, 3 packets received, 6 duplicates, 0% packet loss
2019-08-09 22:23:00,745 - INFO # round-trip min/avg/max = 14.172/25.513/39.995 ms
> ping6 fe80::1114:7221:1327:102a
2019-08-09 22:23:03,651 - INFO #  ping6 fe80::1114:7221:1327:102a
2019-08-09 22:23:03,659 - INFO # 12 bytes from fe80::1114:7221:1327:102a: icmp_seq=2 ttl=64 rssi=-40 dBm time=2943.920 ms
2019-08-09 22:23:03,673 - INFO # 12 bytes from fe80::1114:7221:1327:102a: icmp_seq=0 ttl=64 rssi=-46 dBm time=14.483 ms
2019-08-09 22:23:03,681 - INFO # 12 bytes from fe80::1114:7221:1327:102a: icmp_seq=0 ttl=64 rssi=-42 dBm time=22.153 ms (DUP!)
2019-08-09 22:23:03,690 - INFO # 12 bytes from fe80::1114:7221:1327:102a: icmp_seq=0 ttl=64 rssi=-42 dBm time=30.433 ms (DUP!)
2019-08-09 22:23:03,698 - INFO # 12 bytes from fe80::1114:7221:1327:102a: icmp_seq=0 ttl=64 rssi=-42 dBm time=38.713 ms (DUP!)
2019-08-09 22:23:04,673 - INFO # 12 bytes from fe80::1114:7221:1327:102a: icmp_seq=1 ttl=64 rssi=-46 dBm time=14.481 ms
2019-08-09 22:23:04,674 - INFO # 
2019-08-09 22:23:04,678 - INFO # --- fe80::1114:7221:1327:102a PING statistics ---
2019-08-09 22:23:04,685 - INFO # 2 packets transmitted, 3 packets received, 3 duplicates, 2147483598% packet loss
2019-08-09 22:23:04,690 - INFO # round-trip min/avg/max = 14.481/510.697/2943.920 ms

(Notice how ping6 gets confused on the second run.)

Output of the Nucleo (debug in the AT86RF233 driver enabled):

2019-08-09 22:22:21,710 - INFO # 锟絧an0: 35, pan1: 0      
2019-08-09 22:22:21,712 - INFO # set option 128 to 1                                                                                    
2019-08-09 22:22:21,715 - INFO # [at86rf2xx] opt: enable auto ACKs                                                                      
2019-08-09 22:22:21,718 - INFO # set option 16 to 1      
2019-08-09 22:22:21,723 - INFO # [at86rf2xx] opt: enabling CSMA mode(4 retries, min BE: 3 max BE: 5)
2019-08-09 22:22:21,727 - INFO # [at86rf2xx] opt: Set CSMA seed to 0x13 0x14
2019-08-09 22:22:21,730 - INFO # [at86rf2xx] opt: Set CSMA retries to 4                                                 
2019-08-09 22:22:21,734 - INFO # [at86rf2xx] opt: Set min BE=3, max BE=5
2019-08-09 22:22:21,737 - INFO # at86rf2xx_reset(): reset complete.
2019-08-09 22:22:21,738 - INFO # set option 8 to 1     
2019-08-09 22:22:21,740 - INFO # set option 2 to 1       
2019-08-09 22:22:21,746 - INFO # [at86rf2xx] EVT - TX_END                                                                               
2019-08-09 22:22:21,748 - INFO # [at86rf2xx] TX SUCCESS                                                                                 
2019-08-09 22:22:21,751 - INFO # [at86rf2xx] return to idle state 0x16
2019-08-09 22:22:21,754 - INFO # [at86rf2xx] EVT - TX_END
2019-08-09 22:22:21,756 - INFO # [at86rf2xx] TX SUCCESS
2019-08-09 22:22:21,763 - INFO # main(): This is RIOT! (Version: 2019[at86rf2xx] return to idle state 0x16
2019-08-09 22:22:21,765 - INFO # [at86rf2xx] EVT - TX_END
2019-08-09 22:22:21,767 - INFO # [at86rf2xx] TX SUCCESS
2019-08-09 22:22:21,768 - INFO # .
2019-08-09 22:22:21,769 - INFO # 0-devel-369-gb48afd)
2019-08-09 22:22:21,773 - INFO # RIOT network stack example application
2019-08-09 22:22:21,775 - INFO # All up, running the shell now
> 2019-08-09 22:22:29,388 - INFO #  [at86rf2xx] EVT - RX_END
2019-08-09 22:22:29,396 - INFO # [at86rf2xx] LQI:255 high is good, RSSI:-52 high is either good ortoo much interference.
2019-08-09 22:22:31,764 - INFO # [at86rf2xx] return to idle state 0x16
2019-08-09 22:22:31,766 - INFO # [at86rf2xx] EVT - TX_END
2019-08-09 22:22:31,768 - INFO # [at86rf2xx] TX SUCCESS
2019-08-09 22:22:39,388 - INFO # [at86rf2xx] EVT - RX_END
2019-08-09 22:22:39,396 - INFO # [at86rf2xx] LQI:255 high is good, RSSI:-49 high is either good ortoo much interference.
2019-08-09 22:22:41,764 - INFO # [at86rf2xx] return to idle state 0x16
2019-08-09 22:22:41,766 - INFO # [at86rf2xx] EVT - TX_END
2019-08-09 22:22:41,769 - INFO # [at86rf2xx] TX SUCCESS
2019-08-09 22:22:53,388 - INFO # [at86rf2xx] EVT - RX_END
2019-08-09 22:22:53,396 - INFO # [at86rf2xx] LQI:255 high is good, RSSI:-53 high is either good ortoo much interference.
2019-08-09 22:22:55,763 - INFO # [at86rf2xx] return to idle state 0x16
2019-08-09 22:22:55,766 - INFO # [at86rf2xx] EVT - TX_END
2019-08-09 22:22:55,768 - INFO # [at86rf2xx] TX SUCCESS
2019-08-09 22:22:58,712 - INFO # [at86rf2xx] EVT - RX_END
2019-08-09 22:22:58,719 - INFO # [at86rf2xx] LQI:255 high is good, RSSI:-51 high is either good ortoo much interference.
2019-08-09 22:22:58,735 - INFO # [at86rf2xx] return to idle state 0x16
2019-08-09 22:22:58,737 - INFO # [at86rf2xx] EVT - TX_END

The potential "driver issue" was as simple to solve as running the following to commands:

> ifconfig 7 -ack_req
> ifconfig 7 -autoack

on the Nucleo with the AT86RF233.

Sorry I didn't manage to test any of that until now. My result agrees with @maribu: id has become unique per invocation now but the /* not our ping */ check fails to reject it.

It looks like #11938 worked correctly to reject same-node pongs with different ids but then #11933 broke the logic with a typo that only rejects a pong if it has both a wrong id and a wrong address:
https://github.com/RIOT-OS/RIOT/blob/4a723fe4fd7a8f3d8a3376ad9a4bbabf396d11be/sys/shell/commands/sc_gnrc_icmpv6_echo.c#L341

On master if I change && to || I get the expected result that same-node pongs and wrong source address pongs are ignored.

To test this in a controlled setting I captured and injected ping replies with a Linux node connected via a border router to a Riot node using this procedure:
1) Prepare Linux node to capture its own outgoing ping reply:
tcpdump -i eth0 "icmp6 && ip6[40] == 129" -n -c1 -w icmp-reply.pcap
1) Use the border router to block the genuine ping reply from reaching the Riot node:
ip6tables -I FORWARD -o lowpan0 -p icmpv6 --icmpv6-type echo-reply -j DROP
1) Ping Linux node from Riot node using 1 ping and a 60 second timeout:
ping6 2001:470:4bb0:ffff::1 -c 1 -W 60000
1) On Linux node modify the source address of the captured packet:
tcprewrite --infile=icmp-reply.pcap --outfile=icmp-reply2.pcap --pnat=[2001:470:4bb0:ffff::1]:[2001:470:4bb0:ffff::2]
1) On border router unblock ping replies:
ip6tables -D FORWARD -o lowpan0 -p icmpv6 --icmpv6-type echo-reply -j DROP
1) On Linux node send modified ping reply:
tcpreplay --intf1=eth0 icmp-reply2.pcap
1) Observe whether the Riot node prints the reply. It should not.
12 bytes from 2001:470:4bb0:ffff::2 id:0xbe45/0xbe45 icmp_seq=0 ttl=63 rssi=-85 dBm time=13823.374 ms
1) As before, block ping replies at the border router:
ip6tables -I FORWARD -o lowpan0 -p icmpv6 --icmpv6-type echo-reply -j DROP
1) As before, send a ping from the Riot node:
ping6 2001:470:4bb0:ffff::1 -c 1 -W 60000
1) As before, unblock ping replies:
ip6tables -D FORWARD -o lowpan0 -p icmpv6 --icmpv6-type echo-reply -j DROP
1) From the Linux node send the unmodified ping reply captured earlier:
tcpreplay --intf1=eth0 icmp-reply.pcap
1) Observe whether the Riot node prints the reply. It should not.
12 bytes from 2001:470:4bb0:ffff::1 id:0xbe45/0x2e08 icmp_seq=0 ttl=63 rssi=-86 dBm time=52156.860 ms

At 7 and 12 Riot should not print the ping reply. But on master it prints both 7 and 12. After changing && to || on master, 7 and 12 both stop printing.

@maribu @benemorius Thanks for testing, it seems this was some wrongful due diligence on my side when testing #11933, I was also able to reproduce the findings and test the proposed fix. Do you wan't to open the PR yourself @miri64?

Has someone already opened a PR to change the && to ||?

@maribu Im AFK for the next two weeks, do you think you or @benemorius can open the PR and review it?

@benemorius: Mind to open the PR? I'll test and review then.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jue89 picture jue89  路  5Comments

sinkarharshad picture sinkarharshad  路  7Comments

nmeum picture nmeum  路  5Comments

jia200x picture jia200x  路  5Comments

chrysn picture chrysn  路  5Comments