We have this issue for several weeks now, but for some reason no issue was opened yet (only sub-mentions in related issues). The symptom: the GUA of a node disappears, it is purged from the 6LBR's neighbor cache (as well as the GUA of the 6LBR on the same prefix on the 6LN's side IIRC) and even a new PIO does not re-add the address. Several issues have been found that might be related (or causing) to that:
Test case for this issue: run a border route + 1 node for about 15-20 min and the GUA will disappear on the node.
just checked, the issue is still present with the simplified BR merged.
True. But as I already noted multiple times offline, this issue stems from the (non-existent) design of NDP, so fixing it with a hack for this release (if someone finds one) might be a quick fix, but not a permanent solution.
Disabling deregistration might be one ... (why didn't I think of it [edit]before[/edit]?).
just for further reference, this is what the neighbor cache on the remote node looked like before loosing the entry (updated roughly every second), while in parallel pinging that node from a linux host:
2016-04-15 10:41:42,419 - INFO # > ncache
2016-04-15 10:41:42,421 - INFO # IPv6 address if L2 address state type
2016-04-15 10:41:42,423 - INFO # ------------------------------------------------------------------------------
2016-04-15 10:41:42,424 - INFO # 2001:affe::585a:615d:a451:cbd6 7 5a:5a:61:5d:a4:51:cb:d6 REACHABLE REG
2016-04-15 10:41:45,694 - INFO # > ncache
2016-04-15 10:41:45,696 - INFO # IPv6 address if L2 address state type
2016-04-15 10:41:45,698 - INFO # ------------------------------------------------------------------------------
2016-04-15 10:41:45,700 - INFO # 2001:affe::585a:615d:a451:cbd6 7 5a:5a:61:5d:a4:51:cb:d6 DELAY REG
2016-04-15 10:41:46,486 - INFO # > ncache
2016-04-15 10:41:46,488 - INFO # IPv6 address if L2 address state type
2016-04-15 10:41:46,490 - INFO # ------------------------------------------------------------------------------
2016-04-15 10:41:46,492 - INFO # 2001:affe::585a:615d:a451:cbd6 7 5a:5a:61:5d:a4:51:cb:d6 DELAY REG
2016-04-15 10:41:46,910 - INFO # > ncache
2016-04-15 10:41:46,912 - INFO # IPv6 address if L2 address state type
2016-04-15 10:41:46,914 - INFO # ------------------------------------------------------------------------------
2016-04-15 10:41:46,916 - INFO # 2001:affe::585a:615d:a451:cbd6 7 5a:5a:61:5d:a4:51:cb:d6 DELAY REG
2016-04-15 10:41:47,586 - INFO # > ncache
2016-04-15 10:41:47,588 - INFO # IPv6 address if L2 address state type
2016-04-15 10:41:47,590 - INFO # ------------------------------------------------------------------------------
2016-04-15 10:41:47,591 - INFO # 2001:affe::585a:615d:a451:cbd6 7 5a:5a:61:5d:a4:51:cb:d6 DELAY REG
2016-04-15 10:41:49,530 - INFO # > ncache
2016-04-15 10:41:49,531 - INFO # IPv6 address if L2 address state type
2016-04-15 10:41:49,533 - INFO # ------------------------------------------------------------------------------
2016-04-15 10:41:49,535 - INFO # 2001:affe::585a:615d:a451:cbd6 7 5a:5a:61:5d:a4:51:cb:d6 PROBE REG
2016-04-15 10:41:50,067 - INFO # > ncache
2016-04-15 10:41:50,069 - INFO # IPv6 address if L2 address state type
2016-04-15 10:41:50,070 - INFO # ------------------------------------------------------------------------------
2016-04-15 10:41:50,072 - INFO # 2001:affe::585a:615d:a451:cbd6 7 5a:5a:61:5d:a4:51:cb:d6 PROBE REG
2016-04-15 10:41:50,690 - INFO # > ncache
2016-04-15 10:41:50,692 - INFO # IPv6 address if L2 address state type
2016-04-15 10:41:50,693 - INFO # ------------------------------------------------------------------------------
2016-04-15 10:41:50,695 - INFO # 2001:affe::585a:615d:a451:cbd6 7 5a:5a:61:5d:a4:51:cb:d6 PROBE REG
2016-04-15 10:41:51,149 - INFO # > ncache
2016-04-15 10:41:51,151 - INFO # IPv6 address if L2 address state type
2016-04-15 10:41:51,153 - INFO # ------------------------------------------------------------------------------
2016-04-15 10:41:51,154 - INFO # 2001:affe::585a:615d:a451:cbd6 7 5a:5a:61:5d:a4:51:cb:d6 PROBE REG
2016-04-15 10:41:51,597 - INFO # > ncache
2016-04-15 10:41:51,598 - INFO # IPv6 address if L2 address state type
2016-04-15 10:41:51,600 - INFO # ------------------------------------------------------------------------------
2016-04-15 10:41:51,602 - INFO # 2001:affe::585a:615d:a451:cbd6 7 5a:5a:61:5d:a4:51:cb:d6 PROBE REG
2016-04-15 10:41:52,003 - INFO # > ncache
2016-04-15 10:41:52,004 - INFO # IPv6 address if L2 address state type
2016-04-15 10:41:52,006 - INFO # ------------------------------------------------------------------------------
2016-04-15 10:41:52,008 - INFO # 2001:affe::585a:615d:a451:cbd6 7 5a:5a:61:5d:a4:51:cb:d6 PROBE REG
2016-04-15 10:41:52,458 - INFO # > ncache
2016-04-15 10:41:52,459 - INFO # IPv6 address if L2 address state type
2016-04-15 10:41:52,461 - INFO # ------------------------------------------------------------------------------
2016-04-15 10:41:52,957 - INFO # > ncache
2016-04-15 10:41:52,958 - INFO # IPv6 address if L2 address state type
2016-04-15 10:41:52,960 - INFO # ------------------------------------------------------------------------------
2016-04-15 10:41:53,625 - INFO # > ncache
2016-04-15 10:41:53,627 - INFO # IPv6 address if L2 address state type
2016-04-15 10:41:53,628 - INFO # ------------------------------------------------------------------------------
Has anyone tried if compiling the examples with RPL (initialized and started) would "solve" the problem? @cgundogan maybe?
Hmm, I can't think of anything in our RPL implementation that interacts with NDP (yet). So I guess trying with RPL shouldn't have an impact on this issue?
I thought with RPL the GUA should go into the FIB and don't use the neighbor cache anymore.
Why wouldn't it go into the neighbor cache? RPL should only replace router discovery in the long term but not the address resolution of NDP IMHO.
Yes, but if you have a FIB entry and perform 6lo L2 address resolution, the node just won't consult the neighbor cache.
How so? 6Lo-ND needs to resolve GUAs via the neighbor cache. But of course ideally merging NIB and FIB at one point in the future wouldn't require a NC lookup, since the L2 information would also be contained in the NIB entry.
But the node would first try to consult the FIB which would return the link-local address as a default route which in turn can be resolved without the neighbor cache.
Ah, okay. This way we would circumvent the NC of course. Carry on :-)
Partly solved in #5309. I'll mark it for the next release.
To make it clearer: it doesn't really solve it, it just removes the symptom of removal by not allowing automatic removal at all ;-)
Is there any proposal on how to solve the issue? It's a big issue for anyone who wants to use border router setup
Is there any proposal on how to solve the issue? It's a big issue for
anyone who wants to use border router setup
Yeah.
Did you guys:
?
The current quick fix in master and the release (not removing anything automatically from the neighbor cache/address registry [in 6Lo-ND the former doubles as the latter]) should be sufficient for that setup (if your nodes have more than 8 neighbors you need to resize the neighbor cache in any case). For the next release I'm working on a redesign of the whole ND mechanism so that timed operations to the data structures for next hop determination and address resolution are better to predict. The first step in that is the unified NIB proposed in https://github.com/RIOT-OS/RIOT/pull/5293
There are news on this?
Postponed. Hoping this one will be addressed on the new design of NDP.
I suppose we could move this one again?
Move to where? It's a known issue. Please don't move milestones for bugs as long the release notes are not in ;-).
Ok I was just asking ;-)
Can someone test, if the following quick-fix solves the issue (until I'm done with the new implementation):
diff --git a/sys/net/gnrc/network_layer/ndp/gnrc_ndp.c b/sys/net/gnrc/network_layer/ndp/gnrc_ndp.c
index 9e097b9..858fbb5 100644
--- a/sys/net/gnrc/network_layer/ndp/gnrc_ndp.c
+++ b/sys/net/gnrc/network_layer/ndp/gnrc_ndp.c
@@ -54,23 +54,6 @@ static void _stale_nc(kernel_pid_t iface, ipv6_addr_t *ipaddr, uint8_t *l2addr,
if (l2addr_len != -ENOTSUP) {
gnrc_ipv6_nc_t *nc_entry = gnrc_ipv6_nc_get(iface, ipaddr);
if (nc_entry == NULL) {
-#ifdef MODULE_GNRC_SIXLOWPAN_ND_ROUTER
- /* tentative type see https://tools.ietf.org/html/rfc6775#section-6.3 */
- gnrc_ipv6_netif_t *ipv6_iface = gnrc_ipv6_netif_get(iface);
- if ((ipv6_iface->flags & GNRC_IPV6_NETIF_FLAGS_SIXLOWPAN) &&
- (ipv6_iface->flags & GNRC_IPV6_NETIF_FLAGS_ROUTER)) {
- if ((nc_entry = gnrc_ipv6_nc_add(iface, ipaddr, l2addr,
- (uint16_t)l2addr_len,
- GNRC_IPV6_NC_STATE_STALE |
- GNRC_IPV6_NC_TYPE_TENTATIVE)) != NULL) {
- xtimer_set_msg(&nc_entry->type_timeout,
- (GNRC_SIXLOWPAN_ND_TENTATIVE_NCE_LIFETIME * SEC_IN_USEC),
- &nc_entry->type_timeout_msg,
- gnrc_ipv6_pid);
- }
- return;
- }
-#endif
gnrc_ipv6_nc_add(iface, ipaddr, l2addr, (uint16_t)l2addr_len,
GNRC_IPV6_NC_STATE_STALE);
}
Explanation: The removed code is only supposed to be executed for incoming router advertisements (which isn't checked). Since the 6LBR usually doesn't receive any this would explain this behavior. The RFC states this behavior is a MAY, not a MUST (and we don't have multihop-DAD anyways which is the only way to get out of tentative mode), so removing it is the quickest solution.
(@smlng e.g.)
Fixed by #7925