Some of my enddevices (hue motion sensor/dimmer switch) are frequently loosing connection. Did some sniffing, seems it is a shepherd (or firmware) issue. It seems, some required route requests are missing.
Hue enddevices seem to switch there parent frequently. All fine so far, if change is noticed by coordinator.
The problem is, shepherd seems not to handle "lazy" route updates, means if "parent switch" message got lost (due to weak link or what ever). Coordinator seems not to handle this correctly.
Here the example:
Device's 0xf0ce (blue) first parent was 0x35a3 (red). This is the parent, coordinator has in its routing table.
Then blue changed its parent to 0xbb5e (green)
but coordinator (CO) missed this change! (still valid scenario)
This is the result:

blue sends a request to CO (destination broadcast), through it's parent green (wpan layer). green forwards to CO. CO responds (destination blue), but sends packet to wrong/old parent red!
old parent red replies with Source Route Failure, since it does not "own" the child anymore
CO should search for the new parent of blue after it received a "Source Route Failure" (using "Route Request"). But it does not, so all following request are failing. Also all request, initiated by CO are failing permanently.
Workarround:
If I restart and replug CC2531, the first interaction with blue is expected "Route Request".

New parent green replies, CO knows correct parent now, "Read Attribute" request is send to correct parent and successfull delivered. Device working again.
My bulbs are still connected to normal light switch. Sometimes someone is switching accidentaly, so my routers may be offline and motion sensor is forced to search a new parent.
latest github ioBroker adapter (should be same for zigbee2mqtt)
CC2531_MAX_STABILITY_20190315
(there is no "Issues" enabled in forked shepherd repository, that's why I placed it here)
I think this happens because the routes are never expired by the coordinator (https://github.com/Koenkk/Z-Stack-firmware/blob/master/coordinator/firmware.patch#L246). Could you compile a new firmware with #define SRC_RTG_EXPIRY_TIME 255?
I'm currently working on some Z-Stack 3 firmware so I will also take this into account.
I tried to compile, but Workbench it gives me following error after buildAll:
Error[e104]: Failed to fit all segments into specified ranges. Problem discovered in segment XDATA_N. Unable to place 2 block(s) (0x200 byte(s) total) in 0x1aa byte(s) of memory. The problem occurred while processing the segment placement command
"-P(XDATA)XDATA_N=_XDATA_START-_XDATA_END", where at the moment of placement the available memory ranges were "XDATA:1d56-1eff"
Error while running Linker
Any suggests?
git patch applied (some whitespace errors)
CC2531-ProdHex
FIRMWARE_CC2531
FIRMWARE_MAX_STABILITY
can you recude the maxmemheap by 150?
Seems to help. Thanks.
Now I get"znp.js not found" at "Perform Post-Build Action"
Strange, is the .hex created?
No.
But znp.js exists in ZNP\CC253x\tools.
Hoped you had this error before already ;-)
Compile working now.
I move the Z-Stack..1.2.2.a... folder to root of drive C and renamed it to a short name. That way it worked.
It seems Xiaomi end devices show the same behavior. I keep loosing some of my sensors on regular base. I can reset them so they work for some time, but eventually they will lose their route and are unreachable. @allofmex Did the adjusted firmware solve your issue?
Didn't had much time to test yet. But I don't see much change as of now :-(
Tried SRC_RTG_EXPIRY_TIME = 255 and 2, also CONCENTRATOR_ROUTE_CACHE FALSE
Looks same as before, Sensor works some time, then it changes parent (rejoin request/response), then Device Announcement.
Next is Network Address Request, but Coordinators Network Address Response fails, because it uses wrong/old parent.
It follows permanent Network Address Request/Failed Response. No Route requests visible.
Feels for me that SRC_RTG_EXPIRY_TIME just means, the route entry is marked as "expired" and may be garbaged if route count exceeds MAX_RTG_SRC_ENTRIES. But it seems not to trigger a new route request.
Is there anywere a COMPLETE summary of all the possible compiler options?
I found Breakingthe 400-NodeZigBee NetworkBarrier, but thats not the complete list.
Got my network 99% stable now, but it's more a workaround than a solution :-(
Am using this firmware settings
#define MAXMEMHEAP 3229
...
#define CONCENTRATOR_ENABLE TRUE
#define CONCENTRATOR_ROUTE_CACHE FALSE
#define CONCENTRATOR_DISCOVERY_TIME 120
#define MAX_RTG_SRC_ENTRIES 1
#define SRC_RTG_EXPIRY_TIME 2
#undef MAX_RTG_ENTRIES
#define MAX_RTG_ENTRIES 4
#undef ROUTE_EXPIRY_TIME
#define ROUTE_EXPIRY_TIME 2
Summary:
Disadvantage:
Explanation:
There are 2 different routing tables used in firmware
MAX_RTG_ENTRIES
is standard table, created by CO broadcasting to all end devices, every router forwards any broadcast to all its siblings, the shortest route to devices gets selected.
-> Causes a lot of network traffic
MAX_RTG_SRC_ENTRIES
is a second table (source routing), the enddevice sends a route record message. While this is traveling through the mesh, all used router add there address to the message. When it arrives at coordinator, the message contains a valid route.
-> More efficient, only one message needed to get a route.
As i read in docu, coordinator first look in MAX_RTG_ENTRIES table, then in MAX_RTG_SRC_ENTRIES for a route. If nothing found it does the broadcast route query to find a route for MAX_RTG_ENTRIES list.
ROUTE_EXPIRY_TIME and SRC_RTG_EXPIRY_TIME mean: mark route in list as "can be deleted if space needed". It does NOT mean the route entry would be discarded after this time. It stays in list and will be used. It is only replaced if route is updated or list is full and another entry needs to be added.
CONCENTRATOR_ROUTE_CACHE is connected to source routing. False just means that coordinator tells devices "expect that I do not remember route record, so always send your route record along with any (to-be-acknowledge) message".
This is the scenario that causes problems with the hue motion sensor (and maybe others):
route record message to update coordinators routing table.route record anymore, and coordinator keeps using failing source route entry.route record, it will replace the faulty entry and CO is forced to do the route broadcast. Sensor will "recover" after short time.In my opinion the real solution should be:
If coordinator receives 'Source route failure' message, it must discard its entry in MAX_RTG_SRC_ENTRIES table. That way it would broadcast for a new MAX_RTG_ENTRIES route and get a valid new route.
Anyone an idea how we can achieve this behavior?
Then it will be possible to use better settings like
#define CONCENTRATOR_ROUTE_CACHE TRUE
#define MAX_RTG_SRC_ENTRIES 40
#define SRC_RTG_EXPIRY_TIME 2
#define MAX_RTG_ENTRIES 4
#define ROUTE_EXPIRY_TIME 2
@allofmex thanks for the investigation, I'm wondering if such an issue has been addressed in Z-Stack 3 (see #1445)
@allofmex based on your comment, I would propose to disable source routing.
Reasons:
NWK_MAX_DEVICE_LIST to a very low value (5), disabling source routing allows for the NWK_MAX_DEVICE_LIST to be increased.EDIT: Perhaps we should also try to increase MAX_RTG_ENTRIES to compensate for the disabled source routing.
What do you think?
@Koenkk
I would propose to disable source routing
Do you know how to completely disable it? MAX_RTG_SRC_ENTRIES 0 does not work (some invalid array size crash)
increase MAX_RTG_ENTRIES to compensate for the disabled source routing
I thought to try this too, since I experience (sometimes) slow response of bulbs with my test firmware.
CC2652R
There is no cc253x compareable hardware with this chip available yet, right? I mean a usb key like version, not a developer board.
I've published the new firmwares: https://github.com/Koenkk/Z-Stack-firmware/tree/dev/coordinator/Z-Stack_Home_1.2/bin, could you test if that fixes this problem?
The CC2652R is only available as a usb key now.
Is it working fine with the latest dev firmware?
Seems to work. CC2531_20190425 runs since few days without major issues.
Hopefully Stack 3 will work with SourceRouting...
Seems to work. CC2531_20190425 runs since few days without major issues.
With this firmware my environment is now stable. Thanks!
The CC2652R is only available as a usb key now.
@Koenkk you meant "_is NOT available as usb..._" right?
I can find only this red TI developer board.
@allofmex typo indeed, it's only available as the board yet.
@allofmex I agree with you.
@Koenkk Could you provide a patch for your firmware modification here.
Looking into a similar issue on our 3.0 fork.
Rightyo @Koenkk I found your patch nevermind.
This is what I ended up with. Currently it's untested (I'll need to get out the JTAG in the office this week) but in theory should behave in a much better way. src route tables will be cleared if a MTO network status is received (hooked in ZDApp.c). Please note we have previously patched ZDApp.c for other functionality, you may need to tweak or manually work the patch to get it to apply.
3ea6c68c8516f70325f1779981f6e3eeb9d18027.diff.txt
If someone with a JTAG (or CC-Debugger on applicable devices) could get me the binary value of the srcRoute table when empty I could verify the osal_memset portion. I've made an educated guess that it's 0 initialized.
Some debugging occurred yesterday and testing today. This is our subsequent patch.
diff --git a/Components/stack/zdo/ZDApp.c b/Components/stack/zdo/ZDApp.c
index 20082d4..254b30e 100644
--- a/Components/stack/zdo/ZDApp.c
+++ b/Components/stack/zdo/ZDApp.c
@@ -3342,12 +3342,20 @@ void ZDO_NetworkStatusCB( uint16 nwkDstAddr, uint8 statusCode, uint16 dstAddr )
{
(void)dstAddr; // Remove this line if this parameter is used.
- if (nwkDstAddr == NLME_GetShortAddr()){
+ if (nwkDstAddr == NLME_GetShortAddr() || NLME_IsAddressBroadcast(nwkDstAddr) == ADDR_BCAST_FOR_ME){
MT_ZdoNetworkStatus(statusCode, dstAddr);
- if ( statusCode == NWKSTAT_NONTREE_LINK_FAILURE )
+
+ if ( statusCode == NWKSTAT_MANY_TO_ONE_ROUTE_FAILURE )
{
- // Routing error for dstAddr, this is informational and a Route
- // Request should happen automatically.
+ // MH: Need to confirm this does what we need it to, how is the entry marked as NULL?
+ // I am making a guess here that its the fact that one or more of the fields are NULL.
+ for(unsigned int i = 0; i < MAX_RTG_SRC_ENTRIES; i++){
+ if(rtgSrcTable[i].relayList){
+ osal_mem_free(rtgSrcTable[i].relayList);
+ }
+ }
+ osal_memset(&rtgSrcTable, 0, MAX_RTG_SRC_ENTRIES * sizeof(rtgSrcEntry_t));
+ // It may be better to use RTG_nextHopIsBad ?
}
}
}
diff --git a/Projects/zstack/Tools/CC2538DB/f8wConfig.cfg b/Projects/zstack/Tools/CC2538DB/f8wConfig.cfg
index 236690f..aa1eeaa 100644
--- a/Projects/zstack/Tools/CC2538DB/f8wConfig.cfg
+++ b/Projects/zstack/Tools/CC2538DB/f8wConfig.cfg
@@ -89,9 +89,6 @@
*/
-DLINK_STATUS_JITTER_MASK=0x007F
-/* in seconds; set to 0 to turn off route expiry */
--DROUTE_EXPIRY_TIME=30
-
/* This number is used by polled devices, since the spec'd formula
* doesn't work for sleeping end devices. For non-polled devices,
* a formula is used. Value is in 2 milliseconds periods
@@ -126,11 +123,6 @@
/* The maximum number of groups in the groups table */
-DAPS_MAX_GROUPS=16
-/* Number of entries in the regular routing table plus additional
- * entries for route repair
- */
--DMAX_RTG_ENTRIES=40
-
/* Maximum number of entries in the Binding table. */
-DNWK_MAX_BINDING_ENTRIES=4
@@ -197,4 +189,54 @@
-DCONCENTRATOR_ENABLE
/* Versioning based off GIT commit hash */
--DINCLUDE_REVISION_INFORMATION
\ No newline at end of file
+-DINCLUDE_REVISION_INFORMATION
+
+/****************************************
+ * Routing Control
+ ***************************************/
+
+/*
+* NOTE:
+* Source routing is broken in Z-Stack. Related Issue and interesting read: https://github.com/Koenkk/zigbee2mqtt/issues/1408
+* When a route breaks, it is marked as expired. A route effectively lasts forever. Also MTO errors should clear the route cache.
+* - MH
+*/
+
+/* Number of entries in the regular routing table plus additional
+ * entries for route repair
+ *
+ * This is standard table, created by ZC broadcasting to all end devices, every router forwards any broadcast to all its siblings,
+ * the shortest route to devices gets selected. This Causes a lot of network traffic.
+ */
+-DMAX_RTG_ENTRIES=64
+
+/* This is a second table (source routing), when the enddevice sends a route record message.
+ * While this is traveling through the mesh, all used router add there address to the message.
+ * When it arrives at coordinator, the message contains a valid route.
+ * -> More efficient, only one message needed to get a route.
+ *
+ * Unreliable in standard Z-Stack, and a must for larger networks to prevent overload of RREQ messages.
+ * This bug has been patched, however the table is reduced as this data has a tendency to get stale, and for our purposes increasing RTG entries is sufficient.
+ */
+-DMAX_RTG_SRC_ENTRIES=4
+
+/* Number of missed Link status messages before considering the route to a specific ZR dead.
+ * Default (3) decreased for a more responsive network in case of ZR failure (2x15=30s failure time by default).
+ */
+-DNWK_ROUTE_AGE_LIMIT=2
+
+/* Number of seconds before a route is considered expired within its respective table (standard or source).
+ * This does not remove or invalidate the respective route, instead the slot is available for re-use if MAX_*_ENTRIES is reached.
+ */
+-DSRC_RTG_EXPIRY_TIME=2
+-DROUTE_EXPIRY_TIME=90
+
+/* Number of devices this ZC knows are within radio range and can transmit to directly without performing route discovery.
+ * Default 16 increased for easier Zigbee routing for all devices within range of ZC
+ */
+-DMAX_NEIGHBOR_ENTRIES=32
+
+/* The number of seconds a MTO routing entry will last. Default to not expiring.
+ * Not sure about anything being indefinite, so I set a high limit (10 minutes) - MH
+ */
+-DMTO_ROUTE_EXPIRY_TIME=600
\ No newline at end of file
diff --git a/Projects/zstack/ZNP/Source/znp.cfg b/Projects/zstack/ZNP/Source/znp.cfg
index aec2c79..1c8b49b 100644
--- a/Projects/zstack/ZNP/Source/znp.cfg
+++ b/Projects/zstack/ZNP/Source/znp.cfg
@@ -77,7 +77,6 @@
//-DSRC_RTG_EXPIRY_TIME=255
//-DCONCENTRATOR_ENABLE=TRUE
//-DCONCENTRATOR_DISCOVERY_TIME=60
--DMAX_RTG_SRC_ENTRIES=50
// Define this flag to enable ZNP implementation of the ZCL_KEY_ESTABLISHMENT_ENDPOINT and task.
//-DTC_LINKKEY_JOIN
If anyone has any interesting results please do mention me :)
@splitice
Thank you very much for the patch. Didn't had time yet to test :-(
for(unsigned int i = 0; i < MAX_RTG_SRC_ENTRIES; i++){
if(rtgSrcTable[i].relayList){
osal_mem_free(rtgSrcTable[i].relayList);
}
}
You are trying to clear the whole source-routing table right?
Do we have a chance to clear only the entry for the failed device (nwkDstAddr)?
@allofmex Possibly. It's not something we tested.
@splitice Can you share all modification made with CC2538? I am trying to build ZNP 3.0.2 optimize firmware.
MT_ZdoNetworkStatus i could not find any function like that, you write your own?
@dzungpv Everything that is suitable for release has been released. I am working for a commercial client at the end of the day.
MT_ZdoNetworkStatus is not required for this patch, it's part of a different feature.
TI provided a possible fix for this: http://e2e.ti.com/support/wireless-connectivity/zigbee-and-thread/f/158/t/883629
Most helpful comment
Seems to work. CC2531_20190425 runs since few days without major issues.
Hopefully Stack 3 will work with SourceRouting...