Espeasy: [OpenHAB MQTT] Crash due to yield panic

Created on 3 Aug 2018  ·  98Comments  ·  Source: letscontrolit/ESPEasy

As reported here: https://github.com/letscontrolit/ESPEasy/issues/1574#issuecomment-410065300

Suggested fix:
Make the callback function just register handling of the MQTT as a new job in the scheduler.
This will result in all calls made during processing of the MQTT message being called from the loop() and thus be allowed to call yield() or delay()

It's due to a panic reboot. Happened after a MQTT publish to my Openhab home automation system. Here's the log:

11514348 : NEXTION075 : code: |s,i16,sOff
11514353 : EVENT: NEXTION#idx=16.00
11514394 : ACT  : Publish /ESPEZ_Laundry/NEXTION/idx,16
11514408 : Command: publish

Panic C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\
hardware\esp8266\2.4.1\cores\esp8266\core_esp8266_main.cpp:99 __yield

ctx: cont 
sp: 3fff3e00 end: 3fff4750 offset: 01b0

>>>stack>>>
3fff3fb0:  00000001 00000001 3fff40f4 3fff2419  
3fff3fc0:  00000001 00000001 3fff899c 4025fab8  
3fff3fd0:  00000001 00000001 3fff899c 40241651  
3fff3fe0:  00000c01 00000000 00000000 40100200  
3fff3ff0:  00000000 00000010 00000000 00000000  
3fff4000:  00000000 00000000 3fff8b14 0000000f  
3fff4010:  00000000 3fff8b2c 0000000f 00000000  
3fff4020:  3fff8b44 0000000f 00000000 3fff8b5c  
3fff4030:  0000000f 00000000 3fff8b74 0000000f  
3fff4040:  00000000 00000000 6c627550 00687369  
3fff4050:  00000046 3fff4110 0000000d 402173d7  
3fff4060:  00000600 4c5f1800 00000000 4e2f7901  
3fff4070:  00000000 00000010 00000000 00000000  
3fff4080:  00000000 00000000 00000000 00000000  
3fff4090:  00000000 00000000 50003600 4c5f5a45  
3fff40a0:  646e7561 4e2f7972 49545845 692f4e4f  
3fff40b0:  00007864 00000000 00000000 00000000  
3fff40c0:  00000000 00000000 0000032d 4025ea7b  
3fff40d0:  312c7864 00000036 3fff41dc 4025eaad  
3fff40e0:  3fff41dc 00000000 3fff26ae 00000006  
3fff40f0:  00000001 3fff8afc 0000000f 00000000  
3fff4100:  00000001 00000001 00000001 40241dd5  
3fff4110:  00000c00 00000000 00000000 3fff5300  
3fff4120:  00000000 00000010 00000000 00000000  
3fff4130:  00000000 00000000 3fff8a34 0000000f  
3fff4140:  00000000 3fff8a4c 0000000f 00000000  
3fff4150:  3fff8a7c 0000000f 00000000 3fff8a94  
3fff4160:  0000000f 00000000 3fff8aac 0000000f  
3fff4170:  00000000 00000000 4e5b2c78 49545845  
3fff4180:  69234e4f 0d5d7864 6520200a 6669646e  
3fff4190:  20200a0d 5b206669 5458454e 234e4f49  
3fff41a0:  5d786469 30353d3e 2f202030 6f54202f  
3fff41b0:  20686375 6e657645 40107b10 00000000  
3fff41c0:  000003e8 00000000 3fff87fc 3fff883c  
3fff41d0:  00000000 00000000 00000000 3fff8ac4  
3fff41e0:  0000002f 00000025 3fff89d4 0000002f  
3fff41f0:  00000025 3fff899c 0000002f 00000025  
3fff4200:  3fff87e4 0000000f 00000000 3fff8964  
3fff4210:  0000002f 00000025 3fff891c 0000003f  
3fff4220:  00000025 3fff8a0c 0000001f 0000001b  
3fff4230:  00000001 00000001 00000001 00000001  
3fff4240:  3fff42d0 00000010 3fff4280 00000040  
3fff4250:  00000000 3fff2419 3fff428c 4025ebb0  
3fff4260:  3fff0031 3fff2419 00000001 00afb1f1  
3fff4270:  3fff42d0 3fff2419 00000001 40241f93  
3fff4280:  00000000 00000000 00000000 3fff87cc  
3fff4290:  0000000f 0000000a 3fff879c 0000000f  
3fff42a0:  0000000a 00000030 3fff42d0 4025f058  
3fff42b0:  3fff259e 00000000 00000000 00000005  
3fff42c0:  00000010 00000004 00000000 40242b55  
3fff42d0:  3fff8774 0000001f 00000011 4010020c  
3fff42e0:  3fff23c4 3fff0e58 3fff4310 40100690  
3fff42f0:  00000000 00000000 00000000 00000011  
3fff4300:  3fff1b68 3fff26a8 3fff4610 40244435  
3fff4310:  00000000 00000000 00000000 00000011  
3fff4320:  3fff1b68 3fff0e58 3fff4610 40264db4  
3fff4330:  ae4f0759 3fff0e58 3fff4560 40262844  
3fff4340:  3fff0030 3fff0e58 3fff45a8 00000011  
3fff4350:  3fff1b68 3fff0e58 3fff4610 4024a0c7  
3fff4360:  692c737c 732c3631 0066664f 0083126e  
3fff4370:  3fff5ccc 3fff4670 3fff4478 00000030  
3fff4380:  4021515a 3fff4670 3fff64b4 3fff645c  
3fff4390:  0000072f 0000002f 00000000 00000089  
3fff43a0:  00000000 00000000 0000001f 40104ec9  
3fff43b0:  4000050c 3fff1b00 3fff4470 00000030  
3fff43c0:  40000f83 00000030 00000000 ffffffff  
3fff43d0:  40285942 00000023 3fffbff8 00000e71  
3fff43e0:  000006b0 00000e70 00000001 3fff8a78  
3fff43f0:  000000c2 3fff64dc 00000001 3fff4b5c  
3fff4400:  3fff3640 00000000 00007388 00000030  
3fff4410:  40103e13 3ffef740 50800000 4000050c  
3fff4420:  3fffc278 40101f8c 3fffc200 00000022  
3fff4430:  3fff4440 00000023 3fff84fc 40214afa  
3fff4440:  ffffe660 00000020 00000000 4010053d  
3fff4450:  401004ba 00000023 00000003 3fff4490  
3fff4460:  3fff1b00 3fff4468 3fff4670 40214bbc  
3fff4470:  3fff4470 3fff4470 3fff4478 3fff4478  
3fff4480:  3fff4480 3fff4480 3fff4488 3fff4488  
3fff4490:  3fff4490 3fff4490 3fff4498 3fff4498  
3fff44a0:  3fff44a0 3fff44a0 3fff44a8 3fff44a8  
3fff44b0:  3fff44b0 3fff44b0 3fff44b8 401071ee  
3fff44c0:  3fff44c0 3fff44c0 3fff4610 40210069  
3fff44d0:  3fff44d0 3fff44d0 3fff44d8 40244d60  
3fff44e0:  3fff44e0 3fff44e0 3fff44e8 3fff44e8  
3fff44f0:  3fff44f0 3fff44f0 3fff44f8 3fff44f8  
3fff4500:  00000000 000006c0 000006c0 4010020c  
3fff4510:  3fff3640 000006c0 000006c0 00000046  
3fff4520:  3fff3648 00000000 3fff16d0 40285968  
3fff4530:  00000014 000035d8 000006bb 40207b90  
3fff4540:  00000010 00000010 3fff178c 4028f34f  
3fff4550:  4025d1f4 3fff4610 3fff178c 4025ebb0  
3fff4560:  00000000 00000000 00000000 4020b47c  
3fff4570:  3fff3640 000006bb 000006bb 4010020c  
3fff4580:  00000004 3fff4610 3fff45b0 40100690  
3fff4590:  3fff875c 0000000f 0000000b 3fff8744  
3fff45a0:  0000000f 00000000 3fff8r‚)ŠLþ
ôŒ
JE0¸
†´ˆˆ9!9¯„ù„!ÁÎì)ÿ18Áù526 : INIT : Free RAM:21328
527 : INIT : I2C
527 : INIT : SPI not enabled
541 : NEXTION075 : serial pin config RX:13, TX:15, Plugin Enabled
543 : NEXTION075 : Using software serial
599 : INFO : Plugins: 71 [Normal] [Testing] (ESP82xx Core 2_4_1, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.0.3)
696 : EVENT: System#Wake
745 : WIFI : Set WiFi to STA
Controller Stabiliy In Progress Bug

Most helpful comment

@TD-er: The test system is still running OK. Next I will turn on Serial log to see if it contributes to reboots or memory leaks.

I did an experiment with serial log on my second test device. It found evidence that the ESP8266's serial Tx buffer is being over-filled, which is something that can cause code blocking. My research (google) found that this is a situation that can cause a ESP8266 crash.

The experiment was simple. I added a Serial.availableForWrite() to the addToLog() function (in misc.ino). It checks to see how much room is left in the Tx buffer. If space is insufficient it trims the log string so that it can fit in the buffer (without causing a "full" condition).

This is the test code (EDIT: updated):

void addToLog(byte logLevel, const char *line)
{
  if (loglevelActiveFor(LOG_TO_SERIAL, logLevel)) {
    Serial.print(millis());
    Serial.print(F(" : "));
    if(strlen(line)+20 >  (uint16_t)(Serial.availableForWrite())) {
        uint16_t snip = (uint16_t)(Serial.availableForWrite());
        if(snip <22) snip = 22;
        String linestr = line;
        Serial.print(linestr.substring(0,snip-22));
        Serial.println(F(",snip!"));             // Sorry, but no room for your entire log message.
    }
    else {
        Serial.print(F("["));
        Serial.print(String(Serial.availableForWrite()));
        Serial.print(F("], "));
        Serial.println(line);
    }
  }
  if (loglevelActiveFor(LOG_TO_SYSLOG, logLevel)) {
    syslog(logLevel, line);
  }
  if (loglevelActiveFor(LOG_TO_WEBLOG, logLevel)) {
    Logging.add(logLevel, line);
  }

#ifdef FEATURE_SD
  if (loglevelActiveFor(LOG_TO_SDCARD, logLevel)) {
    File logFile = SD.open("log.dat", FILE_WRITE);
    if (logFile)
      logFile.println(line);
    logFile.close();
  }
#endif
}

I'm seeing periodic log entries that are being trimmed. So that means the native log message would have filled the buffer and caused a temporary block.

The log messages from the plugins are not a problem. The offending messages are from the system (PluginStats, Plugin call, etc.). Here's some examples, where trimmed messages end with _snip!_

20834 : [120], NEXTION075 : Cmd Statement Line-1 Sent: page0.j1.val=45
20856 : [120], SYS  : -79.00
20860 : [120], EVENT: RSSI#signal=-79.00
20880 : [120], EVENT: RSSI#signal=-79.00 Processing time:19 milliSeconds
20958 : [120], MQTT : /ESPEZ_Laundry/RSSI/signal -79
21094 : [120], SYS  : 15040.00
21096 : [116], EVENT: FreeRam#ram=15040.00
21117 : [120], EVENT: FreeRam#ram=15040.00 Processing time:21 milliSeconds
25449 : Current Time Zone:  DST time start: 2018-03-25 02:00:00 offset: -420 minSTD time start: 2018-10-28 02:00:00 offset: -480 min,snip!
25451 : EVENT: Time#Initialized,snip!
25473 : [120], EVENT: Time#Initialized Processing time:23 milliSeconds
25478 : [99], EVENT: Clock#Time=Wed,13:55
25498 : [120], EVENT: Clock#Time=Wed,13:55 Processing time:20 milliSeconds
26040 : [120], SYS  : 15152.00
26043 : [116], EVENT: FreeRam#ram=15152.00
26063 : [120], EVENT: FreeRam#ram=15152.00 Processing time:19 milliSeconds
30668 : [120], SW   : State 0.00
30670 : [114], EVENT: WASHER#ac=0.00
30678 : [120], [if 0=1]=false
30681 : [120], else = true
30682 : [109], ACT  : NEXTION1,page0.va_WasherAC.val=0
30714 : [120], NEXTION075 : WRITE, Command is page0.va_WasherAC.val=0
30732 : [120], EVENT: WASHER#ac=0.00 Processing time:62 milliSeconds
30793 : [120], MQTT : /ESPEZ_Laundry/WASHER/ac 0
31095 : [120], SYS  : 15040.00
31097 : [116], EVENT: FreeRam#ram=15040.00
31117 : [120], EVENT: FreeRam#ram=15040.00 Processing time:20 milliSeconds
31971 : [120], WD   : Uptime 1 ConnectFailures 0 FreeMem 15320
32863 : LoopStats: shortestLoop: 36 longestLoop: 1757031 avgLoopDuration: 54.40 loopCounterMax: 833333 loopCounterLast: 527198 countFindPluginId: 0,snip!
32865 : PluginStats P_1_Switch input - Switch READ                 Count: 3 Avg/min/max 592.67/591/594 usec,snip!
32875 : PluginStats P_1_Switch input - Switch ONCE_A_SECOND        Count: 30 Avg/min/max 22.80/15/66 usec,snip!
32885 : PluginStats P_1_Switch input - Switch TEN_PER_SECOND       Count: 270 Avg/min/max 42.77/32/68 usec,snip!
32895 : PluginStats P_1_Switch input - Switch WRITE                Count: 3 Avg/min/max 515.00/514/517 usec,snip!
32905 : PluginStats P_1_Switch input - Switch EVENT_OUT            Count: 10 Avg/min/max 45.50/35/51 usec,snip!
32915 : PluginStats P_1_Switch input - Switch FIFTY_PER_SECOND     Count: 1319 Avg/min/max 19.82/14/45 usec,snip!
32925 : PluginStats P_26_Generic - System Info READ                 Count: 7 Avg/min/max 698.14/552/748 usec,snip!
32935 : PluginStats P_26_Generic - System Info ONCE_A_SECOND        Count: 90 Avg/min/max 7.30/3/23 usec,snip!
32944 : PluginStats P_26_Generic - System Info TEN_PER_SECOND       Count: 810 Avg/min/max 7.13/3/23 usec,snip!
32954 : PluginStats P_26_Generic - System Info EVENT_OUT            Count: 30 Avg/min/max 7.43/2/19 usec,snip!
32964 : PluginStats P_26_Generic - System Info FIFTY_PER_SECOND     Count: 3957 Avg/min/max 6.05/2/23 usec,snip!
32974 : PluginStats P_71_Display - Nextion [TESTING] READ                 Count: 1 Avg/min/max 21051.00/21051/21051 usec,snip!
32985 : PluginStats P_71_Display - Nextion [TESTING] ONCE_A_SECOND        Count: 30 Avg/min/max 8.60/4/23 usec,snip!
32995 : PluginStats P_71_Display - Nextion [TESTING] TEN_PER_SECOND       Count: 270 Avg/min/max 88.16/78/124 usec,snip!
33006 : PluginStats P_71_Display - Nextion [TESTING] WRITE                Count: 3 Avg/min/max 29707.33/29694/29719 usec,snip!
33017 : PluginStats P_71_Display - Nextion [TESTING] EVENT_OUT            Count: 10 Avg/min/max 27.20/26/31 usec,snip!
33027 : PluginStats P_71_Display - Nextion [TESTING] FIFTY_PER_SECOND     Count: 1319 Avg/min/max 4.45/2/31 usec,snip!
33038 : Load File stats: Count: 20 Avg/min/max 22521.10/854/56695 usec,snip!
33044 : Plugin call 50 p/s   stats: Count: 1319 Avg/min/max 1864.93/1486/3664 usec,snip!
33052 : Plugin call 10 p/s   stats: Count: 270 Avg/min/max 1946.29/1615/2321 usec,snip!
33060 : Plugin call 10 p/s U stats: Count: 270 Avg/min/max 3395.21/2899/4695 usec,snip!
33068 : Plugin call  1 p/s   stats: Count: 30 Avg/min/max 64210.20/2101/1756522 usec,snip!
33076 : SensorSendTask()     stats: Count: 11 Avg/min/max 78003.37/24929/131677 usec,snip!
33084 : sendData()           stats: Count: 10 Avg/min/max 63515.10/24490/127186 usec,snip!
33092 : Compute formula      stats: Count: 10 Avg/min/max 139.00/10/1295 usec,snip!
33099 : setNewTimerAt()      stats: Count: 1759 Avg/min/max 187.72/158/283 usec,snip!
33106 : timeDiff()           stats: Count: 1666387 - CPU cycles per call: 8.44,snip!
33114 : Scheduler stats: (called/tasks/max_length/idle%) 527198/1751/12/74.60,snip!
36040 : [120], SYS  : 15152.00
36042 : [116], EVENT: FreeRam#ram=15152.00
36064 : [120], EVENT: FreeRam#ram=15152.00 Processing time:22 milliSeconds
40668 : [120], SW   : State 0.00
40671 : [114], EVENT: WASHER#ac=0.00
40679 : [120], [if 0=1]=false
40681 : [120], else = true
40683 : [109], ACT  : NEXTION1,page0.va_WasherAC.val=0
40717 : [120], NEXTION075 : WRITE, Command is page0.va_WasherAC.val=0
40734 : [120], EVENT: WASHER#ac=0.00 Processing time:63 milliSeconds
40795 : [120], MQTT : /ESPEZ_Laundry/WASHER/ac 0
40834 : [120], NEXTION075 : Cmd Statement Line-1 Sent: page0.j1.val=45
40857 : [120], SYS  : -77.00
40860 : [120], EVENT: RSSI#signal=-77.00
40882 : [120], EVENT: RSSI#signal=-77.00 Processing time:21 milliSeconds

  • Thomas

All 98 comments

Maybe you could test this PR: https://github.com/letscontrolit/ESPEasy/pull/1627

I just tested it will compile and did not crash my ESP32 node when pressing the test notification.
But since it is already close to 2am here, I see an opportunity for those still in day-light side of the globe ;)

I loaded PR 1627 into two nodeMCU boards. Identical configurations, except only one of them will do MQTT publish.

I'll run them overnight and check their uptime in the morning. If uptime is wrong then we will know that there was at least one reboot.

  • Thomas

13 hours later ...
The NodeMCU board that didn't do MQTT is still OK. No resets (uptime correct).

The other board was reset several times. Resets occurred about every 2-4 hours. I'm now running with serial log, Debug Dev. Maybe there will something in the expanded log that helps ID this.

  • Thomas

It's important to note that although both boards have identical software configurations, the one that is working well is missing the the MPU6050 sensor I use to detect machine movement. So later today I will disable the MPU6050 and see if that stops the resets.

  • Thomas

I captured two panics in the log. Maybe there's a bread crumb in there that helps to ID what is going on. See entries 11575248 and 899358.
panic2.log

Is there a possibility the broker is busy or for some other reason not answering?

That is a good question. Openhab is running correctly. It's installed on a RaspberryPi Model 3+.

I studied the log messages and they seem to suggest that the panic occurred while in misc.ino's rulesProcessingFile(). That's because the last log entry before the crash is "ACT :" which is in rulesProcessingFile(). This log output is printed immediately before the PluginCall().

Also, whenever the openhab controller _C005.ino does a MQTTpublish() it prints a "MQTT :" log entry. But we don't see that in the logs. So I don't know if the publish ever occurred.

Not sure what these debug crumbs are telling us. Do you have any ideas?

For completeness, here's the logs leading up to the panic crash:

11575136 : EVENT: NEXTION#idx=26.00
RuleDebug Processing:rules1.txt
     flags CMI  parse output:
RuleDebug: 001: endon
RuleDebug: 100: washer#ac do
RuleDebug: 101: if [washer#ac]=1
RuleDebug: 101: nextion,page0.va_washerac.val=1
RuleDebug: 101: else
RuleDebug: 101: nextion,page0.va_washerac.val=0
RuleDebug: 101: endif
RuleDebug: 000: endon
RuleDebug: 100: dryermov#detect do
RuleDebug: 101: if [dryermov#detect]=1
RuleDebug: 101: nextion,page0.va_dryerac.val=1
RuleDebug: 101: else
RuleDebug: 101: nextion,page0.va_dryerac.val=0
RuleDebug: 101: endif
RuleDebug: 000: endon
RuleDebug: 110: nextion#idx do
11575183 : Read settings: TaskSettings index: 1
11575189 : Read settings: TaskSettings index: 2
11575192 : Read settings: TaskSettings index: 4
11575196 : DEBUG DEV: Parsed String='  if 26'
11575199 : Read settings: TaskSettings index: 1
11575208 : Read settings: TaskSettings index: 2
11575211 : Read settings: TaskSettings index: 4
11575213 : DEBUG DEV: Parsed String='  if 26>=10 and 26'
RuleDebug: 111: if 26>=10 and 26<=30
11575220 : [if 26>=10 and 26<=30]=true
11575225 : Read settings: TaskSettings index: 1
11575230 : Read settings: TaskSettings index: 2
11575234 : Read settings: TaskSettings index: 4
11575238 : DEBUG DEV: Parsed String='      Publish /%sysname%/NEXTION/idx,26'
RuleDebug: 111: publish /espez_laundry/nextion/idx,26
11575248 : ACT  : Publish /ESPEZ_Laundry/NEXTION/idx,26

Panic C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\core_esp8266_main.cpp:99 __yield

ctx: cont 
sp: 3fff3e70 end: 3fff47c0 offset: 01b0

>>>stack>>>

But it should have gone more like this:

190496 : EVENT: NEXTION#idx=26.00
RuleDebug Processing:rules1.txt
     flags CMI  parse output:
RuleDebug: 001: endon
RuleDebug: 100: washer#ac do
RuleDebug: 101: if [washer#ac]=1
RuleDebug: 101: nextion,page0.va_washerac.val=1
RuleDebug: 101: else
RuleDebug: 101: nextion,page0.va_washerac.val=0
RuleDebug: 101: endif
RuleDebug: 000: endon
RuleDebug: 100: dryermov#detect do
RuleDebug: 101: if [dryermov#detect]=1
RuleDebug: 101: nextion,page0.va_dryerac.val=1
RuleDebug: 101: else
RuleDebug: 101: nextion,page0.va_dryerac.val=0
RuleDebug: 101: endif
RuleDebug: 000: endon
RuleDebug: 110: nextion#idx do
190542 : Read settings: TaskSettings index: 1
190548 : Read settings: TaskSettings index: 2
190551 : Read settings: TaskSettings index: 4
190555 : DEBUG DEV: Parsed String='  if 26'
190558 : Read settings: TaskSettings index: 1
190563 : Read settings: TaskSettings index: 2
190567 : Read settings: TaskSettings index: 4
190571 : DEBUG DEV: Parsed String='  if 26>=10 and 26'
RuleDebug: 111: if 26>=10 and 26<=30
190578 : [if 26>=10 and 26<=30]=true
190585 : Read settings: TaskSettings index: 1
190588 : Read settings: TaskSettings index: 2
190591 : Read settings: TaskSettings index: 4
190595 : DEBUG DEV: Parsed String='      Publish /%sysname%/NEXTION/idx,26'
RuleDebug: 111: publish /espez_laundry/nextion/idx,26
190604 : ACT  : Publish /ESPEZ_Laundry/NEXTION/idx,26
190622 : Command: publish
RuleDebug: 111: endif
190628 : Read settings: TaskSettings index: 1
190631 : Read settings: TaskSettings index: 2
190633 : Read settings: TaskSettings index: 4
190636 : DEBUG DEV: Parsed String='  if 26'
RuleDebug: 111: if 26>=500
190638 : [if 26>=500]=false
190641 : Read settings: TaskSettings index: 1
190645 : Read settings: TaskSettings index: 2
190650 : Read settings: TaskSettings index: 4
190653 : DEBUG DEV: Parsed String='      Publish /%sysname%/NEXTION/idx,26'
RuleDebug: 111: publish /espez_laundry/nextion/idx,26
190664 : Read settings: TaskSettings index: 1
190668 : Read settings: TaskSettings index: 2
190672 : Read settings: TaskSettings index: 4
190676 : DEBUG DEV: Parsed String='      Publish /%sysname%/NEXTION/value,0'
RuleDebug: 111: publish /espez_laundry/nextion/value,0
RuleDebug: 111: endif
RuleDebug: 000: endon
RuleDebug: 100: nextion#idx=98 do
RuleDebug: 101: nextion,page7.t_wifi_ssid.txt="ssid: %ssid%"
RuleDebug: 101: nextion,page7.t_ip.txt="ip: %ip%"
RuleDebug: 101: nextion,page7.t_signal.txt="rssi: [rssi#signal]dbm"
RuleDebug: 101: nextion,page7.t_date.txt="date %sysmonth%:%sysday%:%sysyears%"
RuleDebug: 101: nextion,page7.t_time.txt="time %syshour%:%sysmin%:%syssec% "
RuleDebug: 101: nextion,page7.t_uptime.txt="uptime [runtime#days]days"
190729 : EVENT: NEXTION#idx=26.00 Processing time:233 milliSeconds
190735 : EVENT: NEXTION#value=0.00
RuleDebug Processing:rules1.txt
     flags CMI  parse output:
RuleDebug: 001: endon
RuleDebug: 100: washer#ac do
RuleDebug: 101: if [washer#ac]=1
RuleDebug: 101: nextion,page0.va_washerac.val=1
RuleDebug: 101: else
RuleDebug: 101: nextion,page0.va_washerac.val=0
RuleDebug: 101: endif
RuleDebug: 000: endon
RuleDebug: 100: dryermov#detect do
RuleDebug: 101: if [dryermov#detect]=1
RuleDebug: 101: nextion,page0.va_dryerac.val=1
RuleDebug: 101: else
RuleDebug: 101: nextion,page0.va_dryerac.val=0
RuleDebug: 101: endif
RuleDebug: 000: endon
RuleDebug: 100: nextion#idx do
RuleDebug: 101: if [nextion#idx]>=10 and [nextion#idx]<=30
RuleDebug: 101: publish /%sysname%/nextion/idx,[nextion#idx]
RuleDebug: 101: endif
RuleDebug: 101: if [nextion#idx]>=500
RuleDebug: 101: publish /%sysname%/nextion/idx,[nextion#idx]
RuleDebug: 101: publish /%sysname%/nextion/value,[nextion#value]
RuleDebug: 101: endif
RuleDebug: 000: endon
RuleDebug: 100: nextion#idx=98 do
RuleDebug: 101: nextion,page7.t_wifi_ssid.txt="ssid: %ssid%"
RuleDebug: 101: nextion,page7.t_ip.txt="ip: %ip%"
RuleDebug: 101: nextion,page7.t_signal.txt="rssi: [rssi#signal]dbm"
RuleDebug: 101: nextion,page7.t_date.txt="date %sysmonth%:%sysday%:%sysyears%"
RuleDebug: 101: nextion,page7.t_time.txt="time %syshour%:%sysmin%:%syssec% "
RuleDebug: 101: nextion,page7.t_uptime.txt="uptime [runtime#days]days"
190859 : EVENT: NEXTION#value=0.00 Processing time:124 milliSeconds
190865 : Read settings: ControllerSettings index: 0
190927 : MQTT : /ESPEZ_Laundry/NEXTION/idx 26
190928 : MQTT : /ESPEZ_Laundry/NEXTION/value 0
190930 : NEXTION075 : Pipe Command Sent: |s,i26,sOff26.00
190950 : Read settings: TaskSettings index: 1
190954 : SW   : State 0.00
190956 : EVENT: WASHER#ac=0.00
RuleDebug Processing:rules1.txt
     flags CMI  parse output:
  • Thomas

How do you send the publish commands?
Are they commands from a rule?

Yes the publish actions are in the ESPEasy rule file. To further debug this I've commented them out and restarted my tests. If the reboot appears then the problem is not caused by MQTT publish.

Here's the rule excerpt:

on NEXTION#idx do
  if [NEXTION#idx]>=10 and [NEXTION#idx]<=30
      Publish /%sysname%/NEXTION/idx,[NEXTION#idx]
  endif
  if [NEXTION#idx]>=500  // Touch Events
      Publish /%sysname%/NEXTION/idx,[NEXTION#idx]
      Publish /%sysname%/NEXTION/value,[NEXTION#value]
  endif
endon

I don't use Touch events (idx>=500), so only the first _if statement_ applies to this issue.

  • Thomas

It's been running trouble free for four hours after removing the publish statements from ESPEasy's rule file. So I am back to testing with MQTT publish.

I looked at the core and found the panic function. There's no Serial.flush before the exception, so my theory is that we lose the last buffered serial log messages at the panic reboot. So there is a chance the logs are incomplete before the reboot. Just a theory.

I edited _C005.ino and added a couple more addLog statements with Serial.flush on both sides of the MQTTpublish(). So hopefully that helps push out the debug messages before a panic reboot.

I've restore the rule file's publish statements and I am now restarting the testing.

I shut down my home control system (openhab) to see what would happen to the unanswered MQTT. EasyESP doesn't care, no problems appeared.

Lastly, I've increased the Advanced->Tools->_Controller Message Interval_ from 100mS to 250mS.

I'll collect new logs for a few hours. Maybe the new _C005.ino addLog messages will provide more debug insight.

  • Thomas

This "Controller Message Interval" variable is being used for multiple things and one of them is actually a blocking wait for that period.
It does freeze the system for that amount of time.
It is also used in places where another variable would make more sense instead of re-use.

So that's one thing I would like to change. It looks like it is meant to be used to give the MQTT broker some time to process requests, but I guess using a buffer at the controller side would make more sense.

This yield panic crash occurs when a yield or delay (which calls yield) is being used in a function that's being called from somewhere else.
Normally all functions are being called from either setup() or loop(). But callback functions are called from the core, so their scope does not include access to the yield function to run background tasks.

Yesterday I split the rules processing function to make it more readable.
See: https://github.com/TD-er/ESPEasy/commit/be7724d8c7c15582438af34877f100d63ad8efbe
Today I start looking at ExecuteCommand to see if there is something that can be handled asynchronous via the scheduler to make it being called from the loop() again.

@TD-er: Thanks for the explanation. I was aware that you were working on the _Controller Message Interval._

My test run last night was interesting. Only one reboot, whereas I would have expected several. Could have been blind luck, or something about my altered test conditions contributed to stability.

But the overnight test's panic reboot provided another clue. Here's the log excerpt:

13901389 : Before MQTT
13901393 : After MQTT
13901395 : MQTT : /ESPEZ_Laundry/RSSI/signal -79
13905306 : NEXTION075 : code: |s,i26,sOff
13905308 : Read settings: TaskSettings index: 4
13905312 : EVENT: NEXTION#idx=26.00
RuleDebug Processing:rules1.txt
     flags CMI  parse output:
RuleDebug: 001: endon
RuleDebug: 100: washer#ac do
RuleDebug: 101: if [washer#ac]=1
RuleDebug: 101: nextion,page0.va_washerac.val=1
RuleDebug: 101: else
RuleDebug: 101: nextion,page0.va_washerac.val=0
RuleDebug: 101: endif
RuleDebug: 000: endon
RuleDebug: 100: dryermov#detect do
RuleDebug: 101: if [dryermov#detect]=1
RuleDebug: 101: nextion,page0.va_dryerac.val=1
RuleDebug: 101: else
RuleDebug: 101: nextion,page0.va_dryerac.val=0
RuleDebug: 101: endif
RuleDebug: 000: endon
RuleDebug: 110: nextion#idx do
13905359 : Read settings: TaskSettings index: 1
13905368 : Read settings: TaskSettings index: 2
13905370 : Read settings: TaskSettings index: 4
13905372 : DEBUG DEV: Parsed String='  if 26'
13905375 : Read settings: TaskSettings index: 1
13905381 : Read settings: TaskSettings index: 2
13905385 : Read settings: TaskSettings index: 4
13905389 : DEBUG DEV: Parsed String='  if 26>=10 and 26'
RuleDebug: 111: if 26>=10 and 26<=30
13905396 : [if 26>=10 and 26<=30]=true
13905401 : Read settings: TaskSettings index: 1
13905406 : Read settings: TaskSettings index: 2
13905410 : Read settings: TaskSettings index: 4
13905414 : DEBUG DEV: Parsed String='      Publish /%sysname%/NEXTION/idx,26'
RuleDebug: 111: publish /espez_laundry/nextion/idx,26
13905424 : ACT  : Publish /ESPEZ_Laundry/NEXTION/idx,26
13905441 : Command: publish

Panic C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\core_esp8266_main.cpp:99 __yield

ctx: cont 
sp: 3fff3e70 end: 3fff47c0 offset: 01b0

>>>stack>>>

The "Before MQTT" and "After MQTT" are addLog message wrappers I placed before and after the MQTTpublish() in _C005.ini. I only mention this because you are probably wondering what they are.

The interesting part is the _Command: publish_ log message immediately before the panic reboot. It is log message that occurs inside misc.ino's ExecuteCommand().

And it's even more interesting is that you are starting to look at ExecuteCommand. So hopefully this rabbit hole journey is taking us in the right direction.

  • Thomas

This afternoon I added some commit just to make the commands return a String stating their return status.
That has nothing really to do with this issue, but it gave me time to look at all the commands to see if there is something that cannot be done asynchronous.
So I will now try to add an entry to schedule these command executions as a scheduled job.
This will change the scheduled job a bit to keep the extra text line and the events in sync, but that's just a minor change.

Hopefully I find enough time this evening to implement it so you can test it.
This time zone difference between us is nice to get some tests done :)

Let me know when your changes are complete and need me to try them out.

  • Thomas

I did change some this morning, but there are still some issues with that, which I will fix.... now ;)

I just pushed a fix to the existing pull request.
Could you (stress) test it?

In short:

  • All commands will be executed from the scheduler, except the commands sent from the Tools webpage.
  • Commands run from the web page will reply with a more meaningful reply.
  • Rules should be executed faster, since some commands like "publish" will now be started from the scheduler.
  • Commands will now be queued and thus not lost. (this queue may take quite some memory per item)

Still to do:

  • Maybe introduce some artificial delay between commands when sending a lot at once.

I have it running now. Everything is working fine. I'll keep an eye out for reboots or other odd behavior and report back tomorrow (or sooner if something goes wrong tonight).

FWIW, the updated test branch doesn't seem to have the latest src files. For example, it did not have my recent Nextion.ino. I didn't check any other files for "freshness."

Thank you for your efforts to make the code more stable. It is appreciated.

  • Thomas

After running for about one hour it rebooted. I'll run it again with serial log enabled and capture the exception.

  • Thomas

Sent from Yahoo Mail on Android

On Mon, Aug 6, 2018 at 8:25 PM, Thomasnotifications@github.com wrote:
After running for about one hour it rebooted. I'll run it again with serial log enabled and capture the exception.

  • Thomas


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the ythread.

@thomastech
Could you also test without actively using the publish method in the Nextion plugin?
And maybe stress test it also, then I can merge this code and we can have a look at your Nextion code.
I didn't have a proper look at it yet.

@TD-er: My overnight run had 100% success. No reboots, still running (~12 hours). That's a record! I will continue to let it run and leave it alone.

I have a second NodeMCU running without publish in the rules. It has not rebooted, 15 hours so far. So the latest PR has not caused any new stability issues.

  • Thomas

And the 12-hour running version is using Nextion code that was involved before when it crashed?

Yes, same Nextion code.

During this test I shut down my PC and left the device to run without anything messing with it, including me. Even when I don't touch it, there's plenty of MQTT actions being executed since the device constantly publishes status messages every 20 seconds.

I'm hoping that yesterday's reboot was unrelated to the rule's publish panic problem. But it's too early to know anything. It's best to watch and wait, then check the serial log next time it reboots.

In the meantime, it would be great if you reviewed the Nextion code. Although the panic reboot occurs while in the rules (not in Nextion), maybe something in the Nextion plugin is involved.

  • Thomas

It looks like it is the only plugin calling the ExecuteCommand function directly.

@TD-er: I have to agree that ExecuteCommand is involved with this issue. If the Nextion module is the only one that directly calls it then that certainly makes it (Nextion) look like a bad guy.

The "12-hour" NodeMCU is still running OK. I've been randomly interacting with it to stress test the code and perhaps provoke a reboot.

  • Thomas

While writing that last comment it did a reboot. This time it was Exception (3) rather than a panic. But still a rule file publish, as before.

Here's a log excerpt leading up to the crash:

44981799 : MQTT : /ESPEZ_Laundry/RSSI/signal -78
44982911 : NEXTION075 : code: |s,i26,sOff
44982916 : EVENT: NEXTION#idx=26.00
44982947 : [if 26>=10 and 26<=30]=true
44982958 : ACT  : Publish /ESPEZ_Laundry/NEXTION/idx,26
44982984 : [if 26>=500]=false
44983015 : EVENT: NEXTION#idx=26.00 Processing time:99 milliSeconds
44983017 : EVENT: NEXTION#value=0.00
44983041 : EVENT: NEXTION#value=0.00 Processing time:24 milliSeconds
44983045 : NEXTION075 : Pipe Command Sent: |s,i26,sOff26.00
44983053 : Command: publish
44989810 : NEXTION075 : code: |s,i98,sOff
44989813 : EVENT: NEXTION#idx=98.00
44989849 : [if 98>=10 and 98<=30]=false
44989871 : [if 98>=500]=false
44989900 : ACT  : NEXTION,page7.t_wifi_ssid.txt="SSID: Wizard1_2G4"
44989951 : NEXTION075 : WRITE, Command is page7.t_wifi_ssid.txt="SSID: Wizard1_2G4"
44989956 : ACT  : NEXTION,page7.t_ip.txt="IP: 192.168.1.217"
44989998 : NEXTION075 : WRITE, Command is page7.t_ip.txt="IP: 192.168.1.217"
44990014 : ACT  : NEXTION,page7.t_signal.txt="RSSI: -78dBm"
44990055 : NEXTION075 : WRITE, Command is page7.t_signal.txt="RSSI: -78dBm"
44990061 : ACT  : NEXTION,page7.t_date.txt="Date 08:07:18"
44990102 : NEXTION075 : WRITE, Command is page7.t_date.txt="Date 08:07:18"
44990106 : ACT  : NEXTION,page7.t_time.txt="Time 08:35:16 "
44990147 : NEXTION075 : WRITE, Command is page7.t_time.txt="Time 08:35:16 "
44990162 : ACT  : NEXTION,page7.t_uptime.txt="Uptime 749.0days"
44990207 : NEXTION075 : WRITE, Command is page7.t_uptime.txt="Uptime 749.0days"
44990209 : EVENT: NEXTION#idx=98.00 Processing time:396 milliSeconds
44990211 : EVENT: NEXTION#value=0.00
44990236 : EVENT: NEXTION#value=0.00 Processing time:24 milliSeconds
44990239 : NEXTION075 : Pipe Command Sent: |s,i98,sOff98.00
44991053 : NEXTION075 : code: |s,i98,sOff
44991056 : EVENT: NEXTION#idx=98.00
44991088 : [if 98>=10 and 98<=30]=false
44991110 : [if 98>=500]=false
44991140 : ACT  : NEXTION,page7.t_wifi_ssid.txt="SSID: Wizard1_2G4"
44991190 : NEXTION075 : WRITE, Command is page7.t_wifi_ssid.txt="SSID: Wizard1_2G4"
44991196 : ACT  : NEXTION,page7.t_ip.txt="IP: 192.168.1.217"
44991239 : NEXTION075 : WRITE, Command is page7.t_ip.txt="IP: 192.168.1.217"
44991254 : ACT  : NEXTION,page7.t_signal.txt="RSSI: -78dBm"
44991299 : NEXTION075 : WRITE, Command is page7.t_signal.txt="RSSI: -78dBm"
44991304 : ACT  : NEXTION,page7.t_date.txt="Date 08:07:18"
44991344 : NEXTION075 : WRITE, Command is page7.t_date.txt="Date 08:07:18"
44991349 : ACT  : NEXTION,page7.t_time.txt="Time 08:35:18 "
44991390 : NEXTION075 : WRITE, Command is page7.t_time.txt="Time 08:35:18 "
44991404 : ACT  : NEXTION,page7.t_uptime.txt="Uptime 749.0days"
44991450 : NEXTION075 : WRITE, Command is page7.t_uptime.txt="Uptime 749.0days"
44991452 : EVENT: NEXTION#idx=98.00 Processing time:396 milliSeconds
44991454 : EVENT: NEXTION#value=0.00
44991481 : EVENT: NEXTION#value=0.00 Processing time:27 milliSeconds
44991484 : NEXTION075 : Pipe Command Sent: |s,i98,sOff98.00
44991501 : SW   : State 0.00
44991504 : EVENT: WASHER#ac=0.00
44991515 : [if 0=1]=false
44991518 : else = true
44991520 : ACT  : NEXTION,page0.va_WasherAC.val=0
44991552 : NEXTION075 : WRITE, Command is page0.va_WasherAC.val=0
44991572 : EVENT: WASHER#ac=0.00 Processing time:68 milliSeconds
44992044 : MQTT : /ESPEZ_Laundry/WASHER/ac 0
45001103 : SW   : State 0.00
45001106 : EVENT: WASHER#ac=0.00
45001117 : [if 0=1]=false
45001120 : else = true
45001121 : ACT  : NEXTION,page0.va_WasherAC.val=0
45001154 : NEXTION075 : WRITE, Command is page0.va_WasherAC.val=0
45001172 : EVENT: WASHER#ac=0.00 Processing time:67 milliSeconds
45001235 : MQTT : /ESPEZ_Laundry/WASHER/ac 0
45001274 : NEXTION075 : Cmd Statement Line-1 Sent: page0.j1.val=45
45001288 : SYS  : 750.00
45001292 : EVENT: RUNTIME#days=750.00
45001319 : EVENT: RUNTIME#days=750.00 Processing time:27 milliSeconds
45001748 : SYS  : -76.00
45001752 : EVENT: RSSI#signal=-76.00
45001776 : EVENT: RSSI#signal=-76.00 Processing time:24 milliSeconds
45002300 : MQTT : /ESPEZ_Laundry/RSSI/signal -76
45002309 : NEXTION075 : code: |s,i16,sOff
45002314 : EVENT: NEXTION#idx=16.00
45002348 : [if 16>=10 and 16<=30]=true
45002359 : ACT  : Publish /ESPEZ_Laundry/NEXTION/idx,16
45002389 : [if 16>=500]=false
45002425 : EVENT: NEXTION#idx=16.00 Processing time:111 milliSeconds
45002426 : EVENT: NEXTION#value=0.00
45002451 : EVENT: NEXTION#value=0.00 Processing time:25 milliSeconds
45002807 : NEXTION075 : Pipe Command Sent: |s,i16,sOff16.00
45002816 : WD   : Uptime 750 ConnectFailures 0 FreeMem 14064
45002823 : Command: publish
45002901 : LoopStats: shortestLoop: 34 longestLoop: 1011174 avgLoopDuration: 59.82 loopCounterMax: 882352 loopCounterLast: 483258 countFindPluginId: 0
45002904 : PluginStats P_1_Switch input - Switch READ                 Count: 3 Avg/min/max 476.00/467/485 usec
45002913 : PluginStats P_1_Switch input - Switch ONCE_A_SECOND        Count: 30 Avg/min/max 18.73/11/30 usec
45002923 : PluginStats P_1_Switch input - Switch TEN_PER_SECOND       Count: 265 Avg/min/max 37.36/24/68 usec
45002933 : PluginStats P_1_Switch input - Switch WRITE                Count: 17 Avg/min/max 527.47/440/662 usec
45002942 : PluginStats P_1_Switch input - Switch EVENT_OUT            Count: 10 Avg/min/max 45.40/31/62 usec
45002952 : PluginStats P_1_Switch input - Switch FIFTY_PER_SECOND     Count: 1280 Avg/min/max 14.63/10/43 usec
45002962 : PluginStats P_26_Generic - System Info READ                 Count: 3 Avg/min/max 519.00/478/573 usec
45002972 : PluginStats P_26_Generic - System Info ONCE_A_SECOND        Count: 60 Avg/min/max 12.72/3/28 usec
45002981 : PluginStats P_26_Generic - System Info TEN_PER_SECOND       Count: 530 Avg/min/max 11.19/3/27 usec
45002991 : PluginStats P_26_Generic - System Info WRITE                Count: 4 Avg/min/max 10.25/3/19 usec
45003000 : PluginStats P_26_Generic - System Info EVENT_OUT            Count: 20 Avg/min/max 10.50/3/19 usec
45003010 : PluginStats P_26_Generic - System Info FIFTY_PER_SECOND     Count: 2560 Avg/min/max 7.72/2/23 usec
45003019 : PluginStats P_44_Gyro - MPU 6050 [TESTING] READ                 Count: 1 Avg/min/max 80.00/80/80 usec
45003029 : PluginStats P_44_Gyro - MPU 6050 [TESTING] ONCE_A_SECOND        Count: 30 Avg/min/max 2123.13/1951/3778 usec
45003040 : PluginStats P_44_Gyro - MPU 6050 [TESTING] TEN_PER_SECOND       Count: 265 Avg/min/max 15.00/11/19 usec
45003050 : PluginStats P_44_Gyro - MPU 6050 [TESTING] WRITE                Count: 17 Avg/min/max 17.71/14/21 usec
45003060 : PluginStats P_44_Gyro - MPU 6050 [TESTING] EVENT_OUT            Count: 10 Avg/min/max 15.10/3/22 usec
45003070 : PluginStats P_44_Gyro - MPU 6050 [TESTING] FIFTY_PER_SECOND     Count: 1280 Avg/min/max 10.84/7/19 usec
45003080 : PluginStats P_71_Display - Nextion [TESTING] READ                 Count: 1 Avg/min/max 21004.00/21004/21004 usec
45003091 : PluginStats P_71_Display - Nextion [TESTING] ONCE_A_SECOND        Count: 30 Avg/min/max 13.33/7/29 usec
45003101 : PluginStats P_71_Display - Nextion [TESTING] TEN_PER_SECOND       Count: 265 Avg/min/max 5784.80/83/502502 usec
45003111 : PluginStats P_71_Display - Nextion [TESTING] WRITE                Count: 17 Avg/min/max 33955.77/572/46868 usec
45003122 : PluginStats P_71_Display - Nextion [TESTING] EVENT_OUT            Count: 10 Avg/min/max 23.10/11/31 usec
45003132 : PluginStats P_71_Display - Nextion [TESTING] FIFTY_PER_SECOND     Count: 1280 Avg/min/max 4.32/3/27 usec
45003142 : Load File stats: Count: 144 Avg/min/max 2990.17/810/55416 usec
45003149 : Plugin call 50 p/s   stats: Count: 1280 Avg/min/max 1988.25/1879/2330 usec
45003156 : Plugin call 10 p/s   stats: Count: 265 Avg/min/max 7776.78/2020/504623 usec
45003164 : Plugin call 10 p/s U stats: Count: 265 Avg/min/max 3536.26/3126/4121 usec
45003172 : Plugin call  1 p/s   stats: Count: 30 Avg/min/max 5612.77/4338/33045 usec
45003179 : SensorSendTask()     stats: Count: 8 Avg/min/max 297917.50/3997/558267 usec
45003187 : sendData()           stats: Count: 10 Avg/min/max 381084.50/133337/552572 usec
45003194 : Compute formula      stats: Count: 6 Avg/min/max 693.83/15/1393 usec
45003202 : setNewTimerAt()      stats: Count: 1716 Avg/min/max 204.25/181/1176 usec
45003209 : timeDiff()           stats: Count: 1639431 - CPU cycles per call: 8.46
45003216 : Scheduler stats: (called/tasks/max_length/idle%) 483258/1709/13/68.90
45011103 : SW   : State 0.00
45011106 : EVENT: WASHER#ac=0.00
45011117 : [if 0=1]=false
45011120 : else = true
45011121 : ACT  : NEXTION,page0.va_WasherAC.val=0
45011155 : NEXTION075 : WRITE, Command is page0.va_WasherAC.val=0
45011174 : EVENT: WASHER#ac=0.00 Processing time:69 milliSeconds
45011235 : MQTT : /ESPEZ_Laundry/WASHER/ac 0
45012033 : NEXTION075 : code: |s,i26,sOff
45012039 : EVENT: NEXTION#idx=26.00
45012072 : [if 26>=10 and 26<=30]=true
45012083 : ACT  : Publish /ESPEZ_Laundry/NEXTION/idx,26
45012109 : [if 26>=500]=false
45012143 : EVENT: NEXTION#idx=26.00 Processing time:103 milliSeconds
45012144 : EVENT: NEXTION#value=0.00
45012169 : EVENT: NEXTION#value=0.00 Processing time:25 milliSeconds
45012173 : NEXTION075 : Pipe Command Sent: |s,i26,sOff26.00
45012186 : Command: publish
45021104 : SW   : State 0.00
45021107 : EVENT: WASHER#ac=0.00
45021118 : [if 0=1]=false
45021121 : else = true
45021122 : ACT  : NEXTION,page0.va_WasherAC.val=0
45021155 : NEXTION075 : WRITE, Command is page0.va_WasherAC.val=0
45021174 : EVENT: WASHER#ac=0.00 Processing time:67 milliSeconds
45021236 : MQTT : /ESPEZ_Laundry/WASHER/ac 0
45021297 : SYS  : -78.00
45021301 : EVENT: RSSI#signal=-78.00
45021324 : EVENT: RSSI#signal=-78.00 Processing time:23 milliSeconds
45021798 : MQTT : /ESPEZ_Laundry/RSSI/signal -78
45022010 : NEXTION075 : code: |s,i16,sOff
45022016 : EVENT: NEXTION#idx=16.00
45022052 : [if 16>=10 and 16<=30]=true
45022064 : ACT  : Publish /ESPEZ_Laundry/NEXTION/idx,16

Exception (3):
epc1=0x401003e9 epc2=0x00000000 epc3=0x40000e4a excvaddr=0x4007c1f9 depc=0x00000000

ctx: cont 
sp: 3fff3bc0 end: 3fff4780 offset: 01a0

>>>stack>>>
3fff3d60:  3fff3668 000007f6 000007f6 4010020c  
3fff3d70:  ffffffff 3ffe91bc 3fff3e5c 3fff3668  
3fff3d80:  00000010 00000010 00000000 4010053d  
3fff3d90:  ffffffff 3ffeaca0 3fff3f9c 4021b777  
3fff3da0:  3fff3668 000007fb 000007fb 00000000  
3fff3db0:  00000010 3fff3e1c 3fff3e1c 4025f86f  
3fff3dc0:  3ffeaca0 00000000 3fff3e1c 4025f8bb  
3fff3dd0:  00000022 00000000 3fff3e1c 4025f8ed  
3fff3de0:  00000001 3fff4114 3fff3e1c 4025f93a  
3fff3df0:  3ffeaca0 00000010 00000000 00000000  
3fff3e00:  00000001 3fff4114 3fff3fac 4020caca  
3fff3e10:  00000000 00000001 3fff3e80 00000000  
3fff3e20:  00000000 00000000 00000000 4010053d  
3fff3e30:  00000008 00000000 3fff3e80 00000023  
3fff3e40:  3fff3660 000007ee 000007ee 0000002d  
3fff3e50:  0000000d 3fff4114 3fff3fac 4020cb36  
3fff3e60:  0000000d 3fff4114 3fff40a0 40225962  
3fff3e70:  3fff3668 000007ee 000007ee 4010020c  
3fff3e80:  3fff4114 00000000 3fff3fa8 40100690  
3fff3e90:  00000010 3fff40a0 3fff40a0 4025f814  
3fff3ea0:  3fff4114 3fff40a0 3fff40a0 40236d3c  
3fff3eb0:  00000000 00000000 00000000 00000000  
3fff3ec0:  00000000 00000000 49545845 692f4e4f  
3fff3ed0:  312c7864 00000036 00000060 40000000  
3fff3ee0:  00000000 00000700 00000000 00000000  
3fff3ef0:  3fff8c00 00000000 00000000 40100600  
3fff3f00:  00000000 00000000 00000700 00000007  
3fff3f10:  402a1058 00000006 00000000 3fff3668  
3fff3f20:  00000000 00000010 00000000 0000053d  
3fff3f30:  3fff8c00 00000022 3fff3f00 4025f860  
3fff3f40:  00000000 00000000 3fff3fb0 00000027  
3fff3f50:  c029f1cb 3fff3fb0 3fff3fb0 4025f86f  
3fff3f60:  4029f1cc 0000000b 3fff3fb0 00000027  
3fff3f70:  3fff3668 000007ee 000007ee 4010020c  
3fff3f80:  00000000 00000000 3fff3fbc 40100690  
3fff3f90:  00000000 00000000 3fff40a0 4025f814  
3fff3fa0:  00000000 00000000 00000000 00000000  
3fff3fb0:  00000000 00000000 00000000 00000000  
3fff3fc0:  00000000 00000000 00000000 00000000  
3fff3fd0:  00000000 00000000 00000000 0000002d  
3fff3fe0:  3fff4114 3fff40a0 0000000d 4021780d  
3fff3ff0:  00000600 4c5f1800 00000000 4e2f7901  
3fff4000:  00000000 00000010 00000000 00000000  
3fff4010:  00000000 00000000 3fff8b3c 0000000f  
3fff4020:  00000000 3fff8b54 0000000f 00000000  
3fff4030:  3fff8b6c 0000000f 00000000 3fff8be4  
3fff4040:  0000000f 00000000 3fff8bfc 0000000f  
3fff4050:  00000000 00000000 0000032d 4025f8bb  
3fff4060:  312c7864 00000036 3fff4114 4025f8ed  
3fff4070:  0000000b 00000000 3fff26d2 00000006  
3fff4080:  3fff417c 3fff4188 3fff243d 3fff4300  
3fff4090:  3fff417c 3fff4188 3fff243d 40242a20  
3fff40a0:  00000c00 00000000 00000000 40206c00  
3fff40b0:  00000000 00000010 00000000 00000000  
3fff40c0:  00000000 00000000 3fff8aec 0000000f  
3fff40d0:  00000000 3fff8b84 0000000f 00000000  
3fff40e0:  3fff8b9c 0000000f 00000000 3fff8bb4  
3fff40f0:  0000000f 00000000 3fff8bcc 0000000f  
3fff4100:  00000000 00000000 00000000 00000000  
3fff4110:  00000000 3fff8b04 0000002f 00000025  
3fff4120:  00000000 ffffffff 3fff4170 40100690  
3fff4130:  3fff427e 3fff427d 3fff4260 00000000  
3fff4140:  3fff427e 3fff427d 3fff4260 40242cfc  
3fff4150:  3fff427c 3fff427b 3fff427a 3fff4279  
3fff4160:  3fff4278 00000000 3fff4254 4025f8bb  
3fff4170:  00000000 00000000 00000000 3fff8a7c  
3fff4180:  0000002f 00000025 3fff8a44 0000002f  
3fff4190:  00000025 3fff8954 0000000f 00000000  
3fff41a0:  3fff8a0c 0000002f 00000025 4010020c  
3fff41b0:  3fff427c 3fff4300 00000001 3fff426c  
3fff41c0:  3ffeaca0 3fff0f64 3fff42b0 3ffeaca0  
3fff41d0:  00000009 3fff0f64 3fff42b0 40242f19  
3fff41e0:  3fff427b 3fff427a 3fff4279 3fff4278  
3fff41f0:  234e4f49 5d786469 20200a0d 69646e65  
3fff4200:  200a0d66 20666920 58454e5b 4e4f4954  
3fff4210:  78646923 353d3e5d 20203030 54202f2f  
3fff4220:  6863756f 65764520 2073746e 6e655328  
3fff4230:  40107b10 00000000 000003e8 4025ea81  
3fff4240:  3fff896c 3fff89ac 00000000 00000000  
3fff4250:  00000000 00000000 00000000 00000000  
3fff4260:  3fff89c4 0000003f 00000025 3fff8ac4  
3fff4270:  0000001f 0000001b 01010101 40010101  
3fff4280:  00000040 3fff4300 00000001 4025f9f0  
3fff4290:  3fff0031 3fff243d 00000001 02aefb40  
3fff42a0:  3fff4300 3fff243d 00000001 4024309b  
3fff42b0:  00000000 00000000 00000000 3fff893c  
3fff42c0:  0000000f 0000000a 3fff8924 0000000f  
3fff42d0:  0000000a 00000030 3fff4300 4025fe98  
3fff42e0:  3fff25c2 00000000 00000000 00000005  
3fff42f0:  00000010 00000004 00000000 40243c85  
3fff4300:  3fff853c 0000001f 00000011 4010020c  
3fff4310:  3fff23e8 3fff0e78 3fff4340 40100690  
3fff4320:  00000000 00000000 00000000 00000011  
3fff4330:  3fff1b8c 3fff26cc 3fff4640 40245559  
3fff4340:  00000000 00000000 00000000 00000011  
3fff4350:  3fff1b8c 3fff0e78 3fff4640 40265c0c  
3fff4360:  7b856b8a 3fff0e78 3fff4590 4026369c  
3fff4370:  3fff0030 3fff0e78 3fff45d8 00000011  
3fff4380:  3fff1b8c 3fff0e78 3fff4640 4024b323  
3fff4390:  692c737c 732c3631 0066664f 4010020c  
3fff43a0:  ffffffff 3ffe91bc 3fff43d0 40100690  
3fff43b0:  3fff3668 00000707 00000707 4010020c  
3fff43c0:  ffffffff 3ffe91bc 3fff44ac 40100690  
3fff43d0:  00000000 00000000 0000001f 40104ec9  
3fff43e0:  4000050c 3ffeaca0 3fff45ec 4021b777  
3fff43f0:  40000f83 00000030 00000000 ffffffff  
3fff4400:  4028679e 00000023 3fffbff8 00000e6b  
3fff4410:  000006db 00000e6a 00000001 3fff8920  
3fff4420:  00000080 3fff61e4 00000001 3fff4b8c  
3fff4430:  3fff3668 00000000 00007358 00000030  
3fff4440:  00000180 3fff26cc 3fff4628 40100690  
3fff4450:  00000000 00000000 3fff5c14 40214fc2  
3fff4460:  ffffd570 3fff26cc 3fff4600 3fff3668  
3fff4470:  00000020 00000020 00000000 3fff44b0  
3fff4480:  3fff1b24 3fff4488 3fff4690 40215084  
3fff4490:  3fff4490 3fff4490 3fff4498 3fff4498  
3fff44a0:  3fff44a0 3fff44a0 3fff44a8 3fff44a8  
3fff44b0:  3fff44b0 3fff44b0 3fff44b8 3fff44b8  
3fff44c0:  3fff44c0 3fff44c0 3fff44c8 3fff44c8  
3fff44d0:  3fff44d0 3fff44d0 3fff44d8 3fff44d8  
3fff44e0:  3fff44e0 3fff44e0 3fff44e8 401071ee  
3fff44f0:  3fff44f0 3fff44f0 3fff4640 4021053d  
3fff4500:  3fff4500 3fff4500 3fff4508 40245e9a  
3fff4510:  3fff4510 3fff4510 3fff4518 3fff4518  
3fff4520:  3fff4520 3fff4520 3fff4528 3fff4528  
3fff4530:  00000000 0000070c 0000070c 4010020c  
3fff4540:  3fff3668 0000070c 0000070c 00000004  
3fff4550:  3fff3670 00000000 3fff16f8 402867c4  
3fff4560:  00000014 00003838 00000707 40207b90  
3fff4570:  00000010 00000010 3fff1760 402901ab  
3fff4580:  4025e044 3fff4640 3fff1760 4025f9f0  
3fff4590:  00000000 00000000 00000000 4020b7b8  
3fff45a0:  3fff3668 00000707 00000707 4010020c  
3fff45b0:  3fff192c 3fff4640 3fff45e0 40100690  
3fff45c0:  3fff8524 0000000f 0000000b 3fff850c  
3fff45d0:  0000000f 00000000 3fff84f4 0000000f  
3fff45e0:  00000001 3fff84dc 0000000f 00000003  
3fff45f0:  3fff84c4 0000000f 00000002 3fff859c  
3fff4600:  0000001f 0000001e 00000000 4025f814  
3fff4610:  3fff43cf c000bc71 4029b2f8 4020bd3a  
3fff4620:  00000000 00000000 00000000 00000004  
3fff4630:  3fff192c 3fff4640 00000005 40217965  
3fff4640:  00000400 3fff1000 00000000 3fff4605  
3fff4650:  00000000 00000000 00000000 00000000  
3fff4660:  00000000 3fff4604 3fff82a4 0000000f  
3fff4670:  00000000 3fff8254 0000000f 00000000  
3fff4680:  3fff8634 0000000f 00000000 3fff82d4  
3fff4690:  0000000f 00000000 3fff8584 0000000f  
3fff46a0:  00000000 00000000 00000565 4020d7df  
3fff46b0:  02aefb98 3fff1b24 3fff6308 402156b8  
3fff46c0:  00000046 7b855899 3fff26d0 00000030  
3fff46d0:  7b8552fd 0000000b 02aefb98 02aefb34  
3fff46e0:  7b8553cb 3fff192c 3ffea808 40243689  
3fff46f0:  00000001 3fff1b04 00000002 40215781  
3fff4700:  02aefb98 3fff1b04 3fff5ce4 02aefb34  
3fff4710:  00000001 3fff1b04 3ffea808 40250793  
3fff4720:  00000000 00000000 3fff5ce4 4025087e  
3fff4730:  3fffdad0 00000000 3fff1914 3fff3750  
3fff4740:  3fffdad0 00000000 3fff1914 40250d95  
3fff4750:  00000000 00000000 00000001 4026088d  
3fff4760:  3fffdad0 00000000 3fff374a 402608b8  
3fff4770:  feefeffe feefeffe 3fff3760 40100710  
<<<stack<<<

 ets Jan  8 2013,rst cause:2, boot mode:(3,7)

load 0x4010f000, len 1384, room 16 
tail 8
chksum 0x2d
csum 0x2d
v614f7c32
~ld
ªU105 : 


INIT : Booting version: (custom) (ESP82xx Core 2_4_1, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.0.3)
106 : INIT : Warm boot #2 - Restart Reason: Exception
110 : FS   : Mounting...
184 : FS   : Mount successful, used 77057 bytes of 2949250
227 : CRC  : No program memory checksum found. Check output of crc2.py
276 : CRC  : SecuritySettings CRC   ...OK 
553 : INIT : Free RAM:21280
554 : INIT : I2C
554 : INIT : SPI not enabled
568 : NEXTION075 : serial pin config RX:13, TX:15, Plugin Enabled
568 : NEXTION075 : Using software serial
581 : INFO : Plugins: 71 [Normal] [Testing] (ESP82xx Core 2_4_1, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.0.3)
582 : EVENT: System#Wake
607 : WIFI : Set WiFi to STA
  • Thomas

Do you also happen to trace/log the memory usage?

@TD-er: No, but I can do that. Do you want me to use the Arduino IDE monitor and install this :
https://github.com/me-no-dev/EspExceptionDecoder

  • Thomas

That would be great.
I probably can't decode your stack trace, since it is your own build

If you can't decode it then won't the stack trace be a waste of time? If you mean you want me to decipher it then we're gonna need a bigger boat.

Stepping back a bit, this latest reboot is a bit out of the ordinary. That's because the most common crash on my build is a Panic with _Restart Reason: Software Watchdog_. But the latest crash/reboot (using your new PR) was an Exception 3 with _Restart Reason: Exception_. I've seen those before while debugging this problem, but they have been uncommon.

Google pointed to some discussions that suggested that sometimes power supply issues can cause this kind of crash. I've moved the NodeMCU off my PC's USB port and it's now running on the dedicated supply that I normally use.

Let's give it some time to see what happens. If the crash still occurs then the trace/log memory usage is next.

  • Thomas

Power supply can indeed cause these issues.
But also a failed memory allocation, that's why I asked for tracking memory usage to see if it is decreasing over time

@TD-er:
Does ESPEasy have a built-in feature for tracking memory usage?

Or do I need to use a special tool like the one I mentioned earlier? Do you have a better recommendation that works with Arduino IDE?

  • Thomas

You can add a general sysinfo plugin and set it to log memory usage.

Just a espeasy plugin ;)

Just a espeasy plugin ;)

That's the ticket. Installed and I'm now monitoring memory. Only 14K bytes and dips to 12K on occasion; Wiki says >20K is ideal.

  • Thomas

There was a reboot while I was watching the memory. Never got below 12K.

I'm moving it back to my PC so I can gather the next reboot's information with serial log.

  • Thomas

This recommendation of 20k is from before the new webserver we're now using.
About a year ago, the entire web page was rendered in memory, which needed a String containing the entire page (up-to 11 k)
Now we render the entire page in 400 bytes of memory :) (@s0170071 made this happen)

The number of included plugins also has impact on the amount of free memory.
The left-over free memory also decreased over time with newer core libraries.

With "normal" builds, you get around 15k free.
The events I buffer for the commands are also quite huge, which I should also have a look at.
And we're still holding a lot in memory about the present plugins. A lot of that information can be loaded from flash at runtime. (flash is about as fast as RAM when reading on ESP8266)

@TD-er: Thanks for the clarification on memory requirements. My conclusion is that the publish related crash/reboot isn't a dirty power or low memory issue.

Other than performing tests, I'm not much use to you. If you need me to do anything special then please let me know.

  • Thomas

Could you make it less hot over here? That would be great. :)
They were pouring salt on the roads (something you normally see in winter), to protect them from melting.

So if you want to help... please do that, will you? ;)

I made a build of the latest merges, maybe you could test one of those?
It will be the same as tomorrow's, unless I will merge something else too ;)

My USA state is on fire (literally). And also suffering from high temperature weather. So hot salty roads seem like a dream paradise to me.

Regarding your _Generic HTTP Advanced crash #1536_, my home automation system sometimes responds back to ESPEasy's publish actions using HTTP. It also uses HTTP messages to ESPEasy during remotely activated features that use Alexa voice actions.

I think it is important to mention that during the most recent crash I was testing some functions that involved HTTP command messages to ESPEasy. It was part of my stress testing.

EDIT: During last night's successful 12-hour run I didn't do any actions that would have involved a HTTP.

Does any of this sound like it could be related to your #1536. Is it possible there are multiple reasons for the crash/reboots and we're chasing all the symptoms like a cat chasing a laser pointer?

I made a build of the latest merges, maybe you could test one of those?

I will test your new build.

  • Thomas

@thomastech Hmm fires is indeed a bit worse.
We (fortunately) only had a few and today a big one.
@Grovkillen also had a lot of (really big) fires in Sweden. (they bombed one to fight the fire)

And I think the issue reported in #1536 indeed may also need to have a look at.

Looks like we're all suffering in similar ways; Fixing software bugs is a welcome distraction.

I've flashed with your ESP_Easy_mega-20180804-11-g7783cf4_test_ESP8266_4096.bin. It reverted to AP mode and I had to reload config and rules. The "20180804" date in the name seems odd, so hopefully I used the correct file.

I'm doing the usual testing. If there's anything special you want me to do then let me know.

  • Thomas

I built it on my "Bash on Windows" environment using almost the same script as being used during the nightly builds.
That one bases the name on the last tag and I just fetched the repo. 20180804 is the last tag.

@TD-er: The latest test run lasted a little over 4 hours before a crash/reboot. This time it was _Restart Reason: Hardware Watchdog_ See attached log, time 60189 for the start of the crash.

At the beginning of the test the memory was >14K bytes. It slowly decreased and towards the end it was hovering around 10K. The memory values are in the log file as entries marked with _EVENT: FreeRam#ram_. So besides the reboot issue there appears to be a memory leak.

  • Thomas

log3.zip

@TD-er: After the previous reboot I restarted the test. This time I disabled all logs, switched Nextion from softserial to UART serial, avoided all HTTP access to ESPEasy, and ran on a dedicated power supply. It's been running for >13 hours without any issues. The memory leak is gone too.

My second NodeMCU test system has been running for 40 hours. No memory leak. Log is disabled, Nextion is UART serial, MQTT is disabled, and no HTTP. This system is really not doing much activity, other than breathing. Not an ideal reference since it isn't being stressed. But still useful info.

I have not experienced a "Panic" reboot since installing the scheduler PR firmware. Going down the list of possible culprits to the recent reboots, I recall reading a post about stability issues caused by enabling serial log. Is that still a concern?

  • Thomas

Serial log may still act strange, when there is no reader.
I am not sure what's happening then. It may be something stalling and thus causing a software/hardware watchdog reboot, or just filling memory. I don't know.

@TD-er: That's good to know. I'm still suspicious of Serial log, so I'll do further investigation.

In the meantime I've turned Nextion Soft Serial back on. It's been running for about an hour. No reboots or memory leaks yet. I'll let it run for a few more hours and if all is well I'll move down the suspect list.

Lastly, you mentioned that Nextion is the only plugin that directly calls ExecuteCommand(). I had forgotten that early-on in my Nextion hardware serial update I removed that function because it appeared unnecessary. Was removing ExecuteCommand() a bad idea and perhaps the reason for all this trouble? Or accidental genius? :)

  • Thomas

Was removing ExecuteCommand() a bad idea and perhaps the reason for all this trouble? Or accidental genius? :)

I guess the best answer to that question is: "yes" ;)

I guess the best answer to that question is: "yes" ;)

I see what you did there. I will ask my Magic 8-ball for a more decisive opinion.

The test system is still running (~17 hours) and no memory leaks. Yet.

  • Thomas

@TD-er: The test system is still running OK. Next I will turn on Serial log to see if it contributes to reboots or memory leaks.

I did an experiment with serial log on my second test device. It found evidence that the ESP8266's serial Tx buffer is being over-filled, which is something that can cause code blocking. My research (google) found that this is a situation that can cause a ESP8266 crash.

The experiment was simple. I added a Serial.availableForWrite() to the addToLog() function (in misc.ino). It checks to see how much room is left in the Tx buffer. If space is insufficient it trims the log string so that it can fit in the buffer (without causing a "full" condition).

This is the test code (EDIT: updated):

void addToLog(byte logLevel, const char *line)
{
  if (loglevelActiveFor(LOG_TO_SERIAL, logLevel)) {
    Serial.print(millis());
    Serial.print(F(" : "));
    if(strlen(line)+20 >  (uint16_t)(Serial.availableForWrite())) {
        uint16_t snip = (uint16_t)(Serial.availableForWrite());
        if(snip <22) snip = 22;
        String linestr = line;
        Serial.print(linestr.substring(0,snip-22));
        Serial.println(F(",snip!"));             // Sorry, but no room for your entire log message.
    }
    else {
        Serial.print(F("["));
        Serial.print(String(Serial.availableForWrite()));
        Serial.print(F("], "));
        Serial.println(line);
    }
  }
  if (loglevelActiveFor(LOG_TO_SYSLOG, logLevel)) {
    syslog(logLevel, line);
  }
  if (loglevelActiveFor(LOG_TO_WEBLOG, logLevel)) {
    Logging.add(logLevel, line);
  }

#ifdef FEATURE_SD
  if (loglevelActiveFor(LOG_TO_SDCARD, logLevel)) {
    File logFile = SD.open("log.dat", FILE_WRITE);
    if (logFile)
      logFile.println(line);
    logFile.close();
  }
#endif
}

I'm seeing periodic log entries that are being trimmed. So that means the native log message would have filled the buffer and caused a temporary block.

The log messages from the plugins are not a problem. The offending messages are from the system (PluginStats, Plugin call, etc.). Here's some examples, where trimmed messages end with _snip!_

20834 : [120], NEXTION075 : Cmd Statement Line-1 Sent: page0.j1.val=45
20856 : [120], SYS  : -79.00
20860 : [120], EVENT: RSSI#signal=-79.00
20880 : [120], EVENT: RSSI#signal=-79.00 Processing time:19 milliSeconds
20958 : [120], MQTT : /ESPEZ_Laundry/RSSI/signal -79
21094 : [120], SYS  : 15040.00
21096 : [116], EVENT: FreeRam#ram=15040.00
21117 : [120], EVENT: FreeRam#ram=15040.00 Processing time:21 milliSeconds
25449 : Current Time Zone:  DST time start: 2018-03-25 02:00:00 offset: -420 minSTD time start: 2018-10-28 02:00:00 offset: -480 min,snip!
25451 : EVENT: Time#Initialized,snip!
25473 : [120], EVENT: Time#Initialized Processing time:23 milliSeconds
25478 : [99], EVENT: Clock#Time=Wed,13:55
25498 : [120], EVENT: Clock#Time=Wed,13:55 Processing time:20 milliSeconds
26040 : [120], SYS  : 15152.00
26043 : [116], EVENT: FreeRam#ram=15152.00
26063 : [120], EVENT: FreeRam#ram=15152.00 Processing time:19 milliSeconds
30668 : [120], SW   : State 0.00
30670 : [114], EVENT: WASHER#ac=0.00
30678 : [120], [if 0=1]=false
30681 : [120], else = true
30682 : [109], ACT  : NEXTION1,page0.va_WasherAC.val=0
30714 : [120], NEXTION075 : WRITE, Command is page0.va_WasherAC.val=0
30732 : [120], EVENT: WASHER#ac=0.00 Processing time:62 milliSeconds
30793 : [120], MQTT : /ESPEZ_Laundry/WASHER/ac 0
31095 : [120], SYS  : 15040.00
31097 : [116], EVENT: FreeRam#ram=15040.00
31117 : [120], EVENT: FreeRam#ram=15040.00 Processing time:20 milliSeconds
31971 : [120], WD   : Uptime 1 ConnectFailures 0 FreeMem 15320
32863 : LoopStats: shortestLoop: 36 longestLoop: 1757031 avgLoopDuration: 54.40 loopCounterMax: 833333 loopCounterLast: 527198 countFindPluginId: 0,snip!
32865 : PluginStats P_1_Switch input - Switch READ                 Count: 3 Avg/min/max 592.67/591/594 usec,snip!
32875 : PluginStats P_1_Switch input - Switch ONCE_A_SECOND        Count: 30 Avg/min/max 22.80/15/66 usec,snip!
32885 : PluginStats P_1_Switch input - Switch TEN_PER_SECOND       Count: 270 Avg/min/max 42.77/32/68 usec,snip!
32895 : PluginStats P_1_Switch input - Switch WRITE                Count: 3 Avg/min/max 515.00/514/517 usec,snip!
32905 : PluginStats P_1_Switch input - Switch EVENT_OUT            Count: 10 Avg/min/max 45.50/35/51 usec,snip!
32915 : PluginStats P_1_Switch input - Switch FIFTY_PER_SECOND     Count: 1319 Avg/min/max 19.82/14/45 usec,snip!
32925 : PluginStats P_26_Generic - System Info READ                 Count: 7 Avg/min/max 698.14/552/748 usec,snip!
32935 : PluginStats P_26_Generic - System Info ONCE_A_SECOND        Count: 90 Avg/min/max 7.30/3/23 usec,snip!
32944 : PluginStats P_26_Generic - System Info TEN_PER_SECOND       Count: 810 Avg/min/max 7.13/3/23 usec,snip!
32954 : PluginStats P_26_Generic - System Info EVENT_OUT            Count: 30 Avg/min/max 7.43/2/19 usec,snip!
32964 : PluginStats P_26_Generic - System Info FIFTY_PER_SECOND     Count: 3957 Avg/min/max 6.05/2/23 usec,snip!
32974 : PluginStats P_71_Display - Nextion [TESTING] READ                 Count: 1 Avg/min/max 21051.00/21051/21051 usec,snip!
32985 : PluginStats P_71_Display - Nextion [TESTING] ONCE_A_SECOND        Count: 30 Avg/min/max 8.60/4/23 usec,snip!
32995 : PluginStats P_71_Display - Nextion [TESTING] TEN_PER_SECOND       Count: 270 Avg/min/max 88.16/78/124 usec,snip!
33006 : PluginStats P_71_Display - Nextion [TESTING] WRITE                Count: 3 Avg/min/max 29707.33/29694/29719 usec,snip!
33017 : PluginStats P_71_Display - Nextion [TESTING] EVENT_OUT            Count: 10 Avg/min/max 27.20/26/31 usec,snip!
33027 : PluginStats P_71_Display - Nextion [TESTING] FIFTY_PER_SECOND     Count: 1319 Avg/min/max 4.45/2/31 usec,snip!
33038 : Load File stats: Count: 20 Avg/min/max 22521.10/854/56695 usec,snip!
33044 : Plugin call 50 p/s   stats: Count: 1319 Avg/min/max 1864.93/1486/3664 usec,snip!
33052 : Plugin call 10 p/s   stats: Count: 270 Avg/min/max 1946.29/1615/2321 usec,snip!
33060 : Plugin call 10 p/s U stats: Count: 270 Avg/min/max 3395.21/2899/4695 usec,snip!
33068 : Plugin call  1 p/s   stats: Count: 30 Avg/min/max 64210.20/2101/1756522 usec,snip!
33076 : SensorSendTask()     stats: Count: 11 Avg/min/max 78003.37/24929/131677 usec,snip!
33084 : sendData()           stats: Count: 10 Avg/min/max 63515.10/24490/127186 usec,snip!
33092 : Compute formula      stats: Count: 10 Avg/min/max 139.00/10/1295 usec,snip!
33099 : setNewTimerAt()      stats: Count: 1759 Avg/min/max 187.72/158/283 usec,snip!
33106 : timeDiff()           stats: Count: 1666387 - CPU cycles per call: 8.44,snip!
33114 : Scheduler stats: (called/tasks/max_length/idle%) 527198/1751/12/74.60,snip!
36040 : [120], SYS  : 15152.00
36042 : [116], EVENT: FreeRam#ram=15152.00
36064 : [120], EVENT: FreeRam#ram=15152.00 Processing time:22 milliSeconds
40668 : [120], SW   : State 0.00
40671 : [114], EVENT: WASHER#ac=0.00
40679 : [120], [if 0=1]=false
40681 : [120], else = true
40683 : [109], ACT  : NEXTION1,page0.va_WasherAC.val=0
40717 : [120], NEXTION075 : WRITE, Command is page0.va_WasherAC.val=0
40734 : [120], EVENT: WASHER#ac=0.00 Processing time:63 milliSeconds
40795 : [120], MQTT : /ESPEZ_Laundry/WASHER/ac 0
40834 : [120], NEXTION075 : Cmd Statement Line-1 Sent: page0.j1.val=45
40857 : [120], SYS  : -77.00
40860 : [120], EVENT: RSSI#signal=-77.00
40882 : [120], EVENT: RSSI#signal=-77.00 Processing time:21 milliSeconds

  • Thomas

This is really great, since I tried it a while back and then I could not get a reliable detection whether the output was read or not.

My idea was to temporary disable log to serial when the buffer stays full.

This is really great, since I tried it a while back and then I could not get a reliable detection whether the output was read or not.

@TD-er: I believe the ESP8266's USB serial port still sends the Tx data if there's nothing reading it.

I made a mistake in the log message trimming code. It was just adding _trim!_ at the end of the offending log messages. I've edited my previous post and it has the code corrections.

Now I can see that the problem is much worse than expected. There are long periods of possible code blocks if the log messages are not trimmed. Here's a revised log excerpt (serial log baud rate set to 115.2K):

122828 : [119], EVENT: NEXTION#idx=16.00
122861 : [119], [if 16>=10 and 16<=30]=true
122873 : [119], ACT  : Publish /ESPEZ_Laundry/NEXTION/idx,16
122899 : [119], [if 16>=500]=false
122933 : [119], EVENT: NEXTION#idx=16.00 Processing time:105 milliSeconds
122935 : [60], EVENT: NEXTION#value=0.00
122958 : [119], EVENT: NEXTION#value=0.00 Processing time:23 milliSeconds
123315 : [119], NEXTION075 : Pipe Command Sent: |s,i16,sOff16.00
123335 : LoopStats: shortestLoop: 36 longestLoop: 1924098 avgLoopDuration: 58.37 loopCounterMax: 833333 loo,snip!
123336 : ,snip!
123336 : ,snip!
123337 : ,snip!
123338 : ,snip!
123340 : ,snip!
123341 : ,snip!
123343 : ,snip!
123344 : ,snip!
123346 : ,snip!
123347 : ,snip!
123349 : ,snip!
123350 : ,snip!
123352 : ,snip!
123353 : ,snip!
123355 : ,snip!
123356 : ,snip!
123358 : ,snip!
123359 : ,snip!
123361 : ,snip!
123362 : ,snip!
123364 : ,snip!
123365 : ,snip!
123366 : ,snip!
123368 : ,snip!
123369 : ,snip!
123371 : ,snip!
123372 : ,snip!
123374 : ,snip!
123375 : ,snip!
123377 : ,snip!
123378 : ,snip!
123380 : ,snip!
123381 : ,snip!
123383 : ,snip!
123384 : ,snip!
123394 : [86], Command: publish
131101 : [119], SW   : State 0.00
131104 : [114], EVENT: WASHER#ac=0.00
131115 : [119], [if 0=1]=false
131118 : [119], else = true
131120 : [108], ACT  : NEXTION,page0.va_WasherAC.val=0
131153 : [119], NEXTION075 : WRITE, Command is page0.va_WasherAC.val=0
131171 : [119], EVENT: WASHER#ac=0.00 Processing time:67 milliSeconds
131232 : [119], MQTT : /ESPEZ_Laundry/WASHER/ac 0
132691 : [119], NEXTION075 : code: |s,i26,sOff
132697 : [119], EVENT: NEXTION#idx=26.00
132729 : [119], [if 26>=10 and 26<=30]=true
132741 : [119], ACT  : Publish /ESPEZ_Laundry/NEXTION/idx,26
132769 : [119], [if 26>=500]=false
132802 : [119], EVENT: NEXTION#idx=26.00 Processing time:104 milliSeconds
132803 : [60], EVENT: NEXTION#value=0.00
132827 : [119], EVENT: NEXTION#value=0.00 Processing time:24 milliSeconds
132831 : [86], NEXTION075 : Pipe Command Sent: |s,i26,sOff26.00
141100 : [119], SW   : State 0.00
141103 : [115], EVENT: WASHER#ac=0.00
141115 : [119], [if 0=1]=false
141118 : [119], else = true
141119 : [109], ACT  : NEXTION,page0.va_WasherAC.val=0
141152 : [119], NEXTION075 : WRITE, Command is page0.va_WasherAC.val=0
141172 : [119], EVENT: WASHER#ac=0.00 Processing time:69 milliSeconds
141233 : [119], MQTT : /ESPEZ_Laundry/WASHER/ac 0
141252 : [119], Command: publish
141298 : [119], SYS  : -80.00
141302 : [119], EVENT: RSSI#signal=-80.00
141326 : [119], EVENT: RSSI#signal=-80.00 Processing time:23 milliSeconds
141796 : [119], MQTT : /ESPEZ_Laundry/RSSI/signal -80
142708 : [119], NEXTION075 : code: |s,i16,sOff
142714 : [119], EVENT: NEXTION#idx=16.00
142746 : [119], [if 16>=10 and 16<=30]=true
142757 : [119], ACT  : Publish /ESPEZ_Laundry/NEXTION/idx,16
142786 : [119], [if 16>=500]=false
142818 : [119], EVENT: NEXTION#idx=16.00 Processing time:104 milliSeconds
142819 : [60], EVENT: NEXTION#value=0.00
142843 : [119], EVENT: NEXTION#value=0.00 Processing time:24 milliSeconds
142847 : [87], NEXTION075 : Pipe Command Sent: |s,i16,sOff16.00
151100 : [119], SW   : State 0.00
151103 : [115], EVENT: WASHER#ac=0.00
151114 : [119], [if 0=1]=false
151117 : [119], else = true
151119 : [109], ACT  : NEXTION,page0.va_WasherAC.val=0
151152 : [119], NEXTION075 : WRITE, Command is page0.va_WasherAC.val=0
151171 : [119], EVENT: WASHER#ac=0.00 Processing time:67 milliSeconds
151232 : [119], MQTT : /ESPEZ_Laundry/WASHER/ac 0
151274 : [119], NEXTION075 : Cmd Statement Line-1 Sent: page0.j1.val=45
151283 : [119], Command: publish
152013 : [119], WD   : Uptime 3 ConnectFailures 0 FreeMem 14360
152710 : [119], NEXTION075 : code: |s,i26,sOff
152713 : [108], EVENT: NEXTION#idx=26.00
152745 : [119], [if 26>=10 and 26<=30]=true
152757 : [119], ACT  : Publish /ESPEZ_Laundry/NEXTION/idx,26
152783 : [119], [if 26>=500]=false
152817 : [119], EVENT: NEXTION#idx=26.00 Processing time:103 milliSeconds
152818 : [60], EVENT: NEXTION#value=0.00
152842 : [119], EVENT: NEXTION#value=0.00 Processing time:24 milliSeconds
152846 : [87], NEXTION075 : Pipe Command Sent: |s,i26,sOff26.00
152901 : LoopStats: shortestLoop: 36 longestLoop: 1924098 avgLoopDuration: 55.05 loopCounterMax: 833333 loo,snip!
152902 : ,snip!
152903 : ,snip!
152903 : ,snip!
152905 : ,snip!

I also tested 9600 baud with the trim code removed (reverted back to the original misc.ino). That gave repeating crash/reboots ever few seconds. That's the smoking gun we needed!

Re-installing the trim code allowed 9600 baud to run. But the log messages are severely trimmed. Serial log baud rates under 57.6K baud may prove to be a problem. Plus we really need a sufficiently sized local software circular buffer dedicated to serial log Tx.

Latest status:
I am using your PR with the scheduler patch (Aug 6). From what I have seen it has fixed the Panic crash/reboot issue. Thanks!

Next on the list is a practical fix on the serial log's Tx buffer. That is to say, something more elegant than trimming useful log messages.

Finding and fixing these crash provoking bugs is like peeling an onion. Hopefully there's not many remaining layers to peel.

  • Thomas

@TD-er I know that you found in some experiments that there should be a serial reader for your output.
However, afaik there is no explaination for that behaviour. There is a technique that is called handshaking https://www.codrey.com/embedded-systems/rs232-serial-communication/#What_is_Handshaking but it is not used by the ESP and can merely slow down the transmission. It would not lead to any crashes.

I think @thomastech is on to something. The TX buffer overflow. Of course it makes no sense to write data faster than it can be sent through the wire. This would cause the TX buffer to overflow. Data is lost then.
We should have a strategy for handling this. It could be:

  • delay the system and just wait for the buffer to become available (gives uncertain timing)
  • enlarge the buffer (see #1588) (uses memory and is still not guaranteed to work)
  • trim the messages by half (loss of information)
  • smart trim ( trimm more the fuller the buffer is- e.g. divide free buffer by 3 or so )

Other than that there may also be a bug in the core. There is a new release out: https://github.com/esp8266/Arduino/releases/tag/2.4.2 Maybe try this.

I would opt for option 1 (delay the system) as we do already have the timing checks and could flag chatty pieces of code ;-) Maybe use the last 10 bytes of the buffer to send a warning ?

We already have a log buffer for the web log viewer.
That piece of code should be changed a little to allow to get new lines since the last log timestamp (given as argument by the requester).
Then this buffer can also be placed in front of the serial output.

And I will try to move my 'chatty' piece of statistics code to a JSON web output. That's easier to request :)

The TX buffer overflow. Of course it makes no sense to write data faster than it can be sent through the wire. This would cause the TX buffer to overflow. Data is lost then.

@s0170071 : Fortunately there's never any Tx data loss with the ESP8266. Unfortunately, while the buffer is full it code blocks until the buffer empties out, then execution continues. It's these code blocking periods that provoke a crash.

I would opt for option 1 (delay the system)

I am working on the delay strategy. It would be a good temporary solution. But for fastest ESPEasy performance I think a big serial buffer would be nice. Plus we need to prevent the user from enabling slow serial log baud rates.

Speaking of serial log, I'm finding some issues with the Tools->Advanced->Baud setting. The highest baud that works for me is 115.2K. Setting a higher baud seems to do nothing, it remains at 115.2K. And changing the baud requires a reboot. When the dust settles I'll either open a bug ticket or submit a fix.

We already have a log buffer for the web log viewer. ... Then this buffer can also be placed in front of the serial output.

@TD-er: Very nice.

  • Thomas

@thomastech Just to be clear (you probably already got the idea), my idea is to use the same buffer, not just another instance of that buffer.

@TD-er: That's what I hoped you would do. But I have never been a fan of the web log. For me, it is slow and sometimes unreliable. And if the serial log is synced to the web log output then I fear those issues will affect serial log too.

I implemented the "delay" workaround for the serial log. The only log message that is trimmed is the overly verbose loopstats. All other messages are intact. Works with 9600 baud too, which is a worst case condition for filling the Tx buffer.

But TD-er will make this workaround obsolete, so no need to post it.

For now I'm going to flash the modified log code into both of my NodeMCU's and let them run for a few hours. I'm doubt it's eliminated all sources of the crash/reboots, but I expect an improvement.

  • Thomas

The web log view isn't perfect and has a few technicalities to deal with, the serial port doesn't.
For example, requesting a JSON file containing the log takes between 50 and a few 100 msec, mainly depending on the load of the ESP and the size to output (and CPU-power, network connectivity and presence of virusscanner on the viewing side, the browser)
This means the log on the web-side (spelling with a 'd' is intended) of things can only be updated every now and then, which will lead to lost log lines.
This output is per N lines.

The log output on the serial port can update per character and continue as soon as the buffer has a character free to write to. Well you may want to wait to do a bit more in bursts to be non blocking, but you get the idea.
So I am convinced the same buffer will perform better on the serial output than it does on the web-viewer.

@TD-er : Sounds promising. Thanks for working on this!

BTW, I noticed that the web log excludes some messages that are seen in Serial Log. For example, the _Debug_ setting skips PluginStats and LoopStats. Probably deliberate because they consume a lot of buffer space and flood the log.

My gut feeling is that the serial log would need to have N-lines increased to 35 if heavy hitters like PluginStats are allowed. I saw that line length is 128, which is ideal for the ESP8266's Tx FIFO (also 128 bytes). A couple log messages will need to be altered so they can fit without being post-trimmed.

BTW, when I type messages here I often lose the cursor and have to fly blind. Is there any keyboard tricks to make the cursor come back while I finish typing my post?

  • Thomas

The cursor comes back of you toggle the preview once again.

The log buffer is now working using N lines of M characters (fixed length)
N is depending on the build type (normal/dev/testing/ESP32).
So it is wasting a lot of memory on this, due to its fixed size.

My ideas to improve this buffer:
I will make it either using a circular buffer, or something more dynamic.
Messages will be kept up to 2 or 3 seconds, or mainly limited by some max. amount of memory used and the log-reader (web/serial) will have to send the timestamp of the last read log line. This allows lost requests, which happen sometimes on the web log viewer.
And you can have more log readers, like a second web client connected, or even new output methods like MQTT. There is already a syslog, but that one also cannot keep up at some times. So that one may also benefit from such a new log buffer.

The bursty statistics should not be sent via the log buffer, but rather generated on the fly when queried. The data structure containing all timings is much smaller than the lines themselves.
Output via JSON on the web interface should be much more memory efficient.

@s0170071: Thanks for the lost cursor tip.

@TD-er: You ideas sound amazing! If you are looking for a show of hands, I like circular buffers. Simple and efficient. But it's your baby, so choose what you feel is best.

  • Thomas

Circular buffers are indeed simple, as long as they are used with a fixed object size.
But here we have variable log lines, so looking for new space may involve deleting multiple log lines to make space.
The log buffer object also needs an index to keep track of the start of each log entry and store the timestamp for easy access.
So it is already a bit more complicated than just a read and a write pointer.

I am currently looking into the MessageDelay variable.
This variable does have a lot of impact on the timing of just about anything sent to a controller.
My idea is to strip this variable and place it as a setting in any controller.
This will also involve a small buffer per controller to keep up-to N messages.
So the controller itself has to manage the minimal interval between sending values to the other end of the controller.
Just another use of the scheduler to keep things run smooth and get rid of this horrible delayBackground, which will crash the ESP if it is delaying long enough.
See https://github.com/letscontrolit/ESPEasy/issues/1600
This current MessageDelay value ("Message Interval" in Advanced settings) has effect on all calls to sendData and is also being used on a number of protocols, which also happen to appear in a number of issues reported here and on the forum.
I also suspect this issue to be related to this setting.

The log buffer object also needs an index to keep track of the start of each log entry and store the timestamp for easy access.
So it is already a bit more complicated than just a read and a write pointer.

I was thinking that you would store the fully annotated log message in the circular buffer. That is, store the entire line, like this:

    String msg = String(millis());
    msg += F(" : ");
    msg += line;         // Log Message String
    msg += F("\n");

There's no need to keep track of anything special. All the stored log data in the circular buffer will be formatted and ready for print.

I am currently looking into the MessageDelay variable.

The MessageDelay has come up in your other discussions. More peeling the onion, more fixes to do.

  • Thomas

Stability has improved with the temporary serial log fix. So that is a relief.

Over the past 18 hours I've experienced two exception (28) crashes; I don't recall seeing those before. But no more yield panics or watchdog related reboots. Should we close this "Panic" issue ticket and start a new one for the serial log induced crashes?

Regarding exception 28, the docs say it is LoadProhibitedCause: _A load referenced a page mapped with an attribute that does not permit loads_. So it's a memory related issue, such as invalid pointer.

I installed ESPExceptionDecoder (https://github.com/me-no-dev/EspExceptionDecoder). After a lot of frustration I got it to run. But the decoded stack trace it provides seems to be nonsense. Other users have experienced this too. So back to square one, looking for suggestions on how to debug it.

  • Thomas

If you self compiled the source the decoder usually works fine.

@s0170071: I have self compiled the code.

The file names and line numbers that it gives are nonsense. I found this:
https://github.com/jcmvbkbc/binutils-gdb-xtensa/issues/5

It also references file names to plugins I don't use (example, Plugin_048). Even if the information is correct, finding the reason for the exception in the stack trace would be like finding a needle in a haystack.

EDIT: It seems that this problem was fixed in EspExceptionDecoder version 1.1.0. But I can't get that version to run, so like many others I downgraded to version 1.0.6. The old version is the reason for the nonsense stack trace.

Just to give a peek at my haystack:

0x4010020c: _umm_free at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1291
0x4010053d: _umm_realloc at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1495
:  (inlined by) realloc at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1715
0x4021b86b: parseTemplate(String&, unsigned char) at C:\Users\Tom\AppData\Local\Temp\arduino_build_934241\sketch\Commands/Tasks.h line 49
0x4025f977: String::changeBuffer(unsigned int) at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/WString.cpp line 714
0x4025f9c3: String::reserve(unsigned int) at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/WString.cpp line 714
0x4025f9f5: String::copy(char const*, unsigned int) at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/WString.cpp line 714
0x4025fa42: String::String(char const*) at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/WString.cpp line 714
0x4020cad2: parseString(String const&, unsigned char, bool, bool) at C:\Users\Tom\AppData\Local\Temp\arduino_build_934241\sketch\Commands/Tasks.h line 49
0x4010053d: _umm_realloc at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1495
:  (inlined by) realloc at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1715
0x4020cb3e: parseString(String const&, unsigned char) at C:\Users\Tom\AppData\Local\Temp\arduino_build_934241\sketch\Commands/Tasks.h line 49
0x40225a62: Plugin_048(unsigned char, EventStruct*, String&) at C:\Users\Tom\AppData\Local\Temp\arduino_build_934241\sketch\Commands/Tasks.h line 49
0x4010020c: _umm_free at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1291
0x40100690: free at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1739
0x4025f91c: String::~String() at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/WString.cpp line 714
0x40236e44: Plugin_042(unsigned char, EventStruct*, String&) at C:\Users\Tom\AppData\Local\Temp\arduino_build_934241\sketch\Commands/Tasks.h line 49
0x40100600: _umm_realloc at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1595
:  (inlined by) realloc at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1715
0x402a1160: sleep_reset_analog_rtcreg_8266 at ?? line ?
0x4025f960: String::changeBuffer(unsigned int) at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/WString.cpp line 714
0x4025f977: String::changeBuffer(unsigned int) at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/WString.cpp line 714
0x4029f2d4: sleep_reset_analog_rtcreg_8266 at ?? line ?
0x4010020c: _umm_free at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1291
0x40100690: free at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1739
0x4025f91c: String::~String() at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/WString.cpp line 714
0x402178f4: PluginCall(unsigned char, EventStruct*, String&) at C:\Users\Tom\AppData\Local\Temp\arduino_build_934241\sketch\Commands/Tasks.h line 49
0x4025f9c3: String::reserve(unsigned int) at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/WString.cpp line 714
0x4025f9f5: String::copy(char const*, unsigned int) at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/WString.cpp line 714
0x40242b34: processMatchedRule(String&, String&, String&, String&, bool&, bool&, bool&, bool&, bool&, bool&, bool&) at C:\Users\Tom\AppData\Local\Temp\arduino_build_934241\sketch\Commands/Tasks.h line 49
0x40206c00: spiffs_phys_rd at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\spiffs/spiffs_cache.c line 131
0x40100690: free at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1739
0x40242e10: parseCompleteNonCommentLine(String&, String&, String&, bool&, bool&, bool&, bool&, bool&, bool&, bool&) at C:\Users\Tom\AppData\Local\Temp\arduino_build_934241\sketch\Commands/Tasks.h line 49
0x4025f9c3: String::reserve(unsigned int) at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/WString.cpp line 714
0x4010020c: _umm_free at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1291
0x4024302d: rulesProcessingFile(String, String&) at C:\Users\Tom\AppData\Local\Temp\arduino_build_934241\sketch\Commands/Tasks.h line 49
0x40107b10: sntp_get_current_timestamp at ?? line ?
0x4025faf8: String::operator=(String const&) at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/WString.cpp line 714
0x402431af: rulesProcessing(String&) at C:\Users\Tom\AppData\Local\Temp\arduino_build_934241\sketch\Commands/Tasks.h line 49
0x4025ffa0: String::concat(__FlashStringHelper const*) at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/WString.cpp line 714
0x40243d99: createRuleEvents(unsigned char) at C:\Users\Tom\AppData\Local\Temp\arduino_build_934241\sketch\Commands/Tasks.h line 49
0x4010020c: _umm_free at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1291
0x40100690: free at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1739
0x40245675: sendData(EventStruct*) at C:\Users\Tom\AppData\Local\Temp\arduino_build_934241\sketch\Commands/Tasks.h line 49
0x40265d14: strtod at /Users/igrokhotkov/e/newlib-xtensa/xtensa-lx106-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/strtod.c line 1261
0x402637a4: atof at /Users/igrokhotkov/e/newlib-xtensa/xtensa-lx106-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/atof.c line 72
0x4024b433: Plugin_075(unsigned char, EventStruct*, String&) at D:\Lang\Arduino\Projects\ESPEasy\NodeMCU\Laundry_Buddy\Source\src\ESPEasySchedulerTest\ESPEasy/_P075_Nextion.ino line 571
0x4010020c: _umm_free at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1291
0x40100690: free at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1739
0x4010020c: _umm_free at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1291
0x40104ec9: ets_timer_disarm at ?? line ?
0x4025f91c: String::~String() at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/WString.cpp line 714
0x402868a6: umm_info at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1098
0x4010020c: _umm_free at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1291
0x40100690: free at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1739
0x40214fca: timer_id_couple::operator(timer_id_couple const&) at C:\Users\Tom\AppData\Local\Temp\arduino_build_934241\sketch\Commands/Tasks.h line 49
:  (inlined by) std::list   >::merge(std::list   >&&) at c:\users\tom\appdata\local\arduino15\packages\esp8266\tools\xtensa-lx106-elf-gcc\1.20.0-26-gb404fb9-2\xtensa-lx106-elf\include\c++\4.8.2\bits/list.tcc line 304
0x4021508c: std::list   >::sort() at C:\Users\Tom\AppData\Local\Temp\arduino_build_934241\sketch\Commands/Tasks.h line 49
0x401071ee: __digitalRead at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/core_esp8266_wiring_digital.c line 95
0x40210545: Plugin_001_read_switch_state(EventStruct*) at C:\Users\Tom\AppData\Local\Temp\arduino_build_934241\sketch\Commands/Tasks.h line 49
0x40245fb2: Plugin_001(unsigned char, EventStruct*, String&) at D:\Lang\Arduino\Projects\ESPEasy\NodeMCU\Laundry_Buddy\Source\src\ESPEasySchedulerTest\ESPEasy/_P001_Switch.ino line 175
0x4010020c: _umm_free at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1291
0x402868cc: umm_free_heap_size at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1745
0x40207b90: xPortGetFreeHeapSize at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/heap.c line 162
0x402902b3: system_get_free_heap_size at ?? line ?
0x4025e14c: EspClass::getFreeHeap() at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/Esp.cpp line 418
0x4025faf8: String::operator=(String const&) at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/WString.cpp line 714
0x4020b7c0: RamTracker::registerRamState(String&) at C:\Users\Tom\AppData\Local\Temp\arduino_build_934241\sketch\Commands/Tasks.h line 49
0x4010020c: _umm_free at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1291
0x40100690: free at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266\umm_malloc/umm_malloc.c line 1739
0x4025f91c: String::~String() at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/WString.cpp line 714
0x4029b400: sleep_reset_analog_rtcreg_8266 at ?? line ?
0x4020bd42: checkRAM(__FlashStringHelper const*, int) at C:\Users\Tom\AppData\Local\Temp\arduino_build_934241\sketch\Commands/Tasks.h line 49
0x40217a4d: PluginCall(unsigned char, EventStruct*, String&) at C:\Users\Tom\AppData\Local\Temp\arduino_build_934241\sketch\Commands/Tasks.h line 49
0x4020d7e7: usecPassedSince(unsigned long) at C:\Users\Tom\AppData\Local\Temp\arduino_build_934241\sketch\Commands/Tasks.h line 49
0x402156c0: setNewTimerAt(unsigned long, unsigned long) at C:\Users\Tom\AppData\Local\Temp\arduino_build_934241\sketch\Commands/Tasks.h line 49
0x40260a30: optimistic_yield at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/core_esp8266_main.cpp line 57
0x402437a1: run10TimesPerSecond() at C:\Users\Tom\AppData\Local\Temp\arduino_build_934241\sketch\Commands/Tasks.h line 49
0x40215789: setIntervalTimer(unsigned long, unsigned long) at C:\Users\Tom\AppData\Local\Temp\arduino_build_934241\sketch\Commands/Tasks.h line 49
0x4025089d: process_interval_timer(unsigned long, unsigned long) at D:\Lang\Arduino\Projects\ESPEasy\NodeMCU\Laundry_Buddy\Source\src\ESPEasySchedulerTest\ESPEasy/Scheduler.ino line 105
0x4025098d: handle_schedule() at D:\Lang\Arduino\Projects\ESPEasy\NodeMCU\Laundry_Buddy\Source\src\ESPEasySchedulerTest\ESPEasy/Scheduler.ino line 53
0x40250ea1: loop at D:\Lang\Arduino\Projects\ESPEasy\NodeMCU\Laundry_Buddy\Source\src\ESPEasySchedulerTest\ESPEasy/ESPEasy.ino line 481
0x40260995: esp_schedule at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/core_esp8266_main.cpp line 57
0x402609c0: loop_wrapper at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/core_esp8266_main.cpp line 57
0x40100710: cont_norm at C:\Users\Tom\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.4.1\cores\esp8266/cont.S line 109

My inexperience with this tool is likely the biggest show stopper. Unfortunately I can't find any online information that helps educate me on how to investigate the bug in these stack details.

  • Thomas

Such errors may very well be caused by some memory allocation that failed.
For example, when I try to allocate some memory and assign it to a pointer, it may fail to do so if there isn't enough free memory.
When the result of this allocation is not checked (look in the code, it is almost never checked), crashes like these may occur when such an invalid pointer is being used.

That's just one example of what may happen to get exception 28.
It can also be some pointer to an element in a string or vector. But when that container (string/vector) is increased, the pointer may become invalid, since those containers usually allocate a larger block, copy all data over to the new block and delete the old block. Any pointer to that old block has become invalid. It may still point to some of the old data, so it may seem to work OK, but as soon as another allocation is done, that block may be re-used.
If you were just trying to copy a char array, that copy may try to continue until a '0' is found. But with the new data it may be that '0' takes a while and in the meantime you're writing beyond the allocated space of the target memory structure.

These exceptions are the worst, since they are:

  • Hard to reproduce
  • Hard to get a proper stack trace (the bug may even overwrite code, so you'll end up somewhere else)
  • Seem to occur at other (unrelated) locations in the code.

But judging your stack trace, I guess it is a combination of the two I mention above and caused by something in the parseTemplate function.

@TD-er: Thanks for your insight into my stack trace. Seems like with each incremental success, something else comes along that creates a detour from achieving stable operation.

  • Thomas

My ESPEasy device achieved a major milestone; 40 hours of continuous operation (and still going) without a reboot.

But to earn this success I had to build a special ESPEasy version that only has the plugins I need. All unused plugins were removed from the define_plugin_sets.h file.

The list of plugins used in my build are:

    #define USES_P001   // Switch
    #define USES_P026   // SysInfo
    #define USES_P033   // Dummy
    #define USES_P075   // Nextion
    #define USES_P045   // MPU6050

The FreeRam memory is about 20KB. Controller is OpenHab MQTT. Message Interval is 500mS. Rules and NPT are enabled. Serial log is disabled (but has my "delay" patch).

This solution is drastic and not practical for everyone. But until the exception 28 issue is resolved I think it's the best temporary workaround for my installation.

  • Thomas

Since this is about MQTT and WDT resets...

Yesterday I added a delay(1) to the readByte part of MQTT client PubSubClient.
Can you please test if this is now still an issue?

I'll test it for a few days (or until reboot) and report back.

BTW, some feedback on testing recent DEV releases:
mega-20181015: reboots don't occur very often.
mega-20181011: Lots of reboots.
mega-20181004: Has been the best for me so far (one device went a week without reboots), but 20181015 is a close second and may surpass it.

  • Thomas

Feedback on _ESP_Easy_mega-20181023_dev_ESP8266_4096.bin_
Today one of my devices rebooted due to Hardware Watchdog Reset. Details:
https://github.com/letscontrolit/ESPEasy/issues/1656#issuecomment-433089325

  • Thomas

Is this one now fixed?
Please test with latest build, which will be made tonight. (20181128)

No problem, I'll test the 20181128 pre-compiled build when it becomes available. I've tested most of the previous builds and all experienced one or more reboots each day. I'm currently running 20181124 on two nodeMCU's and one has had 8 resets and the other 12 resets so far.

The web GUI reports the random manual resets caused by software or hardware watchdog (varies). But the GUI's reset reason has never specifically stated a panic yield reset. So either this specific reset reason is not being reported in the GUI or the panic yield issue is fixed.

  • Thomas

It might be the panic yield is back and if so, please report that also.
Then I know what to add ;) (very simple fix if it is needed)

I can do that if you add reset cause "panic yield" to the web GUI System Info page. That is to say, I don't think it specifically reports it with the current code.

I don't want to use serial log for monitoring panic yield. That is because serial log usage introduces more resets and skews the testing.

  • Thomas

Good point.
A crash as result of yield panic is hard to recognize as reboot cause.
At least it differs from the Watchdog resets, but I guess it is just an "exception" in the reboot description.

Upcoming build will have a lot more free stack, so let's hope things work out even better now :)

Looking forward to checking it out.

  • Thomas

For the last few days I've been testing ESP_Easy_mega-20181207_test_ESP8266_4096.bin on two NodeMCU's. I still experience reboots.

I'm using the OpenHab MQTT controller. Recently I reduced the Controller Message interval (Advanced Settings) to 75mS (was 100mS) and so far I've seen less reboots. If this truly did cause an improvement then it is suggesting that the MQTT controller is involved in my reboot issue. Perhaps there's some code blocking during MQTT communication that is hanging the core too long.

I've setup my rules so that I get an email when a reboot occurs. I would like to have the email include the reboot reason. I've looked at the system variable list and did not find one for the reboot cause. Is there a hidden Easter egg variable that provides the reboot error code? If not, no worries.

  • Thomas

I have continued to test new releases. Still experiencing random reboots every day. I no longer log the reboot messages, so I cannot say if they are related specifically to the "yield panic" exception.

I recommend closing this old issue. There are now several open tickets on ESPEasy's random reboot problems. No need for multiple open cases on it.

  • Thomas

It looks like you're also following other issues, but I would suggest you also test some of the newly added WiFi related experimental settings (Tools => Advanced, at the bottom)

I previously tried the Tools=Advanced->Force WiFi B/G and did not observe any improvement to the reboot issues. Any other settings you think I should try?

  • Thomas

The Gratuitous ARP.
It does send ARP packets to announce its own MAC/ARP combination to make sure the switch/AP does know what MAC can be reached via what port on the switch.
One of the (many) issues is that the ESP is sometimes missing the ARP broadcasts and a typical time for a switch to start forgetting MAC addresses is 30 seconds.

By sending those "answers to questions nobody asked for" you make it less likely the ESP cannot be reached anymore.

_Gratuitous ARP_ is currently enabled (default setting). Do you mean you want me to disable it?

BTW, I read the comments in another issue ticket about the possibility of voltage drop during WiFi startup. So a few minutes ago I reworked one of my NodeMCU's as follows:
Added 10uF tantalum to ESP8266MODH module's Vcc Pin-8, short leads.
Added heavy jumper wire from 3.3V linear VReg output pin to Vcc Pin-8 on ESP8266MODH module.
Added 10K Pullup on RST (recommended by a random ESP8266 user).

I will be surprised if these mods help, but it's only 10 minutes work. So worth checking out.

  • Thomas

Nope, leave it on.
And about the voltage drop, it is indeed quite a strong current peak the ESP uses during RF calibration at startup.
The peak can be as high as 500 mA (very short peaks) and the RF calibration is relating its parameters to the seen "3v3" during this calibration. So a dip in voltage can result in incorrect RF calibration.

I've also seen some strange issues with missing packets as soon as you enable the "eco mode" I added recently. This eco mode is nothing more than calling delay() during the idle time of the scheduler.

Ok, thanks for the advice. I'll continue to monitor the problem. If any miracles occur I will let you know.

  • Thomas

Miracles probably deserve a new issue, don't you think?

Miracles on the reboot issue would deserve a celebration with beverages and party hats.

Unfortunately my NodeMCU with the latest code and hardware mods rebooted a few minutes ago. So the miracle didn't happen yet.

  • Thomas

Hmm too bad :(
Well at least that's one other thing we can move down on the list of most likely causes.

Is this still an issue?
We also now have the ADC plugin no longer reading the internal ADC value while WiFi connection is being setup.

Reboots have been an ongoing issue. But after updating one of my NodeMCU's with ESP_Easy_mega-20191003_test_core_260_sdk222_alpha_ESP8266_4M1M.b last week things are looking much better. No reboots on this test subject in 6 days! A new record for this device.

I'd suggest closing this issue since it's over a year old.

  • Thomas

Great and maybe you can already order the "beverages and party hats." ;)

Great and maybe you can already order the "beverages and party hats." ;)

I was about to do that, but I discovered that the new release has broken the Nextion plugin (sends NULLS instead of idx data). So I'm back to the older ESP_Easy_mega-20190830_test_core_260_sdk3_alpha_ESP8266_4M.bin.

  • Thomas

Can you make an issue for that? Then I can have a look. (a bit more descriptive than this :) )

Creating an issue ticket will need to wait for some spare time (the required list of info takes a lot of time to prepare).

But here's a head start. I found that the problem is in the rules parsing. I have rule that publishes the Nextion's IDX data to my Openhab2 broker. When it sends this:
Publish /%sysname%/NEXTION/idx,[NEXTION#idx]
It sends it twice, first with idx NULL data, then ~60mS later is sends it again with valid idx data. Just to be clear, this rule should only publish the data once, not twice.

My OpenHab log shows this:
_2019-10-11 08:17:23.396 [WARN ] [b.core.events.EventPublisherDelegate] - given new state is NULL, couldn't post update for 'NextionIDX'_
_2019-10-11 08:17:23.462 [vent.ItemStateChangedEvent] - NextionIDX changed from 17 to 26_

The problem is averted if I do this:
let,10,[NEXTION#idx]
Publish /%sysname%/NEXTION/idx,[VAR#10]

The previous release (ESP_Easy_mega-20190928_test_core_260_sdk222_alpha_ESP8266_4M1M) does not have this problem. The change log on the new release mentions several changes to the rules parser. Seems to me those changes have broken something.

  • Thomas

I created a new issue about it.
See #2657
As soon as you may have new info. please let me know.

Was this page helpful?
0 / 5 - 0 ratings