Espeasy: Feature request: show stack size on main page

Created on 29 Sep 2018  路  25Comments  路  Source: letscontrolit/ESPEasy

it seems very much that the cont stack is the root of a lot of stability issues.
I put a printf in the main loop did some testing:

after reset, rules disabled: 1600 bytes free.
after reset, rules enabled but empty: 1600 bytes free.
after reset, rules enabled, some rules that not fire: 1312-1296 bytes free.
------- using this config, push the notifications tab: 1188 bytes
------- hit edit: 1136 bytes free.
------- select email 624 bytes free.

if you want to get lower than that, use self-triggering rules. They drive the stack to 0. Not sure why.
The stack does never recover.

I tracked the memory loss to the spiffs write commands. Writing single bytes in a loop is worse than writing the whole bunch. (f.write(*ptr, size))

Once your low enough on stack the ESP gets all sort of exceptions pointing to various locations in the code. Kernel panic outputs I've seen too. I would suggest to

  1. find out what spiffs is doing with the stack. Is ist a bug ? Is there a workaround ?
  2. monitoring the stack and when it goes below 500 send a message. 500 is just good enough to leave a hint in a spiffs file and show it on the next occasion.
  3. in general: give the stack the same amount of attention like the free heap size.
  4. find out why self triggering rules ruin the stack totally. Are there maybe nested spiffs accesses ?
  5. show the stack size next to the free memory on the main page and include it in the checkram routine.
Stabiliy Bug

Most helpful comment

For the ESP8266 I use something like:

extern "C" {
  #include <user_interface.h>
  #include <cont.h>
  extern cont_t* g_pcont;
}

void show_stack()
{
#ifdef ESP8266
  int stack_size = cont_get_free_stack(g_pcont);
  char *head = (char *) g_pcont->stack;
  char *stack_end = (char *) g_pcont->stack_end;
  char * pstack_curr = (char *) &stack_size;

  if (cont_check(g_pcont) != 0) {
    // stack overflow
  }
 etc.

The g_pcont variable is defined in core_esp8266_main.cpp
In cont.h : #define CONT_STACKSIZE 4096
In cont_util.c the code is self explanatory.

All 25 comments

What command do you use for determine the stack? Only ESP.getFreeContStack() ?
Just as reference, a link to @devyte 's explanation: https://github.com/esp8266/Arduino/issues/5148#issuecomment-424329183

Yes. That i use.

The stack does never recover.

Note that this stack function returns the stack high water mark, that is the highest mark the stack has ever gone. Therefore this value should only 'recover' after reboot. The only thing you know about the current stack size is: it is less or equal. Make a function call and look at the address of a local variable in that function if you want to know about the current stack size.

@sakinit good point. What is the address range that the stack may occupy ?

We could use a call in the loop function as baseline for starters?
It still allows to find jumps while in function calls.
I suspect the rules to use a lot of the stack.

For the ESP8266 I use something like:

extern "C" {
  #include <user_interface.h>
  #include <cont.h>
  extern cont_t* g_pcont;
}

void show_stack()
{
#ifdef ESP8266
  int stack_size = cont_get_free_stack(g_pcont);
  char *head = (char *) g_pcont->stack;
  char *stack_end = (char *) g_pcont->stack_end;
  char * pstack_curr = (char *) &stack_size;

  if (cont_check(g_pcont) != 0) {
    // stack overflow
  }
 etc.

The g_pcont variable is defined in core_esp8266_main.cpp
In cont.h : #define CONT_STACKSIZE 4096
In cont_util.c the code is self explanatory.

Where do you suggest to call all these functions?
For example inside some function you would expect to be using a lot from the stack?
Or just as an extra in the checkRAM function?

I used the second option. I would not use cont_get_free_stack in a checkRAM function as it searches every time for the high water mark. However, comparing the current and head to see how many stack is still available is light weight.
I'm not that familiar with the code to know if the checkRAM matches the points also using the most stack space. This does not necessarily need to coincide.
The function cont_get_free_stack might be called and displayed in runEach30Seconds like the RAM. However, it already might be too late and it does not tell much more than a straight out crash at panic. What is the additional gain? Stack is not about memory leaks like the heap one has to keep tracking, but much more a local issue.
In recursive calls I would opt for stack monitoring combined with bailing out before the stack runs out.

Looking at the stack high water mark in the loop function might be interesting during code development.

... and before spiffs access. Watermark or not, the spiffs operations caused a 500 bytes change of that mark. I think it's fair to assume that it requires these 500 bytes each time.

So we have to make sure the stack has 500 bytes available when reading or writing flash.

For inspiration, see this issue and code changes:
https://github.com/esp8266/Arduino/issues/2557

Edit:
Candidates to move to heap allocated:

  ESP8266WebServer WebServer(80);
  ESP8266HTTPUpdateServer httpUpdater(true);
WiFiClient mqtt;
PubSubClient MQTTclient(mqtt);

Hmmm. Maybe use spiffs with new() ?

And the structs given to the read/write routines, I guess they are also created on the stack.

For example the one giving crashed on ESP32 when saving:
char P036_deviceTemplate[P36_Nlines][P36_Nchars];

Write takes a pointer to the buffer. I don't think the contents of the buffer are copied to the stack...

Write takes a pointer to the buffer. I don't think the contents of the buffer are copied to the stack...

But where the call is made, the object given may be created on the stack.

If we know the address range of the heap or stack, I could add a check to read/write routines to show what calls are made on an object allocated on the stack.

We know, for ESP8266 stack, look at gp_cont; it is from head to stack_end, starting at the end growing to head, see https://github.com/letscontrolit/ESPEasy/issues/1824#issuecomment-425668252.

I added the stats like requested in this issue (See PR #1829 ), but since it is now more clear where the main issues might be, I will later this evening look into the code suggested by @sakinit to see if we also can add a stack overflow detection and check for large stack allocations when calling read/write operations to the SPIFFS.

With respect to https://github.com/letscontrolit/ESPEasy/issues/1824#issuecomment-425678272, note that these variables are global variables, thus they are neither on heap nor on stack but in global ram.

And the structs given to the read/write routines, I guess they are also created on the stack.

For example the one giving crashed on ESP32 when saving:
char P036_deviceTemplate[P36_Nlines][P36_Nchars];

Saving on ESP8266 takes about 1k3 stack when I test it with my spiffs configuration.
It triggers at 256 bytes (page size?) and when closing. I expect when the cache is full and the page has to be saved. This is much more than I expected as the SPIFFS_COPY_BUFFER_STACK is defined as 64 in the config.
In the underlying spiffs_hal_write is an optimistic_yield letting other tasks doing their jobs (and also using the stack). Checks shows that in my current test environment about 600 bytes are used during yield.
Finally, the EspClass::flashWrite (calling spi_flash_write) uses also about 600 bytes stack.

when calling read/write operations to the SPIFFS.

Tracking into spiffs is only possible with the method I used when you hack into the library. I restored the high water marks on the stack before calling spiffs to get an idea of its stack usage.

It is also added to the ESP32, but I guess for the ESP32 it only makes sense to output a single value when RTOS flag is disabled.
When enabled, it will show the value from the stack it is being called from, so then it should report for all I guess.

To show the high watermarks on the ESP32 I use vTaskList.

@sakinit I saw that one also yesterday in my quest for the simple function to get the stack usage (or stack free), but I didn't find one.
I will experiment with it, but I'm afraid it will cause other issues, since it will disable interrupts.

It is showing on the main page and has already been proven as very successful.
About implementing it for ESP32, that's for later.
I will now close this issue.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

SANCLA picture SANCLA  路  4Comments

TD-er picture TD-er  路  3Comments

thehijjt picture thehijjt  路  4Comments

TD-er picture TD-er  路  5Comments

Barracuda09 picture Barracuda09  路  5Comments