Esp-idf: Memory leak when using sockets APIs and unrelated threads.

Created on 10 Dec 2016  路  8Comments  路  Source: espressif/esp-idf

This report describes a situation I encountered when working with my ESP32 project that has taken me nearly 8 working hours to get to this level of documentation. It was a strange problem and had many potential distractions that could have contributed but I believe I have it stripped down to its smallest description.

This will be wordy report in order to fully describe and provide an understanding of what I am experiencing and have done so far to capture.

At the core, we have a memory leak. It originally manifested in my primary project when I ran a soak test on the app and it crashed due to being out of memory. I then instrumented my project with logs of the free heap space over time and, sure enough, the amount of free heap was diminishing. This then took me down the path of attempting to find where within my code the heap was being lost ... and after hours and hours of examination ... I am at a belief that the heap is being leaked in ESP32 ESP-IDF code.

Let us start at the beginning ...

I have a macro which is called "HEAP_CHANGE_START()". When called, it obtains and remembers the current heap value. There is a second macro called "HEAP_CHANGE_END()". When it is called, it compares the current heap with the last one remembered. If they are different, there has been a heap leakage and we log the changed value. If the heap hasn't changed (which is the most common case) then we increment a counter that says how many times we ran the test and the heap didn't change.

I use these in a function that looks as follows:

#include "esp32_memory.h"
#include <esp_log.h>
#include "sdkconfig.h"

static char tag[] = "heap_test";

void heap_test(void *ignore) {
    ESP_LOGD(tag, "Starting heap test");
    while(1) {
        HEAP_CHANGE_START();
        int i;
        int j=0;
        for (i=0; i<10000; i++) {
            j=j+1;
        }
        HEAP_CHANGE_END("end_item");
    }
}

This is a function that loops forever. In each iteration of the loop, we do 10000 numeric operations, just to burn time. Surrounding the 10000 operations, we examine the HEAP values to see if there has been a change between the start and the end.

When I run this logic, I get output that looks as follows:

D (760206) memory: end_item: Heap changed by 36, new size: 53616, counter: 811
D (760411) memory: end_item: Heap changed by 36, new size: 53580, counter: 798
D (760616) memory: end_item: Heap changed by 36, new size: 53544, counter: 811
D (760821) memory: end_item: Heap changed by 36, new size: 53508, counter: 812
D (761025) memory: end_item: Heap changed by 36, new size: 53472, counter: 812
D (761230) memory: end_item: Heap changed by 36, new size: 53436, counter: 811
D (761435) memory: end_item: Heap changed by 36, new size: 53400, counter: 811
D (761640) memory: end_item: Heap changed by 36, new size: 53364, counter: 812
D (761845) memory: end_item: Heap changed by 36, new size: 53328, counter: 812
D (762049) memory: end_item: Heap changed by 36, new size: 53292, counter: 812

If we look at each line, we see that in each case, the free heap has shrunk by 36 bytes. But notice also, that the heap doesn't change for EACH iteration of the loop, rather it changes at variable numbers of iterations through the loop (with the predominant number being ~812).

So what we have here is code which is reporting a heap leakage where there should be no possibility of any being leaked. I did some further digging.

Since the ESP32 is using FreeRTOS, LWIP and other technologies, a question that would appear to be needing to be asked is "what other tasks are running" ... and what I am about to describe appears to be the recipe needed to manifest the problem:

I have a standard ESP32-Template app that connects to an access point. When it has connected and we have been allocated an IP address, I then start a socket server. In the socket server I perform:

socket();
bind();
listen()
accept();
recv() <- until all data received;
close(clientSocket);
close(ServerSocket);
call heap_test(); <---- here is where the heap test loop is performed

If I do NOT perform the sockets APIs prior to calling heap_test, no leakage is found.

I am attaching a tar file (renamed as a ZIP ... rename back to .tar before extraction) containing the project as part of this issue. The version of ESP-IDF that I am using is the latest as of this date (https://github.com/espressif/esp-idf/commit/f807d16310719b0dcef07de0276934653b6e48ed)

To test, run the app and wait for it to connect and pause ...

Send in some data:

$ nc <ipaddress> 8001
Hello World
^D

And now look at the console of the ESP32.

heap.tar.ZIP

Most helpful comment

Hi,

Sorry, I didn't properly read the comments on this last week before marking as a bug. @negativekelvin's diagnosis is correct, the "heap check" while loop task is starving lower priority tasks and causing data to accumulate in background buffers, hence causing the memory leak in the test program. (The leaks I was able to trigger were on incoming packets, waiting to be processed by the server socket.)

One clue in esp-idf is the "task watchdog", which is enabled by default, picks up that the heap_task task is not yielding within the timeout period and prints watchdog timeout warnings.

This kind of bug is easy to write in RTOS environments due to real-time scheduler effects, but some good design habits will prevent it in most cases.

The best way in this case is as suggested by @negativekelvin , add a vTaskDelay(1) or similar inside any tight while() loop that has to poll rather than blocking. This isn't necessary if the system is blocking waiting on data via RTOS primitives (ie reading from a queue, or blocking on a socket as the sockets use RTOS primitives behind the scenes.) In general it's best to design systems that can use blocking rather than polling, for things like checking for free heap there is no other option.

Another way is to lower the priority of the heap checker task to run at tskIdlePriority or tskIdlePriority+1, so other system tasks will always preempt it and run. Even if doing this it is best to also add a delay, to help the low priority task coexist with other low priority tasks.

The "idle hook" functionality is another way to have something run on the CPU only when no other task is waiting to run. Or FreeRTOS timers or the tick hook gives another way to have something poll periodicially.

All 8 comments

First, you forgot to delete connection_info.h.

I confirm what you are seeing. I notice if you call esp_wifi_disconnect before running heap_test that there is no memory leak.

Also if you move heap_test to another task there doesn't appear to be a memory leak.

But it is very easy to generate a crash when heap_test is running.
EDIT: This was with a task stack size of 1024 which I guess is too small for printf?

Thank you kind sir for replicating. It is always welcomed when someone other than the issue reporter manages to duplicate ... there is always the potentiality that the issue is local to my environment and hearing that it can be recreated is great news.

Here's what happens when it gets to zero:
E (559173) wifi: error malloc bcn_param

I also noticed that if you replace your numeric loop operations with vTaskDelay(1) that there is no apparent memory leak.

... and I JUST noticed this is configured to run in single core mode so this makes perfect sense. In dual core mode there is no memory leak because the task that needs to free that memory can run on the other core. In single core mode your while(1) loop is blocking and the task priority does not allow the other task to preempt yours.

I had assumed that the two cores were partitioned ... one was meant to run networking related housekeeping while the other ran application code. Is my model wrong? Can you help me understand what would need to be changed to run in "dual core mode"? What are the implications of running in dual core mode?

config FREERTOS_UNICORE
bool "Run FreeRTOS only on first core"
default y

In dual core mode unless you pin a task to a specific core, it will run on whatever core has CPU time available. Your task is pinned to cpu0 where I believe most of the system tasks are pinned, but I think lwip has no affinity in dual core mode so that may be what helps in this case.

Hi,

Sorry, I didn't properly read the comments on this last week before marking as a bug. @negativekelvin's diagnosis is correct, the "heap check" while loop task is starving lower priority tasks and causing data to accumulate in background buffers, hence causing the memory leak in the test program. (The leaks I was able to trigger were on incoming packets, waiting to be processed by the server socket.)

One clue in esp-idf is the "task watchdog", which is enabled by default, picks up that the heap_task task is not yielding within the timeout period and prints watchdog timeout warnings.

This kind of bug is easy to write in RTOS environments due to real-time scheduler effects, but some good design habits will prevent it in most cases.

The best way in this case is as suggested by @negativekelvin , add a vTaskDelay(1) or similar inside any tight while() loop that has to poll rather than blocking. This isn't necessary if the system is blocking waiting on data via RTOS primitives (ie reading from a queue, or blocking on a socket as the sockets use RTOS primitives behind the scenes.) In general it's best to design systems that can use blocking rather than polling, for things like checking for free heap there is no other option.

Another way is to lower the priority of the heap checker task to run at tskIdlePriority or tskIdlePriority+1, so other system tasks will always preempt it and run. Even if doing this it is best to also add a delay, to help the low priority task coexist with other low priority tasks.

The "idle hook" functionality is another way to have something run on the CPU only when no other task is waiting to run. Or FreeRTOS timers or the tick hook gives another way to have something poll periodicially.

Hello projectgus,

I was having the exact same issue, with a HTTP server listener in a tight while loop. I kept seeing decreasing heap but I could not figure out why.

After adding a vTaskDelay(1) to that listener loop I no longer have a memory leak.

So I would like to thank you very much for this answer that has tremendously helped me understand the inner and finer workings of freeRTOS.

Cheers!

adding the vTaskDelay(1) in my code seemed to fix the memory leak as well. i was using lwIP to spawn new threads upon receiving a connection via netconn_connect(). i figured because it's blocking it wouldn't starve the idle thread - nope. thanks all! btw, this was on esp8266, but i figure they're similar enough.

Was this page helpful?
0 / 5 - 0 ratings