Nodemcu-firmware: Infinite loop in os_malloc() on out-of-memory causes WDT in SKD 3.0

Created on 8 May 2019 · 12Comments · Source: nodemcu/nodemcu-firmware

Expected behavior

On SK 2.2.1 builds, the following script produces the output shown:

> function f() a[#a+1]=string.rep('string'..#a,32) print(#a, node.heap()) end
> a={};for i=1,200 do f() end
1   41640
2   41368
...
127 1656
128 1336
E:M 320
not enough memory

That is allocation slowly consumes heap and on exhaustion malloc returns 0 so that Lua GC correctly recovers the RAM.

Actual behavior on SDK 3.0

> function f() a[#a+1]=string.rep('string'..#a,32) print(#a, node.heap()) end
> a={};for i=1,200 do f() end
1   57176
2   56904
3   56608
...
117 19544
118 19216
119 18888

 ets Jan  8 2013,rst cause:4, boot mode:(3,6)

wdt reset
...

That is attempting to allocate the last 18Kb of heap hangs malloc and causes a WDT reset. The heap() should correctly report available RAM and out-of-memory should be correctly reported by the allocator.

Test code

as above

Hardware

Wemos D1 Mini

Source

TerryE

Most helpful comment

OK, I have added some "low hanging fruit" to iRAM which has freed up 2½ Kb dRAM:

app/spiffs/spiffs.c:31:static u8_t IRAM_DATA_ATTR spiffs_work_buf[LOG_PAGE_SIZE*2];
app/spiffs/spiffs.c:34:static u8_t IRAM_DATA_ATTR myspiffs_cache[20 + (LOG_PAGE_SIZE+20)*4];
app/lua/lrotable.c:47:static cache_line_t IRAM_DATA_ATTR cache[LA_LINES][LA_SLOTS];

Moving any more is going to involve recoding to avoid any s8i instructions.

I will also use the latest releasev3.0.0 commit: espressif/ESP8266_NONOS_SDK@e4434aa as the default SDK for the dev make.

TerryE on 14 May 2019

👍2

All 12 comments

Is it my case?

EgorOA on 8 May 2019

2743 is definitely related to this and I was meaning to add this link later. Certainly exploring the reasons for this issue led me to isolate this more specific and yet more widely applicable issue -- hence the separate number

TerryE on 8 May 2019

👍1

I am a bit embarrassed that I didn't pick this up in my SDK 3.0 testing. I have to do some testing and binary chop here. We probably want to force DRAM malloc mode (-DMEM_DEFAULT_USE_DRAM), and I need to consider all of the other random residual malloc use, though this test case is creating the issue with the default Lua allocator.

TerryE on 8 May 2019

See espressif/ESP8266_NONOS_SDK#243

TerryE on 11 May 2019

I've just done builds with the following SDK versions which report the following free heap at boot on this test harness and also don't exhibit this bug:

2.2.1(release): 2.2.1(6ab97e9) 41608 free
3.0.0 (latest on branch): SDK 3.0.1-dev(fce080e) 40712 free
master (current dev): SDK 3.1.0-dev(3b41fcf) 40664 free

Any thoughts on which SDK version we should bind to dev (as the controlled 3.0.0. release version isn't usable in my view)?

TerryE on 11 May 2019

Boxes within boxes: my concerns about the sudden drop of memory are premature. To put this in context recall the basic ESP8266 memory map (excluding address space allocated to Boot ROM):

Base Addr | Size | Use
----------|------|---------------------
0x3ffe8000 | 80K | user dRAM
0x40100000 | 32K | iRAM1
0x40108000 | 32K | iRAM2 + iRAM3. Not available: used as L1 cache for accessing iROM0
0x40200000 | 1024K | iROM0 memory mapped SPI flash

The dRAM segment is used for the .data, rodata and .bss segments on my test build take up the first ~32K. The top 16K of this segment is allocated for stack space and this grows down to 0x3fffc000. The remaining 48K is available for heap, though about 7K of the heap is consumed by the SDK and our libraries at startup, thus leaving the remaining 41K heap available for the application.

The bulk of the SDK and firmware runs from iROM0 but executing this code needs the L1 cache at 0x40108000 to be enabled. However, some SDK and ISR routines must be able to run without the L1 cache enabled, so this is located in the .text segment in iRAM1.

The big change in the 3.x SDK is to move a lot more code and data from iRAM1 to iROM0, though the iROM0 segment has also grown with the extra functionality added in the new version. As a side-by-side comparison:

Segment | SDK 2.2.1 | SDK 3.0.0
--------|-----------|----------
.text | 0x07993 | 0x0441b
.data | 0x0088c | 0x00a10
.rodata | 0x00088 | 0x00008
.bss | 0x07120 | 0x07280
.irom0.text | 0x7c000 | 0x66000

Hence on my test build the text segment has dropped from 31Kb on an SDK 2.2.1 build to 17Kb on SDK 3.0.1 making about 14Kb extra RAM for application use. Now the 14K of dRAM will vary with the configuration of libraries, etc., but at the moment with (-DMEM_DEFAULT_USE_DRAM) we are leaving it unused.

TerryE on 12 May 2019

I would hope that the .rodata would be in the IROM0 segment. Also, your arithemtic doesn't work out. 80k for dRAM - 16k for stack - ~32k .data/.bss leaves 32k for heap. Unless there is space available in IRAM1 -- but I know that space there has been tight in the past.

In your table of numbers above, the .rodata seems very small -- I would have thought that all the C strings would have ended up there.

pjsg on 12 May 2019

There are two RAM slots usable by the application: iRAM1 (32Kb) and dRAM (80Kb), and you are right: there is something not adding up about the heap being reported -- perhaps the SDK allocates less RAM for the stack, but looking in the debugger at my current 3.0.1-dev(fce080e) build, BSS is 28Kb and the heap is reporting as under 41Kb, so maybe the stack is only 12Kb?

.rodata is a bit misleading: it is the RO data used by the routines _mapped in iRAM1_ (a.k.a .text). Most RO data in our builds ends up in .irom0.text.

But the main issue here is that there is roughly 15Kb of iRAM1 unused. The SDK now has two variant heap allocation algos: one of which just uses the free dRAM and one which preferentially uses iRAM1 first. I am not sure how stable this latter is.

BTW, I had to regress some of the changes to eagle.app.v6.ld into nodemcu.ld as this latter was forcing a couple of SDK libraries into iRAM1 that can now live in iROM0. I might try moving some more BSS data out of dRAM into iRAM1.

TerryE on 12 May 2019

One of the compilations here -- as I've discovered -- is that dRAM and iRAM are handled differently so there are performance implications for putting data into iRAM:

Only 32bit load and stores are executed by the xtensa CPU, so just like Flash any unaligned, 16 or 8 bit accesses need to be handled by the unaligned exception handler. Unlike flash 32-bit writes also work, but AFAIK, the unaligned exception handler only handles reads, so 16 and 8 bit write accesses will fail.
Whilst the CPU does handle 32-bit data accesses to iRAM in hardware, these cause it to flush its instruction pipeline, and then restart the load in the case of read and restart the following instruction in case of writes. This has a few clocks / access performance hit -- perhaps a ¼ of the equivalent flash overhead.

Hence you can't safely put data for code running in iRAM into iRAM, unless this code has been specifically written only to use 32-bit accesses, as this code is in general used to execute in ISRs, etc., and cannot depend on the unaligned exception handler.

So the tld;dr is that data in iRAM is possible but with some material restrictions on it use. This is definitely _not_ the seamless option implied by the Espressif SDK 3.0 documentation. We need to select with care what data can be moved into this ~15 Kb iRAM to make more dRAM available to the application.

TerryE on 13 May 2019

As a footnote to the above hidden in Sec 2.5 of the SDK API guide:

RAM and flash access have to be word-aligned (4 byte boundary aligned access only). Casting pointers directly is not recommended. Please use os_memcpy, or other APIs for memory operations.

Well this isn't true for loads, as these will be handled in S/W thanks to the exception handler. The trouble is that with storing R/W data in iRAM, you can't use 8-bit stores, etc., and our code does tend to generate these. For example if I want to move the SPIFFS buffers into iRAM, then code fragments like this will need to be recoded:

    c_strncpy( buf->name, stat.name, FS_OBJ_NAME_LEN+1 );
    buf->name[FS_OBJ_NAME_LEN] = '\0';

as the code to add the null terminator generates a s8i instruction, so before I can move data structures into iRAM, I have to do a code review to check for byte and short stores.

TerryE on 14 May 2019

OK, I have added some "low hanging fruit" to iRAM which has freed up 2½ Kb dRAM:

app/spiffs/spiffs.c:31:static u8_t IRAM_DATA_ATTR spiffs_work_buf[LOG_PAGE_SIZE*2];
app/spiffs/spiffs.c:34:static u8_t IRAM_DATA_ATTR myspiffs_cache[20 + (LOG_PAGE_SIZE+20)*4];
app/lua/lrotable.c:47:static cache_line_t IRAM_DATA_ATTR cache[LA_LINES][LA_SLOTS];

Moving any more is going to involve recoding to avoid any s8i instructions.

I will also use the latest releasev3.0.0 commit: espressif/ESP8266_NONOS_SDK@e4434aa as the default SDK for the dev make.

TerryE on 14 May 2019

👍2

The trouble is that with storing R/W data in iRAM, you can't use 8-bit stores, etc., and our code does tend to generate these.

Trust but verify -- the Espressif unaligned exception handler _does_ handle unaligned stores to iRAM. So I've move the LWiP dns cache into iRAM freeing up another 1Kb of dRAM for application use, and quick test shows that this is working fine.

TerryE on 16 May 2019

Was this page helpful?

0 / 5 - 0 ratings