On SK 2.2.1 builds, the following script produces the output shown:
> function f() a[#a+1]=string.rep('string'..#a,32) print(#a, node.heap()) end
> a={};for i=1,200 do f() end
1 41640
2 41368
...
127 1656
128 1336
E:M 320
not enough memory
That is allocation slowly consumes heap and on exhaustion malloc returns 0 so that Lua GC correctly recovers the RAM.
> function f() a[#a+1]=string.rep('string'..#a,32) print(#a, node.heap()) end
> a={};for i=1,200 do f() end
1 57176
2 56904
3 56608
...
117 19544
118 19216
119 18888
ets Jan 8 2013,rst cause:4, boot mode:(3,6)
wdt reset
...
That is attempting to allocate the last 18Kb of heap hangs malloc and causes a WDT reset. The heap() should correctly report available RAM and out-of-memory should be correctly reported by the allocator.
as above
Wemos D1 Mini
Is it my case?
I am a bit embarrassed that I didn't pick this up in my SDK 3.0 testing. I have to do some testing and binary chop here. We probably want to force DRAM malloc mode (-DMEM_DEFAULT_USE_DRAM), and I need to consider all of the other random residual malloc use, though this test case is creating the issue with the default Lua allocator.
See espressif/ESP8266_NONOS_SDK#243
I've just done builds with the following SDK versions which report the following free heap at boot on this test harness and also don't exhibit this bug:
Any thoughts on which SDK version we should bind to dev (as the controlled 3.0.0. release version isn't usable in my view)?
Boxes within boxes: my concerns about the sudden drop of memory are premature. To put this in context recall the basic ESP8266 memory map (excluding address space allocated to Boot ROM):
Base Addr | Size | Use
----------|------|---------------------
0x3ffe8000 | 80K | user dRAM
0x40100000 | 32K | iRAM1
0x40108000 | 32K | iRAM2 + iRAM3. Not available: used as L1 cache for accessing iROM0
0x40200000 | 1024K | iROM0 memory mapped SPI flash
The dRAM segment is used for the .data, rodata and .bss segments on my test build take up the first ~32K. The top 16K of this segment is allocated for stack space and this grows down to 0x3fffc000. The remaining 48K is available for heap, though about 7K of the heap is consumed by the SDK and our libraries at startup, thus leaving the remaining 41K heap available for the application.
The bulk of the SDK and firmware runs from iROM0 but executing this code needs the L1 cache at 0x40108000 to be enabled. However, some SDK and ISR routines must be able to run without the L1 cache enabled, so this is located in the .text segment in iRAM1.
The big change in the 3.x SDK is to move a lot more code and data from iRAM1 to iROM0, though the iROM0 segment has also grown with the extra functionality added in the new version. As a side-by-side comparison:
Segment | SDK 2.2.1 | SDK 3.0.0
--------|-----------|----------
.text | 0x07993 | 0x0441b
.data | 0x0088c | 0x00a10
.rodata | 0x00088 | 0x00008
.bss | 0x07120 | 0x07280
.irom0.text | 0x7c000 | 0x66000
Hence on my test build the text segment has dropped from 31Kb on an SDK 2.2.1 build to 17Kb on SDK 3.0.1 making about 14Kb extra RAM for application use. Now the 14K of dRAM will vary with the configuration of libraries, etc., but at the moment with (-DMEM_DEFAULT_USE_DRAM) we are leaving it unused.
I would hope that the .rodata would be in the IROM0 segment. Also, your arithemtic doesn't work out. 80k for dRAM - 16k for stack - ~32k .data/.bss leaves 32k for heap. Unless there is space available in IRAM1 -- but I know that space there has been tight in the past.
In your table of numbers above, the .rodata seems very small -- I would have thought that all the C strings would have ended up there.
There are two RAM slots usable by the application: iRAM1 (32Kb) and dRAM (80Kb), and you are right: there is something not adding up about the heap being reported -- perhaps the SDK allocates less RAM for the stack, but looking in the debugger at my current 3.0.1-dev(fce080e) build, BSS is 28Kb and the heap is reporting as under 41Kb, so maybe the stack is only 12Kb?
.rodata is a bit misleading: it is the RO data used by the routines _mapped in iRAM1_ (a.k.a .text). Most RO data in our builds ends up in .irom0.text.
But the main issue here is that there is roughly 15Kb of iRAM1 unused. The SDK now has two variant heap allocation algos: one of which just uses the free dRAM and one which preferentially uses iRAM1 first. I am not sure how stable this latter is.
BTW, I had to regress some of the changes to eagle.app.v6.ld into nodemcu.ld as this latter was forcing a couple of SDK libraries into iRAM1 that can now live in iROM0. I might try moving some more BSS data out of dRAM into iRAM1.
One of the compilations here -- as I've discovered -- is that dRAM and iRAM are handled differently so there are performance implications for putting data into iRAM:
Hence you can't safely put data for code running in iRAM into iRAM, unless this code has been specifically written only to use 32-bit accesses, as this code is in general used to execute in ISRs, etc., and cannot depend on the unaligned exception handler.
So the tld;dr is that data in iRAM is possible but with some material restrictions on it use. This is definitely _not_ the seamless option implied by the Espressif SDK 3.0 documentation. We need to select with care what data can be moved into this ~15 Kb iRAM to make more dRAM available to the application.
As a footnote to the above hidden in Sec 2.5 of the SDK API guide:
- RAM and flash access have to be word-aligned (4 byte boundary aligned access only). Casting pointers directly is not recommended. Please use
os_memcpy, or other APIs for memory operations.
Well this isn't true for loads, as these will be handled in S/W thanks to the exception handler. The trouble is that with storing R/W data in iRAM, you can't use 8-bit stores, etc., and our code does tend to generate these. For example if I want to move the SPIFFS buffers into iRAM, then code fragments like this will need to be recoded:
c_strncpy( buf->name, stat.name, FS_OBJ_NAME_LEN+1 );
buf->name[FS_OBJ_NAME_LEN] = '\0';
as the code to add the null terminator generates a s8i instruction, so before I can move data structures into iRAM, I have to do a code review to check for byte and short stores.
OK, I have added some "low hanging fruit" to iRAM which has freed up 2½ Kb dRAM:
app/spiffs/spiffs.c:31:static u8_t IRAM_DATA_ATTR spiffs_work_buf[LOG_PAGE_SIZE*2];
app/spiffs/spiffs.c:34:static u8_t IRAM_DATA_ATTR myspiffs_cache[20 + (LOG_PAGE_SIZE+20)*4];
app/lua/lrotable.c:47:static cache_line_t IRAM_DATA_ATTR cache[LA_LINES][LA_SLOTS];
Moving any more is going to involve recoding to avoid any s8i instructions.
I will also use the latest releasev3.0.0 commit: espressif/ESP8266_NONOS_SDK@e4434aa as the default SDK for the dev make.
The trouble is that with storing R/W data in iRAM, you can't use 8-bit stores, etc., and our code does tend to generate these.
Trust but verify -- the Espressif unaligned exception handler _does_ handle unaligned stores to iRAM. So I've move the LWiP dns cache into iRAM freeing up another 1Kb of dRAM for application use, and quick test shows that this is working fine.
Most helpful comment
OK, I have added some "low hanging fruit" to iRAM which has freed up 2½ Kb dRAM:
Moving any more is going to involve recoding to avoid any
s8iinstructions.I will also use the latest releasev3.0.0 commit: espressif/ESP8266_NONOS_SDK@e4434aa as the default SDK for the
devmake.