Sorry for the cross-post - I just remembered that I was told to post issues here instead on the sub projects as they are less monitored. I'll close the other issue.
I'm trying to figure out why the wifi driver is running out of memory, when there is 4MB free:
W (115021) wifi: alloc eb len=24 type=3 fail, heap:4098144
I can't find where this is logged so I assume that it is part of the still closed-source libraries(?)
What does type=3 mean in this case and what other possible values are there? Desired type DRAM/IRAM or memory capability?
Since updating to the latest master as of today, I'm no longer getting this issue. I can't find anything in the recent commit history that relates to this - thoughts?
@PerMalmberg type 3 means the WiFi management buffer. The debug log indicates that the WiFi management buffer allocation fails.
We didn't have any fixes related to WiFi management buffer.
@liuzfesp Thanks. What about the two remaining questions?
Hi @PerMalmberg,
Why would this fail when there are 4Mb available?
WiFi management buffer is allocated from internal memory, although the total memory is 4Mb (internal/external memory), the internal memory may not be enough.
What other type values can be logged and what do they mean?
1 - TX buffer
2 - Long management buffer
3 - Short management buffer
4 - reserved
5 - reserved
6 - RX buffer
Is there a way to make these memory buffers use PSRAM?
HI @PerMalmberg, for memory type 1/2/3, they are used by hardware for TX, only the internal memory can be used for WiFi TX by hardware.
Ok, but why is the log message including the entire heap size and not just memory with the MALLOC_CAP_INTERNAL capability? It is confusing as-is.
Also, just how much memory is needed for the management buffers? Anyway to calculate and reserve that?
Also, please help me explain the following - there seems to be plenty of both 8-bit and 32-bit alignable memory available compared to the few bytes it tries to allocate so why does the allocation fail?. Legend:
F:<free> LB:<largest block> M:<minimum free>
I (437871) MemStat: [INTERNAL]
I (437879) MemStat: 8-bit F:33892 LB:32664 M:33236 | 32-bit: F:47728 LB:32664 M:47052
I (437880) MemStat: [SPIRAM]
I (437903) MemStat: 8-bit F:3872088 LB:3867876 M:3864996 | 32-bit: F:3872088 LB:3867876 M:3864996
I (437904) MemStat: Name Stack Min free stack
I (437909) MemStat: MainTask 25600 16300
I (437914) MemStat: SocketDispatcher 20480 11984
I (438166) SocketDispatcher: Active sockets: 3
I (442102) wifi: bcn_timout,ap_probe_send_start
W (442103) wifi: alloc eb len=76 type=2 fail, heap:3873224
W (442104) wifi: m f probe req l=0
W (442605) wifi: alloc eb len=76 type=2 fail, heap:3873224
W (442605) wifi: m f probe req l=0
I (442918) MemStat: [INTERNAL]
I (442927) MemStat: 8-bit F:33892 LB:32664 M:33236 | 32-bit: F:47728 LB:32664 M:47052
I (442928) MemStat: [SPIRAM]
I (442951) MemStat: 8-bit F:3872088 LB:3867876 M:3864996 | 32-bit: F:3872116 LB:3867876 M:3864996
I (442953) MemStat: Name Stack Min free stack
I (442958) MemStat: MainTask 25600 16300
I (442963) MemStat: SocketDispatcher 20480 11984
W (443106) wifi: alloc eb len=76 type=2 fail, heap:3873320
W (443107) wifi: m f probe req l=0
W (443608) wifi: alloc eb len=76 type=2 fail, heap:3873220
W (443608) wifi: m f probe req l=0
W (444109) wifi: alloc eb len=76 type=2 fail, heap:3873220
W (444109) wifi: m f probe req l=0
It is MALLOC_CAP_8BIT but is it MALLOC_CAP_DMA?
Well, I didn't have that MALLOC_CAP_DMA in the bit mask, but even with it set, there seems to be enough memory when allocation fails:
I (231637) MemStat: [INTERNAL]
I (231643) MemStat: 8-bit F:35888 LB:32664 M:34760 | 32-bit: F:49724 LB:32664 M:48576
I (231644) MemStat: [INTERNAL & DMA]
I (231649) MemStat: 8-bit F:35888 LB:32664 M:34760 | 32-bit: F:35888 LB:32664 M:34760
I (231654) MemStat: [SPIRAM]
I (231662) MemStat: 8-bit F:3761128 LB:3736232 M:3753204 | 32-bit: F:3761128 LB:3736232 M:3753204
I (231667) MemStat: Name Stack Min free stack
I (231674) MemStat: MainTask 25600 16364
I (231678) MemStat: SocketDispatcher 20480 3392
I (232447) ServerSocket: Connection accepted
W (235184) wifi: alloc eb len=24 type=3 fail, heap:3726272
W (235184) wifi: m f null
I (236683) MemStat: [INTERNAL]
I (236693) MemStat: 8-bit F:35304 LB:32664 M:34548 | 32-bit: F:49140 LB:32664 M:48364
I (236694) MemStat: [INTERNAL & DMA]
I (236701) MemStat: 8-bit F:35304 LB:32664 M:34548 | 32-bit: F:35304 LB:32664 M:34548
I (236704) MemStat: [SPIRAM]
I (236713) MemStat: 8-bit F:3711340 LB:3702760 M:3709632 | 32-bit: F:3711340 LB:3702760 M:3709632
I (236717) MemStat: Name Stack Min free stack
I (236724) MemStat: MainTask 25600 16364
I (236728) MemStat: SocketDispatcher 20480 3392
W (237393) wifi: alloc eb len=24 type=3 fail, heap:3719836
W (237394) wifi: m f null
I (241733) MemStat: [INTERNAL]
I (241740) MemStat: 8-bit F:34828 LB:32664 M:34284 | 32-bit: F:48664 LB:32664 M:48100
I (241741) MemStat: [INTERNAL & DMA]
I (241747) MemStat: 8-bit F:34832 LB:32664 M:34284 | 32-bit: F:34832 LB:32664 M:34284
I (241751) MemStat: [SPIRAM]
I (241761) MemStat: 8-bit F:3695716 LB:3691092 M:3694128 | 32-bit: F:3696108 LB:3691092 M:3694128
There is a block 32664 which could be SPIRAM_MALLOC_RESERVE_INTERNAL and isn't being used properly?
Or, it is all used up? I have no idea how to tell if that is the case, and since this is all in the binary blobs, there's no source to look at.
My SPIRAM_MALLOC_RESERVE_INTERNAL is set to 32768, the number 32664 in the log is the largest available free block if that is what you're reffering to, @negativekelvin. Or do you mean that these allocations are limited to the memory defined by SPIRAM_MALLOC_RESERVE_INTERNAL and allocated by esp_spiram_reserve_dma_pool? I suppose that is a possibility.
@liuzfesp, Can you please clarify from where the WiFi management buffers are allocated and if they in fact are related to SPIRAM_MALLOC_RESERVE_INTERNAL? If they are, how do we calculate the correct amount for SPIRAM_MALLOC_RESERVE_INTERNAL?
Yes I mean it is suspicious that 32768 and 32664 are so close such that if for some reason that reserved block is not getting used then there may not be another block large enough in the remaining fragmented 2k. Although 24 bytes is really small it is hard to believe.
@liuzfesp any thought on this matter?
@PerMalmberg, we just found a bug about WiFi/LWIP using PSRAM, it may be related to this issue, will let you know once the fix is merged.
@liuzfesp Any progress on fixing/merging the issue? This is preventing me from doing pretty much anything with my device.
Hi @PerMalmberg fix on master branch: 40e808c6, but it may need few days to automatically sync to github. For other IDF releases, the code has been merged yet.
Great news, thank you.
@liuzfesp I can see the change via the link, but it doesn't seem to be merged to master yet? (I'm still experiencing the problem on v4.0-dev-1443-g39f090a4f)
@liuzfesp I'm sorry to say that even with the updated lwip the error persists:
W (78383) wifi: alloc eb len=24 type=3 fail, heap:4090072
W (78384) wifi: m f null
After this no sockets are functional.
Hi @PerMalmberg could you provide your sdkconfig please?
@liuzfesp, this is the one:
https://github.com/PerMalmberg/IO-Card-G3/blob/master/software/sdkconfig
@PerMalmberg your sdkconfig looks ok, I suspect that it's caused by your application memory usage. How much internal memory does the application use? How many tasks the the application create? What's the task stack size of each task?
@liuzfesp can you explain how allocation of a 24 byte buffer fails when there is 35k free dma heap? What are the other reasons for failure besides nomem?
W (235184) wifi: alloc eb len=24 type=3 fail, heap:3726272
I (236694) MemStat: [INTERNAL & DMA]
I (236701) MemStat: 8-bit F:35304 LB:32664 M:34548 | 32-bit: F:35304 LB:32664 M:34548
@liuzfesp The memory numbers that @negativekelvin gave you in the post above represents a typical run. I don't see how it matters, but there are four application tasks. I don't remember their stack sizes, but there's 35k DMA-capable available so allocating 24 bytes can't be the actual issue.
Also, the wifi-log shows the number of available PSRAM - and there's loads of memory available there. Is it trying to allocate memory in DMA-capable or och PSRAM?
As much as I would like to dig deeper, it's not possible to since this is one of the binary blobs we don't have the source for. So we're entirely depending on Espressif to help out. Needless to say, a non-functioning Wifi on an ESP32 is a big deal as it's such a vital port of the platform.
As such, I'm asking in the most possibly kind way - can we please have some focused attention to this matter?
@negativekelvin @PerMalmberg I just can't reproduce this issue in our auto-test and release test, including throughput and traffic stress test. I wonder is it possible to give me your demo code that can reproduce this issue, then I can debug it directly.
@liuzfesp
Devkit v3.x (or any compatible device), with latest IDF and gcc8.
set(selected_test_project http_server_test).test/http_server_test/web_root folder (including the folder itself) to the root of an SDCard and put that into the device.Sometimes it happens on the first page-load, sometimes it takes a few reloads but it happens very consistently.
@liuzfesp Sorry, forgot to mention that you'll have to set up the Wifi credentials in wifi_creds.h too.
Here's an output from a run I just did. As you can see, there has never been less than 24908 bytes DMA-capable memory available. It's not a huge amount, but it is certainly more than the 24 bytes it tries to allocate.
And, talking about memory types - we've never gotten an answer on the question from where the memory is requested. The log displays the available heap - is that from where it is trying to allocate?
Let me know if there's any more information I can provide.
I (43020) HTTPServer: Reply: OK
I (43041) HTTPServer: Request: GET: '/img/application_view_icons.png'
W (43048) wifi: alloc eb len=24 type=3 fail, heap:4066848
W (43049) wifi: m f null
I (43076) HTTPServer: Reply: OK
I (43099) HTTPServer: Request: GET: '/img/application_view_list.png'
I (43133) HTTPServer: Reply: OK
I (43151) HTTPServer: Request: GET: '/img/application_view_tile.png'
W (43154) wifi: alloc eb len=24 type=3 fail, heap:4066964
W (43155) wifi: m f null
W (43156) wifi: alloc eb len=24 type=3 fail, heap:4065408
W (43160) wifi: m f null
I (43182) HTTPServer: Reply: OK
I (43198) HTTPServer: Request: GET: '/img/application_xp.png'
I (43231) HTTPServer: Reply: OK
I (43248) HTTPServer: Request: GET: '/img/application_xp_terminal.png'
I (43272) HTTPServer: Reply: OK
I (43289) HTTPServer: Request: GET: '/img/arrow_branch.png'
I (43313) HTTPServer: Reply: OK
I (43334) HTTPServer: Request: GET: '/img/arrow_divide.png'
I (43358) HTTPServer: Reply: OK
I (46890) SocketDispatcher: Active sockets: 8
W (48160) SocketDispatcher: Receive timeout on socket 0x3fffe71c (5000 ms)
I (48162) Socket: [, 0, 56, 0x3fffe71c]: Receive timeout
I (48164) Socket: [, 0, 56, 0x3fffe71c]: Socket stopping
I (48185) Socket: [, 0, -1, 0x3fffe71c]: Disconnected
W (48202) SocketDispatcher: Receive timeout on socket 0x3ffe3dfc (5000 ms)
I (48204) Socket: [, 0, 58, 0x3ffe3dfc]: Receive timeout
I (48206) Socket: [, 0, 58, 0x3ffe3dfc]: Socket stopping
I (48226) MemStat: [INTERNAL]
I (48232) Socket: [, 0, -1, 0x3ffe3dfc]: Disconnected
I (48232) MemStat: 8-bit F:31736 LB:24952 M:24908 | 32-bit: F:61944 LB:30208 M:55108
I (48236) MemStat: [INTERNAL | DMA]
I (48242) MemStat: 8-bit F:31740 LB:24952 M:24908 | 32-bit: F:31740 LB:24952 M:24908
I (48249) MemStat: [SPIRAM]
I (48256) MemStat: 8-bit F:4057176 LB:4050444 M:3957572 | 32-bit: F:4057176 LB:4050444 M:3957572
@PerMalmberg we are now debugging this issue, will update your once we have some progress
@PerMalmberg Is there a way to reproduce the problem without using SD card?
@Rhulin I don't know, I haven't tried. Why do you ask?
Hi @PerMalmberg How can I solve this issue
E (761) esp_image: Image length 1672160 doesn't fit in partition length 1048576
E (761) boot: Factory app partition is not bootable
E (763) boot: No bootable app partitions in the partition table
@Rhulin run "flash" instead of "app-flash" to write a new partition table (assuming you have it configured). Are you debugging this issue or just trying out my library? If the later, then this isn't the place to ask for help for that.
@liuzfesp Any progress? A "yes" or "no" will do fine for now, thank you :)
Hi @PerMalmberg I am trying to reproduce the problem and debug
@Rhulin Ok. Unless you're working for Espressif, there's the small issue that we don't have access to the wifi source code to debug this - at this time it appears the problem is likely located in the closed-source part of IDF. You're of course more than welcome to see if you can find anything, but you should at least know this.
@liuzfesp Is it too early to ask for a progress update?
@liuzfesp Could we please have an update?
I'm currently developing the web UI for my product but trying to test-run it against the ESP32 is near to impossible since the Wifi constantly breaks and requires a restart of the device.
Not sure if it helps, but I just got this in the log; I can't remembering ever seeing type 2 failing before:
W (5217413) wifi: alloc eb len=76 type=2 fail, heap:3032912
Memory stats when it happened; F:(free) LB:(largest block): M:(minimum free)
I (5220816) MemStat: [INTERNAL]
I (5220894) MemStat: 8-bit F:20400 LB:17596 M:17904 | 32-bit: F:42436 LB:22036 M:39920
I (5220908) MemStat: [INTERNAL | DMA]
I (5220958) MemStat: 8-bit F:20424 LB:17596 M:17904 | 32-bit: F:20424 LB:17596 M:17904
I (5220974) MemStat: [SPIRAM]
I (5221084) MemStat: 8-bit F:3028588 LB:2941692 M:3026568 | 32-bit: F:3028684 LB:2941692 M:3026568
@projectgus Sorry to ping you, but @liuzfesp seems to have gone radio silent. Has any progress been made since the last statement from @liuzfesp 20 days ago?
Also, shouldn't this issue be on the list of known issues in IDF v.4.0-beta 1 which you announced earlier today? (and the PSRAM cache issue too, since v4.0 uses gcc 8.2?)
HI @PerMalmberg,
@Rhulin has been debugging this and will update with any progress.
Some other ideas from looking at the case:
I notice that some time passes between the "MemStat" dumps and the errors. Are you able to turn on memory tracing (of all allocations) in between MemStat calls? You should be able to store the trace buffer in PSRAM provided nothing is trying to allocate while writing/erasing flash.
Also, could you please try enabling "Comprehensive" heap poisoning and running a heap_caps_check_integrity(MALLOC_CAP_DMA) at some point after the error, to make sure the heap metadata is correct?
Also, shouldn't this issue be on the list of known issues in IDF v.4.0-beta 1 which you announced earlier today? (and the PSRAM cache issue too, since v4.0 uses gcc 8.2?)
We've assumed that issues open on GitHub don't need to be mentioned here, especially if the root cause isn't confirmed (until we can reproduce without your code involved, it's not certain that this is an issue in IDF). But you're right we should at least mention GitHub Issues in the known issues, and link to major issues like the PSRAM cache one. Will fix for the next pre-release.
Ah, so @Rhulin is working for Espressif, it wasn't obvious from the questions s/he asked, my apologies for that.
Yes, I can and will do what you ask, @projectgus but isn't it already clear that this happens even without my code; there are already three other issues linked by @negativekelvin : #3976, #2184 and #3840, all of them showing the same symptoms.
Yes, I can and will do what you ask, @projectgus but isn't it already clear that this happens even without my code; there are already three other issues linked by @negativekelvin : #3976, #2184 and #3840, all of them showing the same symptoms.
Apologies, you're right and I should have looked through the full links.
Am I right in summarising that the root causes mentioned in the other issues are either the PSRAM bug with confirmed toolchain fix, or that running out of internal DMA memory causes things not to recover when memory frees up? Do you think one of these might be the root cause here, as well?
It is of course possible that the PSRAM cache issue is the underlying cause for this issue, at least if memory in fact is being allocated from PSRAM. Two of the linked issues uses idf-master, i.e. v.4x. and as such gcc 8 unless I'm mistaken. The third one uses commit 52f9a5c from April 24th 2018 so that's likely gcc 5. Since fixes for gcc 8 are not available, only Espressif can say anything about whether or not the PSRAM cache is the cause.
Regarding running out of DMA memory - I'd say that is not the case. Based on the memory statistics there have always been ~20ish kB available (all-time minimum free), and the logs complain about failing to allocate <30b so there obviously were enough memory to allocate at the time.
Btw, I'm a bit confused re. the relation between DMA memory/PSRAM and Wifi. On failure the Wifi log displays the amount of PSRAM available, implying that it tries to allocate from PSRAM. Is it really, or is it misleading information?
Regarding running out of DMA memory - I'd say that is not the case. Based on the memory statistics there have always been ~20ish kB available (all-time minimum free), and the logs complain about failing to allocate <30b so there obviously were enough memory to allocate at the time.
I agree. The only other possibility is that some other larger buffers are being allocated (between the two events) and then deallocated after the operation fails mid-way. A heap trace will confirm for sure this isn't happening.
Btw, I'm a bit confused re. the relation between DMA memory/PSRAM and Wifi. On failure the Wifi log displays the amount of PSRAM available, implying that it tries to allocate from PSRAM. Is it really, or is it misleading information?
I believe it's a bug when PSRAM is enabled. The log line should only only show free internal RAM.
A heap trace will confirm for sure this isn't happening.
Alright. I'll try to get one this evening.
Turns out getting a heap trace for the particular interval we initially wanted was hard, I couldn't get the timing right. Either the wifi flipped out just before the trace began, or just after.
I did however stumble upon a new situation I've not seen before, one that I could reproduce - the wifi failure happens very early in the startup procedure so I enabled heap trace even earlier and managed to capture a log where trace was enabled while the failure happened.
This log is with HEAP_TRACE_ALL and heap corruption detection set to "comprehensive" (never got a heap corrupt issue btw). the error is at line 119. Heap tracing is enabled from line 95 and stopped at line 123 where it is printed just before the memory statistics which are all the way down at line 1837.
https://gist.github.com/PerMalmberg/4a523001b99a44907529f244667aba06
I was never able to make heads from tails of the trace and since I've not received a reply in more than a month, I did another small test - I maxed out the amount of reserved DMA memory (262144), reduced CONFIG_SPIRAM_MALLOC_ALWAYSINTERNAL to 64 and configured the system to run in unicore mode to avoid any issues with #2892. Also, SPIRAM_TRY_ALLOCATE_WIFI_LWIP is enabled.
What now happens is that wifi-allocations fails immediately at start up, which is unexpected since we now have
I (5911) wifi: state: auth -> assoc (0)
I (5918) wifi: state: assoc -> run (10)
I (5935) wifi: connected with Our place, channel 1, 40U, bssid = f0:9f:c2:aa:b0:3f
I (5936) wifi: pm start, type: 1
W (5987) wifi: alloc eb len=24 type=3 fail, heap:4093504
W (5987) wifi: m f null
I (6467) SocketDispatcher: Active sockets: 0
W (6478) wifi: alloc eb len=24 type=3 fail, heap:4055916
W (6478) wifi: m f null
I (6505) Stora hallen: Triggered.
I (12627) MemStat: Mem type | 8-bit free | Smallest block | Minimum free | 32-bit free | Smallest block | Minimum free
I (12630) MemStat: INTERNAL | 129240 | 127492 | 128704 | 191320 | 127492 | 190764
I (12641) MemStat: DMA | 129240 | 127492 | 128704 | 129240 | 127492 | 128704
I (12654) MemStat: SPIRAM | 4056312 | 4055140 | 4052560 | 4056312 | 4055140 | 4052560
As you can see, there has never been less than 128704 of DMA available - I'd say this proves that there are no huge buffers consuming the DMA and preventing wifi-allocations from succeeding.
This question is now why does the wifi-allocation fail even when there is such a large amount of DMA available?
@projectgus @Rhulin Have you done any new findings?
When the memory gets reserved, it loses the MALLOC_CAP_DEFAULT attribute flag. If the wifi driver is using this flag then it won't match the reserved pool?
Also affecting Bluetooth
So does changing those functions to use MALLOC_CAP_8BIT|MALLOC_CAP_DMA|MALLOC_CAP_INTERNAL solve it for you, @PerMalmberg ?
Theoretically it probably will, though I haven't tried. I'd like to have some feedback on what @Rhulin and @projectgus have found before I start messing around with the inner workings of IDF.
While you have a good point, @negativekelvin , is it really the right solution? I don't know, I don't know the reasoning behind the current design so I can't say if it is the right way to go or not.
Seems obvious, it is exactly what I suggested at the beginning. Do we want wifi to have access to the reserved pool? Yes, of course. Then we have to change the flags. It will still try to use the other unreserved heaps first because they are matching MALLOC_CAP_8BIT so it should be exactly what we want.
Updating to use MALLOC_CAP_8BIT|MALLOC_CAP_DMA|MALLOC_CAP_INTERNAL does seem to solve the problem.
Updating to use
MALLOC_CAP_8BIT|MALLOC_CAP_DMA|MALLOC_CAP_INTERNALdoes seem to solve the problem.
I think I'm having the very same problem, but this solution did not help. ~It just seems to postpone the error. Before my application failed after 10 minutes, now it was alive for 19 minutes.~
@falkoschindler well you're still going to have to tune SPIRAM_MALLOC_RESERVE_INTERNAL and other parameters to fit your application and make sure there are no memory leaks. There is a physical limit on ram so it will always be possible to run out.
Did some further testing today and I can no longer cause this issue to occur.
So how quickly can we get the proposed changes into master, @projectgus and @Rhulin ? I'm happy to do a PR if it helps.
Hi @PerMalmberg ,
The fix suggested by @negativekelvin looks correct to me, thanks for persisting with this. Feel free to send a PR, although we can also make the change internally.
Due to some other major changes coming down the pipeline, we may not be able to merge this for a short time.
Angus
Thanks for the update, @projectgus. Can you share some information on what those major changes are? Is it something we can/need to prepare for?
@PerMalmberg We're merging ESP32-S2 Beta support onto master. The intention is for no impact on ESP32 users, but there may be some unexpected regressions. Suggest moving to release/v4.0 branch if you need more stability than master.
@negativekelvin. Today we updated the SDK with git pull --rec and it has fixed some errors on iram0, but the size needed in flash has increased more than 200k so something still not right. But iram0 fits now ok and a lot of components stop using a lot of iram
We now have a fix for the root cause (DMA reserved pool not included in "internal" Wi-Fi/BT allocations) queued for review & merge internally. Should be available on GitHub in the next week or so.
The misleading log message (logging all free memory when the allocation failed from internal-only memory) will also be fixed but separately, may take a little longer.
Will keep this issue open until both fixes are merged.
We now have a fix for the root cause (DMA reserved pool not included in "internal" Wi-Fi/BT allocations) queued for review & merge internally. Should be available on GitHub in the next week or so.
hi @projectgus , thank you, please merge if possible on release/v4.0 as well
@enricop Yes, this will be backported to v4.0 as well.
Sorry for the delay in updating master, folks.
@projectgus Hi, we are using the v3.3 LTS release and are suffering from this problem. When will the 3.3 LTS release have this fix released?
We are also awaiting the PSRAM fix for the 3.3LTS, but that's another story..
Hi @Patrik-Berglund,
Sorry, this missed the v3.3.1 release cutoff. It will be fixed in v3.3.2, but I'll let you know once the commit is merged to the prerelease release/v3.3 branch and you can cherry-pick those commits if you want.
Angus
W (239871) wifi:alloc eb len=76 type=2 fail, heap:100
W (239871) wifi:m f probe req l=0
Still getting the error i dont see this fixed im using v4.2-dev-1262-g4a558131e
@rebeliousconformist It looks like you are actually running out of heap memory (heap: 100 indicates that there is only 100 bytes of heap left).
This original issue was about the discrepancy between the log message (wifi: alloc eb len=24 type=3 fail, heap:4090072, i.e. 4MB free heap) and the Wi-Fi stack being unable to allocate a buffer. This has been fixed, but the log message may still appear if there is actually not enough heap.
Please check the heap tracing feature for finding memory leaks and tracing memory allocations in the app. If you believe that you are seeing a bug in ESP-IDF, please open a new issue and provide some information for us to reproduce it.
@igrr understood. I will get back to you. with more details
@igrr: I'm getting this error, but only under certain conditions. For lack of a better way to say it, it seems to happen when I begin sending messages a little too soon after startup. But, according to my error, I have plenty of heap:
wifi:alloc eb len=24 type=3 fail, heap:4130424
Has a fix been applied to release/v4.0 yet? If not, is there an ETA for this?
Thanks...
@mzimmers The root cause of failing to allocate from all DMA-capable memory was fixed for v4.0 in 947fc14
However, the log message still shows the total heap and this can be misleading because the allocation will only accept internal & DMA-capable memory (MALLOC_CAP_8BIT|MALLOC_CAP_DMA), so it's possible to see a large number here and still be out of internal memory. This issue will be closed when that is fixed.
Suggest calling heap_caps_get_minimum_free_size(MALLOC_CAP_8BIT|MALLOC_CAP_DMA) some time after the error log line, and see what the value is. If it's very low, the system was legitimately out of internal memory at this time. (Details).
To fix it, see if there's anything in your firmware that can be moved from internal memory to PSRAM.
Hi all,
we did an endurance test by shutting down the Wifi AP, and after 13 hours, turning it back to on:
The result is that something leaks memory while the AP is off, so the esp32 wifi library is unable to reconnect when the AP in avalable again:
We _do not observe memory leaks_ in internal memory while the esp32 is _connected correctly_ to the Wifi AP. (we have a a print for free internal memory that is triggered very frequently, but not for DMA mem)
W (65250911) wifi:alloc eb len=76 type=2 fail, heap:2872652
W (65250921) wifi:m f probe req l=0
W (65251641) wifi_event_handler: WIFI DISCONNECTED from SSID minnominni for REASON 201
W (65251661) wifi:alloc eb len=76 type=2 fail, heap:2872492
W (65251661) wifi:m f probe req l=0
W (65251911) wifi:alloc eb len=76 type=2 fail, heap:2872624
W (65251911) wifi:m f probe req l=0
W (65252151) wifi:alloc eb len=76 type=2 fail, heap:2872624
W (65252151) wifi:m f probe req l=0
W (65252391) wifi:alloc eb len=76 type=2 fail, heap:2872568
W (65252401) wifi:m f probe req l=0
W (65252641) wifi:alloc eb len=76 type=2 fail, heap:2872568
W (65252641) wifi:m f probe req l=0
W (65252881) wifi:alloc eb len=76 type=2 fail, heap:2872500
W (65252881) wifi:m f probe req l=0
W (65253121) wifi:alloc eb len=76 type=2 fail, heap:2872568
W (65253121) wifi:m f probe req l=0
W (65253361) wifi:alloc eb len=76 type=2 fail, heap:2872568
W (65253361) wifi:m f probe req l=0
W (65253611) wifi:alloc eb len=76 type=2 fail, heap:2872568
W (65253611) wifi:m f probe req l=0
W (65253851) wifi:alloc eb len=76 type=2 fail, heap:2872568
W (65253851) wifi:m f probe req l=0
W (65254091) wifi:alloc eb len=76 type=2 fail, heap:2872568
W (65254091) wifi:m f probe req l=0
W (65254821) wifi_event_handler: WIFI DISCONNECTED from SSID minnominni for REASON 201
thanks for your support
@enricop Thanks for reporting this. Could you please open a new issue with the details of this bug? The memory leak is a separate bug to the misleading log message, and it's going to be easiest to track and resolve it separately.
@mzimmers The root cause of failing to allocate from all DMA-capable memory was fixed for v4.0 in 947fc14
However, the log message still shows the total heap and this can be misleading because the allocation will only accept internal & DMA-capable memory (
MALLOC_CAP_8BIT|MALLOC_CAP_DMA), so it's possible to see a large number here and still be out of internal memory. This issue will be closed when that is fixed.Suggest calling
heap_caps_get_minimum_free_size(MALLOC_CAP_8BIT|MALLOC_CAP_DMA)some time after the error log line, and see what the value is. If it's very low, the system was legitimately out of internal memory at this time. (Details).To fix it, see if there's anything in your firmware that can be moved from internal memory to PSRAM.
Just before my call to esp_wifi_init():
I (5075) memreport: memReport(): there are 68424 MALLOC_CAP_8BIT | MALLOC_CAP_DMA bytes free.
I'm running release/v4.0. Is this fix in there?
One other tidbit: I disable one of my tasks, and here's the memreports:
(4987) memreport: memReport(): there are 77492 MALLOC_CAP_8BIT | MALLOC_CAP_DMA bytes free.
(execute esp_wifi_init() here)
I (7037) memreport: memReport(): there are 4324 MALLOC_CAP_8BIT | MALLOC_CAP_DMA bytes free.
First observation is: that's a lot of memory consumed by that call. Second observation is: How can I have less that what was reported when the other task was active? Do we have some weird uint16 rollover occurring that is giving me a false value?
To fix it, see if there's anything in your firmware that can be moved from internal memory to PSRAM.
Can you provide any suggestions on what might be movable? Also, any idea on how much I need to free up?
Thanks...
@mzimmers Make sure that you have enabled PSRAM for use by the memory manager via menu config.
You can also reduce the minimum size of allocations that are always placed in internal memory. This is ofc a balance between performance and memory.
If you have large items or many small items on your thread stacks you can allocated them on the heap instead and reduce the stack size of your tasks to free up internal memory.
@PerMalmberg performance isn't an issue. How do I go about reducing these allocations you mention?
I'm still interested in knowing (if possible) how much more memory I need. If it's on the order of 30-40K more, then fine-tuning heap vs. stack isn't going to do it.
And, I'm reluctant to use menuconfig, because whenever I do now, it causes my builds to fail.
How do I go about reducing these allocations you mention?
I don't know your code, but I can give a simple example.
Instead of this
char buffer[1024]
you do, in C++:
auto buffer = std::make_unique<char[]>(1024)
This places your buffer on the heap instead and reduces your needed stack for this particular method by sizeof(char) * 1024 - sizeof(std::unique Edit: This only works if you have enabled PSRAM for use by the memory allocator, otherwise it still ends up in internal memory. I'm still interested in knowing (if possible) how much more memory I need. I have a vague memory of it being specified in the docs, but I really don't know, sorry. And, I'm reluctant to use menuconfig, because whenever I do now, it causes my builds to fail. Then you should spend some time fixing that so that you can use the tools that are available to you. There are also other options in menuconfig that can be tweaked adjust amount of internal memory used by various parts of the system.
I (5062) memreport: memReport(): there are 80728 MALLOC_CAP_8BIT | MALLOC_CAP_DMA bytes free.
(run wifi_task_init();)
I (7002) memreport: memReport(): there are 9472 MALLOC_CAP_8BIT | MALLOC_CAP_DMA bytes free.
So, about 70,000 bytes. I'm no expert, but that sure seems like a big number.
Regarding menuconfig: I think there's a bug in it. When I run it, and make a change, it records my change, but it also puts in about 6K of stuff at the end, under the label "# Deprecated options for backward compatibility." It also adds a line "CONFIG_FREERTOS_TASK_FUNCTION_WRAPPER=y." And when I try to build (or even run menuconfig again), I get errors. The only fix is to restore my old sdkconfig file.
On the subject of the PSRAM, I have these two lines in my sdkconfig file:
# CONFIG_SPIRAM_TYPE_ESPPSRAM32 is not set
# CONFIG_SPIRAM_TYPE_ESPPSRAM64 is not set
Assuming I could get menuconfig to work right, where would I go to change these?
Thanks for the help.
70k _is_ huge. I don't remember mine using that much, but I've not looked at it for a long time.
Re. menu config. Looks like you've not kept your config up-to-date while updating ESP-IDF. I really have no easy solution for you, other than working through the changes. If you know the requirements for your app, then you could simply start scratch instead of updating.
I don't remember the exact menu names, but there are a section in menuconfig that specifically relates to PSRAM.
Has this been merged into v3.3 yet?
This bug not only results in a communication lock-up but also leaves the WiFi radio on, which on battery powered devices kills the battery fast!
Most helpful comment
Updating to use
MALLOC_CAP_8BIT|MALLOC_CAP_DMA|MALLOC_CAP_INTERNALdoes seem to solve the problem.