Nodemcu-firmware: Wifi connection lost..... No reconnection

Created on 13 Jul 2020  ·  76Comments  ·  Source: nodemcu/nodemcu-firmware

NodeMCU 3.0.0.0 built on nodemcu-build.com provided by frightanic.com
branch: dev
commit: 2fa63a1303493aa66831af5466311436d11cf827
release:
release DTS: 202007071335
SSL: false
build type: integer
LFS: 0x40000 bytes total capacity
modules: file,gpio,mqtt,net,node,rtctime,sntp,tmr,uart,wifi
build 2020-07-08 00:54 powered by Lua 5.1.4 on SDK 3.0.1-dev(fce080e)

I am using above build....
In long run wifi gets disconnected and does not reconnect automatically. even if i do a soft reset using node.restart() but if i toggle power supply it reconnects.
had this issue with 4 esp-07 devices .
not sure if it is a bug or not......

Most helpful comment

I'm adding some code so that I can read out the last 12k of flash (where the wifi setup informaion is stored) and see when/if it changes unexpectedly.

All 76 comments

I've noticed same issue, when in some cases (did not found exact ones, so did not reported yet). Wi-Fi monitor will report 201 when connection lost and not restored. It will not attempt to reconnect till power cycle. Also strange behavior is with #define WIFI_STA_HOSTNAME as after firmware write you have to do power cycle for changes to be reflected. I've checked only 5.3 version only. Any suggestions how to catch fault?

There is definitely something wrong with the wifi sta modul.... wifi gets disconnected after some time and reports 201 and never reconnect until power cycle..
I think this is a bug

@chathurangawijetunge Maybe you got any steps to reproduce this constantly? On dev version, LUA5.3 normal reconnection on normal circumstances works okay. I can't remember now what was exact reason when module just got stuck on 201 and will not reconnect till power cycled.

It only happens in long run over 24 hours...

And It happens with both master and dev LUA5.3

getting wifi.eventmon.reason.AUTH_EXPIRE (2)
and
wifi.eventmon.reason.ASSOC_EXPIRE (4)
and somtiles
wifi.eventmon.reason.NO_AP_FOUND (201)

The Device will not auto reconnect even with wifi.sta.disconnect() wifi.sta.connect()
or even with node.restart()
only with hard reset with rts pin or power cycle will connect

3.0-master_20190907 work's fine.

i'v tried
wifi.sta.autoconnect(0)
wifi.sta.config({ssid="ssid",pwd="pwd",auto=false})
wifi.sta.connect()

wifi connection is little stable..... not sure y this is happening

any updates with regards to above situation....?

Glancing at the logs, I don't see anything interesting happening to wifi between 3.0-master_20190907 (i.e. 310faf7fcc9130a296f7f17021d48c6d717f5fb6) and 2fa63a1303493aa66831af5466311436d11cf827, but you might have a go with git bisect to see what happens.

Yes
3.0-master_20190907 work's fine but the issue is with new dev

Any one having this issue....?
It happens in long run >24 hours

It may well be that nobody but you is experiencing this problem (yet); perhaps all our long-running esp8266es are still back on master. Because you are able to reliably see it, please git bisect and tell us when the problem first appeared. Otherwise, I'm afraid you'll just be waiting for someone else to notice, and that might not happen.

It may well be that nobody but you is experiencing this problem (yet); perhaps all our long-running esp8266es are still back on master. Because you are able to reliably see it, please git bisect and tell us when the problem first appeared. Otherwise, I'm afraid you'll just be waiting for someone else to notice, and that might not happen.

i don't know how to do git bisect exactly...
but
3.0-master_20190907 works fine
and the issue is with
3.0-master_20200610 and in (dev)

NO_AP_FOUND (201) after about 12-24 hours no way to recover until power cycle

I will leave 2 devices with #995114b LUA53 (I've obsoleted 5.1 in my head already) with weak wifi connection (<-85 dBm) and will respond after few days about results. As I mentioned before I've also had same issue with few boards but somehow I could not identify problem and now it for me it works stable (on the first glance).
Config will be:

station_cfg = {}
wifi.sta.sethostname("WiFitest")
wifi.sta.autoconnect(1)
station_cfg.ssid = "ssid"
station_cfg.pwd = "password"
station_cfg.save = true
wifi.sta.config(station_cfg)

With two eventmon

wifi.eventmon.register(wifi.eventmon.STA_CONNECTED, function(T)
    print("\n\tSTA - CONNECTED" .. "\n\tSSID: " .. T.SSID .. "\n\tBSSID: " ..
              T.BSSID .. "\n\tChannel: " .. T.channel)
end)
wifi.eventmon.register(wifi.eventmon.STA_DISCONNECTED, function(T)
    print("\n\tSTA - DISCONNECTED" .. "\n\tSSID: " .. T.SSID .. "\n\tBSSID: " ..
              T.BSSID .. "\n\treason: " .. T.reason)
    connectedToMqtt = false
end)

Because it will use mqtt, I will add additional check:

function MyMqtt.watch_mqtt()
tmr.create():alarm(10000, tmr.ALARM_AUTO, function()
    if not connectedToMqtt and wifi.sta.getip() ~= nil and wifi.eventmon.STA_CONNECTED == 0 then
        m:close() print('Reconnecting to Mqtt!') collectgarbage()
        tmr.create():alarm(1000, tmr.ALARM_SINGLE, function()
        MyMqtt.Connect()
            end)
        elseif not connectedToMqtt and wifi.sta.getip() == nil then
            wifi.sta.config(station_cfg)
        end
    end)
end

Also mqtt will be as indicator of lost and not restored connection if it will happen.

I will leave 2 devices with #995114b LUA53 (I've obsoleted 5.1 in my head already) with weak wifi connection (<-85 dBm) and will respond after few days about results. As I mentioned before I've also had same issue with few boards but somehow I could not identify problem and now it for me it works stable (on the first glance).
Config will be:

station_cfg = {}
wifi.sta.sethostname("WiFitest")
wifi.sta.autoconnect(1)
station_cfg.ssid = "ssid"
station_cfg.pwd = "password"
station_cfg.save = true
wifi.sta.config(station_cfg)

With two eventmon

wifi.eventmon.register(wifi.eventmon.STA_CONNECTED, function(T)
    print("\n\tSTA - CONNECTED" .. "\n\tSSID: " .. T.SSID .. "\n\tBSSID: " ..
              T.BSSID .. "\n\tChannel: " .. T.channel)
end)
wifi.eventmon.register(wifi.eventmon.STA_DISCONNECTED, function(T)
    print("\n\tSTA - DISCONNECTED" .. "\n\tSSID: " .. T.SSID .. "\n\tBSSID: " ..
              T.BSSID .. "\n\treason: " .. T.reason)
    connectedToMqtt = false
end)

Because it will use mqtt, I will add additional check:

function MyMqtt.watch_mqtt()
tmr.create():alarm(10000, tmr.ALARM_AUTO, function()
    if not connectedToMqtt and wifi.sta.getip() ~= nil and wifi.eventmon.STA_CONNECTED == 0 then
        m:close() print('Reconnecting to Mqtt!') collectgarbage()
        tmr.create():alarm(1000, tmr.ALARM_SINGLE, function()
        MyMqtt.Connect()
            end)
        elseif not connectedToMqtt and wifi.sta.getip() == nil then
            wifi.sta.config(station_cfg)
        end
    end)
end

Also mqtt will be as indicator of lost and not restored connection if it will happen.

At what point does MyMqtt.watch_mqtt() is called..?
do we need to call wifi.sta.autoconnect(1) separately as in wifi.sta.config() defaults is true ...?
and do we have to reconfigure wifi.sta.config(station_cfg) if connection gets lost as wifi.sta.autoconnect is enabled in the beginning...?

@KT819GM: wifi.eventmon.STA_CONNECTED == 0 is a comparison between two constants; I do not think it means what you think it means?

@KT819GM: wifi.eventmon.STA_CONNECTED == 0 is a comparison between two constants; I do not think it means what you think it means?

At what point does MyMqtt.watch_mqtt() is called..?
do we need to call wifi.sta.autoconnect(1) separately as in wifi.sta.config() defaults is true ...?
and do we have to reconfigure wifi.sta.config(station_cfg) if connection gets lost as wifi.sta.autoconnect is enabled in the beginning...?
and wifi.eventmon.STA_CONNECTED is all was 0

@chathurangawijetunge Please either edit your earlier comments or merely refrain from making duplicate comments like that. They are, like the duplicate issues, not conducive to conversation.

@KT819GM: wifi.eventmon.STA_CONNECTED == 0 is a comparison between two constants; I do not think it means what you think it means?

Yeah, it bit of brain fart, got stuck experimenting with wifi.NULLMODE so did even more stupidness like asking question about it on gitter, and it's the reason why I don't like to provide code examples being not a programmer.

@KT819GM: At what point does MyMqtt.watch_mqtt() is called..?
do we need to call wifi.sta.autoconnect(1) separately as in wifi.sta.config() defaults is true ...?
and do we have to reconfigure wifi.sta.config(station_cfg) if connection gets lost as wifi.sta.autoconnect is enabled in the beginning...?
and wifi.eventmon.STA_CONNECTED is all was 0

MyMqtt.watch_mqtt() is started after mqtt connected. I've did not posted full code for reasons I've said bit higher.
wifi.sta.autoconnect(1) I've usually declare what I use, yes, it defaults to true anyways.
wifi.sta.config(station_cfg) this is what does reconnection when wifi.sta.autoconnect(1) fails, as an example:
Connect to Wifi, push Lua wifi.NULLMODE then put it back to Lua wifi.STATION and wifi.sta.autoconnect(1) will not be active, only wifi.sta.config(station_cfg) will put it back online. This is my dirty workaround for more stable wifi.

p.s. both units online, one at constant -86 / -92 dBm

There is definitely something weird going on on the dev branch. Today, the node that I'm working on got into a state where it wouldn't connect to the AP. It kept on given eventmon reason 23 (a type of auth fail) and the AP also reported

Aug 11 22:21:43.273 | Debugging | Station 2c3a.e835.f4eb Authentication failed

I tried redoing the wifi.sta.config({ssid="correct ssid", pwd="correct pw"}) and it didn't help. I tried changing modes to wifi.NULLMODE and back to wifi.STATION -- no help. I tried power cycling the board. No help.

I tried disabling the AP that it was trying to connect to so that it would switch to a different AP. No help.

I switched to wifi.sta.config({ssid="correct ssid", pwd="incorrect pw"}) and then back to correct. No help.

I switched to wifi.sta.config({ssid="incorrect ssid", pwd="correct pw"}) and then back to correct. This worked.

The eventmon data showed the correct ssid.

This started after I loaded a new LFS image. I have no idea whether this is related -- I include it for completeness.

Not just with dev this happens in 3.0-master_20200610
too.....
Only stable version for me is 3.0-master_20190907

I'm adding some code so that I can read out the last 12k of flash (where the wifi setup informaion is stored) and see when/if it changes unexpectedly.

I'm adding some code so that I can read out the last 12k of flash (where the wifi setup informaion is stored) and see when/if it changes unexpectedly.

Any luck in finding the bug....,?
or if you can share the code so i also can try with some modules..... as loosing wifi is a major issue.....

I managed to reproduce it today. It turns out that (I think) SPIFFS writes into the flash area at the end of the flash chip and overwrites the wifi settings. This is ugly.

Normally the last 12k doesn't change -- even on a reboot. however, sometimes it does -- it could be to do with reloading the LFS region -- that was when it happened. However, the LFS partition also got corrupted at that time, so I don't know whether I can really blame it. It was SPIFFS data that was found in the last 12k. I suppose that I ought to check that the spiffs partition doesn't overlap the end of the flash.....

@pjsg, This shouldn't make any difference, because the SDK is supposed to use the PT now. See my comments in #3260. If there some bit of the code in our current SDK that are still writing to the old locations then we have wider issues that we need to scope and understand. We are currently running an old 3.0 SDK version. My first instinct would be to rebaseline to a current version and see if that fixes the problem before abandoning use of the Partition Table.

We currently use SDK 3.0.1.

From what you say saving the default wifi.sta.config ssid + password does not use the PT partition but writes to this old SDK 2.x area. We can just erase a chip and set the SPIFFS at 0x100000 then the 5 page reqion should be left at FF. I will try to see if saving SSID credentials corrupts this. I'll also rebaseline our SDK to 3.0.4 and see if this repeats the issue.

This is my partition table:

[{"size":4096,"address":45056,"type":4},{"size":4096,"address":49152,"type":5},{"size":12288,"address":53248,"type":6},{"size":45056,"address":0,"type":101},{"size":507904,"address":65536,"type":102},{"size":131072,"address":573440,"type":103},{"size":3469312,"address":704512,"type":106}]
> 

I just iterated through all the partition types and these were the values that were returned. This looks plausible, but nevertheless, when you do wifi.sta.config, it does overwrite the last 12k of flash (I have a 4MB flash chip).

This is as below and this looks pretty typical. It is worth moving SPIFFS to 1M for 1M, say, so you can see exactly what is writing to the forbidden region. Let me have a play.

type|address|size
----|---|----
SYSTEM_PARTITION_RF_CAL|0xB000|0x1000
SYSTEM_PARTITION_PHY_DATA|0xC000|0x1000
SYSTEM_PARTITION_SYSTEM_PARAMETER|0xD000|0x3000
NODEMCU_PARTITION_EAGLEROM|0x0000|0xB000
NODEMCU_PARTITION_IROM0TEXT|0x10000|0x7C000
NODEMCU_PARTITION_LFS0|0x8C000|0x20000
NODEMCU_PARTITION_SPIFFS|0xAC000|0x34F000

I've just tried various combinations of setmode(), sta.config() getconfig, connect and autoconnect including valid and invalid SSID / password combos. All works as expected and it only seems to be the valid system partitions that are getting updated. Every dump of 0XFB000 is 20Kb × 0xFF, so if the SDK is writing to this, it's going to be on some weird error path.

When I program a new SSID, it shows up in the following places in the flash:

Address 3fe010
Address 3fe144

It looks as though it is ignoring the partition table.

I added

static int node_peek(lua_State *L) {
  luaL_Buffer b;
  luaL_buffinit( L, &b );
  char *p = luaL_prepbuffer(&b);

  uint32_t addr = luaL_checkint(L, 1);
  uint32_t len = luaL_checkint(L, 2);

  if (len > LUAL_BUFFERSIZE) {
    len = LUAL_BUFFERSIZE;
  }

  int size = platform_s_flash_read(p, addr, len);

  luaL_addsize(&b, size);
  luaL_pushresult( &b );
  return 1;
}

to node.c and added

  LROT_FUNCENTRY( peek, node_peek )

And then ran:

local last = 0
for addr = 0, 4 * 1024 * 1024 - 256, 128 do
  local data = node.peek(addr, 256)
  local _start, _end = string.find(data, "test1234")
  if _start then
    if last ~= addr + _start - 1 then
        last = addr + _start - 1
        print (string.format("Address %8x", last))
    end
  end
  tmr.wdclr()
end   

to search for the test1234 ssid

Bugger. My D1 Pros have 16Mb and the pages at 0xFD000 and 0FE000 and 0xFF000 are getting written to as well as the PT partition copies. Clearly some code uses the PT, some ignores it and uses the fixed locations. I have tried upping the SDK version to the latest V3.0.4 and it makes no difference to this issue. (Here is the Makefile patch to do this:

@@ -20,12 +20,10 @@ else ifeq ("$(RELEASE)","master")
   SDK_ZIP_ROOT   := ESP8266_NONOS_SDK-$(SDK_VER)
   SDK_FILE_VER   := $(SDK_VER)
 else
-# SDK_VER        := 3.0
-# SDK_FILE_VER   := v$(SDK_VER)
-  SDK_FILE_VER   := e4434aa730e78c63040ace360493aef420ec267c
-  SDK_VER        := 3.0-e4434aa
-  SDK_FILE_SHA1  := ac6528a6a206d3d4c220e4035ced423eb314cfbf
-  SDK_ZIP_ROOT   := ESP8266_NONOS_SDK-$(SDK_FILE_VER)
+  SDK_VER        := 3.0.4
+  SDK_FILE_VER   := v$(SDK_VER)
+  SDK_FILE_SHA1  := 24c702348ee5b9ae3ae640c1aead60c97768c881
+  SDK_ZIP_ROOT   := ESP8266_NONOS_SDK-$(SDK_VER)
 endif
 SDK_REL_DIR      := sdk/esp_iot_sdk_v$(SDK_VER)
 SDK_DIR          := $(TOP_DIR)/$(SDK_REL_DIR)

If you unpack the SDK from the cache to get the examples/IoT_demo/user/user_main.c which is what the documentation refers you to: the RFcall, PhysData and SP partitions are implicitly assumed to be the _last_ 1+1+3 pages of physical flash memory. We need to update our user_main code to reflect this. I will do this tomorrow. Time for bed for me.

PS. Double checking, the RFcall, PhysData partitions are based on the locations in the PT. It is only ignored for the SP, which is where the wifi stuff is written to.

PS. Philip, you really need to do a flash erase before reflashing to make sure that you've got rid of legacy stuff and use nodemcu-partition to set the spiffs out of the way. e.g. 1Mb for ½Mb

Even with

NodeMCU 3.0.0.0
branch: dev
commit: a0d27059bcb8663bb9c8a9367142dbdc8aa7ad9d
release: 3.0-master_20200610 +25
release DTS: 202009031150
SSL: false
build type: integer
LFS: 0x40000 bytes total capacity
modules: file,gpio,mqtt,net,node,rtctime,sntp,tmr,uart,wifi
build 2020-09-04 17:10 powered by Lua 5.1.4 on SDK 3.0.1-dev(fce080e)

I am getting the same error..... wifi will not auto reconnect.
I am using esp-12 with 4M and since im on windows i use PyFlasher with the option 'yes,wipe all data' enable to flash the freeware. but no luck.
and since the issue is still there why is this issue closed?????

i tried with esptool.py erase_flash but no luck after some time i get error 201 [wifi.eventmon.reason.NO_AP_FOUND]
but with wifi.sta.getconfig() i get valid SSID , PWD and BSSID.... but no re connection even with node.restart() until power cycle.

this is a major issue guys.......

I'm not sure now, but it's possible that I got that behavior when I've compiled version without limiting spiffs (I always do #define SPIFFS_MAX_FILESYSTEM_SIZE and #define SPIFFS_SIZE_1M_BOUNDARY on ESP-07S 4M devices). So maybe because of that I do not get this problem anymore (I have approx 15 units now living their life on MQTT).

p.s. one of devices I left 26 days ago is still alive and has active connection. Sadly second one had to go, to do some other work 😄

I'm not sure now, but it's possible that I got that behavior when I've compiled version without limiting spiffs (I always do #define SPIFFS_MAX_FILESYSTEM_SIZE and #define SPIFFS_SIZE_1M_BOUNDARY on ESP-07S 4M devices). So maybe because of that I do not get this problem anymore (I have approx 15 units now living their life on MQTT).

p.s. one of devices I left 26 days ago is still alive and has active connection. Sadly second one had to go, to do some other work 😄

OK I will do the same and let u guys know....

Finally i have manage to run over 24 hours after nearly about 3 month without connection issues by enabling define SPIFFS_SIZE_1M_BOUNDARY.
i really don't know what is really happening by enabling SPIFFS_SIZE_1M_BOUNDARY with wifi connection,
so i thought of sharing my findings with u guys....

So seems it has something to do with Philip observations. Terry will have to look at this again (I hope).

So seems it has something to do with Philip observations. Terry will have to look at this again (I hope).

We have 9 maintainers on the project and at least 4 are active and C developers. I am the one who focuses on the Lua core, and that is where we are doing most of the development ATM. I usually leave networking to the others. If as @chathurangawijetunge says, lifting SPIFFS above the 1M boundary removes this failure mode then this could imply that some code is accessing the high 5 pages in the first 1Mb. Something for Nathaniel or Philip to check?

Same Error Pop after about 48 hours..... in 1 device (I have 2 devices connected.)
now I'm Trying with both

define SPIFFS_SIZE_1M_BOUNDARY

&

define SPIFFS_MAX_FILESYSTEM_SIZE 0x20000

will update the findings......

Nop even with

define SPIFFS_SIZE_1M_BOUNDARY

&

define SPIFFS_MAX_FILESYSTEM_SIZE 0x20000

Error is still there after about 18 hours both devices got disconnected and no reconnection.....

@chathurangawijetunge if your SPIFFS is 1Mb for 2Mb then nothing is writing to the system parameter area, AFAIK other than the SDK. Have you considered that this isn't a NodeMCU bug at all but something to do with your network config / router? If this was a NodeMCU issue then I'd expect more users reporting this or being able to reproduce it. For example you have a short default DHCP lease and this is expiring so your router if ignoring your ESPs?

My router DHCP lease time is 259200 sec.
the thing is when i flash 3.0-master_20190907 with the same esp-8266 it works fine.
it happens with 3.0-master_20200610 and dev only and I am using node.setonerror() function.. code is below

node.setonerror(function(e)         
      print("'Panic Error: '",e)  
      file.open("log", "a+")
      file.write(e)
      file.close()
      node.restart()
end)

will remove node.setonerror() and try..
And using gpio.serout() to show wifi status on a led not sure if gpio.serout() have anything to do with wifi connection

Ok, so now seems I see exact scenario for this error. As @chathurangawijetunge is using SPIFFS for error logging it's definitely has to do something with it. I've just recreated same scenario with ~122 Kb file - I was uploading *.gz file to spiffs with esplorer. Write failed failed at some point (~40 %) and at the same moment
STA - DISCONNECTED SSID: ssid BSSID: 00:00:00:00:00:00 reason: 201
was fired. Now, in that state ESP does not reconnect event after restart. Manual wifi.sta.config(station_cfg) does not help either. After power cycle module reconnected.
I've observed such behavior only with spiffs. All modules which are working OK does not use it, everything is in LFS. Who would have time to help on gitter / whatever with GDB, I'm thinking that if I physically short circuit HW pins on flash I could imitate flash write failure and catch exact error. Or maybe @pjsg could add his 2 cents how to catch that error.

Ok, so now seems I see exact scenario for this error. As @chathurangawijetunge is using SPIFFS for error logging it's definitely has to do something with it. I've just recreated same scenario with ~122 Kb file - I was uploading *.gz file to spiffs with esplorer. Write failed failed at some point (~40 %) and at the same moment

   STA - DISCONNECTED
   SSID: ssid
   BSSID: 00:00:00:00:00:00
   reason: 201

was fired. Now, in that state ESP does not reconnect event after restart. Manual wifi.sta.config(station_cfg) does not help either. After power cycle module reconnected.
I've observed such behavior only with spiffs. All modules which are working OK does not use it, everything is in LFS. Who would have time to help on gitter / whatever with GDB, I'm thinking that if I physically short circuit HW pins on flash I could imitate flash write failure and catch exact error. Or maybe @pjsg could add his 2 cents how to catch that error.

All my code is also in LFS. I think this happens with when codes run from LFS. Not sure.... will do some more testing....

What is slightly surprising to me is that a power-cycle brings the module back to life. This implies that the flash is not corrupted. I'm wondering if the act of writing to the flash is causing some interference with the wifi controller. Maybe interrupts get disabled for too long.

I've played with it for an ~hour. On heavily used MCU (TLS MQTT running) I could easily force it to disconnect at the time of file writing but in most cases it returns to connected state. Still, 2 times got reproduced fatal disconnection where only power cycle helped.

What is slightly surprising to me is that a power-cycle brings the module back to life. This implies that the flash is not corrupted. I'm wondering if the act of writing to the flash is causing some interference with the wifi controller. Maybe interrupts get disabled for too long.

Now I think where I did mistake - WiFi credentials were stored / hardcoded in LFS. Will try again tomorrow with enduser setup.

Are you doing the wifi.sta.config() on every startup? Or do you rely on the
fact that they are saved in the flash between boots?

On Thu, Sep 10, 2020 at 12:15 PM Modestas Bunokas notifications@github.com
wrote:

I've played with it for an ~hour. On heavily used MCU (TLS MQTT running) I
could easily force it to disconnect at the time of file writing but in most
cases it returns to connected state. Still, 2 times got reproduced fatal
disconnection where only power cycle helped. Now I think where I did
mistake - WiFi credentials were stored in LFS. Will try again tomorrow with enduser
setup.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/nodemcu/nodemcu-firmware/issues/3208#issuecomment-690425994,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AALQLTJLMCUWX6AMSBOD373SFD3Q7ANCNFSM4OYGNH3Q
.

On this unit wifi.sta.config() was running on every reboot and it had ssid and pass saved on LFS. That's why possibly after power cycle it reconnected. But - it did not reconnected on software restart, as wifi.sta.config() was launched also.

We are currently running on 3.0.1 and the current is 3.0.4. We'll rebaseline the SDK immediately after the next master drop. This might help.

I have following code ruing on 3 modules

 local Status_led = 4 
 for i=0, 8, 1 do gpio.mode(i, gpio.OUTPUT) gpio.write(i,0) end 
 wifi.setmode(wifi.STATION) 
 wifi.sta.clearconfig() 
 wifi.sta.config({ssid="SSID" ,pwd="PWD"}) 
 --~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
 local Status_led_map={0,2,3,4,0,1} -- 1=Connected 2=Connecting 3=password error 4=AP not found 
 tmr.create():alarm(5000, 1, function() 
       tmr.softwd(30) 
       local WiFi_Status=Status_led_map[wifi.sta.status()+1] 
       if WiFi_Status==0 then 
          gpio.write(Status_led,0)  
       else    
          gpio.serout(Status_led,gpio.LOW,{180000,180000},WiFi_Status,nil) 
      end      
      print(wifi.sta.status())   
 end) 
 --~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `

after overnight all 3 went offline error '201 AP not found' this is in my init.lua not in LFS (now)
i will try the same with 3.0.4. SDK and update the findings....

I've bumped SDK to 3.0.4 also and left one device alive with wifi.sta.config() inactive. Attached it to broker.hivemq.com on channel: 9741000/#. It send data every 20 sec, with content: {"Serial": "9741000", "Boot reason": "1", "Heap": "32776", "rssi": "-84", "Live": 269, "powered by Lua 5.3.5 on SDK 3.0.4(9532ceb)"} where Boot reason indicates 1 - power cycle, 2 - software restart, Live is tmr.time() counter in seconds.

I've bumped SDK to 3.0.4 also and left one device alive with wifi.sta.config() inactive. Attached it to broker.hivemq.com on channel: 9741000/#. It send data every 20 sec, with content: {"Serial": "9741000", "Boot reason": "1", "Heap": "32776", "rssi": "-84", "Live": 269, "powered by Lua 5.3.5 on SDK 3.0.4(9532ceb)"} where Boot reason indicates 1 - power cycle, 2 - software restart, Live is tmr.time() counter in seconds.

It seems that device have WTD restarted...
2020-09-11 18:38:08Topic: 9741000/dataQos: 0{"Serial": "9741000", "Boot reason": "4", "Heap": "32512", "rssi": "-81", "Live": 6968, "powered by Lua 5.3.5 on SDK 3.0.4(9532ceb)"}

It seems that device have WTD restarted...

Yeah, because "somebody" have done mqtt publish without checking if mqtt is available at all 😄. I've left it on battery and sadly can't fix that now. Still, if wifi will fail it should not reconnect even after restart.

It seems that device have WTD restarted...

Yeah, because "somebody" have done mqtt publish without checking if mqtt is available at all 😄. I've left it on battery and sadly can't fix that now. Still, if wifi will fail it should not reconnect even after restart.

True... but to my experience this error happens only after about >24 hors so if the device reboots in between it might not pop

not sure if this is related to this issue, but i have notice that by wifi.sta.clearconfig() wan't clear the MAC address of previously connected router.
and once i get the error error '201 AP not found'

function listap(t)
    for k,v in pairs(t) do
        print(k.." : "..v)
    end
end
wifi.sta.getap(listap)

will do nothing...

by wifi.sta.clearconfig() should clear all wifi config. but after wifi.sta.clearconfig() and wifi.sta.getdefaultconfig() i get

ssid ="" (empty sting)
pwd="" (should be nill)
bssid_set =0
bssid_set = 34:e8:94:04:7a:a0 (old bssid should be ff:ff:ff:ff:ff:ff )

It seems that device have WTD restarted...

Yeah, because "somebody" have done mqtt publish without checking if mqtt is available at all 😄. I've left it on battery and sadly can't fix that now. Still, if wifi will fail it should not reconnect even after restart.

Your device is re starting.... not running continuously

Even with LUA 5.3 wifi issue is still there. I guess I have to stick with 3.0-master_20190907 for my projects 🙁 where wifi is stable..

@chathurangawijetunge, are you saying that 3.0-master_20190907 doesn't manifest this issue but 3.0-master_20200610 does? If so this piece of data will help to work out any underlying failure.

Your device is re starting.... not running continuously

To be honest I see watchdog restart for the first time, and I think it came from the code part I added from your example tmr.softwd(30) Still, let's keep thinking test was unsuccessful and go further.

@TerryE Please consider checking WiFi when UART writing is in process - I can get disconnections reliably when simply transferring files to SPIFFS. Reconnecting most times but sometimes it's not.

@chathurangawijetunge, are you saying that 3.0-master_20190907 doesn't manifest this issue but 3.0-master_20200610 does? If so this piece of data will help to work out any underlying failure.

Yes @TerryE I have devices running on 3.0-master_20190907 over 6 months with out any issue.
But 3.0-master_20200610 gets wifi disconnect. Sometimes in about 2 hours and in some 24 hours or mor.
Yesterday I even bought a new ESP-12 and tested it. got discounted after abut 14 hours with error 201 AP not found...(only power/hard reset reconnects)
(3.0-master_20190907 working perfectly)
modules: file,gpio,mqtt,net,node,rtctime,sntp,tmr,uart,wifi

I'm going to try the example above and see if it does anything strange. I'm using a regular nodemcu board with nothing attached.

However, I've been running a node off the dev branch for a while and after fixing the issue with spiffs overwriting the config, it has been rock solid.

Thanks Philip 😊

After 24 hours, it is still running fine. Note that this code doesn't do anything except check for the status of the wifi. Some of the comments in this thread associated the failures with writing to spiffs.

Another aspect that is different is the radio environment. I have a number of Ubiquiti APs and I have pretty strong signal and I'm running WPA2.

When this fails, what environment does it fail in? @chathurangawijetunge

After 24 hours, it is still running fine. Note that this code doesn't do anything except check for the status of the wifi. Some of the comments in this thread associated the failures with writing to spiffs.

Another aspect that is different is the radio environment. I have a number of Ubiquiti APs and I have pretty strong signal and I'm running WPA2.

When this fails, what environment does it fail in? @chathurangawijetunge

My Wifi setting as follows
RTS/CTS Threshold = 2347
Wireless Mode = 80211b+g+n
Channel Bandwidth = 20/40 Mhz  
Authentication Type = WPA2-PSK
Encryption = AES

also in ruing following code to check switch status with a timer of 50ms

if #(Out_Pin or {})~=3 then
Out_Pin={}
      Out_Pin[1] = 5 --GPIO-14  
      Out_Pin[2] = 6 --GPIO-12
      Out_Pin[3] = 7 --GPIO-13
else print("'user Define out put pin set") end

--if table.getn(Sw_Pin or {})~=3 then
if #(Sw_Pin or {})~=3 then
Sw_Pin={}
      Sw_Pin[1] = 0 --GPIO-4
      Sw_Pin[2] = 1 --GPIO-5
      Sw_Pin[3] = 2 --GPIO-16      
else print("'user Define Switch pin set") end

Timer_status = {} 
local Sw_Master = 3  -- GPIO0 and (+3.3v) 
local prese_ctn=0
----------------------------------------------------------------------
gpio.mode(Sw_Master,gpio.INPUT)
gpio.write(Sw_Master,0)
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
if #(file.getcontents("led") or "")~=6 then file.remove("led") end
if file.open("led", "r") then
   for i=1, 3, 1 do
     gpio.write(Out_Pin[i],string.gsub(file.readline(),"\n",""))
   end     
   file.close()
end
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
function write_led_Status()
   file.open("led", "w")  
   for i=1,3, 1 do
     file.writeline(Timer_status[i]==nil and gpio.read(Out_Pin[i]) or 0)
   end
   file.close()
end
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
local function manual_on_off(pin) 
     gpio.write(Out_Pin[pin],gpio.read(Out_Pin[pin]) == 1 and 0 or 1)
     write_led_Status() 
     pcall(LED_ON_OFF,pin,gpio.read(Out_Pin[pin]) == 1 and "on" or "off",1)
end
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
local Sw_Clicks=0
local mytimer = tmr.create()
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
local mytimer1 = tmr.create()
local sw_sta=gpio.read(Sw_Master)
local debounce=0
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
local function check_master() 
   local function Process_switch_Press()
         if prese_ctn>=10 then 
            pcall(system_reboot,1)                  
         elseif prese_ctn>=1 and prese_ctn<=3 then
            manual_on_off(prese_ctn)
         elseif prese_ctn~=0 then
            pcall(Beep,3)
         end
         prese_ctn=0 
   end                
   if sw_sta==0 and gpio.read(Sw_Master)==1 and math.abs(tmr.now()-debounce)>250000 then
      debounce=tmr.now()
      prese_ctn=prese_ctn+1
      pcall(Beep,1)
      mytimer1:alarm(750, tmr.ALARM_SINGLE, function () 
           if gpio.read(Sw_Master)==1 then
              local long_press=tmr.time()
              mytimer1:alarm(500,1,function(t)
                 if math.abs(tmr.time()-long_press)==4 then 
                    t:stop() prese_ctn=0
                    pcall(Go_AP_Mode) 
                 elseif gpio.read(Sw_Master)==0 then
                      t:stop()
                      Process_switch_Press()
                 end
              end) 
           else    
           Process_switch_Press()
          end     
      end)
   end 
   sw_sta=gpio.read(Sw_Master)
end
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
local blink_ctn=0
tmr.create():alarm(50, 1, function()
    check_master()
    blink_ctn=blink_ctn>8 and 0 or blink_ctn+1
    for i=1,3, 1 do
        if Timer_status[i]=="timer" then
           gpio.write(Sw_Pin[i], blink_ctn<=4 and 1 or 0)
        else
           gpio.write(Sw_Pin[i],gpio.read(Out_Pin[i])==1 and 0 or 1) 
        end
    end 
end)
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

i will remove above code and check if this problem is related to it...... and update......

I'm wondering if this is due to your 50ms timer -- maybe the wifi stack is
not getting enough CPU to actually maintain things. If this is the case,
then maybe the lua firmware could detect this case, and warn about it.

On Wed, Sep 16, 2020 at 10:36 PM chathurangawijetunge <
[email protected]> wrote:

After 24 hours, it is still running fine. Note that this code doesn't do
anything except check for the status of the wifi. Some of the comments in
this thread associated the failures with writing to spiffs.

Another aspect that is different is the radio environment. I have a number
of Ubiquiti APs and I have pretty strong signal and I'm running WPA2.

When this fails, what environment does it fail in? @chathurangawijetunge
https://github.com/chathurangawijetunge

my Wifi setting as follows
RTS/CTS Threshold = 2347
Wireless Mode = 80211b+g+n
Channel Bandwidth = 20/40 Mhz
Authentication Type = WPA2-PSK
Encryption = AES

also in ruing following code to check switch status with a timer of 50ms

if #(Out_Pin or {})~=3 then

Out_Pin={}

  Out_Pin[1] = 5 --GPIO-14

  Out_Pin[2] = 6 --GPIO-12

  Out_Pin[3] = 7 --GPIO-13

else print("'user Define out put pin set") end

--if table.getn(Sw_Pin or {})~=3 then

if #(Sw_Pin or {})~=3 then

Sw_Pin={}

  Sw_Pin[1] = 0 --GPIO-4

  Sw_Pin[2] = 1 --GPIO-5

  Sw_Pin[3] = 2 --GPIO-16

else print("'user Define Switch pin set") end

Timer_status = {}

local Sw_Master = 3 -- GPIO0 and (+3.3v)

local prese_ctn=0


gpio.mode(Sw_Master,gpio.INPUT)

gpio.write(Sw_Master,0)

--~~~~~~~~~~~~~~~~~

if #(file.getcontents("led") or "")~=6 then file.remove("led") end

if file.open("led", "r") then

for i=1, 3, 1 do

 gpio.write(Out_Pin[i],string.gsub(file.readline(),"\n",""))

end

file.close()

end

--~~~~~~~~~~~~~~

function write_led_Status()

file.open("led", "w")

for i=1,3, 1 do

 file.writeline(Timer_status[i]==nil and gpio.read(Out_Pin[i]) or 0)

end

file.close()

end

--~~~~~~~~~~~~~~

local function manual_on_off(pin)

 gpio.write(Out_Pin[pin],gpio.read(Out_Pin[pin]) == 1 and 0 or 1)

 write_led_Status()

 pcall(LED_ON_OFF,pin,gpio.read(Out_Pin[pin]) == 1 and "on" or "off",1)

end

--~~~~~~~~~~~~~~

local Sw_Clicks=0

local mytimer = tmr.create()

--~~~~~~~~~~~~~~

local mytimer1 = tmr.create()

local sw_sta=gpio.read(Sw_Master)

local debounce=0

--~~~~~~~~~~~~~~

local function check_master()

local function Process_switch_Press()

     if prese_ctn>=10 then

        pcall(system_reboot,1)

     elseif prese_ctn>=1 and prese_ctn<=3 then

        manual_on_off(prese_ctn)

     elseif prese_ctn~=0 then

        pcall(Beep,3)

     end

     prese_ctn=0

end

if sw_sta==0 and gpio.read(Sw_Master)==1 and math.abs(tmr.now()-debounce)>250000 then

  debounce=tmr.now()

  prese_ctn=prese_ctn+1

  pcall(Beep,1)

  mytimer1:alarm(750, tmr.ALARM_SINGLE, function ()

       if gpio.read(Sw_Master)==1 then

          local long_press=tmr.time()

          mytimer1:alarm(500,1,function(t)

             if math.abs(tmr.time()-long_press)==4 then

                t:stop() prese_ctn=0

                pcall(Go_AP_Mode)

             elseif gpio.read(Sw_Master)==0 then

                  t:stop()

                  Process_switch_Press()

             end

          end)

       else

       Process_switch_Press()

      end

  end)

end

sw_sta=gpio.read(Sw_Master)

end

--~~~~~~~~~~~~~~

local blink_ctn=0

tmr.create():alarm(50, 1, function()

check_master()

blink_ctn=blink_ctn>8 and 0 or blink_ctn+1

for i=1,3, 1 do

    if Timer_status[i]=="timer" then

       gpio.write(Sw_Pin[i], blink_ctn<=4 and 1 or 0)

    else

       gpio.write(Sw_Pin[i],gpio.read(Out_Pin[i])==1 and 0 or 1)

    end

end

end)

--~~~~~~~~~~~~~~

i will remove above code and check is this problem is related to it and
update......


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/nodemcu/nodemcu-firmware/issues/3208#issuecomment-693772250,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AALQLTKZ4S7TUWUS54SYD53SGFYZ7ANCNFSM4OYGNH3Q
.

I'm wondering if this is due to your 50ms timer

You are calling a pretty complicated function with various nested function calls every 50mS. What happens if it's execution time is near or over 50mS? You will always have a task ready to run and so are breaking SDK scheduling rules, as you are starving the WiFi stack the ability to run low priority housekeeping. My FAQ and the SDK API guides warm that this might happen.

I am really tempted to close this unless you can do what the issue template asks for and that is to provide a minimal complete example that shows the failure mode.

@pjsg, the task scheduling rules are what they are. IMO, it would be impractical to try to detect when a Lua developer isn't following them.

I'm wondering if this is due to your 50ms timer

You are calling a pretty complicated function with various nested function calls every 50mS. What happens if it's execution time is near or over 50mS? You will always have a task ready to run and so are breaking SDK scheduling rules, as you are starving the WiFi stack the ability to run low priority housekeeping. My FAQ and the SDK API guides warm that this might happen.

I am really tempted to close this unless you can do what the issue template asks for and that is to provide a minimal complete example that shows the failure mode.

@pjsg, the task scheduling rules are what they are. IMO, it would be impractical to try to detect when a Lua developer isn't following them.

I understand... but if this code works in 3.0-master_20190907 for continually over many months why not with the new firmware...?
as i said before I have remove the above code and have about 3 Esp-12 modules running now will update the results....

provide a minimal complete example that shows the failure mode

👍

if this code works in 3.0-master_20190907 for continually over many months why not with the new firmware...?

That's a totally different question - a valid one, but kind of OT here. Nothing is ever going to be infinitely backwards compatible. Either our code or the Espressif SDK may change the behavior of your code. For our code we strive to mention breaking changes in the release notes.

I understand... but if this code works in 3.0-master_20190907 for continually over many months why not with the new firmware...?

Feel free to ask the Q, and even try to answer it yourself. However if you want one of the maintainers to answer it for you and to fix the issue, then the first step is (as we ask) to supply a minimal, complete, and verifiable example that we can use to examine the core issue and determine a fix.

At first with all respect to dev's - don't take this as some "cry to developers / hammer developers to find non-existing bug" thread. I've spent last two days checking commit history, so in bright side learned to use git more than I thought I will ever need. As you already found, I'm also facing issues with WIFI, but differently from @chathurangawijetunge I can't tell exactly when I've started facing them. I will try to describe as detailed and in proper way as much as my non-native English allows to do that:

  1. I'm, like thread OP facing rarely repeatable but not consistently reproduceable, thus making hard to provide verifiable example, issues with WiFi module, mostly

    • wifi.eventmon.reason.NO_AP_FOUND 201

    • wifi.eventmon.reason.ASSOC_FAIL 203

  2. These events fire when CPU is on heavy usage with timers firing repeatedly and in my case bad written code. Still, I'm fairly convinced, that Lua C code part should be responsible to maintain WIFI connection disregarding if Lua part was written by high skilled dev like Terry or by me. Bad Lua code should be rewarded with panic - reboot.

_Leaving literary part aside_

After Philip engaged in this thread I've removed most of the static WIFI config and left first time connection to be made by enduser_setup.start() with static config leftovers:

wifi.setmode(wifi.STATION)
wifi.sta.autoconnect(1)
wifi.sta.sethostname("TLStest")
wifi.setcountry({
    country = "LT",
    start_ch = 1,
    end_ch = 13,
    policy = wifi.COUNTRY_MANUAL
})

wifi.eventmon.register(wifi.eventmon.STA_CONNECTED, function(T)
    print("\n\tSTA - CONNECTED" .. "\n\tSSID: " .. T.SSID .. "\n\tBSSID: " ..
              T.BSSID .. "\n\tChannel: " .. T.channel)
end)
wifi.eventmon.register(wifi.eventmon.STA_DISCONNECTED, function(T)
    print("\n\tSTA - DISCONNECTED" .. "\n\tSSID: " .. T.SSID .. "\n\tBSSID: " ..
              T.BSSID .. "\n\treason: " .. T.reason)
    _G.connectedToMqtt = false
end)

Compiled firmware with Lua 5.3 with 0x40000 for LFS and 0x80000 for spiffs (SPIFFS_FIXED_LOCATION and SPIFFS_SIZE_1M_BOUNDARY) with 28 modules, SSL and TLS enabled but not used in this code. Added Lua code which uses 2 tmr[auto], connects to non-TLS mqtt server and every 10 seconds sends string < 100 bytes of data into LFS. Left it to send data and have logged it into "influxdb". After ~18 hours it stopped sending data and on UART I got code 203. For comparison few other devices were connected on same network (some of them with exactly same firmware, some with arduino), same mqtt server and they were still sending data. I'm pretty sure that statement it wasn't WIFI router fault could be made.

_Moving further_
I've observed some random disconnects when was uploading file into spiffs while already running Lua code with failure 201. Thought it's some coincidence, but I was able to repeat it, and in rare occasions It went to catastrophic failure when only hard reset helped (power disconnect / reconnect). Moving further I still have no exact procedure to repeat this failure, as seems it depends on code running on ESP8266 and disconnect still occurs randomly, I've started looking for changes made to wifi.c in commit history with git log --follow wifi.c. As for me, non programmer, it haven't made much sense, but still I've observed:

commit 98e428f12edb7869993b5fa3d0eda3976f52a8f4
Author: Terry Ellison <[email protected]>
Date:   Fri May 15 12:45:54 2020 +0100

    Update wifi..c to fix #3106

Which led me to read about:

Wifi resume occurs asynchronously, this means that the resume request will only be processed when control of the processor is passed back to the SDK (after MyResumeFunction() has completed). The resume callback also executes asynchronously and will only execute after wifi has resumed normal operation.

and

// If your application uses the light sleep functions and you wish the
// firmware to manage timer rescheduling over sleeps (the CPU clock is
// suspended so timers get out of sync) then enable the following options

Though I'm not using _light sleep_, still for me timers get out of sync makes much sense on this problem and could justify problems which seems only me and thread OP are facing.

So from all this, most likely fail observations, could any C dev check that if some bad Lua user code influences with CPU cycles (while WIFI events on C part had to be fired but failed because of busy CPU) does Lua interpreted gets callback from C module? And if not, maybe because of that it does not fire _reconnection_ leaving module in not connected state (201, 203) with eventmon showing failure status? Also, what should I use to try to reconnect when failure code occurs if I'm not using wifi.sta.config()?

Thank you for whoever will take a look at this "essay".

@KT819GM Modestas, I really appreciate this type of constructive feedback.

A couple of of comments:

  • Forget about 98e428f1. This was a one line change to fix a compile error introduced when I was making the modules compilable under both Lua 5.1 and 5.3. (I missed this because I don't personally don't use the wifi.eventmon feature and it's not enabled by default.)
  • I discuss the SDK in my Lua Developer FAQ. Well worth a read, even through it is time for a major update. The SDK uses a non-preemptive FIFO within priority scheduler. Leaving aside the small realtime ISRs, everything is split into tasks which run until completion. 3 of the 32 priority levels are allocated to application use (our low, medium and high priorities). Most SDK services run at a a higher level than these and will be scheduled preferentially, but some run below these, so if we continually repost at an application priority, then we can starve out some of the lower priority SDK housekeeping tasks. I note that wifi_eventmon.c breaks these rules. If you want to repost _continually_ that you should use a timer to repost rather than task.post(). Even a 10 mSec gap will allow enough to allow any pending low priority task to start, and thus avoid this starvation.

Incidentally my time on the project is pro-bono as and when available; I am currently having some yard-work done by some contractors and doing some of the associated tasks myself, so my NodeMCU work is itself being starved out a bit until this work is concluded. I will post further when I have time :smile:

@KT819GM see #3285. Try doing what I do which is to leave wifi.eventmon disabled and roll your own monitor code in Lua on a 100 mSec timer, say, and see if this removed the issue. Thanks Terry.

@KT819GM see #3285. Try doing what I do which is to leave wifi.eventmon disabled and roll your own monitor code in Lua on a 100 mSec timer, say, and see if this removed the issue. Thanks Terry.

Thank you, will disable wifi.eventmon and will leave few devices running with wifi.sta.autoconnect() enabled on latest dev.

Incidentally my time on the project is pro-bono as and when available; I am currently having some yard-work done by some contractors and doing some of the associated tasks myself, so my NodeMCU work is itself being starved out a bit until this work is concluded. I will post further when I have time 😄

Seems it would be better for some of us to come and do some of your yard-work, so you would have more time for nodemcu things. I'm pretty sure I would be better in digging than I'm in programming currently 😄

I have no idea what you guys have done to fix this issue... or with other fixes but I'm so happy to tell that new dev commit ebfce4a9111dd6e1c1c35352c2198a8f566639f8 is working fine with same old codes of mine without any wifi issue for over 3 days now.

@chathurangawijetunge We have, I think, done nothing to address this issue, which lends credence to the theory that your code is treading dangerously close to instability occupying so much CPU time and denying the Espressif SDK stack the opportunity to run its tasks. If you have done nothing to correct your code, you should expect it to break again in the future, and I would ask you to please not file a similar issue with us until you can persuasively argue that your code is not starving the Espressif stack.

I think that all our tasks run below the Espressif tasks. I suspect that the root cause was the missing IRAM_CACHE_ATTR on one of the functions being called at interrupt level.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

NicolSpies picture NicolSpies  ·  6Comments

HHHartmann picture HHHartmann  ·  7Comments

liuyang77886 picture liuyang77886  ·  5Comments

TerryE picture TerryE  ·  7Comments

joysfera picture joysfera  ·  5Comments