esp-idf 🚀 - [TW#17765] I2C crashing - watchdog timeout (master & v3.0 branch)

In addition, I've not seen the problem occur with WiFi disabled, but when I enable WiFi I've now seen it happen about five times.

DavidAntliff on 15 Jan 2018

Another dump with slightly different info:

Guru Meditation Error: Core  0 panic'ed (Interrupt wdt timeout on CPU0)
Register dump:
PC      : 0x4008550d  PS      : 0x00060034  A0      : 0x800859a0  A1      : 0x3ffb0570  
0x4008550d: prvCopyDataToQueue at /Users/david/esp32/esp-idf-master/components/freertos/./queue.c:2037

A2      : 0x3ffbc928  A3      : 0x00000000  A4      : 0x00000002  A5      : 0x3ffbadc0  
A6      : 0x00000000  A7      : 0x400849fc  A8      : 0x3ffbc97c  A9      : 0x3ffbc978  
0x400849fc: i2c_isr_handler_default at /Users/david/esp32/esp-idf-master/components/driver/./i2c.c:1023

A10     : 0x3ffbc978  A11     : 0x3ffb05b8  A12     : 0x00000004  A13     : 0x3ffbc97c  
A14     : 0x00000001  A15     : 0x00000000  SAR     : 0x00000020  EXCCAUSE: 0x00000005  
EXCVADDR: 0x00000000  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff  

Backtrace: 0x4008550d:0x3ffb0570 0x4008599d:0x3ffb0590 0x40084682:0x3ffb05b0 0x40084a89:0x3ffb05e0 0x40082ba5:0x3ffb0610 0x4000bfed:0x00000000
0x4008550d: prvCopyDataToQueue at /Users/david/esp32/esp-idf-master/components/freertos/./queue.c:2037

0x4008599d: xQueueGenericSendFromISR at /Users/david/esp32/esp-idf-master/components/freertos/./queue.c:2037

0x40084682: i2c_master_cmd_begin_static at /Users/david/esp32/esp-idf-master/components/driver/./i2c.c:1023

0x40084a89: i2c_isr_handler_default at /Users/david/esp32/esp-idf-master/components/driver/./i2c.c:1023

0x40082ba5: _xt_lowint1 at /Users/david/esp32/esp-idf-master/components/freertos/./xtensa_vectors.S:1105


Rebooting...

DavidAntliff on 15 Jan 2018

Also, if I hold down the reset button on my ESP32 board for about a second (rather than just pressing it quickly) then the issue does clear on next boot. Pressing it too quickly doesn't help.

DavidAntliff on 15 Jan 2018

From what I can tell so far, this only occurs when there's a glitch on the I2C bus and WiFi is enabled. If WiFi is disabled, the glitch may happen but it doesn't cause this cascade-reset issue.

DavidAntliff on 16 Jan 2018

I've now seen this happen with the v3.0 branch (e7dc749e2f953a976325964b5ac49b53c8bafc42).

DavidAntliff on 16 Jan 2018

To be clear, I am using a fairly long cable with my I2C light sensor (TSL2561) - this cable has the side effect of causing the occasional I2C error (missing slave ACK or timeout). Normally this doesn't cause a problem (apart from the failed transaction) however it appears to trigger this issue occasionally. If I use a much shorter cable, the occasional I2C error does not occur, and neither does this issue. So the cable is clearly related to the issue, but I believe it is triggering the problem by inducing I2C errors, rather than _causing_ it. I think the I2C master shouldn't fail like this in the event of a slave or comms issue.

DavidAntliff on 16 Jan 2018

I have now observed this issue occur when WiFi is _disabled_.

DavidAntliff on 17 Jan 2018

Thanks for the detailed descriptions.

I have a question that how large is the pull-up resister?

We will have some tests under similar condition and get back to you.

costaud on 29 Jan 2018

Thanks for the detailed descriptions.
I would like to ask what is esp_mqtt.h? I could not find it in either esp-idf3.0 or esp-idf2.1.
If you could share it, that will help us debug this issue.

panfeng-espressif on 29 Jan 2018

@panfeng-espressif https://github.com/DavidAntliff/esp-mqtt/

It is a submodule of his other project

negativekelvin on 29 Jan 2018

👍1

SDA and SCK pull up resistance measured (with ESP32, and AVR, removed) at 7.4 kOhm each.

Not sure if it's relevant but I sometimes get a similar error (and resulting continual crash) involving RMT (this time with ESP IDF v3.0):

Guru Meditation Error: Core  0 panic'ed (LoadProhibited)
. Exception was unhandled.
Register dump:
PC      : 0x40084e99  PS      : 0x00060033  A0      : 0x80081aa4  A1      : 0x3ffb05c0  
0x40084e99: rmt_driver_isr_default at /Users/david/esp32/esp-idf-v3.0/components/driver/./rmt.c:497

A2      : 0x00000006  A3      : 0x00000040  A4      : 0x00000002  A5      : 0x00000000  
A6      : 0x00000000  A7      : 0x00001800  A8      : 0x00000006  A9      : 0x00000001  
A10     : 0x00000001  A11     : 0x3ffb05c0  A12     : 0x80082d8b  A13     : 0x00000001  
A14     : 0x00060021  A15     : 0x00020d90  SAR     : 0x00000014  EXCCAUSE: 0x0000001c  
EXCVADDR: 0x00000014  LBEG    : 0x4000c46c  LEND    : 0x4000c477  LCOUNT  : 0x00000000  

Backtrace: 0x40084e99:0x3ffb05c0 0x40081aa1:0x3ffb05f0 0x40082b25:0x3ffb0610 0x4000bfed:0x00000000
0x40084e99: rmt_driver_isr_default at /Users/david/esp32/esp-idf-v3.0/components/driver/./rmt.c:497

0x40081aa1: shared_intr_isr at /Users/david/esp32/esp-idf-v3.0/components/esp32/./intr_alloc.c:754

0x40082b25: _xt_lowint1 at /Users/david/esp32/esp-idf-v3.0/components/freertos/./xtensa_vectors.S:1105


Rebooting...

Might be a red herring though...

DavidAntliff on 29 Jan 2018

On several occasions the system has run for 3+ days without me seeing this error. Other times it can happen after just a few minutes since power up. I have noticed that it doesn't seem to crash, or enter this continual crash loop, if I don't have a serial console connected, but I need to do more tests to confirm this for sure.

DavidAntliff on 29 Jan 2018

Hi @DavidAntliff ,
Thanks for you feedback. we are now want to reproducing your problem. is there any place in your code that I should to modify ?

koobest on 30 Jan 2018

There isn't really any modification required - I am currently running from the head of my ESP-IDF_v3.0 branch, using ESP IDF v3.0. I've also seen the same problem with my ESP-IDF_master branch, using ESP IDF master @ 595688a32ad653d8e6cb1c7682b813f96125853e.

Extra info:

I have it in a crash-loop now, and it's failing over and over during or shortly after the call to i2c_driver_install:

W (2142) i2c: about to run i2c_driver_install
Guru Meditation Error: Core  0 panic'ed (Interrupt wdt timeout on CPU0)
Register dump:
PC      : 0x4000bff0  PS      : 0x00060034  A0      : 0x80086778  A1      : 0x3ffb0560  
A2      : 0x00000003  A3      : 0x00060023  A4      : 0x00060023  A5      : 0x3ffb9e8c  
A6      : 0x00000001  A7      : 0x00000000  A8      : 0x00000000  A9      : 0x3ffb9d5c  
A10     : 0x0000000a  A11     : 0x3ffbaf80  A12     : 0x800d1954  A13     : 0x3ffbaf60  
A14     : 0x00000031  A15     : 0x0000000e  SAR     : 0x0000001f  EXCCAUSE: 0x00000005  
EXCVADDR: 0x00000000  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff  

Backtrace: 0x4000bff0:0x3ffb0560 0x40086775:0x3ffb0570 0x40085c00:0x3ffb0590 0x40084852:0x3ffb05b0 0x40084c7d:0x3ffb05e0 0x40082bbd:0x3ffb0610 0x4000bfed:0x00000000

Rebooting...

It ran without errors, performing many I2C operations (one or two a second) for 4 hours, 3 minutes and 43 seconds before this occurred.

DavidAntliff on 30 Jan 2018

Hi, @DavidAntliff
Can you provide the log when the system power on ?
I found that your code does not initialize nvs_flash before wifi start up, but I'm not sure if this is the cause.

koobest on 30 Jan 2018

I'm mostly using my ESP-IDF_v3.0 branch (with ESP IDF v3.0) at the moment (which has the same problem as my branch that uses the ESP IDF master branch), and I do call nvs_flash_init() there.

Here's the log when my application (commit 801dca9ce8833707fa6bb44c6b614fc0f24fc7b4) starts:

ets Jun  8 2016 00:22:57

rst:0x1 (POWERON_RESET),boot:0x16 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:5576
load:0x40078000,len:0
ho 12 tail 0 room 4
load:0x40078000,len:13716
entry 0x40078fb4
I (29) boot: ESP-IDF v3.0-rc1-151-gb6b8af49-dirty 2nd stage bootloader
I (30) boot: compile time 12:35:35
I (31) boot: Enabling RNG early entropy source...
I (36) boot: SPI Speed      : 40MHz
I (40) boot: SPI Mode       : DIO
I (44) boot: SPI Flash Size : 4MB
I (48) boot: Partition Table:
I (52) boot: ## Label            Usage          Type ST Offset   Length
I (59) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (66) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (74) boot:  2 factory          factory app      00 00 00010000 00100000
I (81) boot: End of partition table
I (86) esp_image: segment 0: paddr=0x00010020 vaddr=0x3f400020 size=0x17e40 ( 97856) map
I (129) esp_image: segment 1: paddr=0x00027e68 vaddr=0x3ffb0000 size=0x033a0 ( 13216) load
I (134) esp_image: segment 2: paddr=0x0002b210 vaddr=0x40080000 size=0x00400 (  1024) load
0x40080000: _iram_start at /Users/david/esp32/esp-idf-v3.0/components/freertos/./xtensa_vectors.S:1685

I (136) esp_image: segment 3: paddr=0x0002b618 vaddr=0x40080400 size=0x049f8 ( 18936) load
I (152) esp_image: segment 4: paddr=0x00030018 vaddr=0x400d0018 size=0x64b58 (412504) map
0x400d0018: _stext at ??:?

I (296) esp_image: segment 5: paddr=0x00094b78 vaddr=0x40084df8 size=0x0def0 ( 57072) load
0x40084df8: i2c_isr_handler_default at /Users/david/esp32/esp-idf-v3.0/components/driver/./i2c.c:1023

I (320) esp_image: segment 6: paddr=0x000a2a70 vaddr=0x400c0000 size=0x00000 (     0) load
I (331) boot: Loaded app from partition at offset 0x10000
I (331) boot: Disabling RNG early entropy source...
I (332) cpu_start: Pro cpu up.
I (335) cpu_start: Starting app cpu, entry point is 0x4008102c
0x4008102c: call_start_cpu1 at /Users/david/esp32/esp-idf-v3.0/components/esp32/./cpu_start.c:215

I (0) cpu_start: App cpu up.
I (346) heap_init: Initializing. RAM available for dynamic allocation:
D (353) heap_init: New heap initialised at 0x3ffae6e0
I (358) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
D (364) heap_init: New heap initialised at 0x3ffb9140
I (369) heap_init: At 3FFB9140 len 00026EC0 (155 KiB): DRAM
I (375) heap_init: At 3FFE0440 len 00003BC0 (14 KiB): D/IRAM
I (382) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
D (389) heap_init: New heap initialised at 0x40092ce8
I (393) heap_init: At 40092CE8 len 0000D318 (52 KiB): IRAM
I (399) cpu_start: Pro cpu start user code
D (411) clk: RTC_SLOW_CLK calibration value: 3302912
D (85) intr_alloc: Connected src 46 to int 2 (cpu 0)
D (86) intr_alloc: Connected src 57 to int 3 (cpu 0)
D (86) stack_chk: Intialize random stack guard
D (91) intr_alloc: Connected src 24 to int 9 (cpu 0)
I (96) cpu_start: Starting scheduler on PRO CPU.
D (0) intr_alloc: Connected src 25 to int 2 (cpu 1)
I (0) cpu_start: Starting scheduler on APP CPU.
D (111) heap_init: New heap initialised at 0x3ffe0440
D (121) heap_init: New heap initialised at 0x3ffe4350
D (121) intr_alloc: Connected src 16 to int 12 (cpu 0)
W (2141) app_main: about to run i2c_master_init
W (2141) i2c: about to run i2c_param_config
W (2141) i2c: about to run i2c_driver_install
D (2141) intr_alloc: Connected src 49 to int 13 (cpu 0)
W (2141) app_main: about to run i2c_master_scan
W (8181) sensor_temp: about to run owb_rmt_initialize
D (8181) intr_alloc: Connected src 47 to int 17 (cpu 0)
W (8181) sensor_temp: about to run find_owb_rom_codes
W (8261) sensor_temp: about to run associate_ds18b20_devices
D (11291) avr_support: avr_support_init
D (11291) avr_support: request AVR reset
I (11291) avr_support: Core ID 1
D (11291) avr_support: I2C 0, REG 0x01: 0x02
D (11291) avr_support: Publishing all switch states
D (11301) avr_support: Publishing all pump states
D (11301) avr_support: command 0xc7, 0b11000111
I (11301) avr_support: CP off
I (11311) avr_support: PP off
I (11311) avr_support: Alarm off
I (11321) avr_support: AVR reset
D (11521) avr_support: I2C 0, REG 0x00: 0x00
D (11521) avr_support: I2C 0, REG 0x01: 0x00
D (11521) avr_support: last_switch_states 0x00, new_switch_states 0x00, changed 0x00
D (11521) avr_support: last_pump_states 0x02, new_pump_states 0x00, changed 0x02
...

DavidAntliff on 30 Jan 2018

If I hold down the reset button for a few seconds, I get a slightly different log:

ets Jun  8 2016 00:22:57

rst:0x1 (POWERON_RESET),boot:0x17 (SPI_FAST_FLASH_BOOT)
flash read err, 1000
Falling back to built-in command interpreter.
OK
>ets Jun  8 2016 00:22:57

rst:0x10 (RTCWDT_RTC_RESET),boot:0x16 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:5576
load:0x40078000,len:0
ho 12 tail 0 room 4
load:0x40078000,len:13716
entry 0x40078fb4
I (30) boot: ESP-IDF v3.0-rc1-151-gb6b8af49-dirty 2nd stage bootloader
I (30) boot: compile time 12:35:35
I (31) boot: Enabling RNG early entropy source...
I (36) boot: SPI Speed      : 40MHz
I (41) boot: SPI Mode       : DIO
I (45) boot: SPI Flash Size : 4MB
I (49) boot: Partition Table:
I (52) boot: ## Label            Usage          Type ST Offset   Length
I (59) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (67) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (74) boot:  2 factory          factory app      00 00 00010000 00100000
I (82) boot: End of partition table
I (86) esp_image: segment 0: paddr=0x00010020 vaddr=0x3f400020 size=0x17e40 ( 97856) map
I (129) esp_image: segment 1: paddr=0x00027e68 vaddr=0x3ffb0000 size=0x033a0 ( 13216) load
I (134) esp_image: segment 2: paddr=0x0002b210 vaddr=0x40080000 size=0x00400 (  1024) load
I (136) esp_image: segment 3: paddr=0x0002b618 vaddr=0x40080400 size=0x049f8 ( 18936) load
I (152) esp_image: segment 4: paddr=0x00030018 vaddr=0x400d0018 size=0x64b58 (412504) map
I (297) esp_image: segment 5: paddr=0x00094b78 vaddr=0x40084df8 size=0x0def0 ( 57072) load
I (320) esp_image: segment 6: paddr=0x000a2a70 vaddr=0x400c0000 size=0x00000 (     0) load
I (332) boot: Loaded app from partition at offset 0x10000
I (332) boot: Disabling RNG early entropy source...
I (332) cpu_start: Pro cpu up.
I (336) cpu_start: Starting app cpu, entry point is 0x4008102c
I (0) cpu_start: App cpu up.
I (346) heap_init: Initializing. RAM available for dynamic allocation:
D (353) heap_init: New heap initialised at 0x3ffae6e0
I (358) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
D (365) heap_init: New heap initialised at 0x3ffb9140
I (369) heap_init: At 3FFB9140 len 00026EC0 (155 KiB): DRAM
I (376) heap_init: At 3FFE0440 len 00003BC0 (14 KiB): D/IRAM
I (382) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
D (390) heap_init: New heap initialised at 0x40092ce8
I (394) heap_init: At 40092CE8 len 0000D318 (52 KiB): IRAM
I (400) cpu_start: Pro cpu start user code
D (412) clk: RTC_SLOW_CLK calibration value: 3297498
D (86) intr_alloc: Connected src 46 to int 2 (cpu 0)
D (87) intr_alloc: Connected src 57 to int 3 (cpu 0)
D (87) stack_chk: Intialize random stack guard
D (91) intr_alloc: Connected src 24 to int 9 (cpu 0)
I (97) cpu_start: Starting scheduler on PRO CPU.
D (0) intr_alloc: Connected src 25 to int 2 (cpu 1)
I (0) cpu_start: Starting scheduler on APP CPU.
D (111) heap_init: New heap initialised at 0x3ffe0440
D (121) heap_init: New heap initialised at 0x3ffe4350
D (121) intr_alloc: Connected src 16 to int 12 (cpu 0)
W (2141) app_main: about to run i2c_master_init
W (2141) i2c: about to run i2c_param_config
W (2141) i2c: about to run i2c_driver_install
D (2141) intr_alloc: Connected src 49 to int 13 (cpu 0)
W (2141) app_main: about to run i2c_master_scan
W (8181) sensor_temp: about to run owb_rmt_initialize
D (8181) intr_alloc: Connected src 47 to int 17 (cpu 0)
W (8181) sensor_temp: about to run find_owb_rom_codes
W (8261) sensor_temp: about to run associate_ds18b20_devices
D (11291) avr_support: avr_support_init
D (11291) avr_support: request AVR reset
I (11291) avr_support: Core ID 1
D (11291) avr_support: I2C 0, REG 0x01: 0x00
D (11291) avr_support: Publishing all switch states
D (11301) avr_support: Publishing all pump states
D (11301) avr_support: command 0xc7, 0b11000111
I (11301) avr_support: CP off
I (11311) avr_support: PP off
I (11311) avr_support: Alarm off
I (11321) avr_support: AVR reset
D (11521) avr_support: I2C 0, REG 0x00: 0x00
D (11521) avr_support: I2C 0, REG 0x01: 0x00
D (11521) avr_support: last_switch_states 0x00, new_switch_states 0x00, changed 0x00
D (11521) avr_support: last_pump_states 0x00, new_pump_states 0x00, changed 0x00
...

DavidAntliff on 30 Jan 2018

What do these lines mean during boot?

0x40080000: _iram_start at /Users/david/esp32/esp-idf-v3.0/components/freertos/./xtensa_vectors.S:1685
...
0x400d0018: _stext at ??:?
...
0x40084df8: i2c_isr_handler_default at /Users/david/esp32/esp-idf-v3.0/components/driver/./i2c.c:1023
...
0x4008102c: call_start_cpu1 at /Users/david/esp32/esp-idf-v3.0/components/esp32/./cpu_start.c:215
...

They don't appear on a first boot after power cycle or a long board reset, but they do appear on a make monitor reset or a short board reset.

DavidAntliff on 30 Jan 2018

Let me configure that you are using three I2C light sensors(TSL2561) as your I2C slave and ESP32 as I2C master.
How do they connect with each other? We are going to buy some sensors that you use.
Thank you

panfeng-espressif on 31 Jan 2018

Here's my schematic: https://drive.google.com/file/d/1Tl_T7OjuoEaj_AxchCVL4cJy9trVFNh_/view?usp=sharing

Less importantly, it's currently built on vero board: https://drive.google.com/open?id=1ERjawt9woHZCbJvHvDzopnqYay1ns9jU

On the I2C side of things I'm using one TSL2561 sensor, and one ATTiny84A as an I2C slave (see code here), and one LCD 1602 display with I2C adapter.

I'm also using 3 DS18B20 temperature sensors on the OneWire bus, and using RMT to drive this, and I'm sometimes seeing issues with that too - not sure yet if it's related to I2C.

DavidAntliff on 31 Jan 2018

The strange thing about this problem is that sometimes it can run for several days without error (thousands of successful I2C operations), and other times it encounters an error in just a few minutes. I've been unable to determine any real pattern to this at all. The only thing that seems certain is that once it crashes, it just keeps crashing over-and-over.

I'm planning to run some tests with fewer I2C devices to see if I can work out which combination causes the issue. This seems like an obvious thing to do, and I've already started doing that, but since days can elapse between instances, it's very difficult to know for certain whether a particular combination is OK or not. What I really need to do is find a 100% reliable way to trigger the fault, and then I can eliminate combinations very quickly.

Is there any way that an I2C slave (such as the AVR) can hold the bus in such a way that the ESP32 is unable to recover?

DavidAntliff on 31 Jan 2018

Where is the definition of the function 'rmt_get_ringbuf_handler'?
When I compile yesterday, there is no error come out. But this time, I was told this function is missing.

panfeng-espressif on 31 Jan 2018

@panfeng-espressif the function has been renamed recently to rmt_get_ringbuf_handle, see #274 and https://github.com/espressif/esp-idf/commit/0eb59844bbd72931b3fcf2bb55f6a2bde7c54402

igrr on 31 Jan 2018

This thread on the forum is about the same thing: https://esp32.com/viewtopic.php?f=13&t=4275

With help from ESP_Angus I decoded the watchdog timeout address to be at line 1045 in i2c.c.

This section of code deals with ACK and TIMEOUT errors, then tries to push a command to the command event queue. If the queue is full, and the call blocks, would this cause a watchdog event? However xQueueOverwriteFromISR isn't meant to block according to the docs , so why would this call result in a watchdog event?

Is there a hardware state that can happen that would cause many multiple failures (and queue posts) in the same call to the handler? What if the I2C bus was a bit unstable, and errors were not only possible but fairly common? If this situation can persist over a reset, then that suggests it's being left in an unexpected hardware state - should the driver attempt to reset the peripheral (perhaps via i2c_hw_fsm_reset?) in i2c_driver_install?

I notice the loop in the interrupt handler beginning at line 363 only terminates when status == 0. If the FSM is stuck, would this loop ever terminate? Will it yield in all cases? If not, would this cause a watchdog event?

I recall that there is a hardware issue that was worked around in 3.0 with i2c_hw_fsm_reset - is it possible that I've encountered another scenario that triggers this issue but isn't covered by the current workaround? Note that when I was using ESP IDF 2.1, that particular hardware issue was happening a _lot_, which is why I moved to 3.0rc1.

DavidAntliff on 31 Jan 2018

Also, not sure if it's useful, but after observing with an oscilloscope I see that there is no edge activity on the I2C bus once this crash (and subsequent rolling crashes) occurs. Both SDA and SCL stay high (not sure if driven by a bad slave or master, or just by pull-ups - I can probably test that with a known resistive load). So the second and subsequent crashes are happening without any real I2C operations taking place.

DavidAntliff on 31 Jan 2018

Can you call xQueueOverwriteFromISR from non-isr context?

negativekelvin on 31 Jan 2018

@negativekelvin I was just looking at that too - the first call to i2c_master_cmd_begin_static is made in a non-ISR context, and then the rest of the linked list is picked up by the interrupt handler. I can only assume that it's OK, since it seems to work in almost all cases except this one, but I am not in a position to say really. There is a mutex within xQueueOverwriteFromISR but I can't see how that could get stuck.

DavidAntliff on 31 Jan 2018

Well in normal cases I don't think the first call would reach those code paths. Have you tried removing the attiny device from the equation? Does calling i2c_hw_fsm_reset before starting the first transaction stop the boot loop?

negativekelvin on 1 Feb 2018

Could the first non-ISR call reach them if the FSM was in a bad state?

In the backtrace I can see that the watchdog did actually fire with the CPU at this line:

        taskENTER_CRITICAL_ISR(&pxQueue->mux);

(there's a bug with addr2line apparently, so I used gdb's info line *0x4... to get the line numbers instead, which is why they don't match those in the console log).

So it is blocked by the mutex. I think I can imagine a situation where this could happen - if the non-ISR call to i2c_master_cmd_begin_static calls xQueueOverwriteFromISR and takes the mutex, but the ISR interrupts and tries to write to the queue, then the ISR will be blocked.

Yes, I have tried without the ATTiny, and I wasn't able to cause the problem. However I've also tried without the LCD 1602 and without the TSL2561 and I don't see the problem with those absent either (and sometimes it runs for _days_ before the error occurs so it's hard to conclude anything in those cases), so I haven't been able to isolate it to a specific slave unfortunately. I only see this problem when all three are present, although with more time I can do more exhaustive testing. I suspect that it's interaction between transactions to multiple I2C slaves that might be causing the problem, but I do need to prove that.

I will try a few things:

your suggestion of an early call to i2c_hw_fsm_reset to try and stop the boot loop.
an experimental rewrite of i2c_master_cmd_begin to do the same operations but without the ISR-specific calls.
try to eliminate I2C slave devices by removing them and wait for the error to occur.
determine whether the SDA & SCL lines are pulled up or driven high by a slave (or the master) when the fault occurs.

DavidAntliff on 1 Feb 2018

In answer to 1, yes, calling i2c_hw_fsm_reset at the beginning of i2c_driver_install does stop the crash-boot-crash loop.

(Although it's uncovered another crash-loop involving RMT and rmt_driver_isr_default, which I hinted at earlier but now seems much clearer).

DavidAntliff on 1 Feb 2018

Hi,

I have a simular problem since IDF3.0rc1. With IDF2.1.1 there was no crash.
When a tmp102 is disconnected I got the following error until it's reconnected:
ESP_ERROR_CHECK failed: esp_err_t 0xfffffffff at 0x401068f1
0x401068f1: tmp102_init1 at C:/msys32/home/projects/ulp_cc_templogger/main/main.c:164

file: "C:/msys32/home/projects/ulp_cc_templogger/main/tmp102.c" line 199
func: tmp102_init1
expression: i2c_master_ccmd_begin(I2C_NUM_0, cmdconfig, 100 / portTICK_PERIOD_MS)

Backtrace: 0x400883bb:0x3ffc1f60 0x400887f1:0x3ffc1f80 0x401068f1:0x3ffc1fa0 0x40107261:0x3ffc1fc0 0x400d13f8:0x3ffc1ff0
0x400883bb: invoke_abort at Y:/VM/Share/GitHub/esp-idf-v3.0-rc1/components/esp32/panic.c:572

0x400887f1: _esp_error_check_failed at Y:/VM/Share/GitHub/esp-idf-v3.0-rc1/components/esp32/panic.c:584

0x401068f1: tmp102_init1 at C:/msys32/home/projects/ulp_cc_templogger/main/main.c:164

0x40107261: app_main at C:/msys32/home/projects/ulp_cc_templogger/main/main.c:460

0x400d13f8: main_task at Y:/VM/Share/GitHub/esp-idf-v3.0-rc1/components/esp32/cpu_start.c:456

Rebooting...
ets Jun 8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x17 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv::0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4

Code:

void setup_i2c()
{
//Configure as I2C master
i2c_config_t conf;
conf.mode = I2C_MODE_MASTER;
conf.sda_io_num = SDA_PIN;
conf.scl_io_num = SCL_PIN;
conf.sda_pullup_en = GPIO_PULLUP_ENABLE;
conf.scl_pullup_en = GPIO_PULLUP_ENABLE;
conf.master.clk_speed = 100000;
ESP_ERROR_CHECK(i2c_param_config(I2C_NUM_0, &conf));

ESP_ERROR_CHECK(i2c_driver_install(I2C_NUM_0, I2C_MODE_MASTER, 0, 0, 0));

}

void tmp102_init1()
{
i2c_cmd_handle_t cmdconfig = i2c_cmd_link_create();

ESP_ERROR_CHECK(i2c_master_start(cmdconfig));
ESP_ERROR_CHECK(i2c_master_write_byte(cmdconfig, (TMP102_ADDR1 << 1 | I2C_MASTER_WRITE), ACK_CHECK_ON));
ESP_ERROR_CHECK(i2c_master_write_byte(cmdconfig, 0x01, ACK)); //configuration register
ESP_ERROR_CHECK(i2c_master_write_byte(cmdconfig, 0xE1, ACK)); //powersave + trigger oneshot
ESP_ERROR_CHECK(i2c_master_write_byte(cmdconfig, 0xA0, ACK)); //
ESP_ERROR_CHECK(i2c_master_stop(cmdconfig));
ESP_ERROR_CHECK(i2c_master_cmd_begin(I2C_NUM_0, cmdconfig, 100 / portTICK_PERIOD_MS));// Crash when tmp102 is not connected.
i2c_cmd_link_delete(cmdconfig);

i2c_cmd_handle_t cmdreadreg = i2c_cmd_link_create();
ESP_ERROR_CHECK(i2c_master_start(cmdreadreg));
ESP_ERROR_CHECK(i2c_master_write_byte(cmdreadreg, (TMP102_ADDR1 << 1 | I2C_MASTER_WRITE), ACK_CHECK_ON));
ESP_ERROR_CHECK(i2c_master_write_byte(cmdreadreg, 0x00, ACK)); //Set to read temperature
ESP_ERROR_CHECK(i2c_master_stop(cmdreadreg));
ESP_ERROR_CHECK(i2c_master_cmd_begin(I2C_NUM_0, cmdreadreg, 100 / portTICK_PERIOD_MS));
i2c_cmd_link_delete(cmdreadreg);
}

My problem is always reproducable. If there is a new test i2c driver, I can test it.

ginodecock on 1 Feb 2018

@ginodecock your ESP32 reboots because a call to ESP_ERROR_CHECK has failed - the function within has returned a non-ESP_OK code. As written, if any of your i2c function calls hit an error, your ESP32 will reboot. What you'll need to do is remove all the ESP_ERROR_CHECK wrappers and see if your code triggers a watchdog panic in the i2c interrupt handler.

And/or, modify the source for ESP_ERROR_CHECK and have it print out the error code rather than assert - then you can at least see what the bad error code is, which will be useful to know.

DavidAntliff on 1 Feb 2018

@igrr does the ESP-IDF FreeRTOS port support "interrupt nesting"? Apparently calling a FromISR function outside of an ISR context is only "probably" safe if interrupt nesting is supported. I'm wondering if the problem I'm seeing is caused by an I2C hardware FSM failure in the non-ISR context of i2c_master_cmd_begin_static that causes it to call xQueueOverwriteFromISR, taking the internal mutex, only to be interrupted by the ISR, which then blocks itself on xQueueOverwriteFromISR and fires the watchdog, causing the ESP32 to reboot. Then, because the FSM freeze isn't addressed by i2c_hw_fsm_reset until after a subsequent transaction has started, this never gets to happen because any subsequent I2C transaction fails immediately. I'm not sure why this always results in a watchdog+reset at that point, but an early preventative call to i2c_hw_fsm_reset seems to avoid it.

There's still some gaps in my understanding. Any comments from Espressif would be helpful please. I'm going to try some tests this weekend to develop this hypothesis and then attempt to prove or disprove it.

DavidAntliff on 1 Feb 2018

Hi,

I don't see a watchdog panic in the i2c interrupt handler.
When modifying the ESP_ERROR_CHECK I got -1 as reported error.

ginodecock on 1 Feb 2018

HI, Sorry I am coming late into this discussion but some of the problems look familiar. Some background.

We have custom designed and built ESP32 controller with 8x I2C/1-Wire (DS2482-800 bridge) inputs and 16x I2C/GPIO (PCA9555) outputs with various other inputs. The 1-Wire inputs are used to read iButtons (room access and bathroom water access) and the 16x GPIO's outputs are used to drive 8x LEDs (in the iButton readers) and 8x Relays (switching door strikers and water solenoid valves).

In a test environment with short leads all worked perfectly but in the live environment with longer UTP cabling driving the LED & Striker and reading the 1-Wire we had catastrophic results. Similarly on the bathrooms where the DC solenoid and LED/1Wire cabling was separated.

The solution (so far) has been very simple. We simply soldered an 1N4007 diode with reverse bias as close as possible to the door striker and the water solenoid valve. This has the effect of absorbing (shorting) the reverse EMF introduced when power is removed from the striker or solenoid valve.

I have not been able to look at your schematic since the link is not working.
Hope this helps a bit.

Andre

ammaree on 2 Feb 2018

@ammaree Hi André, thank you for your comments. I'll fix the link shortly.

It's fairly clear to me that I have some electrical issues on one of my I2C sensors, and your suggestion may well help with that. However, I tried an experiment: I set up four I2C slave devices (AVR ATtiny84 and 85s) on very short cables, and wrote an ESP32 program to fire off I2C transactions to each in parallel. The bus looks good and it runs well at 400 kHz. However, in the middle of the night, there was a single incident which triggered this bug, causing a watchdog timeout and then a subsequent crash loop.

So, my concern is that although I can make my own electrical side more reliable (and I will), there's a serious bug in ESP-IDF at the moment which causes a catastrophic and persistent failure once triggered. I'd like to track this down and solve it, because it seems that even with a good I2C bus, there's always the risk of an error that could cause this problem.

The challenge at the moment is finding a way to trigger this bug reliably. It's easy to create a noisy bus, but it seems that this bug is related to a possible race in the driver that is only apparent on a noisy bus, but is still relatively rare. Once I can get a simple circuit working that reliably triggers the bug, it should become a bit easier to track it down.

DavidAntliff on 2 Feb 2018

I fixed the schematic link earlier. Here it is too.

DavidAntliff on 2 Feb 2018

I've found a reliable way to cause the crash involving 4 x AVRs acting as I2C slaves, and a long cable to a TSL2561, however I will work on simplifying the set-up so that it can be reproduced by others - more on this soon.

Note: I disabled the crash loop by calling i2c_hw_fsm_reset at the beginning of i2c_driver_install. This helps because now it only crashes due to the "initial" fault.

It now crashes pretty reliably every 30 seconds or so. Using the gdb info line *0xaddress technique, I've inspected the stack traces and although the reported error is always "Core 0 panic'ed (Interrupt wdt timeout on CPU0)", and it always happens within the i2c driver, the exact location of each panic varies a little. Here's a selection of the places where it has failed - each one is the first frame reported in the backtrace, over a series of independent crashes in the same build:

(gdb) info line *0x400866ed
Line 157 of "/Users/david/esp32/esp-idf-v3.0/components/freertos/./portmux_impl.h"
   starts at address 0x400866ed <vTaskExitCritical+53> and ends at 0x40086705 <vTaskExitCritical+77>.

       assert(mux->count < 0x100); // Indicates memory corruption

(gdb) info line *0x40086604
Line 4203 of "/Users/david/esp32/esp-idf-v3.0/components/freertos/./tasks.c"
   starts at address 0x40086604 <vTaskEnterCritical> and ends at 0x40086607 <vTaskEnterCritical+3>.

#ifdef CONFIG_FREERTOS_PORTMUX_DEBUG
    void vTaskEnterCritical( portMUX_TYPE *mux, const char *function, int line )
#else
    void vTaskEnterCritical( portMUX_TYPE *mux )
#endif
    {    // <------ Line 4203
        BaseType_t oldInterruptLevel=0;

(gdb) info line *0x400849d9
Line 376 of "/Users/david/esp32/esp-idf-v3.0/components/driver/./i2c.c"
   starts at address 0x400849cf <i2c_isr_handler_default+111> and ends at 0x400849db <i2c_isr_handler_default+123>.

                p_i2c_obj[i2c_num]->status = I2C_STATUS_ACK_ERROR;

(gdb) info line *0x400845e0
Line 1168 of "/Users/david/esp32/esp-idf-v3.0/components/driver/./i2c.c"
   starts at address 0x400845dc <i2c_master_cmd_begin_static+124>
   and ends at 0x400845e9 <i2c_master_cmd_begin_static+137>.

        xQueueOverwriteFromISR(p_i2c->cmd_evt_queue, &evt, &HPTaskAwoken);

(gdb) info line *0x4008560b
Line 2065 of "/Users/david/esp32/esp-idf-v3.0/components/freertos/./queue.c"
   starts at address 0x40085606 <prvCopyDataToQueue+106> and ends at 0x40085612 <prvCopyDataToQueue+118>.

                --( pxQueue->uxMessagesWaiting );

Note that it stops in a different place each time - could this be due to memory corruption as one of the code comments above suggests?

DavidAntliff on 2 Feb 2018

Also, I think I have disproved my earlier hypothesis that the queue mutex is being held by the non-ISR context call to xQueueOverwriteFromISR, resulting in a watchdog timeout. I added some logging around the mutex take and give (for the non-ISR context only) and after many, many crashes I haven't seen any evidence that the ISR is blocked by the non-ISR's hold on the mutex. So I think we can discount this theory.

DavidAntliff on 2 Feb 2018

@panfeng-espressif @igrr I have found a way to reproduce this issue with a much simpler program and hardware setup.

Please see my new repository here: https://github.com/DavidAntliff/esp32-i2c-error-1503

With my two "DOIT" boards I'm able to reproduce simply by hooking up two wires to two GPIOs (as I2C SDA and SCL) and repeatedly touching them together. Not a true representation of any properly configured I2C bus, of course, but it does trigger the exact same issue I see in my "complete" system, without all the overhead of extra I2C devices, use of RMT, WiFi, etc.

With my "LoLin32 Lite" board it's harder to reproduce this way (they must be more electrically stable, perhaps), but with a bit of perseverance I'm able to get this board to show the same error. I've included notes in the project's README to explain more.

This should hopefully make it much easier for the Espressif team to reproduce and debug. Please let me know if this helps.

DavidAntliff on 3 Feb 2018

I would suggest you add some logging to the isr and see if there are multiple interrupts being generated at the same time. If so then instead of looping certain code paths in the isr should probably clear ALL interrupts before calling other functions.

negativekelvin on 3 Feb 2018

@negativekelvin does ESP-IDF support any logging from an ISR? I assume printf isn't going to work as it will be too slow. I can log simple values to an ad-hoc timestamped array but if something already exists I'll use that.

When you say "multiple interrupts being generated at the same time" do you mean that all-but-one interrupt is being ignored, or that they are being resolved in an incorrect order? What sort of bug do you anticipate here? The hardware FSM is getting "stuck" in this case, so I'm keen to better understand how this might relate to interrupts.

Or, are you suggesting that the ISR is spinning (looping) too much due to too many interrupts (in too short a time) and that causes the watchdog to fire? That seems plausible and fast logging might show that up.

DavidAntliff on 3 Feb 2018

ESP_EARLY_LOGI

Yes too many interrupts or some other race condition between two interrupts

negativekelvin on 3 Feb 2018

I haven't gotten nearly as deep into this as @DavidAntliff has but I believe I am also having the same issues. It behaves exactly like he described. It will work reliably for a day straight then suddenly go into an endless bootloop with the same Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0) error until I disconnect power completely.

Here are the complete logs right before the reboot:

Guru Meditation Error: Core  0 panic'ed (Interrupt wdt timeout on CPU0)
Core 0 register dump:
PC      : 0x400845fd  PS      : 0x00060034  A0      : 0x80084a7c  A1      : 0x3ffb05b0  
0x400845fd: i2c_master_cmd_begin_static at /home/khendricks/dev/esp32/tools/esp-idf/components/driver/./i2c.c:947

A2      : 0x00000001  A3      : 0x00000001  A4      : 0x800d17de  A5      : 0x3ffbb880  
A6      : 0x00000000  A7      : 0x00000001  A8      : 0x3ffbd0d8  A9      : 0x00000001  
A10     : 0x00000001  A11     : 0x00000000  A12     : 0x3ffb05b0  A13     : 0x00000002  
A14     : 0x3ffbd1e4  A15     : 0x00060021  SAR     : 0x00000011  EXCCAUSE: 0x00000005  
EXCVADDR: 0x00000000  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff  
Core 0 was running in ISR context:
EPC1    : 0x400d1b9a  EPC2    : 0x00000000  EPC3    : 0x00000000  EPC4    : 0x400845fd
0x400d1b9a: esp_vApplicationIdleHook at /home/khendricks/dev/esp32/tools/esp-idf/components/esp32/./freertos_hooks.c:85

0x400845fd: i2c_master_cmd_begin_static at /home/khendricks/dev/esp32/tools/esp-idf/components/driver/./i2c.c:947


Backtrace: 0x400845fd:0x3ffb05b0 0x40084a79:0x3ffb05e0 0x40082b95:0x3ffb0610 0x400d1b97:0x00000000
0x400845fd: i2c_master_cmd_begin_static at /home/khendricks/dev/esp32/tools/esp-idf/components/driver/./i2c.c:947

0x40084a79: i2c_isr_handler_default at /home/khendricks/dev/esp32/tools/esp-idf/components/driver/./i2c.c:947

0x40082b95: _xt_lowint1 at /home/khendricks/dev/esp32/tools/esp-idf/components/freertos/./xtensa_vectors.S:1105

0x400d1b97: esp_vApplicationIdleHook at /home/khendricks/dev/esp32/tools/esp-idf/components/esp32/./freertos_hooks.c:85


Core 1 register dump:
PC      : 0x400d1b9a  PS      : 0x00060034  A0      : 0x80086748  A1      : 0x3ffbbe40  
0x400d1b9a: esp_vApplicationIdleHook at /home/khendricks/dev/esp32/tools/esp-idf/components/esp32/./freertos_hooks.c:85

A2      : 0x00000008  A3      : 0x00000001  A4      : 0x00000001  A5      : 0x3ffbb984  
A6      : 0x00000000  A7      : 0x00000001  A8      : 0x3ffb371c  A9      : 0x3ffb36e0  
A10     : 0x00000000  A11     : 0x80000001  A12     : 0x00000000  A13     : 0x00000001  
A14     : 0x00060021  A15     : 0x00000000  SAR     : 0x00000000  EXCCAUSE: 0x00000005  
EXCVADDR: 0x00000000  LBEG    : 0x00000000  LEND    : 0x00000000  LCOUNT  : 0x00000000  

Backtrace: 0x400d1b9a:0x3ffbbe40 0x40086745:0x3ffbbe60
0x400d1b9a: esp_vApplicationIdleHook at /home/khendricks/dev/esp32/tools/esp-idf/components/esp32/./freertos_hooks.c:85

0x40086745: prvIdleTask at /home/khendricks/dev/esp32/tools/esp-idf/components/freertos/./tasks.c:4571

My current setup is a single ST VL53L0X Time-Of-Flight sensor hooked up to the ESP32 via a 56cm cable and 4.7kohm pull-up resistors. My esp-idf is on master at commit https://github.com/espressif/esp-idf/commit/daa8cfa800677a71a3447eba418a4336979a3cb8 .

kylehendricks on 4 Feb 2018

Appears I too have experienced this same issue. One SHT-31 on the i2c bus and seven devices on a 1-wire bus (via RMT). Log message immediately proceeding the WDT timeout indicated the read of the SHT-31 failed with esp_rc=-1

I (1265887) mcrTimestamp:  Sat Feb  3 23:45:50 2018 heap=166k first=259k min=163k max=170k delta=120
W (1265917) mcrI2c readSHT31: read failed for i2cDev(mcrDev(mcrDevAddr(0x44) id=i2c/542419a4ae30.01.sht31 desc=sht31 rus=20374 wus=0 reading=0x3ffd5cb4) bus=1 use_mplex=true) esp_rc=-1
Guru Meditation Error: Core  0 panic'ed (Interrupt wdt timeout on CPU0)
Core 0 register dump:
PC      : 0x400845ff  PS      : 0x00060034  A0      : 0x40082560  A1      : 0x3ffb05e0  
0x400845ff: i2c_isr_handler_default at /Users/thughey/devel/esp-idf/components/driver/./i2c.c:1023

0x40082560: _xt_lowint1 at /Users/thughey/devel/esp-idf/components/freertos/./xtensa_vectors.S:1105

A2      : 0x3ffbee50  A3      : 0x3ff53000  A4      : 0x00000400  A5      : 0x00000000  
A6      : 0x00000000  A7      : 0x00000001  A8      : 0x00000001  A9      : 0x00000000  
A10     : 0xabc4f586  A11     : 0x00000001  A12     : 0x800d3132  A13     : 0x3ffbdb20  
A14     : 0x00000000  A15     : 0x00000000  SAR     : 0x00000015  EXCCAUSE: 0x00000005  
EXCVADDR: 0x00000000  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff  
Core 0 was running in ISR context:
EPC1    : 0x400d3402  EPC2    : 0x00000000  EPC3    : 0x00000000  EPC4    : 0x400845ff
0x400d3402: esp_vApplicationIdleHook at /Users/thughey/devel/esp-idf/components/esp32/./freertos_hooks.c:85

0x400845ff: i2c_isr_handler_default at /Users/thughey/devel/esp-idf/components/driver/./i2c.c:1023


Backtrace: 0x400845ff:0x3ffb05e0 0x4008255d:0x3ffb0610 0x400d33ff:0x00000000
0x400845ff: i2c_isr_handler_default at /Users/thughey/devel/esp-idf/components/driver/./i2c.c:1023

0x4008255d: _xt_lowint1 at /Users/thughey/devel/esp-idf/components/freertos/./xtensa_vectors.S:1105

0x400d33ff: esp_vApplicationIdleHook at /Users/thughey/devel/esp-idf/components/esp32/./freertos_hooks.c:85


Core 1 register dump:
PC      : 0x400d3402  PS      : 0x00060334  A0      : 0x80086d08  A1      : 0x3ffbe0e0  
0x400d3402: esp_vApplicationIdleHook at /Users/thughey/devel/esp-idf/components/esp32/./freertos_hooks.c:85

A2      : 0x00000008  A3      : 0x00000001  A4      : 0x00000001  A5      : 0x3ffbdc18  
A6      : 0x00000000  A7      : 0x00000001  A8      : 0x3ffb34fc  A9      : 0x3ffb34c0  
A10     : 0x00000000  A11     : 0x00000001  A12     : 0x8008571f  A13     : 0x3ffc6080  
A14     : 0x00000003  A15     : 0x00060023  SAR     : 0x00000000  EXCCAUSE: 0x00000005  
EXCVADDR: 0x00000000  LBEG    : 0x00000000  LEND    : 0x00000000  LCOUNT  : 0x00000000  

Backtrace: 0x400d3402:0x3ffbe0e0 0x40086d05:0x3ffbe100
0x400d3402: esp_vApplicationIdleHook at /Users/thughey/devel/esp-idf/components/esp32/./freertos_hooks.c:85

0x40086d05: prvIdleTask at /Users/thughey/devel/esp-idf/components/freertos/./tasks.c:4554


CPU halted.

TimHughey on 4 Feb 2018

Unfortunately I haven't had a chance to investigate this further - in particular the possibility of a flood of interrupts starving the watchdog. I hope to revisit at some stage but I'm not sure when, as I've already spent three weeks chasing this and I need to move on with other tasks. Of course eventually I'll be forced to revisit if it isn't fixed in the meantime since my project is pretty much dead with this issue present.

In the meantime I believe I've demonstrated that the problem is real, is not related to my project specifically, and that there should be enough info here for Espressif to take over, reproduce and find the root cause.

On that note, has anyone had a chance to follow my reproduction instructions and reproduce the issue by simply brushing two wires together while running my example code? I've noticed it's more difficult to reproduce on certain boards (newer LoLin32 Lite boards seem much more resilient than my older "DOIT" boards) , however I've been able to eventually reproduce it on all of mine, so it would be interesting to see how easily anyone can reproduce it on other boards, since I think there's some sort of electrical component to this too (perhaps in how bus noise propagates onto the board and triggers interrupts).

DavidAntliff on 7 Feb 2018

I've not attempted to recreate the issue with your code or by brushing wires together. At this point my code is stable after I cleaned up my code to avoid hammering away at the i2c bus. I'll let me board soak for awhile to see if the issue resurfaces.

P.S. I'm using the Adafruit Feather ESP32 module.

TimHughey on 7 Feb 2018

Seem's I'm having the same (or very similar) issue as the rest here. In my case I always get timeouts reported by the I2C driver (the actual text is from my own I2CMasterDevice class) just before the timeout.

I'm communicating on two separate buses with two ADS1115 and one MCP23017. They are both read based on a timer tick() as well as a "ready interrupt" from the respective devices.

It happens very rarely (many days in between) so I don't have more than a stack trace.

```E (4356362) I2CMasterDevice: Error during read - Send command error, no ACK from slave
E (4356362) I2CMasterDevice: Error during read - Send command error, no ACK from slave
E (4356362) I2CMasterDevice: Error during write - Send command error, no ACK from slave
Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0)
Core 0 register dump:
PC : 0x40083f3c PS : 0x00060034 A0 : 0x400824f4 A1 : 0x3ffb09f0
0x40083f3c: i2c_master_cmd_begin_static at /home/permal/esp/esp-idf/components/driver/./i2c.c:778

0x400824f4: _xt_lowint1 at /home/permal/esp/esp-idf/components/freertos/./xtensa_vectors.S:1105

A2 : 0x3ffd9848 A3 : 0x3ff67000 A4 : 0x00000400 A5 : 0x00000001
A6 : 0x00000000 A7 : 0x00000000 A8 : 0x800843cc A9 : 0x00000000
A10 : 0x00000001 A11 : 0x00000001 A12 : 0x800d350a A13 : 0x3ffc0160
A14 : 0x00000000 A15 : 0x00000000 SAR : 0x00000012 EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000 LBEG : 0x4000c2e0 LEND : 0x4000c2f6 LCOUNT : 0xffffffff
Core 0 was running in ISR context:
EPC1 : 0x400d2242 EPC2 : 0x00000000 EPC3 : 0x00000000 EPC4 : 0x40083f3c
0x400d2242: esp_vApplicationIdleHook at /home/permal/esp/esp-idf/components/esp32/./freertos_hooks.c:85

0x40083f3c: i2c_master_cmd_begin_static at /home/permal/esp/esp-idf/components/driver/./i2c.c:778

Backtrace: 0x40083f3c:0x3ffb09f0 0x400824f1:0x3ffb0a20 0x400d223f:0x00000000
0x40083f3c: i2c_master_cmd_begin_static at /home/permal/esp/esp-idf/components/driver/./i2c.c:778

0x400824f1: _xt_lowint1 at /home/permal/esp/esp-idf/components/freertos/./xtensa_vectors.S:1105

0x400d223f: esp_vApplicationIdleHook at /home/permal/esp/esp-idf/components/esp32/./freertos_hooks.c:85

Core 1 register dump:
PC : 0x400d2242 PS : 0x00060234 A0 : 0x8008b330 A1 : 0x3ffc0740
0x400d2242: esp_vApplicationIdleHook at /home/permal/esp/esp-idf/components/esp32/./freertos_hooks.c:85

A2 : 0x00000008 A3 : 0x00000001 A4 : 0x00000001 A5 : 0x3ffc0270
A6 : 0x00000000 A7 : 0x00000001 A8 : 0x3ffb34b0 A9 : 0x3ffb3474
A10 : 0x00000000 A11 : 0x00000000 A12 : 0x8008a578 A13 : 0x3ffdf6b0
A14 : 0x00000000 A15 : 0x3ffc0400 SAR : 0x00000000 EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000 LBEG : 0x00000000 LEND : 0x00000000 LCOUNT : 0x00000000

Backtrace: 0x400d2242:0x3ffc0740 0x4008b32d:0x3ffc0760
0x400d2242: esp_vApplicationIdleHook at /home/permal/esp/esp-idf/components/esp32/./freertos_hooks.c:85

0x4008b32d: prvIdleTask at /home/permal/esp/esp-idf/components/freertos/./tasks.c:4702
```

PerMalmberg on 10 Feb 2018

Hello, I found that sometimes there was a strange low level on SCL when SDA was prepared to start a I2C transmission. This made I2C transmission failed sometimes. Have you met such kind of condition before?

panfeng-espressif on 11 Feb 2018

Edit: After more investigation I don't think this is related, but I'll leave it here just in case.

Pretty sure I am seeing this now as well. I've just updated IDF to a5eb369126e5cef31715d2755931cf933bb98bee and now am getting random watchdog crashes after a failed I2C read:

E (105377) FB_HAL_V1R (0107) fb_hal_battery_task fb_hal_adc_read_channels
ets Jun  8 2016 00:22:57

rst:0x7 (TG0WDT_SYS_RESET),boot:0x33 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:5648
ho 0 tail 12 room 4
load:0x40078000,len:0
load:0x40078000,len:13912
entry 0x40078fd8
W (91) boot: PRO CPU has been reset by WDT.
W (91) boot: WDT reset info: PRO CPU PC=0x40085652
0x40085652: spi_flash_op_block_func at /Users/jason/esp/esp-idf/components/spi_flash/./cache_utils.c:82 (discriminator 1)

W (91) boot: WDT reset info: APP CPU PC=0x40093ebf
0x40093ebf: is_free at /Users/jason/esp/esp-idf/components/heap/./multi_heap.c:339
 (inlined by) multi_heap_malloc_impl at /Users/jason/esp/esp-idf/components/heap/./multi_heap.c:370

For what it's worth, before upgrading this was perfectly stable. I was previously using:

esp-idf: 914ab075672216c15a5564d9ae54fc257b237c13
toolchain: 1.22.0-75-gbaf03c2

vonnieda on 11 Feb 2018

I appreciate that these things take time. I'm disappointed that after 15 days there hasn't been an Espressif reply to my reproduction instructions. Once I found a way to easily reproduce it I had what I thought to be a reasonable expectation that Espressif would at least try it out and report back. It only takes a few seconds to put the test system together, compile and flash my GitHub project, then brush a few wires together to reproduce the bug. I can reproduce it on all my ESP32 boards (across two manufacturers).

Unfortunately I lack the experience and equipment to debug a timing issue in an interrupt service routine, especially as ESP_EARLY_LOG adds delays to the routine that affect the timing. Someone at Espressif with a hardware debugger would make much faster work of this issue. However if there's additional testing I can do, I'm more than happy to do so - I just need some guidance please.

DavidAntliff on 18 Feb 2018

Hi David !

I suggest that you scope (with osciloscope) the SDA and SCL lines during your tests. Test one device at time. I noticed that both lines are very sensitive to the squaring of signals! In my case, I used 3K3 ohms resistors to pullup lines. I have seem that you are using internall pullups.
When I scoping the first test, the SCL produce constant clock 100 KHz ! - strange behavior. After 3k3 resistors , both SDA and SCL function correctly.

ESP32 - I2C Scanner using Arduino IDE (working)
https://www.esp32.com/viewtopic.php?f=18&t=4742&hilit=i2c+scanner

Gustavomurta on 18 Feb 2018

@Gustavomurta thanks for the advice. Here's the thing - I know that my I2C bus isn't perfect, and it would be good if I could condition my signals to avoid an issue, but the problem is that there's always a risk of errors on the bus due to noise. In the event of a failed I2C transaction, the bus will be in an error state, and that's fine if the software can detect that and return an error code to the caller. The problem is that the ESP32 I2C peripheral has a bug that causes its internal finite-state-machine (FSM) to lock up if SDA or SCK are electrically affected in certain ways. This is a known issue and acknowledged by Espressif. There is a fix in the 3.0 stream that attempts a FSM reset when there is a transaction timeout and the hardware busy flag is still raised. I see this fix activate sometimes and it seems to work. The issues that I have documented here are related to this, I think, but take it further:

there's some sort of condition that occurs when the bus is in an errored state that results in the interrupt service routine starving the scheduler and causing a watchdog reset.
almost every single time this happens, the subsequent reset causes another crash because the FSM is in the locked state, and there's no code in the I2C init function to reset the FSM state. The reset only occurs after a failed transaction, but at init time there are no transactions yet, so it doesn't get cleared. I added a FSM reset and that appeared to avoid this second issue (a reset due to 1. no longer causes a chain of resets due to 2.)

Because 2. happens almost every single time 1. does, I suspect that 1. is related to the FSM failure. It may be a cause, or it may be incidental, I'm not sure. I don't know enough about the FSM failure to know whether it can cause a flood of interrupts.

So my point is that although there's a lot I can do to improve the I2C bus in my particular circuit, there's a _real issue_ with the ESP32 software interaction with the hardware at the moment that is causing I2C for multiple people, and Espressif are in the best possible place to investigate this now that there's a way to reproduce it.

I am using external pull-ups BTW. The issue is also unrelated to bus speed. It happens at 10 kHz almost as often as it happens at 100 kHz.

DavidAntliff on 18 Feb 2018

👍1

Ok, I was curious to know what the value of the resistors you are using. Thank you.

Gustavomurta on 19 Feb 2018

I have also recently faced this problem, and I agree with @DavidAntliff that there is something wrong in i2c driver not being fully tolerant to errors in transmissions.

In my case, it was due to not using external pullups, so I was just using the internal ones, with just one peripheral and 5cm cables in a breadboard. I was seeing random communication errors, and at some point, the wdt interrupt timeout error would appear. Once this error appeared, ESP32 was stuck in a reboot loop. I had to unpower the ESP32, or make a long push on the reset button, in order to work again. When touching the cables with my fingers, this happened a lot more and earlier, that's how I realized I needed external pullups.

Once I put external pullups (4.7K), all communication errors and wdt interrupt errors are gone. But I understand there is something wrong with the driver, and there is the risk of entering this error state. Maybe if there is a lot of external EMI? Or maybe if some connection is a bit loose? Or in case the i2c pad of the wrover module I am using disconnects from my pcb and catches EMI noise... I dont know...

This scares me because my product has internal battery and there is no power switch, nor a reset button... and the battery and pcb is inside a housing, not accesible to the user... So if this happens, I am in a big trouble, and the product will have to be shipped back to me in order to open it and unpower it. Right now I am looking for a solution and I may add a way of resetting ESP32 from another MCU, my product has ESP32+nRF52.

This should be addressed looking for the real cause and fixing it, in order to make i2c driver fully tolerant to communication errors.

I have seen some people at the forum reporting that some gpio interrupts are happening more times than expected, could this be related?
Maybe the EMI noise + extra interrupts triggering are overflowing the interrupt handler or taking too much time?

luisonoff on 19 Feb 2018

Hi Luis, If there are many doubts, I suggest you to test with 3K3 resistors for 3.3V.
4K7 resistors are for 5V. Did you scope I2C interface with Logic analyzer ?

Gustavomurta on 19 Feb 2018

👍1

Thank you @Gustavomurta .
Using an oscilloscope was on my to-do list, but I did not get to do it because pullups seemed to fix the error.

luisonoff on 19 Feb 2018

@panfeng-espressif sorry, I missed your comment:

Hello, I found that sometimes there was a strange low level on SCL when SDA was prepared to start a I2C transmission. This made I2C transmission failed sometimes. Have you met such kind of condition before?

No, I haven't specifically noticed this - is this something you've seen with my circuit, or in general? My reproduction method ("brush the SDA/SCL wires together") could easily replicate this scenario though. I thought that it was a known issue that grounding SCL or SDA at the wrong time could cause the FSM to enter a locked state?

DavidAntliff on 19 Feb 2018

Just found this other issue at the arduino-esp32 repo, could be related?

https://github.com/espressif/arduino-esp32/pull/1065

Can anyone test if changing that pin declaration as they propose has any effect?

luisonoff on 21 Feb 2018

@DavidAntliff
Because we are celebrating the Spring Festival the last week, I'm quite sorry for the last response.
It is the phenomenon I saw with your code. When using the example code of ESP-IDF, it didn't come out. I would take your suggestion and do further test .

panfeng-espressif on 22 Feb 2018

@luisonoff thanks, I'll look into it with my reproduction code.

@panfeng-espressif ok, that sounds strange - I can't recall anything in my code that would drive SCL low - the SCL GPIO is only used for I2C operations. Do you see this often? I will keep an eye out for it.

DavidAntliff on 22 Feb 2018

I think I have had the same boot-loop issue a few times, using IDF da81b97 (git from 2017-12-31).

EDIT: confirmed, the stack trace is similar to other crashes, but has minor differences:

Register dump:
PC      : 0x4008748f  PS      : 0x00060034  A0      : 0x80088d00  A1      : 0x3ffc0570
0x4008748f: vTaskEnterCritical at ...esp-idf/components/freertos/tasks.c:5074

A2      : 0x00000000  A3      : 0x00000000  A4      : 0x00060021  A5      : 0x3ffc6700
A6      : 0x00000003  A7      : 0x000060d23  A8      : 0x000000fe  A9      : 0x00000001
A10     : 0x00000001  A11     : 0x00000000  A12     : 0x00060023  A13     : 0x00000001
A14     : 0x00060023  A15     : 0x00000000  SAR     : 0x00000013  EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000  LBEG    : 0x4008df7c  LEND    : 0x4008dfaa  LCOUNT  : 0xffffffff
0x4008df7c: memcpy at /home/jeroen/esp8266/esp32/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/machine/xte
nsa/../../../../.././newlib/libc/machine/xtensa/memcpy.S:168

0x4008dfaa: memcpy at /home/jeroen/esp8266/esp32/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/machine/xte
nsa/../../../../.././newlib/libc/machine/xtensa/memcpy.S:201


Backtrace: 0x4008748f:0x3ffc0570 0x40088cfd:0x3ffc0590 0x40086228:0x3ffc05b0 0x4008671d:0x3ffc05e0 0x40084579:0x3ffc0610 0x400d2073:0
x00000000
0x4008748f: vTaskEnterCritical at ...esp-idf/components/freertos/tasks.c:5074

0x40088cfd: xQueueGenericSendFromISR at ...esp-idf/components/freertos/queue.c:2037

0x40086228: i2c_master_cmd_begin_static at ...esp-idf/components/driver/i2c.c:1023

0x4008671d: i2c_isr_handler_default at ...esp-idf/components/driver/i2c.c:1023

0x40084579: _xt_lowint1 at ...esp-idf/components/freertos/xtensa_vectors.S:1105

0x400d2073: esp_vApplicationIdleHook at ...esp-idf/components/esp32/freertos_hooks.c:85

And the boot-loop after that:

Register dump:
PC      : 0x40088d10  PS      : 0x00060034  A0      : 0x8008622b  A1      : 0x3ffc0590
0x40088d10: xQueueGenericSendFromISR at ...esp-idf/components/freertos/queue.c:2037

A2      : 0x3ffcb718  A3      : 0x3ffc05b4  A4      : 0xx3ffc05b0  A5      : 0x00000002
A6      : 0x3ffcb76c  A7      : 0x00060021  A8      : 0x00000001  A9      : 0x00000001
A10     : 0x00000001  A11     : 0x00000000  A12     : 0x00000002  A13     : 0x3ffc05c0
A14     : 0x00000000  A15     : 0x00000000  SAR     : 0x00000001c  EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000  LBEG    : 0x4008df7c  LEND    : 0x4008dfaa  LCOUNT  : 0xffffffff
0x4008df7c: memcpy at /home/jeroen/esp8266/esp32/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/machine/xte
nsa/../../../../.././newlib/libc/machine/xtensa/memcpy.S:168

0x4008dfaa: memcpy at /home/jeroen/esp8266/esp32/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/machine/xte
nsa/../../../../.././newlib/libc/machine/xtensa/memcpy.S:201
Backtrace: 0x40088d10:0x3ffc0590 0x40086228:0x3ffc05b0 0x4008671d::0x3ffc05e0 0x40084579:0x3ffc0610 0x4000bfed:0x00000000
0x40088d10: xQueueGenericSendFromISR at ...esp-idf/components/freertos/queue.c:2037

0x40086228: i2c_master_cmd_begin_static at ...esp-idf/components/driver/i2c.c:1023

0x4008671d: i2c_isr_handler_default at ...esp-idf/components/driver/i2c.c:1023

0x40084579: _xt_lowint1 at ...esp-idf/components/freertos/xtensa_vectors.S:1105

Hardware: one TCA9535 I2C expander

chanibal on 22 Feb 2018

I am seeing this issue using esp-idf master 3ede9f (Feb 20, 2018) and using ESP32-DevKitC.

In my case, I first observed this issue while running a test whose side effect is some noise on the I2C bus. This noise in turn leads to some failed I2C transactions after which I immediately see the crash with
Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0)

During normal circumstances when all I2C transactions succeed (with a 1.2K pullup), I can keep running without any errors.

In trying to reproduce the same error without running my complicated test, I was able to use pull-ups of a higher value (4.7K) to cause the crash. In this case I observe quite a few failed I2C transactions during normal operation and then the crash.

Seems like the driver is not able to handle some error conditions and ultimately causes the crash. Would be great if someone can look into this since it is hard to guarantee that the I2C bus will always function without any errors.

Azriel-Samson on 24 Feb 2018

@luisonoff When the FSM gets hungup in a Bus Busy state, it can be recovered by manually stimulating SCL, SDA. I have connected 2 additional GPIO pins, one to SCL. one to SDA. By transmitting a START, (SDA going low while SCL high, then nine SCL High->Low, then (STOP) SCL High (100us delay)then SDA High. This bus busy state can be cleared.

Chuck.

stickbreaker on 28 Feb 2018

@stickbreaker interesting - I see you've done some work on this on the Arduino side. Can you explain how your FSM clearing approach is different to i2c_hw_fsm_reset()?

However the main issue, in my opinion at least, is the watchdog reset inside the interrupt service routine, and it's not 100% clear to me that this is caused by the FSM busy state (although all indicators so far suggest that it is). Is there an opportunity to trap and clear this FSM busy state before the watchdog timeout occurs? I suspect not though, since the interrupts have already been generated. But this is all guesswork on my part.

@panfeng-espressif @igrr Given that many people have now reported seeing this issue too, is Espressif able to give us an update on their investigation please? I'm hoping for a fix so my project can proceed.

DavidAntliff on 28 Feb 2018

@DavidAntliff i2c_hw_fsm_reset() is attempting to work around the problem that the fsm will see a glitch on SDA as a multiMaster bus capture. Once the FSM sees 'another' master transacting the bus it will initiate a Bus_Busy State until it sees a complete transaction with a Valid STOP. The basis of this problem is the FSM's interpretation of START:

A START condition is High SCL, HIGH SDA then SDA LOW then SCL LOW.
The FSM interprets SCL HIGH, SDA HIGH then SDA LOW then SDA HIGH.
This LOW going 'glitch' on SDA should not be interpreted as START, It should just be ignored. The reason the protocol defines START as a voltage level sequence is to immunize the FSM from signal glitches.

Since all of the problems have been encountered by people using the ESP32 in a SINGLE Master I2C configuration, the FSM will infinitely hang waiting for the 'other' master to complete it's transaction.

The use of additional GPIO pins to act as another I2C master will solve the Bus_Busy problems, Actual TIMEOUT interrupt cascades I haven't solved.

static esp_err_t i2c_master_clear_bus(i2c_port_t i2c_num) acts like an I2C master to send out a null transaction. Except, there is no initial START(It drops SCL before SDA). So, it is another illegal transaction.
i2c_master_clear_bus

 gpio_set_direction(scl_io, GPIO_MODE_OUTPUT_OD);
    gpio_set_direction(sda_io, GPIO_MODE_OUTPUT_OD);
    gpio_set_level(scl_io, 0);
    gpio_set_level(sda_io, 0);
    for (int i = 0; i < 9; i++) {
        gpio_set_level(scl_io, 1);
        gpio_set_level(scl_io, 0);
    }
    gpio_set_level(scl_io, 1);
    gpio_set_level(sda_io, 1);
    i2c_set_pin(i2c_num, sda_io, scl_io, 1, 1, I2C_MODE_MASTER);
return ESP_OK;

This code should be changed to something like this:

    gpio_set_level(scl_io, 1); // initial condition SCL needs to be High
    gpio_set_level(sda_io, 1);// initial conditions SDA needs to be high
    // a small delay (5us) 1/2 clock at 100khz
    gpio_set_level(sda_io, 0);// Issues START
    // a small delay (5us) 1/2 clock at 100khz
     for (int i = 0; i < 9; i++) {
        gpio_set_level(scl_io, 0);
    // a small delay (5us) 1/2 clock at 100khz
        gpio_set_level(scl_io, 1);
    // a small delay (5us) 1/2 clock at 100khz
    }
    gpio_set_level(sda_io, 1); // Issue STOP

The history of needing this function traces back to hardware glitches that occur when attaching the GPIO pins to the hardware peripheral. @ESP32DE and I solved these glitches for the Arduino environment proposal to i2c. I don't know the equivalent pin assignment sequence for IDF. I don't use directly use esp32-IDF.

In my testing I no longer need to execute this function at every boot.

with a quick looking through of the IDF i2c code, I see a few design idea I don't support.

First, exiting through portYIELD_FROM_ISR(); at every opportunity.
portYIELD_FROM_ISR(); should only be called at the end of the ISR. If one of the OS calls returns (HPTaskAwoken == pdTRUE), Then at the END of the ISR call portYIELD_FROM_ISR() instead of just exiting the ISR. HPTaskAwoken just means that the FOREGROUND process must Yield. not the ISR. so, as soon as you complete your ISR (which should be Short, Deterministic, and NO waits) Call portYIELD_FROM_ISR(); to task switch to the higher priority FOREGROUND task.

I think the multiple portYIELD_FROM_ISR() throughout the ISR is the cause/basis of this Issue. A single interrupt is not completing before the next byte moves. HPTaskAwoken only needs to be acted on ONCE, just before the end of the ISR. I would create a SINGLE HPTaskAwoken variable at the top of the ISR and pass it by reference to all sub-Functions. The FreeRTOS ISR functions will not clear it, only SET it. So, if any of the OS functions set it, it will cascade until that last step before the ISR returns. I haven't studied this ISR, in my verion, I have a subfunction that does my exit operations it is the only place that can call portYIELD_FROM_ISR().
This code is from my ISR statemachine.

 if (activeInt & I2C_TRANS_COMPLETE_INT_ST_M) { 
    i2cIsrExit(p_i2c,EVENT_DONE,false);
    return; // no more work to do

inside i2cIsrExit() all of my ISR cleanup, foreground notifications, and a potential portYIELD_FROM_ISR() is executed, the return; exits from my ISR back to the interrupted foreground task.

From my point of view, ISR's must complete. They are atomic. If an ISR can't complete in a short FINITE timespan, it is coded wrong.

Second, I have discovered the FSM.command[] list has some order rules. If the command I2C_CMD_END is used, when the command[] buffer is refilled, the positions of command[] at and beyond where I2C_CMD_END was placed cannot be reused.
In my implementation of I2C for Arduino, If I need to use I2C_CMD_END because of data block length I only use command[15] for end. The FSM allows each command to move upto 255 bytes, so before I need to use an 'END' a Master Write would have to send over 3060 bytes. A single Read is a little more limited because of the last byte NAK. I don't see any consideration for these 'rules'.

Chuck.

stickbreaker on 1 Mar 2018

👍3

@DavidAntliff I have reported this issue, and our engineer is working on the problem now. As soon as we have any conclusion, I will reply to you here.

panfeng-espressif on 2 Mar 2018

Is this Problem related to https://www.esp32.com/viewtopic.php?f=2&t=2632&p=16910&hilit=juergen#p16910?

I saw that the I2C Interrupt randomly stopped working and the i2C Interrupt monitoring timed out.

What I found out was: it never occured if I have no Network traffic through WLAN. (using REST requests). And when I am hammering the System with a high frequency of web requests the I2C fails much more often and the I2C timeout stopps.
I assume that there is a Problem with the Interrupt handler between I2C and somewhat of websocket activities through WLAN (as STA or AP does not matter).

The i2c Fails when having Websocket acticities. When there is more activity it fails more often And it does not without websocket activities.

Juergen

juergen1601 on 4 Mar 2018

Software reset does not reset peripherals. so, when the I2C state machine has a serious error, the system will restart repeatedly unless a system reset. I found, when the the communication environment is relatively poor, ack will be wrong, and then generate an interruption. we use a while loop to handle all interrupts, houwerver, for some reason, ack error caused this loop to execute more than 700 times in isr context. so the system crashed. we are looking for a solution to this problem and any progress will be notified here. thanks !!

koobest on 13 Mar 2018

@koobest I just found a way to reset the I2C peripherals that seems to do a power on reset. It clears out all registers, resets the Bus_busy flags, initializes the hardware state back to a Power On condition.
I work on the Arduino branch, a patch merge containing just this reset is arduino-esp32 pr 1201.

Chuck.

stickbreaker on 13 Mar 2018

👍3

@koobest if you can work out why the ISR loop runs 700 times (and presumably neglects kicking the watchdog resulting in the crash) then that would be very useful, as I believe that is the main issue here.

If you call i2c_hw_fsm_reset in i2c_driver_install then this also stops the repeated crashing. It will still crash once, of course.

DavidAntliff on 13 Mar 2018

We are trying to enable time_out and ack_err these two interrupt in i2c_master_cmd_begin, when these two interrupt happens, disable it. we need to do a test for this.

koobest on 14 Mar 2018

Just wanted to let you know that there is a new commit in master from today that seems to fix this issue. I have not tested it yet. :)

luisonoff on 27 Mar 2018

👍3

@luisonoff thank you for the alert!

I have tested ESP-IDF commit 391c3ff959f9eb1b2975cb0d7b29c0478f3b6a48 with my reproduction project on one of my "DOIT" boards and I can happily report that I am unable to reproduce my issue by rubbing SDA and SCL together rapidly. I spent maybe 4 minutes rapidly mechanically shorting them and did not see a single crash.

Then I reverted back to 2e7613b6560775b27c50eb81e81d5c3ff712b866 (just prior to the "fix") and verified that the issue can be reproduced. In fact it was extremely easy to reproduce it, many times per minute.

So I can conclude from this that merge 892f3907fa2e074943e865b68f2fda3da600584b appears to resolve the issue, for me at least, on this board.

I'll try it on my Wemos LoLin32 Lite next, and report back if the results are any different.

EDIT: looks good on the LoLin32 also - no crashes seen. Good work Espressif! Thank you.

DavidAntliff on 28 Mar 2018

👍1

I can also confirm that this fix works for me. My ESP32 does not crash anymore even though there is noise on the I2C line.
Thanks.

Azriel-Samson on 28 Mar 2018

Another successful story. I was using an SSD1306 via i2c combined with a UART device and it used to fail. Updated to the latest version that includes this fix and now things just work!!! :D

Thanks for this great fix. My hobby project can now continue!!

gdombiak on 28 Mar 2018

Hi DavidAntliff ,
I am trying to apply the patch 892f390 but unfortunately I am using arduino ide and I did not find the file i2c.c.
So my question is where is the file driver/i2c.c in arduino ide and how to apply the patch in arduino ide.
Please guide me to resolve the i2c issue due to watch dog timer.
Thanks and regards,
Satyabrata

psatya111 on 28 Apr 2018

@psatya111 the patch is for ESP-IDF not Arduino - you can't apply it, it's different code. You might be able to find help in the esp32 Arduino forum.

DavidAntliff on 29 Apr 2018

Esp-idf: [TW#17765] I2C crashing - watchdog timeout (master & v3.0 branch)

Most helpful comment

All 78 comments

Related issues