Zephyr: STM32F407 I2C driver hangs

Created on 12 Feb 2020  路  14Comments  路  Source: zephyrproject-rtos/zephyr

The STM32F407 I2C driver hangs from time to time where it waits on the semaphore in the driver. When that happens the signal on the bus looks like this (SDA = yellow, SCL = green);

i2c_error

To work around it I added a timeout to the k_sem_take calls, but I have not found the real cause of the problem. On the scope is to see it must go wrong real early, because it generates a start-stop without any data.

This is on the 1.14 branch, but I cherry-picked most commits (like the new timeout handling) from master. The target is a STM32F407 so it uses the V1 driver. Also not using IRQ's but polling causes the problem to happen less but it still happens.

I2C bug STM32 low

All 14 comments

@lowlander even if happening from time to time, would you have some code to reproduce?

@erwango I will try to "port" the problem to a 407 dev board. This will take me at least until early next week. As soon as I have more info I'll post it here.

@lowlander could you be so kind to try play with optimization?
CONFIG_SIZE_OPTIMIZATIONS=n
CONFIG_NO_OPTIMIZATIONS/CONFIG_DEBUG_OPTIMIZATIONS=y
and vise-versa

@pavlohamov I have tried every optimization setting and it doesn't seem to influence the problem, it happens with all settings.

Setting prio low while we're waiting for way to reproduce

I have been trying to debug the problem, and the only thing I can say is the first IRQ (BS) happens, and the driver sets the DR register with the address, but that never lands on the bus. And no other IRQ's will be generated, so the driver "hangs".

What is more worrying is that I found others that reported the same issue, but with different software;
https://community.st.com/s/question/0D50X00009Xkhfn/stm32f2xx-i2c-not-sending-address-after-start

@erwango would it be possible to access the SDA and SCL as gpio pins inside the driver, so they can be used to "hard reset" the i2c bus in case a slave gets confused by the start-stop pulse and doesn't release the bus.

Because the ST errata just has as software workaround "reset the peripheral", and than there is a risk of leaving the bus in a wrong state.

So I think it would be good to have a real bus-reset function in the driver, so not everybody has to write their own hacks.

@lowlander indeed, we're working on it via the introduction of pinctrl definition via device tree.
In this model, each peripheral driver would have access at pin definitions and would be able to use them for multiple purpose: reset, low power, ...
Topic has just started, but you could find some info here: https://github.com/zephyrproject-rtos/zephyr/issues/22748

@erwango but how to fix this for 1.14.X ? I can add the timeout and give the i2c hardware a reset, but I have no access to the GPIO pins the reset the bus in the worst case scenario.

@lowlander

@erwango but how to fix this for 1.14.X

My bad I forgot you were using 1.14. This would be a significant change (and not a simple fix), I don't know the policy on adding enhancement on LTS branch. @MaureenHelm ?

@erwango I think just a peripheral soft-reset is the only option, if it keeps failing the 1.14 user will have to build its own hard-reset via GPIO in its own application.

Your timeout patch can be back ported and a 100ms timeout on the mutex should make sure the driver comes back to "userspace" where the user than must check via GPIO if the SDA or SCL are still low and than reset the bus via a "fake" clock signal and a "fake" stop-condition.

Let me try to make a PR and than move the discussion there.

OK after some more research it seems the the driver sets the STOP flag (not sure when/how), this flags stays pending until a data byte is finished or a START is generated, and than it will directly generate a STOP (and that is what I see on the bus).

I now check and reset the STOP bit in CR1 before doing a START, and this seems to work.

Not really sure where it goes wrong, it has to be some race-condition or else it would always case a problem.

The fix I have now needs some more long term (several days run time) testing before I'll create a PR.

@erwango I made an initial patch set #23663, that fixes my problems. The ECM problem was very tricky because in only happened every 20 to 30 hours, until I figured out that it can be triggered by simply short circuit SDA to GND, that will directly hang the current driver.

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

Was this page helpful?
0 / 5 - 0 ratings