Mbed-os: Issue using multiple SPI objects with software NSS

Created on 13 Jun 2019 · 24Comments · Source: ARMmbed/mbed-os

Description

Hello!

I'm using 3 different SPI objects in my application, all of them are using software NSS which means that I am toggling a GPIO (DigitalOut) before and after the transaction.

Now the problem is that when i use "spi.write()" it calls "acquire()" which in turn reconfigures the SPI peripheral (BTW it is not even the same peripheral, each SPI object uses different physical peripheral and there is no actual need to reconfigure it over and over again - is this on purpose?). Now the reconfiguration makes the CLK pin to act weird as the reconfiguration takes place, it won't be such a big deal but the CS is already down (and the slave is already listening to transaction) - as a result the slave can't parse the transaction correctly and it won't answer.

I was trying to inherit the SPI class with my own and to do the CS internally but some methods aren't virtual (such as "transfer()").

Is this a real bug or did I misuse it?

target - STM32F407 DISCOVERY

Issue request type

[] Question
[ ] Enhancement
[x] Bug

CLOSED st mirrored bug

Source

embeddedteam103

Most helpful comment

Only a couple of targets currently have this. I believe ST are planning a PR to do a bulk update of their targets
I will make a PR including this for all our targets.
@kjbracey-arm, do we have a way to easily test this feature ? (else, I will just pass CI shield tests)

VVESTM on 14 Jun 2019

🎉2

All 24 comments

Internal Jira reference: https://jira.arm.com/browse/MBOCUSTRIA-1284

ciarmcom on 13 Jun 2019

I believe I resolved this in Mbed OS 5.12. SPI now has exposed select and deselect methods which allow correct ordering:

  SPI spi(MOSI_PIN, MISO_PIN, SCLK_PIN);
  DigitalOut gpio(SSEL_PIN, 1);
  spi.select();
  gpio = 0;
  spi.write(...)
  gpio = 1;
  spi.deselect();

If you have any sort of bus/peripheral sharing, just asserting your SSEL manually then writing isn't thread safe anyway - what if another thread decided it was going to access its slave while your SSEL was on? In principle you've got to acquire exclusive master peripheral access before asserting the SSEL.

If you don't have any further special SSEL timing requirements, SPI can now control the SSEL for you as GPIO, permitting manual control via select, or ensuring it's done in the correct order with a standalone write.

Use the new constructor:

  SPI spi(MOSI_PIN, MISO_PIN, SCLK_PIN, SSEL_PIN, use_gpio_ssel);
  // 1 standalone transaction - no manual select required
  spi.write(...)
  // 2 transactions locked as one:
  spi.select()
  spi.write(...)
  spi.write(...)
  spi.deselect()

BTW it is not even the same peripheral, each SPI object uses different physical peripheral and there is no actual need to reconfigure it over and over again - is this on purpose?

Legacy pre-5.12 behaviour was that SPI works on the pessimistic assumption that all slaves might be on the same peripheral, thus using a global mutex and configuration.

As of 5.12, a target can provide a slightly extended API to distinguish peripherals, making the mutex and configuration per-peripheral. Search for DEVICE_SPI_COUNT. Only a couple of targets currently have this. I believe ST are planning a PR to do a bulk update of their targets, but you could easily experimentally add it to your build.

(Note that you might be on a separate wired bus, but still sharing a pin-multiplexed master peripheral, so SPI can't deduce different peripheral by different MOSI/MISO/SCLK pins - we need to ask the HAL).

kjbracey-arm on 14 Jun 2019

VVESTM on 14 Jun 2019

🎉2

There are currently no specific CI tests for this feature, but by defining DEVICE_SPI_COUNT, a certain amount of functionality is tested by SPI continuing to work at all - the SPI class adapts its implementation based on that macro.

I guess the only extra specific problems that might occur would be:

spi_get_peripheral_name returning more values than DEVICE_SPI_COUNT indicates
spi_get_peripheral_name returning different values for one peripheral

And it would be sub-optimal for spi_get_peripheral_name to return the same value for two different peripherals.

kjbracey-arm on 14 Jun 2019

Also note that memory usage is increased when DEVICE_SPI_COUNT is set - if SPI is used, by default static storage is allocated for that many peripherals.

Users can reduce this by setting the config option drivers.spi_count_max to indicate how many peripherals they know they're using. An assert is triggered if they actually use more than that, and that assert should never occur with the default DEVICE_SPI_COUNT-based allocation.

kjbracey-arm on 14 Jun 2019

Also note that memory usage is increased when DEVICE_SPI_COUNT is set - if SPI is used, by default static storage is allocated for that many peripherals.

Users can reduce this by setting the config option drivers.spi_count_max to indicate how many peripherals they know they're using. An assert is triggered if they actually use more than that, and that assert should never occur with the default DEVICE_SPI_COUNT-based allocation.

Interesting. Then shouldn't it be the other way ? Keep things as they are now and use drivers.spi_count_max from applications that require more than 1 SPI instance ?

LMESTM on 14 Jun 2019

I believe the emphasis was placed on guaranteeing performance - never unnecessarily mutexing two separate peripherals. So when we run out of storage slots, we assert. We guarantee full performance in all cases by always having enough slots.

If the fallback when running out of storage slots was to reuse a storage slot, then yes, it could have been done as you suggest. Default behaviour to have 1 slot, and the application can choose to have 2 slots to get enhanced performance if they know they're using 2 peripherals.

For the full RTOS (not bare-metal) builds we tend to default to full operation in APIs, with the option of tuning down, and I think the current set-up is consistent with that, but I acknowledge it could have been done the other way.

kjbracey-arm on 14 Jun 2019

We could still adjust the code to re-use slots, not assert, making drivers.spi_count_max set to 1 legal when using multiple peripherals, allowing people to get the 5.12 single mutex behaviour even when target is upgraded.

kjbracey-arm on 14 Jun 2019

Sounds like a good idea.
My concern is more around small targets, i.e. with low memory footprint.

LMESTM on 14 Jun 2019

On the lowest footprint devices using a single slave, there are further gains to be made by dropping the mutex infrastructure altogether.

kjbracey-arm on 14 Jun 2019

Hi
Is that issue still valid ?
Thx

jeromecoutant on 2 Sep 2019

@kjbracey-arm Hi! Looking at the examples you provided on your first comments I wonder if the current SPI implementation is not completely thread-safe. For example consider that you have 10 SPI devices on the same bus and also you have 10 threads trying to acquire exclusive master peripheral access, each one trying to communicate to a different SPI device with something like this:

  spi.select();
  CS = 0;
  spi.write(...)  //or write(tx_buffer, 4, ...)
  spi.write(...)
  spi.write(...)
  spi.write(...)
  CS = 1;
  spi.deselect();

The issue is that after every spi.write(...) the code calls deselect() which always unlocks the mutex for that particular peripheral granting green light to the next waiting thread to select() and put their respective CS down and now you will have two CSs lines down on the same bus, add more threads to the mix and it goes bad really quick. Using use_gpio_ssel will have the same behavior.

Tested on:
Target: STM32F7xx (6 SPI peripherals)
Version: mbed-os 5.13

m0cg on 20 Dec 2019

select/deselect are (or should be) reference counted. So that series of transactions is held together by the outer select/deselect.

The built-in selects only move the reference count between 1 and 2. The mutex and CS should only be deasserted by the final deselect taking the reference count from 1 to 0.

Although maybe there's a bug. Can you see one?

kjbracey-arm on 20 Dec 2019

The mutex and CS should only be deasserted by the final deselect taking the reference count from 1 to 0.

Then unlock should be inside of the if

void SPI::deselect()
{
    if (--_select_count == 0) {
        _set_ssel(1);
    }
    unlock();
}

m0cg on 20 Dec 2019

The lock and unlock are manipulating a re-entrant mutex, so that itself is reference counted. And the mutex has to be outside to protect _select_count.

kjbracey-arm on 20 Dec 2019

BTW, if you're seeing a problem, it may be a problem in the user of SPI. This one is known - did you come here from there? #11732

kjbracey-arm on 20 Dec 2019

Yeah I know the re-entrant mutex should be counted properly but I see on the rtos::Mutex class the private _count is not being used for anything... or am I missing something?

m0cg on 20 Dec 2019

The _count variable was just put there for debugging purposes - asserting that _count == 1 when doing a ConditionVariable::wait.

The actual re-entrancy is handled internally by the OS, as long as we set the osMutexRecursive flag in osMutexNew, which rtos::Mutex always does.

kjbracey-arm on 20 Dec 2019

Great, thanks for the fast responses! The issue I have is that I'm seeing this multiple CSs low on the oscilloscope and I thought this could be an issue but maybe I'm missing something on the user of SPI as you mention. I will keep looking and update if I find something.

One strange thing is that user code works correctly with the previous version of SPI(the one with the global mutex and not per SPI mutex)

m0cg on 20 Dec 2019

If there was an error in the SPI peripheral indication from the HAL, then maybe the code to separate the buses would go wrong.

Maybe add a bit of debug to check the SPIName passed to _lookup and the resulting _peripheral pointer for each SPI object.

If you have 6 devices on the same bus, they should be the same for all 6 objects, which then means they're all locked against the same _peripheral->mutex.

kjbracey-arm on 20 Dec 2019

Hi
Is there any pending action on that issue?
Or should we close it ?
Thx

jeromecoutant on 20 Feb 2020

@kjbracey-arm could you please answer @jeromecoutant 's question ?

adbridge on 10 Mar 2020

I'll close this.

@m0cg Please open a new issue with an update if you are still seeing an issue.

0xc0170 on 16 Mar 2020

👍1

Hi Guys,
I have a question on this topic and this felt apropriate thread.
After defining an SPI object, is there a way to control a pin (say the MOSI) as DigitalOut too?

When SPI is not needed, can I assign the MOSI pin as a GPIO to control something else?