Mbed-os: RTX5 Semaphore Observations

Created on 6 Jun 2017 · 9Comments · Source: ARMmbed/mbed-os

I was recently reviewing the latest changes in the CMSIS5 feature branch and I noticed a few things in the semaphore code that I thought I would bring up here to get community feedback.

osSemaphoreRelease() API Description

The documentation for the osSemaphoreRelease() in cmsis_os2.h could mislead novice users:

/// Release a Semaphore token that was acquired by \ref osSemaphoreAcquire.
/// \param[in]     semaphore_id  semaphore ID obtained by \ref osSemaphoreNew.
/// \return status code that indicates the execution status of the function.
osStatus_t osSemaphoreRelease (osSemaphoreId_t semaphore_id);

As seen above the documentation indicates that this API will release a semaphore token that was "acquired by osSemaphoreAcquire". This makes it sound like a thread shouldn't call osSemaphoreRelease() on a semaphore object that the same thread didn't previously acquire via osSemaphoreAcquire(). This would make sense if the user was attempting to use the semaphore in place of a mutex but not in a typical producer/consumer scenario. In such a scenario, the producer thread would typically release the semaphore to let a consumer thread know that it can now acquire and consume the produced object. The producer thread never calls osSemaphoreAcquire() and the consumer thread's call to osSemaphoreAcquire() can occur at a later point in time.

It is probably best to just keep the documentation similar to what it was in the previous version of RTX.

Release a Semaphore token.

Semaphore::wait() and Thread Safety

The token count returned from Semaphore::wait() is no longer thread safe.

This is the current implementation:

int32_t Semaphore::wait(uint32_t millisec) {
    osStatus_t stat = osSemaphoreAcquire(_id, millisec);
    switch (stat) {
        case osOK:
            return osSemaphoreGetCount(_id) + 1;
        case osErrorTimeout:
        case osErrorResource:
            return 0;
        case osErrorParameter:
        default:
            return -1;
    }
}

This is the previous implementation:

int32_t Semaphore::wait(uint32_t millisec) {
    return osSemaphoreWait(_osSemaphoreId, millisec);
}

In the previous implementation, the returned token count would come from osSemaphoreWait() and this was implemented in a thread safe manner. This means that the returned token count is the value it had just before it was decremented.

In the current implementation a call is made to osSemaphoreAcquire() and then a second call is made to osSemaphoreGetCount() to determine the token count value to be returned. There is nothing stopping another thread or ISR from modifying the token count between these two API calls. In the previous implementation, it wasn't possible for another thread to interrupt osSemaphoreWait() while it was decrementing and capturing the token count down in the SVC handler and there was no need to worry about higher priority interrupts modifying the token count since calling semaphore APIs from an ISR wasn't even allowed. As it is currently implemented, multiple calls to Semaphore::wait() from different concurrent threads may return the same value. This might not be expected by callers.

I can think of a few solutions:

Run the Semaphore::wait() code with interrupts disabled. I don't think this would actually work though since the semaphore code would fail with an osErrorResource error if it actually needed to place the thread into a waiting state.
Don't worry about backwards compatibility for this Semaphore::wait() method. Just change the return value type to osStatus. This will call attention to old user code which interprets the integer return value and this is probably a good thing since the old behavior can't be guaranteed anymore.

osStatus Semaphore::wait(uint32_t millisec) {
    return osSemaphoreAcquire(_id, millisec);
}

Modify the osSemaphoreAcquire() API to take an additional parameter, a uint16_t* pointer to be filled in with the token count value associated with this acquisition. It should be trivial to calculate this token count value in a thread safe matter down in RTX as part of the acquire code path. Allow the pointer to be NULL if the caller isn't interested in the token count value. If this was done, should it be done for osSemaphoreRelease() as well to maintain consistency?

osSemaphoreGetCount() and Thread Safety

Related to the previous item, I don't really see a good use for the osSemaphoreGetCount() API since it isn't guaranteed that the return value is in anyway consistent with the thread's previous calls to osSemaphoreAcquire() and/or osSemaphoreRelease(). It would be best to just remove this function if possible.

Source

adamgreen

Most helpful comment

Hi @adamgreen,

I just slightly enhanced the CMSIS-RTOS2 documentation about semaphores, see 2a4ab8.

Added a second example using semaphores for producer/consumer problems.
Removed the misleading "acquired by osSemaphoreAcquire" from osSemaphoreRelease brief.

May I ask you to take a look into the documentation on the devel branch and let me know about your opinion? Thanks.

cc @pan- @0xc0170

JonatanAntoni on 14 Jun 2017

👍3

All 9 comments

First of all, many thanks for this detailed report.

osSemaphoreRelease() API Description

This API has been improved (and fixed). In the previous version it was possible to release not provisioned tokens and therefore increase the total number of tokens available for the semaphore up to 65535 tokens( 😱 here ).

With the new API, it is not possible to release more tokens than the number of tokens provisioned during the semaphore initialization. Trying to do so will return an error.

In a producer/consumer scenario, the producer can continue to call release to wakeup the consumer. It will just fail if their is no consumer holding a token.

However, there is an issue with the error code return, it is no possible to differentiate an error resulting of an inactive semaphore object from an error resulting of a lack of tokens to release

Semaphore::wait() and Thread Safety

According to the documentation, this function returns the number of available tokens, or -1 in case of incorrect parameters .

With the RTX update, this statement remains valid, the function will return -1 in case of error and the number of available tokens. Does it really matter if concurrent call to Semaphore::wait returns the same result ? I'm not sure to understand why user would have expectations over the return of Semaphore::wait.

osSemaphoreGetCount() and Thread Safety

Related to the previous item, I don't really see a good use for the osSemaphoreGetCount() API since it isn't guaranteed that the return value is in anyway consistent with the thread's previous calls to osSemaphoreAcquire() and/or osSemaphoreRelease().

osSemaphoreGetCount() is consistent with calls to osSemaphoreAcquire()/osSemaphoreRelease() made across the whole system. A program expecting it to be consistent with the previous calls made in the local thread is doomed.

pan- on 8 Jun 2017

cc @RobertRostohar @JonatanAntoni

0xc0170 on 8 Jun 2017

@pan- Thanks for the response!

In a producer/consumer scenario, the producer can continue to call release to wakeup the consumer. It will just fail if their is no consumer holding a token.

But it is still possible to call osSemaphoreRelease() before another thread has gotten around to calling osSemaphoreAcquire()? It just can't do it more than the max count times.

According to the documentation, this function returns the number of available tokens, or -1 in case of incorrect parameters .

I guess I just don't know what available tokens is really supposed to mean? Before it was the number of available tokens at the time the acquire call was made. Now it is just whatever it's current value is +1.

osSemaphoreGetCount() is consistent with calls to osSemaphoreAcquire()/osSemaphoreRelease() made across the whole system. A program expecting it to be consistent with the previous calls made in the local thread is doomed.

Other than its use to give a return value more consistent with the old osSemaphoreWait() function, when would a developer use this new function?

adamgreen on 14 Jun 2017

Hi @adamgreen,

I just slightly enhanced the CMSIS-RTOS2 documentation about semaphores, see 2a4ab8.

Added a second example using semaphores for producer/consumer problems.
Removed the misleading "acquired by osSemaphoreAcquire" from osSemaphoreRelease brief.

May I ask you to take a look into the documentation on the devel branch and let me know about your opinion? Thanks.

cc @pan- @0xc0170

JonatanAntoni on 14 Jun 2017

👍3

@JonatanAntoni I checked out the commit mentioned above and it looked good to me. Thanks!

adamgreen on 14 Jun 2017

Based on feedback here, I wonder if this might not be a sufficient implementation for Semaphore::wait() using the new API?

int32_t Semaphore::wait(uint32_t millisec) {
    osStatus_t stat = osSemaphoreAcquire(_id, millisec);
    switch (stat) {
        case osOK:
            return 1;
        case osErrorTimeout:
        case osErrorResource:
            return 0;
        case osErrorParameter:
        default:
            return -1;
    }
}

If the acquisition was successful then just return 1, a positive integer.

adamgreen on 14 Jun 2017

@JonatanAntoni Many thanks for the documentation enhancements. I'm nitpicking but the enumeration of returns value from osSemaphoreRelease is inaccurate. osErrorResource is also returned when the semaphore is in an invalid semaphore state. Would it be possible to distinguish an error caused by the lack of tokens to release from an error due to an invalid state of the semaphore ?

For example, an IRQ handler signaling a thread via a osSemaphoRelease cannot distinguish an error due to the thread at the other end being awake and not holding any token from a semaphore invalid state error. In the first case, the error may happen and it won't hurt the program flow but in the second case it is a critical error.

@adamgreen

But it is still possible to call osSemaphoreRelease() before another thread has gotten around to calling osSemaphoreAcquire()? It just can't do it more than the max count times.

Unlike the RTX1 implementation, it is safe to call osSemaphoreRelease even if no tokens has been acquired via osSemaphoreAcquire.

About osSemaphoreGetCount and what available tokens were/are/will be, I'm not sure of there usefulness beside logging. Even if the access to the semaphore is protected (Mutex or critical section) and the value returned by osSemaphoreGetCount is consistent locally; it will always be faster to blindly call osSemaphoreReleaseorosSemaphoreAcquire` and lookup the error result rather than acquiring the semaphore count and do something based on the returned value.

int32_t Semaphore::wait(uint32_t millisec) {
    osStatus_t stat = osSemaphoreAcquire(_id, millisec);
    switch (stat) {
        case osOK:
            return 1;
        case osErrorTimeout:
        case osErrorResource:
            return 0;
        case osErrorParameter:
        default:
            return -1;
    }
}

It is a breaking change with the current API but it would be more useful than what we currently have. However I'm not sure it is necessary to factor error codes they are all meaningful.

osStatus_t Semaphore::wait(uint32_t millisec) {
    return osSemaphoreAcquire(_id, millisec);
}

pan- on 14 Jun 2017

Hi @adamgreen,

osErrorResource is used whenever the operation cannot be completed due to object state, such as a

uninitialized,
deleted, or
overflown

semaphore. Either case should be avoided. One should not release tokens that have not been acquired before.

The new producer/consumer example I added recently makes use of two semaphores. This way the producer always knows that he is allowed to release a token to the first semaphore after he was able to acquire one from the second one. This should work for an IRQ-based producer as well. Might that work for your scenario as well?

Changing the CMSIS-RTOS2 API must be done extremely carefully. Typically we try to avoid breaking changes as best as we can.

Kind regards,
Jonatan

JonatanAntoni on 14 Jun 2017

👍2

@JonatanAntoni,

Changing the CMSIS-RTOS2 API must be done extremely carefully. Typically we try to avoid breaking changes as best as we can.

Based on the above feedback, I think I would now only suggest the consideration of two possible CMSIS-RTOS changes:

As @pan- mentioned above, it might be nice to distinguish between the two osErrorResource error causes in the acquire and release paths.
Consider removing the new osSemaphoreGetCount() API. It didn't exist in earlier versions.

I also suggested a change to the mbed-os's Semaphore::wait() method above, changing it not to call osSemaphoreGetCount() and instead just return 1 on successful acquisition. I suspect that that is not a breaking change for most users of the existing API.

adamgreen on 14 Jun 2017

Was this page helpful?

0 / 5 - 0 ratings