Describe the bug
I was preparing some training material and noticed behavior that I believe is a bug.
I have threads. Two are cooperative. The higher priority one is waiting for mutex. The lower priority one is holding the mutex. After mutex is release low priority one continues execution until sleep. If mutex is held by preemptive thread releasing the mutex cause immediate swap to high priority thread.
[00:00:00.000,000] <inf> sched_demo: thread 0x20010438: id:2 prio:-2 ... 0 locked
[00:00:00.977,569] <inf> sched_demo: thread 0x20010438: id:2 prio:-2 ... 0 unlocked
[00:00:00.977,600] <inf> sched_demo: thread 0x20010438: id:2 prio:-2 ... 0 waiting <--- [-2] sleeps
[00:00:00.977,630] <inf> sched_demo: thread 0x200103bc: id:1 prio:-1 ... 0 locked <--- [-1] has the lock
[00:00:01.955,108] <inf> sched_demo: thread 0x20010438: id:2 prio:-2 ... 0 waited <--- [-1] is sleeping, [-2] is done sleeping will executes until mutex_lock
[00:00:01.955,139] <inf> sched_demo: thread 0x200103bc: id:1 prio:-1 ... 0 unlocked <--- [-1] gets back the CPU as [-2] is waiting for the lock; [-1] releases the lock and continues execution
[00:00:01.955,169] <inf> sched_demo: thread 0x200103bc: id:1 prio:-1 ... 0 waiting <--- continues until sleep
[00:00:01.955,200] <inf> sched_demo: thread 0x20010438: id:2 prio:-2 ... 1 locked <--- [-1] went to sleep so [-2] can take over the CPU
[00:00:02.932,647] <inf> sched_demo: thread 0x20010438: id:2 prio:-2 ... 1 unlocked
[00:00:02.932,678] <inf> sched_demo: thread 0x20010438: id:2 prio:-2 ... 1 waiting
[00:00:02.932,708] <inf> sched_demo: thread 0x200103bc: id:1 prio:-1 ... 0 waited
[00:00:02.932,739] <inf> sched_demo: thread 0x20010340: id:0 prio:1 ... 0 locked
[00:00:03.910,186] <inf> sched_demo: thread 0x20010438: id:2 prio:-2 ... 1 waited
[00:00:03.910,217] <inf> sched_demo: thread 0x20010340: id:0 prio:1 ... 0 unlocked
[00:00:03.910,247] <inf> sched_demo: thread 0x20010438: id:2 prio:-2 ... 2 locked
[00:00:03.910,278] <inf> sched_demo: thread 0x20010340: id:0 prio:1 ... 0 waiting
[00:00:04.887,725] <inf> sched_demo: thread 0x20010438: id:2 prio:-2 ... 2 unlocked
[00:00:04.887,756] <inf> sched_demo: thread 0x20010438: id:2 prio:-2 ... 2 waiting
[00:00:04.887,786] <inf> sched_demo: thread 0x200103bc: id:1 prio:-1 ... 1 locked
To Reproduce
Steps to reproduce the behavior:
Run simple code
Expected behavior
High priority thread should be swapped in as soon as lock is released. i.e. mutex_unlock must be a scheduling point!
Impact
Probably showstopper
Screenshots or console output
N/A
Environment (please complete the following information):
master
Additional context
N/A
static void thread_fn(void *p1, void *p2, void *p3)
{
size_t id = (size_t)p1;
struct thread_config *thread = p2;
int prio = thread->prio;
s32_t locked_time_ms = thread->locked_time_ms;
s32_t unlocked_time_ms = thread->unlocked_time_ms;
size_t i = 0;
while (true) {
/* Resource is locked */
k_mutex_lock(&mutex, K_FOREVER);
LOG_INF("thread %p: id:%zu prio:%d ... %zu locked",
k_current_get(), id, prio, i);
k_sleep(locked_time_ms);
LOG_INF("thread %p: id:%zu prio:%d ... %zu unlocked",
k_current_get(), id, prio, i);
k_mutex_unlock(&mutex);
/* Thread not holding resources */
LOG_INF("thread %p: id:%zu prio:%d ... %zu waiting",
k_current_get(), id, prio, i);
k_sleep(unlocked_time_ms);
LOG_INF("thread %p: id:%zu prio:%d ... %zu waited",
k_current_get(), id, prio, i);
i++;
}
}
@carlescufi , @ioannisg
Indeed, there's no reschedule done inside k_mutex_unlock(). I'm not sure this is a bug, though. Note that k_sem_give() has exactly the same behavior, as does k_poll_signal_raise() and a bunch of other "write-like" IPC APIs we have.
But not all: I note that an unblocking insert to a queue (e.g. k_lifo/k_fifo) does do a synchronous reschedule. We've never been good about documenting what our reschedule points are.
As far as k_mutex... I suspect the reason it's done this way is a performance thing. There's an existing controversy in the industry about "fair" locks. IIRC, the last time this came up it was pointed out that default POSIX mutexes in OS X are fair (i.e. have the behavior you want, and always yield if needed) and Linux/glibc ones are not, which is why they can minimize the number of context switches and thus blow Apple away in routine benchmarks.
I officially take no position here. If there's desire for the feature, this small patch should work to do it. It doesn't break existing tests, but I haven't tried it with your example above yet.
(Note that it also fixes what seem like really obvious races here -- it's changing mutex state after readying the thread, which means that an SMP core or interrupt might run it before the mutex is correct. I'll submit those separately once we decide what to do here.)
--- a/kernel/mutex.c
+++ b/kernel/mutex.c
@@ -233,18 +233,15 @@ void z_impl_k_mutex_unlock(struct k_mutex *mutex)
mutex, new_owner, new_owner ? new_owner->base.prio : -1000);
if (new_owner != NULL) {
- z_ready_thread(new_owner);
-
- k_spin_unlock(&lock, key);
-
- arch_thread_return_value_set(new_owner, 0);
-
/*
* new owner is already of higher or equal prio than first
* waiter since the wait queue is priority-based: no need to
* ajust its priority
*/
mutex->owner_orig_prio = new_owner->base.prio;
+ arch_thread_return_value_set(new_owner, 0);
+ z_ready_thread(new_owner);
+ z_reschedule(&lock, key);
} else {
mutex->lock_count = 0U;
k_spin_unlock(&lock, key);
my vote is to have a reschedule point here. k_mutex should be considered as a special case like Linux RT-Mutex, aside from other IPC which does more traditional wait/wake semantics and aren't supposed to be priority inheriting.
Lowering priority. Based on @andyross evaluation, even considering this a medium bug may be too much.
While this is currently a question it is blocking for #18970 as described in https://github.com/zephyrproject-rtos/zephyr/pull/20919#pullrequestreview-324937300: until it's answered I can't identify the conditions under which a cooperative thread may transition from running to another state.
I think we're converging to a consensus that this is working as designed for co-op threads (even if we add a z_reschedule(), it's a no-op if the caller is coop) but definitely broken for preempt threads.