kernel/sched/core.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
During Wound/Wait testing on PREEMPT_RT, a WARNING was hit:
WARNING: CPU: 0 PID: 0 at kernel/sched/core.c:7085 rt_mutex_pre_schedule+0xa8/0x108
Call trace:
rt_mutex_pre_schedule+0xa8/0x108
__ww_rt_mutex_lock+0x1d4/0x300
ww_mutex_lock+0x1c/0x30
The issue stems from the non-atomic `fetch_and_set` macro:
#define fetch_and_set(x, v) ({ int _x = (x); (x) = (v); _x; })
It lacks atomicity and memory ordering, leading to race conditions under
preemption or interrupts, where `current->sched_rt_mutex` may be corrupted.
Since this flag is only used for lockdep assertions and accessed per-task,
replace the unsafe macro with direct assignment and explicit state checks:
- In rt_mutex_pre_schedule(): assert is 0 before setting to 1.
- In rt_mutex_post_schedule(): assert is 1 before clearing to 0.
This fixes the false-positive warning without needing atomic operations.
Signed-off-by: cuiguoqi <cuiguoqi@kylinos.cn>
---
kernel/sched/core.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5e89a6eeadba..fb4c446e46f7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7078,11 +7078,11 @@ const struct sched_class *__setscheduler_class(struct task_struct *p, int prio)
* name such that if someone were to implement this function we get to compare
* notes.
*/
-#define fetch_and_set(x, v) ({ int _x = (x); (x) = (v); _x; })
void rt_mutex_pre_schedule(void)
{
- lockdep_assert(!fetch_and_set(current->sched_rt_mutex, 1));
+ lockdep_assert(!current->sched_rt_mutex);
+ current->sched_rt_mutex = 1;
sched_submit_work(current);
}
@@ -7095,7 +7095,9 @@ void rt_mutex_schedule(void)
void rt_mutex_post_schedule(void)
{
sched_update_worker(current);
- lockdep_assert(fetch_and_set(current->sched_rt_mutex, 0));
+ lockdep_assert(current->sched_rt_mutex);
+ current->sched_rt_mutex = 0;
+
}
/*
--
2.25.1
On Tue, 26 Aug 2025 19:08:33 +0800 cuiguoqi <cuiguoqi@kylinos.cn> wrote: > @@ -7078,11 +7078,11 @@ const struct sched_class *__setscheduler_class(struct task_struct *p, int prio) > * name such that if someone were to implement this function we get to compare > * notes. > */ > -#define fetch_and_set(x, v) ({ int _x = (x); (x) = (v); _x; }) > > void rt_mutex_pre_schedule(void) > { > - lockdep_assert(!fetch_and_set(current->sched_rt_mutex, 1)); > + lockdep_assert(!current->sched_rt_mutex); > + current->sched_rt_mutex = 1; > sched_submit_work(current); > } > > @@ -7095,7 +7095,9 @@ void rt_mutex_schedule(void) > void rt_mutex_post_schedule(void) > { > sched_update_worker(current); > - lockdep_assert(fetch_and_set(current->sched_rt_mutex, 0)); > + lockdep_assert(current->sched_rt_mutex); > + current->sched_rt_mutex = 0; > + > } Honestly, without adding a READ_ONCE() or barrier() I don't see how your change would prevent the compiler from making the code any different? It may have "fixed" your issue because the compiler may have done things differently, but this change doesn't guarantee anything. Also, could you show an example of how current->sched_rt_mutex could be corrupted? -- Steve
The issue arises during EDEADLK testing in `lib/locking-selftest.c` when `is_wait_die=1`. In this mode, the current thread's `debug_locks` flag is disabled via `__debug_locks_off` (which calls `xchg(&debug_locks, 0)`) during the blocking path of `rt_mutex_slowlock`, specifically in `rt_mutex_slowlock_block()`: rt_mutex_slowlock() rt_mutex_pre_schedule() rt_mutex_slowlock_block() DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock) __debug_locks_off(); // xchg(&debug_locks, 0) However, `rt_mutex_post_schedule()` still performs: lockdep_assert(fetch_and_set(current->sched_rt_mutex, 0)); Which expands to: do { WARN_ON(debug_locks && !( ({ int _x = current->sched_rt_mutex; current->sched_rt_mutex = 0; _x; }) )); } while (0) The generated assembly shows that the entire assertion is conditional on `debug_locks`: adrp x0, debug_locks ldr w0, [x0] cbz w0, .label_skip_warn // Skip WARN if debug_locks == 0 This means: if `debug_locks` was cleared earlier, the check on `current->sched_rt_mutex` is effectively skipped, and the flag may remain set. Later, when the same task re-enters `rt_mutex_slowlock`, it calls `lockdep_reset()` to re-enable `debug_locks`, but the stale `current->sched_rt_mutex` state (left over from the previous lock attempt) causes a false-positive warning in `rt_mutex_pre_schedule()`: WARNING: CPU: 0 PID: 0 at kernel/sched/core.c:7085 rt_mutex_pre_schedule+0xa8/0x108 Because: - `rt_mutex_pre_schedule()` asserts `!current->sched_rt_mutex` - But the flag was never properly cleared due to the skipped post-schedule check. This is not a data race on the flag itself, but a **state inconsistency caused by conditional debugging logic** — the `fetch_and_set` macro is not atomic, but more importantly, the assertion is bypassed when `debug_locks` is off, breaking the expected state transition.
On 2025-08-26 09:56:15 [-0400], Steven Rostedt wrote: > Honestly, without adding a READ_ONCE() or barrier() I don't see how your > change would prevent the compiler from making the code any different? Other than that, that flag is supposed to be only set/ cleared by the thread itself. There should be no need for it to be atomic. > It may have "fixed" your issue because the compiler may have done things > differently, but this change doesn't guarantee anything. > > Also, could you show an example of how current->sched_rt_mutex could be > corrupted? That would be important. > -- Steve Sebastian
© 2016 - 2025 Red Hat, Inc.