[v21] Donor Migration for Proxy Execution (v21)

[RESEND][PATCH v21 2/6] sched/locking: Add blocked_on_state to provide necessary tri-state for proxy return-migration

Posted by John Stultz 4 weeks, 1 day ago

As we add functionality to proxy execution, we may migrate a
donor task to a runqueue where it can't run due to cpu affinity.
Thus, we must be careful to ensure we return-migrate the task
back to a cpu in its cpumask when it becomes unblocked.

Thus we need more then just a binary concept of the task being
blocked on a mutex or not.

So add a blocked_on_state value to the task, that allows the
task to move through BO_RUNNING -> BO_BLOCKED -> BO_WAKING
and back to BO_RUNNING. This provides a guard state in
BO_WAKING so we can know the task is no longer blocked
but we don't want to run it until we have potentially
done return migration, back to a usable cpu.

Signed-off-by: John Stultz <jstultz@google.com>
---
v15:
* Split blocked_on_state into its own patch later in the
  series, as the tri-state isn't necessary until we deal
  with proxy/return migrations
v16:
* Handle case where task in the chain is being set as
  BO_WAKING by another cpu (usually via ww_mutex die code).
  Make sure we release the rq lock so the wakeup can
  complete.
* Rework to use guard() in find_proxy_task() as suggested
  by Peter
v18:
* Add initialization of blocked_on_state for init_task
v19:
* PREEMPT_RT build fixups and rework suggested by
  K Prateek Nayak
v20:
* Simplify one of the blocked_on_state changes to avoid extra
  PREMEPT_RT conditionals
v21:
* Slight reworks due to avoiding nested blocked_lock locking
* Be consistent in use of blocked_on_state helper functions
* Rework calls to proxy_deactivate() to do proper locking
  around blocked_on_state changes that we were cheating in
  previous versions.
* Minor cleanups, some comment improvements

Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: kuyo chang <kuyo.chang@mediatek.com>
Cc: hupu <hupu.gm@gmail.com>
Cc: kernel-team@android.com
---
 include/linux/sched.h     | 80 +++++++++++++++++++++++++++++----------
 init/init_task.c          |  1 +
 kernel/fork.c             |  1 +
 kernel/locking/mutex.c    | 15 ++++----
 kernel/locking/ww_mutex.h | 20 ++++------
 kernel/sched/core.c       | 44 +++++++++++++++++++--
 kernel/sched/sched.h      |  2 +-
 7 files changed, 120 insertions(+), 43 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3ec0ef0d91603..5801de1a44a79 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -815,6 +815,12 @@ struct kmap_ctrl {
 #endif
 };
 
+enum blocked_on_state {
+	BO_RUNNABLE,
+	BO_BLOCKED,
+	BO_WAKING,
+};
+
 struct task_struct {
 #ifdef CONFIG_THREAD_INFO_IN_TASK
 	/*
@@ -1234,6 +1240,7 @@ struct task_struct {
 	struct rt_mutex_waiter		*pi_blocked_on;
 #endif
 
+	enum blocked_on_state		blocked_on_state;
 	struct mutex			*blocked_on;	/* lock we're blocked on */
 	raw_spinlock_t			blocked_lock;
 
@@ -2141,7 +2148,52 @@ extern int __cond_resched_rwlock_write(rwlock_t *lock);
 	__cond_resched_rwlock_write(lock);					\
 })
 
-#ifndef CONFIG_PREEMPT_RT
+static inline void __force_blocked_on_runnable(struct task_struct *p)
+{
+	lockdep_assert_held(&p->blocked_lock);
+	p->blocked_on_state = BO_RUNNABLE;
+}
+
+static inline void force_blocked_on_runnable(struct task_struct *p)
+{
+	guard(raw_spinlock_irqsave)(&p->blocked_lock);
+	__force_blocked_on_runnable(p);
+}
+
+static inline void __force_blocked_on_blocked(struct task_struct *p)
+{
+	lockdep_assert_held(&p->blocked_lock);
+	p->blocked_on_state = BO_BLOCKED;
+}
+
+static inline void __set_blocked_on_runnable(struct task_struct *p)
+{
+	lockdep_assert_held(&p->blocked_lock);
+	if (p->blocked_on_state == BO_WAKING)
+		p->blocked_on_state = BO_RUNNABLE;
+}
+
+static inline void set_blocked_on_runnable(struct task_struct *p)
+{
+	if (!sched_proxy_exec())
+		return;
+	guard(raw_spinlock_irqsave)(&p->blocked_lock);
+	__set_blocked_on_runnable(p);
+}
+
+static inline void __set_blocked_on_waking(struct task_struct *p)
+{
+	lockdep_assert_held(&p->blocked_lock);
+	if (p->blocked_on_state == BO_BLOCKED)
+		p->blocked_on_state = BO_WAKING;
+}
+
+static inline void set_blocked_on_waking(struct task_struct *p)
+{
+	guard(raw_spinlock_irqsave)(&p->blocked_lock);
+	__set_blocked_on_waking(p);
+}
+
 static inline struct mutex *__get_task_blocked_on(struct task_struct *p)
 {
 	lockdep_assert_held_once(&p->blocked_lock);
@@ -2163,24 +2215,23 @@ static inline void __set_task_blocked_on(struct task_struct *p, struct mutex *m)
 	lockdep_assert_held_once(&p->blocked_lock);
 	/*
 	 * Check ensure we don't overwrite existing mutex value
-	 * with a different mutex. Note, setting it to the same
-	 * lock repeatedly is ok.
+	 * with a different mutex.
 	 */
-	WARN_ON_ONCE(p->blocked_on && p->blocked_on != m);
+	WARN_ON_ONCE(p->blocked_on);
 	p->blocked_on = m;
+	p->blocked_on_state = BO_BLOCKED;
 }
 
 static inline void __clear_task_blocked_on(struct task_struct *p, struct mutex *m)
 {
+	/* The task should only be clearing itself */
+	WARN_ON_ONCE(p != current);
 	/* Currently we serialize blocked_on under the task::blocked_lock */
 	lockdep_assert_held_once(&p->blocked_lock);
-	/*
-	 * There may be cases where we re-clear already cleared
-	 * blocked_on relationships, but make sure we are not
-	 * clearing the relationship with a different lock.
-	 */
-	WARN_ON_ONCE(m && p->blocked_on && p->blocked_on != m);
+	/* Make sure we are clearing the relationship with the right lock */
+	WARN_ON_ONCE(m && p->blocked_on != m);
 	p->blocked_on = NULL;
+	p->blocked_on_state = BO_RUNNABLE;
 }
 
 static inline void clear_task_blocked_on(struct task_struct *p, struct mutex *m)
@@ -2188,15 +2239,6 @@ static inline void clear_task_blocked_on(struct task_struct *p, struct mutex *m)
 	guard(raw_spinlock_irqsave)(&p->blocked_lock);
 	__clear_task_blocked_on(p, m);
 }
-#else
-static inline void __clear_task_blocked_on(struct task_struct *p, struct rt_mutex *m)
-{
-}
-
-static inline void clear_task_blocked_on(struct task_struct *p, struct rt_mutex *m)
-{
-}
-#endif /* !CONFIG_PREEMPT_RT */
 
 static __always_inline bool need_resched(void)
 {
diff --git a/init/init_task.c b/init/init_task.c
index 7e29d86153d9f..6d72ec23410a6 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -174,6 +174,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = {
 	.mems_allowed_seq = SEQCNT_SPINLOCK_ZERO(init_task.mems_allowed_seq,
 						 &init_task.alloc_lock),
 #endif
+	.blocked_on_state = BO_RUNNABLE,
 #ifdef CONFIG_RT_MUTEXES
 	.pi_waiters	= RB_ROOT_CACHED,
 	.pi_top_task	= NULL,
diff --git a/kernel/fork.c b/kernel/fork.c
index db6d08946ec11..4bd0731995e86 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2129,6 +2129,7 @@ __latent_entropy struct task_struct *copy_process(
 	lockdep_init_task(p);
 #endif
 
+	p->blocked_on_state = BO_RUNNABLE;
 	p->blocked_on = NULL; /* not blocked yet */
 
 #ifdef CONFIG_BCACHE
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index fac40c456098e..42e4d2e6e4ad4 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -682,11 +682,9 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 		raw_spin_lock_irqsave(&lock->wait_lock, flags);
 		raw_spin_lock(&current->blocked_lock);
 		/*
-		 * As we likely have been woken up by task
-		 * that has cleared our blocked_on state, re-set
-		 * it to the lock we are trying to acquire.
+		 * Re-set blocked_on_state as unlock path set it to WAKING/RUNNABLE
 		 */
-		__set_task_blocked_on(current, lock);
+		__force_blocked_on_blocked(current);
 		set_current_state(state);
 		/*
 		 * Here we order against unlock; we must either see it change
@@ -705,14 +703,14 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 			 * and clear blocked on so we don't become unselectable
 			 * to run.
 			 */
-			__clear_task_blocked_on(current, lock);
+			__force_blocked_on_runnable(current);
 			raw_spin_unlock(&current->blocked_lock);
 			raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 			trace_contention_begin(lock, LCB_F_MUTEX | LCB_F_SPIN);
 			opt_acquired = mutex_optimistic_spin(lock, ww_ctx, &waiter);
 			raw_spin_lock_irqsave(&lock->wait_lock, flags);
 			raw_spin_lock(&current->blocked_lock);
-			__set_task_blocked_on(current, lock);
+			__force_blocked_on_blocked(current);
 			if (opt_acquired)
 				break;
 			trace_contention_begin(lock, LCB_F_MUTEX);
@@ -963,8 +961,11 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
 
 		next = waiter->task;
 
+		raw_spin_lock(&next->blocked_lock);
 		debug_mutex_wake_waiter(lock, waiter);
-		clear_task_blocked_on(next, lock);
+		WARN_ON_ONCE(__get_task_blocked_on(next) != lock);
+		__set_blocked_on_waking(next);
+		raw_spin_unlock(&next->blocked_lock);
 		wake_q_add(&wake_q, next);
 	}
 
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index e4a81790ea7dd..f34363615eb34 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -285,11 +285,11 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER *waiter,
 		debug_mutex_wake_waiter(lock, waiter);
 #endif
 		/*
-		 * When waking up the task to die, be sure to clear the
-		 * blocked_on pointer. Otherwise we can see circular
-		 * blocked_on relationships that can't resolve.
+		 * When waking up the task to die, be sure to set the
+		 * blocked_on_state to BO_WAKING. Otherwise we can see
+		 * circular blocked_on relationships that can't resolve.
 		 */
-		clear_task_blocked_on(waiter->task, lock);
+		set_blocked_on_waking(waiter->task);
 		wake_q_add(wake_q, waiter->task);
 	}
 
@@ -339,15 +339,11 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
 		 */
 		if (owner != current) {
 			/*
-			 * When waking up the task to wound, be sure to clear the
-			 * blocked_on pointer. Otherwise we can see circular
-			 * blocked_on relationships that can't resolve.
-			 *
-			 * NOTE: We pass NULL here instead of lock, because we
-			 * are waking the mutex owner, who may be currently
-			 * blocked on a different mutex.
+			 * When waking up the task to wound, be sure to set the
+			 * blocked_on_state to BO_WAKING. Otherwise we can see
+			 * circular blocked_on relationships that can't resolve.
 			 */
-			clear_task_blocked_on(owner, NULL);
+			set_blocked_on_waking(owner);
 			wake_q_add(wake_q, owner);
 		}
 		return true;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0180853dd48c5..e0007660161fa 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4328,6 +4328,12 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 		ttwu_queue(p, cpu, wake_flags);
 	}
 out:
+	/*
+	 * For now, if we've been woken up, set us as BO_RUNNABLE
+	 * We will need to be more careful later when handling
+	 * proxy migration
+	 */
+	set_blocked_on_runnable(p);
 	if (success)
 		ttwu_stat(p, task_cpu(p), wake_flags);
 
@@ -6623,7 +6629,7 @@ static struct task_struct *proxy_deactivate(struct rq *rq, struct task_struct *d
 		 * as unblocked, as we aren't doing proxy-migrations
 		 * yet (more logic will be needed then).
 		 */
-		donor->blocked_on = NULL;
+		force_blocked_on_runnable(donor);
 	}
 	return NULL;
 }
@@ -6676,20 +6682,41 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
 			return NULL;
 		}
 
+		/*
+		 * If a ww_mutex hits the die/wound case, it marks the task as
+		 * BO_WAKING and calls try_to_wake_up(), so that the mutex
+		 * cycle can be broken and we avoid a deadlock.
+		 *
+		 * However, if at that moment, we are here on the cpu which the
+		 * die/wounded task is enqueued, we might loop on the cycle as
+		 * BO_WAKING still causes task_is_blocked() to return true
+		 * (since we want return migration to occur before we run the
+		 * task).
+		 *
+		 * Unfortunately since we hold the rq lock, it will block
+		 * try_to_wake_up from completing and doing the return
+		 * migration.
+		 *
+		 * So when we hit a !BO_BLOCKED task briefly schedule idle
+		 * so we release the rq and let the wakeup complete.
+		 */
+		if (p->blocked_on_state != BO_BLOCKED)
+			return proxy_resched_idle(rq);
+
 		owner = __mutex_owner(mutex);
 		if (!owner) {
-			__clear_task_blocked_on(p, mutex);
+			__force_blocked_on_runnable(p);
 			return p;
 		}
 
 		if (!READ_ONCE(owner->on_rq) || owner->se.sched_delayed) {
 			/* XXX Don't handle blocked owners/delayed dequeue yet */
-			return proxy_deactivate(rq, donor);
+			goto deactivate_donor;
 		}
 
 		if (task_cpu(owner) != this_cpu) {
 			/* XXX Don't handle migrations yet */
-			return proxy_deactivate(rq, donor);
+			goto deactivate_donor;
 		}
 
 		if (task_on_rq_migrating(owner)) {
@@ -6749,6 +6776,15 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
 
 	WARN_ON_ONCE(owner && !owner->on_rq);
 	return owner;
+
+	/*
+	 * NOTE: This logic is down here, because we need to call
+	 * the functions with the mutex wait_lock and task
+	 * blocked_lock released, so we have to get out of the
+	 * guard() scope.
+	 */
+deactivate_donor:
+	return proxy_deactivate(rq, donor);
 }
 #else /* SCHED_PROXY_EXEC */
 static struct task_struct *
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index be9745d104f75..845454ec81a22 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2264,7 +2264,7 @@ static inline bool task_is_blocked(struct task_struct *p)
 	if (!sched_proxy_exec())
 		return false;
 
-	return !!p->blocked_on;
+	return !!p->blocked_on && p->blocked_on_state != BO_RUNNABLE;
 }
 
 static inline int task_on_cpu(struct rq *rq, struct task_struct *p)
-- 
2.51.0.338.gd7d06c2dae-goog

Re: [RESEND][PATCH v21 2/6] sched/locking: Add blocked_on_state to provide necessary tri-state for proxy return-migration

Posted by K Prateek Nayak 2 weeks, 3 days ago

Hello John,

On 9/4/2025 5:51 AM, John Stultz wrote:
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -815,6 +815,12 @@ struct kmap_ctrl {
>  #endif
>  };
>  
> +enum blocked_on_state {
> +	BO_RUNNABLE,
> +	BO_BLOCKED,
> +	BO_WAKING,
> +};
> +
>  struct task_struct {
>  #ifdef CONFIG_THREAD_INFO_IN_TASK
>  	/*
> @@ -1234,6 +1240,7 @@ struct task_struct {
>  	struct rt_mutex_waiter		*pi_blocked_on;
>  #endif
>  
> +	enum blocked_on_state		blocked_on_state;

Is there any use of the "blocked_on_state" outside of CONFIG_PROXY_EXEC?
If not, should we start thinking about putting the proxy exec specific
members behind CONFIG_PROXY_EXEC to avoid bloating the task_struct?

-- 
Thanks and Regards,
Prateek

>  	struct mutex			*blocked_on;	/* lock we're blocked on */
>  	raw_spinlock_t			blocked_lock;
>

Re: [RESEND][PATCH v21 2/6] sched/locking: Add blocked_on_state to provide necessary tri-state for proxy return-migration

Posted by John Stultz 2 weeks ago

On Mon, Sep 15, 2025 at 2:05 AM K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> On 9/4/2025 5:51 AM, John Stultz wrote:
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -815,6 +815,12 @@ struct kmap_ctrl {
> >  #endif
> >  };
> >
> > +enum blocked_on_state {
> > +     BO_RUNNABLE,
> > +     BO_BLOCKED,
> > +     BO_WAKING,
> > +};
> > +
> >  struct task_struct {
> >  #ifdef CONFIG_THREAD_INFO_IN_TASK
> >       /*
> > @@ -1234,6 +1240,7 @@ struct task_struct {
> >       struct rt_mutex_waiter          *pi_blocked_on;
> >  #endif
> >
> > +     enum blocked_on_state           blocked_on_state;
>
> Is there any use of the "blocked_on_state" outside of CONFIG_PROXY_EXEC?
> If not, should we start thinking about putting the proxy exec specific
> members behind CONFIG_PROXY_EXEC to avoid bloating the task_struct?

So yeah, your suggestion is a decent one, though it gets a little
messy in a few spots. I'm working on integrating this and propagating
it through the full series, and hopefully I can clean it up further.
There are a few spots where this and other proxy related values do get
checked, so wrapping those up so they can be ifdef'ed out will require
some extra logic.

Thanks for the review and suggestion!
-john

Re: [RESEND][PATCH v21 2/6] sched/locking: Add blocked_on_state to provide necessary tri-state for proxy return-migration

Posted by K Prateek Nayak 2 weeks ago

Hello John,

On 9/19/2025 4:37 AM, John Stultz wrote:
> On Mon, Sep 15, 2025 at 2:05 AM K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>> On 9/4/2025 5:51 AM, John Stultz wrote:
>>> --- a/include/linux/sched.h
>>> +++ b/include/linux/sched.h
>>> @@ -815,6 +815,12 @@ struct kmap_ctrl {
>>>  #endif
>>>  };
>>>
>>> +enum blocked_on_state {
>>> +     BO_RUNNABLE,
>>> +     BO_BLOCKED,
>>> +     BO_WAKING,
>>> +};
>>> +
>>>  struct task_struct {
>>>  #ifdef CONFIG_THREAD_INFO_IN_TASK
>>>       /*
>>> @@ -1234,6 +1240,7 @@ struct task_struct {
>>>       struct rt_mutex_waiter          *pi_blocked_on;
>>>  #endif
>>>
>>> +     enum blocked_on_state           blocked_on_state;
>>
>> Is there any use of the "blocked_on_state" outside of CONFIG_PROXY_EXEC?
>> If not, should we start thinking about putting the proxy exec specific
>> members behind CONFIG_PROXY_EXEC to avoid bloating the task_struct?
> 
> So yeah, your suggestion is a decent one, though it gets a little
> messy in a few spots. I'm working on integrating this and propagating
> it through the full series, and hopefully I can clean it up further.
> There are a few spots where this and other proxy related values do get
> checked, so wrapping those up so they can be ifdef'ed out will require
> some extra logic.

Looking at the proxy-exec-6.17-rc4, most of the direct references to
"p->blocked_on_state" is already under CONFIG_PROXY_EXEC. Only the
functions that modify it needs a stub and a helper for the usage in
task_is_blocked()

-- 
Thanks and Regards,
Prateek

Re: [RESEND][PATCH v21 2/6] sched/locking: Add blocked_on_state to provide necessary tri-state for proxy return-migration

Posted by K Prateek Nayak 2 weeks, 3 days ago

Hello John,

On 9/4/2025 5:51 AM, John Stultz wrote:
>  static inline void __clear_task_blocked_on(struct task_struct *p, struct mutex *m)
>  {
> +	/* The task should only be clearing itself */
> +	WARN_ON_ONCE(p != current);
>  	/* Currently we serialize blocked_on under the task::blocked_lock */
>  	lockdep_assert_held_once(&p->blocked_lock);
> -	/*
> -	 * There may be cases where we re-clear already cleared
> -	 * blocked_on relationships, but make sure we are not
> -	 * clearing the relationship with a different lock.
> -	 */
> -	WARN_ON_ONCE(m && p->blocked_on && p->blocked_on != m);
> +	/* Make sure we are clearing the relationship with the right lock */
> +	WARN_ON_ONCE(m && p->blocked_on != m);
>  	p->blocked_on = NULL;
> +	p->blocked_on_state = BO_RUNNABLE;
>  }
>  

Maybe it is just me, but I got confused a couple of time only to
realize __clear_task_blocked_on() clears the "blocked_on" and sets
"blocked_on_state" to BO_RUNNABLE.

Can we decouple the two and only set "p->blocked_on" in
*_task_blocked_on_* and "p->blocked_on_state" in
*{set,clear,force}_blocked_on* functions so it becomes easier to
follow in the mutex path as:

    __set_task_blocked_on(p, mutex); // blocked on mutex
    __force_blocked_on_blocked(p); // blocked from running on CPU

   ...

    __clear_task_blocked_on(p, mutex); // p is no longer blocked on a mutex
    __set_blocked_on_runnable(p); // p is now runnable

>  static inline void clear_task_blocked_on(struct task_struct *p, struct mutex *m)

Of the three {set,clear}_task_blcoked_on() usage:

    $ grep -r "\(set\|clear\)_task_blocked_on" include/
    kernel/locking/mutex.c: __set_task_blocked_on(current, lock);
    kernel/locking/mutex.c: __clear_task_blocked_on(current, lock);
    kernel/locking/mutex.c: clear_task_blocked_on(current, lock);

two can be converted directly and perhaps clear_task_blocked_on() can be
renamed as clear_task_blocked_on_set_runnable()?

This is just me rambling on so feel free to ignore. I probably have to
train my mind enough to see __clear_task_blocked_on() not only clears
"blocked_on" but also sets task to runnable :)

[..snip..]

> @@ -6749,6 +6776,15 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
>  
>  	WARN_ON_ONCE(owner && !owner->on_rq);
>  	return owner;
> +
> +	/*
> +	 * NOTE: This logic is down here, because we need to call
> +	 * the functions with the mutex wait_lock and task
> +	 * blocked_lock released, so we have to get out of the
> +	 * guard() scope.
> +	 */

I didn't know that was possible! Neat. Since cleanup.h has a note
reading:

    ... the expectation is that usage of "goto" and cleanup helpers is
    never mixed in the same function.

are there any concerns w.r.t. compiler versions etc. or am I just being
paranoid?

> +deactivate_donor:
> +	return proxy_deactivate(rq, donor);
>  }
>  #else /* SCHED_PROXY_EXEC */
>  static struct task_struct *
-- 
Thanks and Regards,
Prateek

Re: [RESEND][PATCH v21 2/6] sched/locking: Add blocked_on_state to provide necessary tri-state for proxy return-migration

Posted by John Stultz 2 weeks ago

On Mon, Sep 15, 2025 at 1:06 AM K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>
> Hello John,
>
> On 9/4/2025 5:51 AM, John Stultz wrote:
> >  static inline void __clear_task_blocked_on(struct task_struct *p, struct mutex *m)
> >  {
> > +     /* The task should only be clearing itself */
> > +     WARN_ON_ONCE(p != current);
> >       /* Currently we serialize blocked_on under the task::blocked_lock */
> >       lockdep_assert_held_once(&p->blocked_lock);
> > -     /*
> > -      * There may be cases where we re-clear already cleared
> > -      * blocked_on relationships, but make sure we are not
> > -      * clearing the relationship with a different lock.
> > -      */
> > -     WARN_ON_ONCE(m && p->blocked_on && p->blocked_on != m);
> > +     /* Make sure we are clearing the relationship with the right lock */
> > +     WARN_ON_ONCE(m && p->blocked_on != m);
> >       p->blocked_on = NULL;
> > +     p->blocked_on_state = BO_RUNNABLE;
> >  }
> >
>
> Maybe it is just me, but I got confused a couple of time only to
> realize __clear_task_blocked_on() clears the "blocked_on" and sets
> "blocked_on_state" to BO_RUNNABLE.
>
> Can we decouple the two and only set "p->blocked_on" in
> *_task_blocked_on_* and "p->blocked_on_state" in
> *{set,clear,force}_blocked_on* functions so it becomes easier to
> follow in the mutex path as:
>
>     __set_task_blocked_on(p, mutex); // blocked on mutex
>     __force_blocked_on_blocked(p); // blocked from running on CPU
>
>    ...
>
>     __clear_task_blocked_on(p, mutex); // p is no longer blocked on a mutex
>     __set_blocked_on_runnable(p); // p is now runnable

Hrm. So I guess I was thinking of
set_task_blocked_on()/clear_task_blocked_on() as the main enter/exit
states, with set_blocked_on_waking(), set_blocked_on_runnable() as
transition states within.

But, the various force_blocked_on_<state>() cases do add more
complexity, so maybe handling it completely separately would be more
intuitive? I dunno.


> >  static inline void clear_task_blocked_on(struct task_struct *p, struct mutex *m)
>
> Of the three {set,clear}_task_blcoked_on() usage:
>
>     $ grep -r "\(set\|clear\)_task_blocked_on" include/
>     kernel/locking/mutex.c: __set_task_blocked_on(current, lock);
>     kernel/locking/mutex.c: __clear_task_blocked_on(current, lock);
>     kernel/locking/mutex.c: clear_task_blocked_on(current, lock);
>
> two can be converted directly and perhaps clear_task_blocked_on() can be
> renamed as clear_task_blocked_on_set_runnable()?
>
> This is just me rambling on so feel free to ignore. I probably have to
> train my mind enough to see __clear_task_blocked_on() not only clears
> "blocked_on" but also sets task to runnable :)

Yeah, the case where we don't already hold the lock and want to do
both gets more complex in that case.

Hrm. Maybe just the way the functions are organized in the header make
it seem like we're managing two separate bits of state, where really
they are ordered.
I'll try to re-arrange that introducing the
set_task_blocked_on/clear_task_blocked_on first, then the transition
set_blocked_on_<state>() helpers after?
Maybe that and  some comments will make that clearer?

> > @@ -6749,6 +6776,15 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
> >
> >       WARN_ON_ONCE(owner && !owner->on_rq);
> >       return owner;
> > +
> > +     /*
> > +      * NOTE: This logic is down here, because we need to call
> > +      * the functions with the mutex wait_lock and task
> > +      * blocked_lock released, so we have to get out of the
> > +      * guard() scope.
> > +      */
>
> I didn't know that was possible! Neat. Since cleanup.h has a note
> reading:
>
>     ... the expectation is that usage of "goto" and cleanup helpers is
>     never mixed in the same function.
>
> are there any concerns w.r.t. compiler versions etc. or am I just being
> paranoid?

Hrrrrmmmm.  I hadn't seen that detail. :/   I guess I was just lucky
it worked with my toolchain.

Oof. That may require reworking all this logic back to explicit
lock/unlock calls, away from the guard() usage.

Let me think on if there's a better way.

Thanks so much again for pointing this out!
-john

Re: [RESEND][PATCH v21 2/6] sched/locking: Add blocked_on_state to provide necessary tri-state for proxy return-migration

Posted by K Prateek Nayak 2 weeks ago

Hello John,

On 9/19/2025 4:27 AM, John Stultz wrote:
>> Of the three {set,clear}_task_blcoked_on() usage:
>>
>>     $ grep -r "\(set\|clear\)_task_blocked_on" include/
>>     kernel/locking/mutex.c: __set_task_blocked_on(current, lock);
>>     kernel/locking/mutex.c: __clear_task_blocked_on(current, lock);
>>     kernel/locking/mutex.c: clear_task_blocked_on(current, lock);
>>
>> two can be converted directly and perhaps clear_task_blocked_on() can be
>> renamed as clear_task_blocked_on_set_runnable()?
>>
>> This is just me rambling on so feel free to ignore. I probably have to
>> train my mind enough to see __clear_task_blocked_on() not only clears
>> "blocked_on" but also sets task to runnable :)
> 
> Yeah, the case where we don't already hold the lock and want to do
> both gets more complex in that case.
> 
> Hrm. Maybe just the way the functions are organized in the header make
> it seem like we're managing two separate bits of state, where really
> they are ordered.
> I'll try to re-arrange that introducing the
> set_task_blocked_on/clear_task_blocked_on first, then the transition
> set_blocked_on_<state>() helpers after?
> Maybe that and  some comments will make that clearer?

Again that was me rambling. Even a small comment above
clear_task_blocked_on() would also be sufficient if the whole rework
turns out to be more extensive. 

> 
>>> @@ -6749,6 +6776,15 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
>>>
>>>       WARN_ON_ONCE(owner && !owner->on_rq);
>>>       return owner;
>>> +
>>> +     /*
>>> +      * NOTE: This logic is down here, because we need to call
>>> +      * the functions with the mutex wait_lock and task
>>> +      * blocked_lock released, so we have to get out of the
>>> +      * guard() scope.
>>> +      */
>>
>> I didn't know that was possible! Neat. Since cleanup.h has a note
>> reading:
>>
>>     ... the expectation is that usage of "goto" and cleanup helpers is
>>     never mixed in the same function.
>>
>> are there any concerns w.r.t. compiler versions etc. or am I just being
>> paranoid?
> 
> Hrrrrmmmm.  I hadn't seen that detail. :/   I guess I was just lucky
> it worked with my toolchain.

I have been too. Maybe it is okay to use a goto if folks know what
they are doing ¯\_(ツ)_/¯

Another idea is to have:

    bool deactivate_donor = false;

    for (p = donor; task_is_blocked(p); p = owner) {
        guard(raw_spinlock)(...);
        ...
        if (<condition> {
            deactivate_donor = true;
            break;
        }
        ...
    }
    if (deactivate_donor)
        return proxy_deactivate(rq, donor);

Can that work?

> 
> Oof. That may require reworking all this logic back to explicit
> lock/unlock calls, away from the guard() usage.
> 
> Let me think on if there's a better way.
> 
> Thanks so much again for pointing this out!
> -john

-- 
Thanks and Regards,
Prateek

Re: [RESEND][PATCH v21 2/6] sched/locking: Add blocked_on_state to provide necessary tri-state for proxy return-migration

Posted by John Stultz 1 week, 2 days ago

On Thu, Sep 18, 2025 at 8:28 PM K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> On 9/19/2025 4:27 AM, John Stultz wrote:
> >> I didn't know that was possible! Neat. Since cleanup.h has a note
> >> reading:
> >>
> >>     ... the expectation is that usage of "goto" and cleanup helpers is
> >>     never mixed in the same function.
> >>
> >> are there any concerns w.r.t. compiler versions etc. or am I just being
> >> paranoid?
> >
> > Hrrrrmmmm.  I hadn't seen that detail. :/   I guess I was just lucky
> > it worked with my toolchain.
>
> I have been too. Maybe it is okay to use a goto if folks know what
> they are doing ¯\_(ツ)_/¯
>
> Another idea is to have:
>
>     bool deactivate_donor = false;
>
>     for (p = donor; task_is_blocked(p); p = owner) {
>         guard(raw_spinlock)(...);
>         ...
>         if (<condition> {
>             deactivate_donor = true;
>             break;
>         }
>         ...
>     }
>     if (deactivate_donor)
>         return proxy_deactivate(rq, donor);
>
> Can that work?

Yeah, I've reworked the logic to switch() on an action enum, which
will let us do something similar without gotos.

thanks
-john

[RESEND][PATCH v21 1/6] locking: Add task::blocked_lock to serialize blocked_on state
[RESEND][PATCH v21 2/6] sched/locking: Add blocked_on_state to provide necessary tri-state for proxy return-migration
[RESEND][PATCH v21 3/6] sched: Add logic to zap balance callbacks if we pick again
[RESEND][PATCH v21 4/6] sched: Handle blocked-waiter migration (and return migration)
[RESEND][PATCH v21 5/6] sched: Add blocked_donor link to task for smarter mutex handoffs
[RESEND][PATCH v21 6/6] sched: Migrate whole chain in proxy_migrate_task()