Donor Migration for Proxy Execution (v24)

[PATCH v24 08/11] sched: Avoid donor->sched_class->yield_task() null traversal

Posted by John Stultz 2 months, 2 weeks ago

With proxy-exec once we do return migration from ttwu(), if a
task is proxying for a waiting donor, and the donor is woken up,
we switch the rq->donor to point to idle briefly until we can
re-enter __schedule().

However, if a task that was acting as a proxy calls into
yield() right after the donor is switched to idle, it may
trip a null  pointer traversal, because the idle task doesn't
have a yield_task() pointer.

So add a conditional to ensure we don't try to call the
yield_task() pointer in that case.

This was only recently found because prior to commit
127b90315ca07 ("sched/proxy: Yield the donor task")
do_sched_yield() incorrectly called
current->sched_class_yield_task() instead of using
rq->donor.

Signed-off-by: John Stultz <jstultz@google.com>
---
Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: kuyo chang <kuyo.chang@mediatek.com>
Cc: hupu <hupu.gm@gmail.com>
Cc: kernel-team@android.com
---
 kernel/sched/syscalls.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index bf360a6fbb800..4b2b81437b03b 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -1351,7 +1351,8 @@ static void do_sched_yield(void)
 	rq = this_rq_lock_irq(&rf);
 
 	schedstat_inc(rq->yld_count);
-	rq->donor->sched_class->yield_task(rq);
+	if (rq->donor->sched_class->yield_task)
+		rq->donor->sched_class->yield_task(rq);
 
 	preempt_disable();
 	rq_unlock_irq(rq, &rf);
-- 
2.52.0.487.g5c8c507ade-goog

Re: [PATCH v24 08/11] sched: Avoid donor->sched_class->yield_task() null traversal

Posted by K Prateek Nayak 1 month, 1 week ago

Hello John,

On 11/25/2025 4:01 AM, John Stultz wrote:
> With proxy-exec once we do return migration from ttwu(), if a
> task is proxying for a waiting donor, and the donor is woken up,
> we switch the rq->donor to point to idle briefly until we can
> re-enter __schedule().
> 
> However, if a task that was acting as a proxy calls into
> yield() right after the donor is switched to idle, it may
> trip a null  pointer traversal, because the idle task doesn't
> have a yield_task() pointer.

I thought that was a transient state that should not be observed by the
running task.

Since NEED_RESCHED is retained, we'll just go though another pass of
__schedule() loop and the task should not observe rq->donor as idle in
do_sched_yield() as only the current task can yield on the local CPU.

Do we have a splat that suggests this happens?

-- 
Thanks and Regards,
Prateek

Re: [PATCH v24 08/11] sched: Avoid donor->sched_class->yield_task() null traversal

Posted by K Prateek Nayak 1 month, 1 week ago

Hello John,

On 12/30/2025 11:31 AM, K Prateek Nayak wrote:
> On 11/25/2025 4:01 AM, John Stultz wrote:
>> With proxy-exec once we do return migration from ttwu(), if a
>> task is proxying for a waiting donor, and the donor is woken up,
>> we switch the rq->donor to point to idle briefly until we can
>> re-enter __schedule().
>>
>> However, if a task that was acting as a proxy calls into
>> yield() right after the donor is switched to idle, it may
>> trip a null  pointer traversal, because the idle task doesn't
>> have a yield_task() pointer.
> 
> I thought that was a transient state that should not be observed by the
> running task.
> 
> Since NEED_RESCHED is retained, we'll just go though another pass of
> __schedule() loop and the task should not observe rq->donor as idle in
> do_sched_yield() as only the current task can yield on the local CPU.
> 
> Do we have a splat that suggests this happens?

So I think I found my answer (at least one of them):

  find_proxy_task()
    /* case DEACTIVATE_DONOR */
    proxy_deactivate(rq, donor)
      proxy_resched_idle(rq); /* Switched donor to rq->idle. */
      try_to_block_task(rq, donor, &state, true)
        if (signal_pending_state(task_state, donor))
          WRITE_ONCE(p->__state, TASK_RUNNING)
          return false; /* Blocking fails. */

    /* If deactivate fails, force return */
    p = donor;
    return p

  next = p; /* Donor is rq->idle. */


This should be illegal and I think we should either force a "pick_again"
if proxy_deactivate() fails (can it get stuck on an infinite loop?) or
we should fix the donor relation before running "p".

We can also push the proxy_resched_idle() into try_to_block_task() and
only do it once we are past all the early returns and if
task_current_donor(rq, p).

... and as I write this I realize we can have this via
proxy_needs_return() so I guess we need this patch after all and
proxy_needs_return() should do resched_curr() for that case too so we
can re-evaluate the donor context on the CPU where we are stealing away
the donor from.

Quick question: Can we avoid that proxy_resched_idle() in
proxy_needs_return() by retaining PROXY_WAKING and letting
proxy_force_return() handle donor's migration too?

  (Seems to survive the same set of challenges I've been putting my
   suggestions through)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 700fd08b2392..e01a61f1955e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3720,6 +3720,12 @@ static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p)
 	if (!sched_proxy_exec())
 		return false;
 
+	/* For current, and donor, let proxy_force_return() handle return. */
+	if (task_current(rq, p) || task_current_donor(rq, p)) {
+		resched_curr(rq);
+		return false;
+	}
+
 	guard(raw_spinlock)(&p->blocked_lock);
 
 	/* If task isn't PROXY_WAKING, we don't need to do return migration */
@@ -3728,20 +3734,12 @@ static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p)
 
 	__clear_task_blocked_on(p, PROXY_WAKING);
 
-	/* If already current, don't need to return migrate */
-	if (task_current(rq, p))
-		return false;
-
 	/* If wake_cpu is targeting this cpu, don't bother return migrating */
 	if (p->wake_cpu == cpu_of(rq)) {
 		resched_curr(rq);
 		return false;
 	}
 
-	/* If we're return migrating the rq->donor, switch it out for idle */
-	if (task_current_donor(rq, p))
-		proxy_resched_idle(rq);
-
 	/* (ab)Use DEQUEUE_SPECIAL to ensure task is always blocked here. */
 	block_task(rq, p, DEQUEUE_NOCLOCK | DEQUEUE_SPECIAL);
 	return true;
---

I think you'll have a note somewhere that says why this is an absolutely
terrible idea :-)

-- 
Thanks and Regards,
Prateek

  “Program testing can be used to show the presence of bugs,
   but never to show their absence!”  ― Edsger W. Dijkstra

Re: [PATCH v24 08/11] sched: Avoid donor->sched_class->yield_task() null traversal

Posted by K Prateek Nayak 1 month, 1 week ago

Hello John,

On 12/30/2025 3:22 PM, K Prateek Nayak wrote:
> So I think I found my answer (at least one of them):
> 
>   find_proxy_task()
>     /* case DEACTIVATE_DONOR */
>     proxy_deactivate(rq, donor)
>       proxy_resched_idle(rq); /* Switched donor to rq->idle. */
>       try_to_block_task(rq, donor, &state, true)
>         if (signal_pending_state(task_state, donor))
>           WRITE_ONCE(p->__state, TASK_RUNNING)
>           return false; /* Blocking fails. */
> 
>     /* If deactivate fails, force return */
>     p = donor;

So I was looking at my tree with some modifications on top and that
above scenario cannot happen since "force return" just falls through to
to NEEDS_RETURN and that'll migrate the task away and return NULL
forcing a re-pick and donor will be back to normal.

>     return p
> 
>   next = p; /* Donor is rq->idle. */
> 
> 
> This should be illegal and I think we should either force a "pick_again"
> if proxy_deactivate() fails (can it get stuck on an infinite loop?) or
> we should fix the donor relation before running "p".
> 
> We can also push the proxy_resched_idle() into try_to_block_task() and
> only do it once we are past all the early returns and if
> task_current_donor(rq, p).
> 
> ... and as I write this I realize we can have this via
> proxy_needs_return() so I guess we need this patch after all and
> proxy_needs_return() should do resched_curr() for that case too so we
> can re-evaluate the donor context on the CPU where we are stealing away
> the donor from.

But this can definitely happen. FWIW I think we can just do:

static inline void proxy_reset_donor(struct rq *rq)
{
	put_prev_set_next_task(rq, rq->donor, rq->curr);
	rq_set_donor(rq, rq->curr);
	resched_curr(rq);
}

... instead of proxy_resched_idle() and we just account the balance time
between stealing the donor and the resched to the "rq->curr" instead of
accounting it to the idle task to stay fair.

-- 
Thanks and Regards,
Prateek