[RFC][PATCH v13 4/7] sched: Fix psi_dequeue for Proxy Execution

John Stultz posted 7 patches 2 weeks, 4 days ago
[RFC][PATCH v13 4/7] sched: Fix psi_dequeue for Proxy Execution
Posted by John Stultz 2 weeks, 4 days ago
Currently, if the sleep flag is set, psi_dequeue() doesn't
change any of the psi_flags.

This is because psi_switch_task() will clear TSK_ONCPU as well
as other potential flags (TSK_RUNNING), and the assumption is
that a voluntary sleep always consists of a task being dequeued
followed shortly there after with a psi_sched_switch() call.

Proxy Execution changes this expectation, as mutex-blocked tasks
that would normally sleep stay on the runqueue. In the case where
the mutex owning task goes to sleep, we will then deactivate the
blocked task as well.

However, in that case, the mutex-blocked task will have had
its TSK_ONCPU cleared when it was switched off the cpu, but it
will stay TSK_RUNNING. Then when we later dequeue it becaues of
a sleeping-owner, as it is sleeping psi_dequeue() won't change
any state (leaving it TSK_RUNNING), as it incorrectly expects a
psi_task_switch() call to immediately follow.

Later on when it get re enqueued, and psi_flags are set for
TSK_RUNNING, we hit an error as the task is already TSK_RUNNING:
  psi: inconsistent task state!

To resolve this, extend the logic in psi_dequeue() so that
if the sleep flag is set, we also check if psi_flags have
TSK_ONCPU set (meaning the psi_task_switch is imminient) before
we do the shortcut return.

If TSK_ONCPU is not set, that means we've already swtiched away,
and this psi_dequeue call needs to clear the flags.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: kernel-team@android.com
Signed-off-by: John Stultz <jstultz@google.com>
---
v13:
* Reworked for collision
---
 kernel/sched/stats.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
index 8ee0add5a48a..c313fe76a772 100644
--- a/kernel/sched/stats.h
+++ b/kernel/sched/stats.h
@@ -176,8 +176,12 @@ static inline void psi_dequeue(struct task_struct *p, int flags)
 	 * avoid walking all ancestors twice, psi_task_switch() handles
 	 * TSK_RUNNING and TSK_IOWAIT for us when it moves TSK_ONCPU.
 	 * Do nothing here.
+	 * In the SCHED_PROXY_EXECUTION case we may do sleeping
+	 * dequeues that are not followed by a task switch, so check
+	 * TSK_ONCPU is set to ensure the task switch is imminent.
+	 * Otherwise clear the flags as usual.
 	 */
-	if (flags & DEQUEUE_SLEEP)
+	if ((flags & DEQUEUE_SLEEP) && (p->psi_flags & TSK_ONCPU))
 		return;
 
 	/*
-- 
2.47.0.199.ga7371fff76-goog