[PATCH 1/3] sched/core: Dequeue PSI signals for blocked tasks that are delayed

K Prateek Nayak posted 3 patches 1 month, 2 weeks ago
[PATCH 1/3] sched/core: Dequeue PSI signals for blocked tasks that are delayed
Posted by K Prateek Nayak 1 month, 2 weeks ago
From: Peter Zijlstra <peterz@infradead.org>

psi_dequeue() in for blocked task expects psi_sched_switch() to clear
the TSK_.*RUNNING PSI flags and set the TSK_IOWAIT flags however
psi_sched_switch() uses "!task_on_rq_queued(prev)" to detect if the task
is blocked or still runnable which is no longer true with DELAY_DEQUEUE
since a blocking task can be left queued on the runqueue.

This can lead to PSI splats similar to:

    psi: inconsistent task state! task=... cpu=... psi_flags=4 clear=0 set=4

when the task is requeued since the TSK_RUNNING flag was not cleared
when the task was blocked.

Explicitly communicate that the task was blocked to psi_sched_switch()
even if it was delayed and is still on the runqueue.

  [ prateek: Broke off the relevant part from [1], commit message ]

Link: https://lore.kernel.org/lkml/20241004123506.GR18071@noisy.programming.kicks-ass.net/ [1]
Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue")
Closes: https://lore.kernel.org/lkml/20240830123458.3557-1-spasswolf@web.de/
Closes: https://lore.kernel.org/all/cd67fbcd-d659-4822-bb90-7e8fbb40a856@molgen.mpg.de/
Tested-by: Johannes Weiner <hannes@cmpxchg.org>
Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
 kernel/sched/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 43e453ab7e20..88cbfc671fb6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6537,6 +6537,7 @@ static void __sched notrace __schedule(int sched_mode)
 	 * as a preemption by schedule_debug() and RCU.
 	 */
 	bool preempt = sched_mode > SM_NONE;
+	bool block = false;
 	unsigned long *switch_count;
 	unsigned long prev_state;
 	struct rq_flags rf;
@@ -6622,6 +6623,7 @@ static void __sched notrace __schedule(int sched_mode)
 			 * After this, schedule() must not care about p->state any more.
 			 */
 			block_task(rq, prev, flags);
+			block = true;
 		}
 		switch_count = &prev->nvcsw;
 	}
@@ -6667,7 +6669,7 @@ static void __sched notrace __schedule(int sched_mode)
 
 		migrate_disable_switch(rq, prev);
 		psi_account_irqtime(rq, prev, next);
-		psi_sched_switch(prev, next, !task_on_rq_queued(prev));
+		psi_sched_switch(prev, next, block);
 
 		trace_sched_switch(preempt, prev, next, prev_state);
 
-- 
2.34.1
Re: [PATCH 1/3] sched/core: Dequeue PSI signals for blocked tasks that are delayed
Posted by Johannes Weiner 1 month, 2 weeks ago
On Thu, Oct 10, 2024 at 08:28:36AM +0000, K Prateek Nayak wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> 
> psi_dequeue() in for blocked task expects psi_sched_switch() to clear
> the TSK_.*RUNNING PSI flags and set the TSK_IOWAIT flags however
> psi_sched_switch() uses "!task_on_rq_queued(prev)" to detect if the task
> is blocked or still runnable which is no longer true with DELAY_DEQUEUE
> since a blocking task can be left queued on the runqueue.
> 
> This can lead to PSI splats similar to:
> 
>     psi: inconsistent task state! task=... cpu=... psi_flags=4 clear=0 set=4
> 
> when the task is requeued since the TSK_RUNNING flag was not cleared
> when the task was blocked.
> 
> Explicitly communicate that the task was blocked to psi_sched_switch()
> even if it was delayed and is still on the runqueue.
> 
>   [ prateek: Broke off the relevant part from [1], commit message ]
> 
> Link: https://lore.kernel.org/lkml/20241004123506.GR18071@noisy.programming.kicks-ass.net/ [1]
> Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue")
> Closes: https://lore.kernel.org/lkml/20240830123458.3557-1-spasswolf@web.de/
> Closes: https://lore.kernel.org/all/cd67fbcd-d659-4822-bb90-7e8fbb40a856@molgen.mpg.de/
> Tested-by: Johannes Weiner <hannes@cmpxchg.org>
> Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>