[PATCH 2/3] sched/core: Add ENQUEUE_WAKEUP flag alongside ENQUEUE_DELAYED

K Prateek Nayak posted 3 patches 1 month, 2 weeks ago
[PATCH 2/3] sched/core: Add ENQUEUE_WAKEUP flag alongside ENQUEUE_DELAYED
Posted by K Prateek Nayak 1 month, 2 weeks ago
With the fixup in dequeuing of PSI signals for delayed tasks, a new
inconsistent PSI task state splat was discovered during boot similar to:

    psi: inconsistent task state! task=... cpu=... psi_flags=5 clear=4 set=1

Tracking the PSI changes along with task's state revealed the following
series of events:

    psi_task_switch: psi_flags=4 clear=4 set=1 # sched_delayed is set to 1
    psi_enqueue:     psi_flags=1 clear=0 set=4 # requeue of delayed entity via ENQUEUE_DELAYED
    psi_task_switch: psi_flags=5 clear=4 set=1 # task is blocked again but 1 is already set
    psi: inconsistent task state! task=... cpu=... psi_flags=5 clear=4 set=1

The TSK_IOWAIT flag was never cleared onrequeue since psi_enqueue() only
clears it on a "wakeup" which, in term of enqueue flags, is defined as:

    (flags & ENQUEUE_WAKEUP) && !(flags & ENQUEUE_MIGRATED)

Add ENQUEUE_WAKEUP alongside ENQUEUE_DELAYED for requeue through
ttwu_runnable(). psi_enqueue() is the only observer of this flag in the
requeue path and it pairs with the DEQUEUE_SLEEP in block_task().

Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue")
Closes: https://lore.kernel.org/lkml/20240830123458.3557-1-spasswolf@web.de/
Closes: https://lore.kernel.org/all/cd67fbcd-d659-4822-bb90-7e8fbb40a856@molgen.mpg.de/
Link: https://lore.kernel.org/lkml/f82def74-a64a-4a05-c8d4-4eeb3e03d0c0@amd.com/
Tested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 88cbfc671fb6..52be38021ebb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3733,7 +3733,7 @@ static int ttwu_runnable(struct task_struct *p, int wake_flags)
 	if (task_on_rq_queued(p)) {
 		update_rq_clock(rq);
 		if (p->se.sched_delayed)
-			enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
+			enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_WAKEUP | ENQUEUE_DELAYED);
 		if (!task_on_cpu(rq, p)) {
 			/*
 			 * When on_rq && !on_cpu the task is preempted, see if
-- 
2.34.1