[PATCH v2 6/6] sched: warn for high latency with TIF_NEED_RESCHED_LAZY

Ankur Arora posted 6 patches 2 weeks, 3 days ago
[PATCH v2 6/6] sched: warn for high latency with TIF_NEED_RESCHED_LAZY
Posted by Ankur Arora 2 weeks, 3 days ago
Add support for warning if the TIF_NEED_RESCHED_LAZY bit is set
without rescheduling for more than the latency_warn_ms period.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Ziljstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 kernel/sched/core.c  | 3 ++-
 kernel/sched/debug.c | 7 +++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5c47d70f4204..077ea42a17f1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5579,7 +5579,8 @@ static u64 cpu_resched_latency(struct rq *rq)
 	if (sysctl_resched_latency_warn_once && warned_once)
 		return 0;
 
-	if (!need_resched() || !latency_warn_ms)
+	if ((!need_resched() && !tif_test_bit(TIF_NEED_RESCHED_LAZY)) ||
+	    !latency_warn_ms)
 		return 0;
 
 	if (system_state == SYSTEM_BOOTING)
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index a48b2a701ec2..6c1a5305a1b3 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -1293,9 +1293,12 @@ void proc_sched_set_task(struct task_struct *p)
 void resched_latency_warn(int cpu, u64 latency)
 {
 	static DEFINE_RATELIMIT_STATE(latency_check_ratelimit, 60 * 60 * HZ, 1);
+	char *nr;
+
+	nr = tif_need_resched() ? "need_resched" : "need_resched_lazy";
 
 	WARN(__ratelimit(&latency_check_ratelimit),
-	     "sched: CPU %d need_resched set for > %llu ns (%d ticks) "
+	     "sched: CPU %d %s set for > %llu ns (%d ticks) "
 	     "without schedule\n",
-	     cpu, latency, cpu_rq(cpu)->ticks_without_resched);
+	     cpu, nr, latency, cpu_rq(cpu)->ticks_without_resched);
 }
-- 
2.43.5
Re: [PATCH v2 6/6] sched: warn for high latency with TIF_NEED_RESCHED_LAZY
Posted by Sebastian Andrzej Siewior 1 week, 2 days ago
On 2024-11-06 12:17:58 [-0800], Ankur Arora wrote:
> Add support for warning if the TIF_NEED_RESCHED_LAZY bit is set
> without rescheduling for more than the latency_warn_ms period.

You fail to explain _why_ it is required to also check
TIF_NEED_RESCHED_LAZY to not be set.

The problem with NEED_RESCHED set but no scheduling in 100ms is a long
preempt-off or IRQ-off region .
The problem with NEED_RESCHED_LAZY set but no scheduling in 100ms is a
missing timer tick in that time. Also the previous mentioned things
might have happen.
And the code acting on NEED_RESCHED_LAZY is invoked before this check
is.

Sebastian