resched_latency_warn() now also warns if TIF_NEED_RESCHED_LAZY is set
without rescheduling for more than the latency_warn_ms period.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Ziljstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
kernel/sched/core.c | 2 +-
kernel/sched/debug.c | 7 +++++--
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 694bfcf153cb..1229766b704e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5571,7 +5571,7 @@ static u64 cpu_resched_latency(struct rq *rq)
if (sysctl_resched_latency_warn_once && warned_once)
return 0;
- if (!need_resched() || !latency_warn_ms)
+ if ((!need_resched() && !tif_need_resched_lazy()) || !latency_warn_ms)
return 0;
if (system_state == SYSTEM_BOOTING)
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 9abcc6ead11b..f0d551ba64bb 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -1293,9 +1293,12 @@ void proc_sched_set_task(struct task_struct *p)
void resched_latency_warn(int cpu, u64 latency)
{
static DEFINE_RATELIMIT_STATE(latency_check_ratelimit, 60 * 60 * HZ, 1);
+ char *nr;
+
+ nr = tif_need_resched() ? "need_resched" : "need_resched_lazy";
WARN(__ratelimit(&latency_check_ratelimit),
- "sched: CPU %d need_resched set for > %llu ns (%d ticks) "
+ "sched: CPU %d %s set for > %llu ns (%d ticks) "
"without schedule\n",
- cpu, latency, cpu_rq(cpu)->ticks_without_resched);
+ cpu, nr, latency, cpu_rq(cpu)->ticks_without_resched);
}
--
2.43.5
On 10/9/24 22:24, Ankur Arora wrote: > resched_latency_warn() now also warns if TIF_NEED_RESCHED_LAZY is set > without rescheduling for more than the latency_warn_ms period. > I am bit confused here. Why do we need to warn if LAZY is set for a long time? If lazy set, the subsequent tick, it would be set to upgraded to NEED_RESCHED. Since the value of latency_warn_ms=100ms, that means even on system with HZ=100, that means 10 ticks before that warning would be printed no? IIUC, the changelog c006fac556e40 ("sched: Warn on long periods of pending need_resched") has the concern of need_resched set but if it is non-preemptible kernel it would spend a lot of time in kernel mode. In that case print a warning. If someone enables Lazy, that means it is preemptible and probably this whole notion of resched_latency_warn doesn't apply to lazy. Please correct me if i am not understanding this correctly. > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Peter Ziljstra <peterz@infradead.org> > Cc: Juri Lelli <juri.lelli@redhat.com> > Cc: Vincent Guittot <vincent.guittot@linaro.org> > Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> > --- > kernel/sched/core.c | 2 +- > kernel/sched/debug.c | 7 +++++-- > 2 files changed, 6 insertions(+), 3 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 694bfcf153cb..1229766b704e 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -5571,7 +5571,7 @@ static u64 cpu_resched_latency(struct rq *rq) > if (sysctl_resched_latency_warn_once && warned_once) > return 0; > > - if (!need_resched() || !latency_warn_ms) > + if ((!need_resched() && !tif_need_resched_lazy()) || !latency_warn_ms) > return 0; > > if (system_state == SYSTEM_BOOTING) > diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c > index 9abcc6ead11b..f0d551ba64bb 100644 > --- a/kernel/sched/debug.c > +++ b/kernel/sched/debug.c > @@ -1293,9 +1293,12 @@ void proc_sched_set_task(struct task_struct *p) > void resched_latency_warn(int cpu, u64 latency) > { > static DEFINE_RATELIMIT_STATE(latency_check_ratelimit, 60 * 60 * HZ, 1); > + char *nr; > + > + nr = tif_need_resched() ? "need_resched" : "need_resched_lazy"; > > WARN(__ratelimit(&latency_check_ratelimit), > - "sched: CPU %d need_resched set for > %llu ns (%d ticks) " > + "sched: CPU %d %s set for > %llu ns (%d ticks) " > "without schedule\n", > - cpu, latency, cpu_rq(cpu)->ticks_without_resched); > + cpu, nr, latency, cpu_rq(cpu)->ticks_without_resched); > }
Shrikanth Hegde <sshegde@linux.ibm.com> writes: > On 10/9/24 22:24, Ankur Arora wrote: >> resched_latency_warn() now also warns if TIF_NEED_RESCHED_LAZY is set >> without rescheduling for more than the latency_warn_ms period. >> > > I am bit confused here. Why do we need to warn if LAZY is set for a long time? > > If lazy set, the subsequent tick, it would be set to upgraded to NEED_RESCHED. > > Since the value of latency_warn_ms=100ms, that means even on system with HZ=100, > that means 10 ticks before that warning would be printed no? That's a fair point. However, the assumption there is that there are no bugs in upgrade on tick or that there's no situation in which the tick is off for a prolonged period. Ankur > IIUC, the changelog c006fac556e40 ("sched: Warn on long periods of pending > need_resched") has the concern of need_resched set but if it is non-preemptible > kernel it would spend a lot of time in kernel mode. In that case print a > warning. > > If someone enables Lazy, that means it is preemptible and probably this whole > notion of resched_latency_warn doesn't apply to lazy. Please correct me if i am > not understanding this correctly. > >> Cc: Ingo Molnar <mingo@redhat.com> >> Cc: Peter Ziljstra <peterz@infradead.org> >> Cc: Juri Lelli <juri.lelli@redhat.com> >> Cc: Vincent Guittot <vincent.guittot@linaro.org> >> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> >> --- >> kernel/sched/core.c | 2 +- >> kernel/sched/debug.c | 7 +++++-- >> 2 files changed, 6 insertions(+), 3 deletions(-) >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 694bfcf153cb..1229766b704e 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -5571,7 +5571,7 @@ static u64 cpu_resched_latency(struct rq *rq) >> if (sysctl_resched_latency_warn_once && warned_once) >> return 0; >> - if (!need_resched() || !latency_warn_ms) >> + if ((!need_resched() && !tif_need_resched_lazy()) || !latency_warn_ms) >> return 0; >> if (system_state == SYSTEM_BOOTING) >> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c >> index 9abcc6ead11b..f0d551ba64bb 100644 >> --- a/kernel/sched/debug.c >> +++ b/kernel/sched/debug.c >> @@ -1293,9 +1293,12 @@ void proc_sched_set_task(struct task_struct *p) >> void resched_latency_warn(int cpu, u64 latency) >> { >> static DEFINE_RATELIMIT_STATE(latency_check_ratelimit, 60 * 60 * HZ, 1); >> + char *nr; >> + >> + nr = tif_need_resched() ? "need_resched" : "need_resched_lazy"; >> WARN(__ratelimit(&latency_check_ratelimit), >> - "sched: CPU %d need_resched set for > %llu ns (%d ticks) " >> + "sched: CPU %d %s set for > %llu ns (%d ticks) " >> "without schedule\n", >> - cpu, latency, cpu_rq(cpu)->ticks_without_resched); >> + cpu, nr, latency, cpu_rq(cpu)->ticks_without_resched); >> } -- ankur
On 10/22/24 00:51, Ankur Arora wrote: > > Shrikanth Hegde <sshegde@linux.ibm.com> writes: > >> On 10/9/24 22:24, Ankur Arora wrote: >>> resched_latency_warn() now also warns if TIF_NEED_RESCHED_LAZY is set >>> without rescheduling for more than the latency_warn_ms period. >>> >> >> I am bit confused here. Why do we need to warn if LAZY is set for a long time? >> >> If lazy set, the subsequent tick, it would be set to upgraded to NEED_RESCHED. >> >> Since the value of latency_warn_ms=100ms, that means even on system with HZ=100, >> that means 10 ticks before that warning would be printed no? > > That's a fair point. However, the assumption there is that there are no > bugs in upgrade on tick or that there's no situation in which the tick > is off for a prolonged period. > ok. But if tick is off, then ticks_without_resched isn't incremented either. IIUC, this check is for situation when NR is set and tick is on. > Ankur > >> IIUC, the changelog c006fac556e40 ("sched: Warn on long periods of pending >> need_resched") has the concern of need_resched set but if it is non-preemptible >> kernel it would spend a lot of time in kernel mode. In that case print a >> warning. >> >> If someone enables Lazy, that means it is preemptible and probably this whole >> notion of resched_latency_warn doesn't apply to lazy. Please correct me if i am >> not understanding this correctly. >> >>> Cc: Ingo Molnar <mingo@redhat.com> >>> Cc: Peter Ziljstra <peterz@infradead.org> >>> Cc: Juri Lelli <juri.lelli@redhat.com> >>> Cc: Vincent Guittot <vincent.guittot@linaro.org> >>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> >>> --- >>> kernel/sched/core.c | 2 +- >>> kernel/sched/debug.c | 7 +++++-- >>> 2 files changed, 6 insertions(+), 3 deletions(-) >>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >>> index 694bfcf153cb..1229766b704e 100644 >>> --- a/kernel/sched/core.c >>> +++ b/kernel/sched/core.c >>> @@ -5571,7 +5571,7 @@ static u64 cpu_resched_latency(struct rq *rq) >>> if (sysctl_resched_latency_warn_once && warned_once) >>> return 0; >>> - if (!need_resched() || !latency_warn_ms) >>> + if ((!need_resched() && !tif_need_resched_lazy()) || !latency_warn_ms) >>> return 0; >>> if (system_state == SYSTEM_BOOTING) >>> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c >>> index 9abcc6ead11b..f0d551ba64bb 100644 >>> --- a/kernel/sched/debug.c >>> +++ b/kernel/sched/debug.c >>> @@ -1293,9 +1293,12 @@ void proc_sched_set_task(struct task_struct *p) >>> void resched_latency_warn(int cpu, u64 latency) >>> { >>> static DEFINE_RATELIMIT_STATE(latency_check_ratelimit, 60 * 60 * HZ, 1); >>> + char *nr; >>> + >>> + nr = tif_need_resched() ? "need_resched" : "need_resched_lazy"; >>> WARN(__ratelimit(&latency_check_ratelimit), >>> - "sched: CPU %d need_resched set for > %llu ns (%d ticks) " >>> + "sched: CPU %d %s set for > %llu ns (%d ticks) " >>> "without schedule\n", >>> - cpu, latency, cpu_rq(cpu)->ticks_without_resched); >>> + cpu, nr, latency, cpu_rq(cpu)->ticks_without_resched); >>> } > > > -- > ankur
Hi Ankur, kernel test robot noticed the following build errors: [auto build test ERROR on paulmck-rcu/dev] [also build test ERROR on powerpc/next powerpc/fixes tip/sched/core linus/master v6.12-rc2 next-20241011] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Ankur-Arora/sched-warn-for-high-latency-with-TIF_NEED_RESCHED_LAZY/20241010-005819 base: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev patch link: https://lore.kernel.org/r/20241009165411.3426937-2-ankur.a.arora%40oracle.com patch subject: [PATCH 1/7] sched: warn for high latency with TIF_NEED_RESCHED_LAZY config: x86_64-kexec (https://download.01.org/0day-ci/archive/20241013/202410131726.Gi9qvUP8-lkp@intel.com/config) compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241013/202410131726.Gi9qvUP8-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202410131726.Gi9qvUP8-lkp@intel.com/ All errors (new ones prefixed by >>): >> kernel/sched/core.c:5528:27: error: call to undeclared function 'tif_need_resched_lazy'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] 5528 | if ((!need_resched() && !tif_need_resched_lazy()) || !latency_warn_ms) | ^ kernel/sched/core.c:5528:27: note: did you mean 'tif_need_resched'? include/linux/thread_info.h:182:29: note: 'tif_need_resched' declared here 182 | static __always_inline bool tif_need_resched(void) | ^ 1 error generated. vim +/tif_need_resched_lazy +5528 kernel/sched/core.c 5517 5518 #ifdef CONFIG_SCHED_DEBUG 5519 static u64 cpu_resched_latency(struct rq *rq) 5520 { 5521 int latency_warn_ms = READ_ONCE(sysctl_resched_latency_warn_ms); 5522 u64 resched_latency, now = rq_clock(rq); 5523 static bool warned_once; 5524 5525 if (sysctl_resched_latency_warn_once && warned_once) 5526 return 0; 5527 > 5528 if ((!need_resched() && !tif_need_resched_lazy()) || !latency_warn_ms) 5529 return 0; 5530 5531 if (system_state == SYSTEM_BOOTING) 5532 return 0; 5533 5534 if (!rq->last_seen_need_resched_ns) { 5535 rq->last_seen_need_resched_ns = now; 5536 rq->ticks_without_resched = 0; 5537 return 0; 5538 } 5539 5540 rq->ticks_without_resched++; 5541 resched_latency = now - rq->last_seen_need_resched_ns; 5542 if (resched_latency <= latency_warn_ms * NSEC_PER_MSEC) 5543 return 0; 5544 5545 warned_once = true; 5546 5547 return resched_latency; 5548 } 5549 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
Hi Ankur, kernel test robot noticed the following build errors: [auto build test ERROR on paulmck-rcu/dev] [also build test ERROR on powerpc/next powerpc/fixes tip/sched/core linus/master v6.12-rc2 next-20241011] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Ankur-Arora/sched-warn-for-high-latency-with-TIF_NEED_RESCHED_LAZY/20241010-005819 base: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev patch link: https://lore.kernel.org/r/20241009165411.3426937-2-ankur.a.arora%40oracle.com patch subject: [PATCH 1/7] sched: warn for high latency with TIF_NEED_RESCHED_LAZY config: x86_64-rhel-8.3 (https://download.01.org/0day-ci/archive/20241013/202410131715.HYC3WK5i-lkp@intel.com/config) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241013/202410131715.HYC3WK5i-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202410131715.HYC3WK5i-lkp@intel.com/ All errors (new ones prefixed by >>): kernel/sched/core.c: In function 'cpu_resched_latency': >> kernel/sched/core.c:5528:34: error: implicit declaration of function 'tif_need_resched_lazy'; did you mean 'tif_need_resched'? [-Werror=implicit-function-declaration] 5528 | if ((!need_resched() && !tif_need_resched_lazy()) || !latency_warn_ms) | ^~~~~~~~~~~~~~~~~~~~~ | tif_need_resched cc1: some warnings being treated as errors vim +5528 kernel/sched/core.c 5517 5518 #ifdef CONFIG_SCHED_DEBUG 5519 static u64 cpu_resched_latency(struct rq *rq) 5520 { 5521 int latency_warn_ms = READ_ONCE(sysctl_resched_latency_warn_ms); 5522 u64 resched_latency, now = rq_clock(rq); 5523 static bool warned_once; 5524 5525 if (sysctl_resched_latency_warn_once && warned_once) 5526 return 0; 5527 > 5528 if ((!need_resched() && !tif_need_resched_lazy()) || !latency_warn_ms) 5529 return 0; 5530 5531 if (system_state == SYSTEM_BOOTING) 5532 return 0; 5533 5534 if (!rq->last_seen_need_resched_ns) { 5535 rq->last_seen_need_resched_ns = now; 5536 rq->ticks_without_resched = 0; 5537 return 0; 5538 } 5539 5540 rq->ticks_without_resched++; 5541 resched_latency = now - rq->last_seen_need_resched_ns; 5542 if (resched_latency <= latency_warn_ms * NSEC_PER_MSEC) 5543 return 0; 5544 5545 warned_once = true; 5546 5547 return resched_latency; 5548 } 5549 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
On 2024-10-09 09:54:05 [-0700], Ankur Arora wrote: > resched_latency_warn() now also warns if TIF_NEED_RESCHED_LAZY is set > without rescheduling for more than the latency_warn_ms period. The description is odd. I think you want to say that resched_latency_warn() does not warn if a task has TIF_NEED_RESCHED_LAZY has set for longer periods and you want to add this functionality. > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Peter Ziljstra <peterz@infradead.org> > Cc: Juri Lelli <juri.lelli@redhat.com> > Cc: Vincent Guittot <vincent.guittot@linaro.org> > Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> … > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 694bfcf153cb..1229766b704e 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -5571,7 +5571,7 @@ static u64 cpu_resched_latency(struct rq *rq) > if (sysctl_resched_latency_warn_once && warned_once) > return 0; > > - if (!need_resched() || !latency_warn_ms) > + if ((!need_resched() && !tif_need_resched_lazy()) || !latency_warn_ms) tif_need_resched_lazy() is not doing what you think it is doing. Either PeterZ makes a helper for this or you need tif_test_bit(TIF_NEED_RESCHED_LAZY). > return 0; > > if (system_state == SYSTEM_BOOTING) Sebastian
Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes: > On 2024-10-09 09:54:05 [-0700], Ankur Arora wrote: >> resched_latency_warn() now also warns if TIF_NEED_RESCHED_LAZY is set >> without rescheduling for more than the latency_warn_ms period. > > The description is odd. I think you want to say that > resched_latency_warn() does not warn if a task has TIF_NEED_RESCHED_LAZY > has set for longer periods and you want to add this functionality. > >> Cc: Ingo Molnar <mingo@redhat.com> >> Cc: Peter Ziljstra <peterz@infradead.org> >> Cc: Juri Lelli <juri.lelli@redhat.com> >> Cc: Vincent Guittot <vincent.guittot@linaro.org> >> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> > … >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 694bfcf153cb..1229766b704e 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -5571,7 +5571,7 @@ static u64 cpu_resched_latency(struct rq *rq) >> if (sysctl_resched_latency_warn_once && warned_once) >> return 0; >> >> - if (!need_resched() || !latency_warn_ms) >> + if ((!need_resched() && !tif_need_resched_lazy()) || !latency_warn_ms) > > tif_need_resched_lazy() is not doing what you think it is doing. Thanks. My bad. > tif_test_bit(TIF_NEED_RESCHED_LAZY). > Either PeterZ makes a helper for this or you need > tif_test_bit(TIF_NEED_RESCHED_LAZY). Yeah. tif_test_bit(TIF_NEED_RESCHED_LAZY) and need_resched() will be identical with !ARCH_HAS_PREEMPT_LAZY but that should be okay in this context. -- ankur
© 2016 - 2024 Red Hat, Inc.