kernel/sched/clock.c | 4 +- kernel/sched/core.c | 78 +++++++++++++++++++++++++++++--- kernel/sched/cpufreq_schedutil.c | 2 +- kernel/sched/fair.c | 6 +-- kernel/sched/features.h | 8 ++++ kernel/sched/sched.h | 7 +++ 6 files changed, 93 insertions(+), 12 deletions(-)
Previous attempt to make HZ 1000 the default [1] to help with scheduler responsiveness didn't get merged. But maybe for the best, as I think this idea of decoupling sched_tick() from HZ makes more sense. We shouldn't need to make a choice between how often timers should trigger vs how often should the scheduler update its stats/take decisions. With HRTICK now is default on, preemption checks are less of problem. But tasks could need to migrate (specifically on HMP systems) and frequencies could need updating for solo long running tasks. The conversion is half complete. We need to invent a new jiffy (piffy?) and move existing load balancing logic (and potentially other users if they want to) to speed them up. BUT, with push load balancer patches circulating around, I am not sure it is worth while or desired to speed it since we have a better faster mechanism to keep the system balanced without the need for the 'heavy handed' full load balance to trigger. The push lb can benefit from this ptick too to ensure more responsiveness. [1] https://lore.kernel.org/lkml/20250226000810.459547-1-qyousef@layalina.io/ Qais Yousef (3): sched/tick: Decouple sched_tick() from HZ sched/tick: Move TICK_NSEC users to PTICK_NSEC sched/tick: Turn on PTICK by default kernel/sched/clock.c | 4 +- kernel/sched/core.c | 78 +++++++++++++++++++++++++++++--- kernel/sched/cpufreq_schedutil.c | 2 +- kernel/sched/fair.c | 6 +-- kernel/sched/features.h | 8 ++++ kernel/sched/sched.h | 7 +++ 6 files changed, 93 insertions(+), 12 deletions(-) -- 2.34.1
On Sun, May 17, 2026 at 05:07:37AM +0100, Qais Yousef wrote: > With HRTICK now is default on, .. for archs that use GENERIC_ENTRY. So for example, ARM64 doesn't (yet) have this.
On 05/18/26 09:40, Peter Zijlstra wrote: > On Sun, May 17, 2026 at 05:07:37AM +0100, Qais Yousef wrote: > > With HRTICK now is default on, > > .. for archs that use GENERIC_ENTRY. So for example, ARM64 doesn't (yet) > have this. I remembering trying to trace down the deps on HRTIMER_REARM_DEFERRED and then got distracted. HRTIMER on arm are cheap. Do we need this deps on generic entry? Anyways. More reason to decouple I take :)
On Tue, May 19, 2026 at 01:42:56AM +0100, Qais Yousef wrote: > On 05/18/26 09:40, Peter Zijlstra wrote: > > On Sun, May 17, 2026 at 05:07:37AM +0100, Qais Yousef wrote: > > > With HRTICK now is default on, > > > > .. for archs that use GENERIC_ENTRY. So for example, ARM64 doesn't (yet) > > have this. > > I remembering trying to trace down the deps on HRTIMER_REARM_DEFERRED and then > got distracted. > > HRTIMER on arm are cheap. Do we need this deps on generic entry? > > Anyways. More reason to decouple I take :) ARM64 Generic Entry patches are en-route somewhere. I'll get sorted.
On Sun, 17 May 2026 05:07:37 +0100 Qais Yousef <qyousef@layalina.io> wrote: > Previous attempt to make HZ 1000 the default [1] to help with scheduler > responsiveness didn't get merged. But maybe for the best, as I think this idea > of decoupling sched_tick() from HZ makes more sense. We shouldn't need to make > a choice between how often timers should trigger vs how often should the > scheduler update its stats/take decisions. Have you also looked a decoupling HZ/jiffies from the timer interrupt rate? It ought to be reasonable set HZ to 1000 and to do 'jiffies += 4' when the timer interrupts 250 times a second. I'd expect most code to handle that fine. I know one architecture (forgotten which) traditionally used a 1024Hz clock, and some very old ones 60Hz; but I don't think Linux supports either. -- David
On 05/17/26 15:10, David Laight wrote: > On Sun, 17 May 2026 05:07:37 +0100 > Qais Yousef <qyousef@layalina.io> wrote: > > > Previous attempt to make HZ 1000 the default [1] to help with scheduler > > responsiveness didn't get merged. But maybe for the best, as I think this idea > > of decoupling sched_tick() from HZ makes more sense. We shouldn't need to make > > a choice between how often timers should trigger vs how often should the > > scheduler update its stats/take decisions. > > Have you also looked a decoupling HZ/jiffies from the timer interrupt rate? > It ought to be reasonable set HZ to 1000 and to do 'jiffies += 4' when the > timer interrupts 250 times a second. > I'd expect most code to handle that fine. John actually had a stab at that [1]. I am not sure we can fully say we don't care about what TICK_NSEC means. And all the open coding with HZ. I don't see value in the overall potential treewide conversion and the potential compile time math becoming 'runtime overhead' complaint someone might throw out. I did have a stab at making HZ a variable and update TICK_NSEC to depend on it. Unfortunately I lost that work. It required a lot of fix ups to the math and conversions (which I did all of that). And since HZ is assumed to be a label/define; when it is a variable lots of code fails to compile. This is where I was stuck getting all the conversion fixed when my machine died and I forgot to save the work somewhere else and it was all gone. > I know one architecture (forgotten which) traditionally used a 1024Hz clock, > and some very old ones 60Hz; but I don't think Linux supports either. Personally I think HZ=1000 is the only sensible option. But I am not going to fight this battle :) [1] https://lore.kernel.org/lkml/20250128063301.3879317-1-jstultz@google.com/
On Sun, 17 May 2026 16:44:01 +0100 Qais Yousef <qyousef@layalina.io> wrote: > On 05/17/26 15:10, David Laight wrote: > > On Sun, 17 May 2026 05:07:37 +0100 > > Qais Yousef <qyousef@layalina.io> wrote: > > > > > Previous attempt to make HZ 1000 the default [1] to help with scheduler > > > responsiveness didn't get merged. But maybe for the best, as I think this idea > > > of decoupling sched_tick() from HZ makes more sense. We shouldn't need to make > > > a choice between how often timers should trigger vs how often should the > > > scheduler update its stats/take decisions. > > > > Have you also looked a decoupling HZ/jiffies from the timer interrupt rate? > > It ought to be reasonable set HZ to 1000 and to do 'jiffies += 4' when the > > timer interrupts 250 times a second. > > I'd expect most code to handle that fine. > > John actually had a stab at that [1]. I'd forgotten about that - even though I commented :-) > I am not sure we can fully say we don't care about what TICK_NSEC means. And > all the open coding with HZ. > > I don't see value in the overall potential treewide conversion and the > potential compile time math becoming 'runtime overhead' complaint someone might > throw out. If HZ is fixed at 1000 then there is no extra maths. > I did have a stab at making HZ a variable and update TICK_NSEC to depend on it. I do remember that, and thinking it would add a lot of extra maths. ... > > I know one architecture (forgotten which) traditionally used a 1024Hz clock, > > and some very old ones 60Hz; but I don't think Linux supports either. > > Personally I think HZ=1000 is the only sensible option. But I am not going to > fight this battle :) I go for only allowing values that divide evenly into 1000. That really only gives you interrupt rates of 1000Hz, 500Hz, 250Hz, 200Hz and 100Hz (and maybe 50Hz). -- David > > [1] https://lore.kernel.org/lkml/20250128063301.3879317-1-jstultz@google.com/
On 05/17/26 17:09, David Laight wrote: > On Sun, 17 May 2026 16:44:01 +0100 > Qais Yousef <qyousef@layalina.io> wrote: > > > On 05/17/26 15:10, David Laight wrote: > > > On Sun, 17 May 2026 05:07:37 +0100 > > > Qais Yousef <qyousef@layalina.io> wrote: > > > > > > > Previous attempt to make HZ 1000 the default [1] to help with scheduler > > > > responsiveness didn't get merged. But maybe for the best, as I think this idea > > > > of decoupling sched_tick() from HZ makes more sense. We shouldn't need to make > > > > a choice between how often timers should trigger vs how often should the > > > > scheduler update its stats/take decisions. > > > > > > Have you also looked a decoupling HZ/jiffies from the timer interrupt rate? > > > It ought to be reasonable set HZ to 1000 and to do 'jiffies += 4' when the > > > timer interrupts 250 times a second. > > > I'd expect most code to handle that fine. > > > > John actually had a stab at that [1]. > > I'd forgotten about that - even though I commented :-) > > > I am not sure we can fully say we don't care about what TICK_NSEC means. And > > all the open coding with HZ. > > > > I don't see value in the overall potential treewide conversion and the > > potential compile time math becoming 'runtime overhead' complaint someone might > > throw out. > > If HZ is fixed at 1000 then there is no extra maths. > > > I did have a stab at making HZ a variable and update TICK_NSEC to depend on it. > > I do remember that, and thinking it would add a lot of extra maths. > > ... > > > I know one architecture (forgotten which) traditionally used a 1024Hz clock, > > > and some very old ones 60Hz; but I don't think Linux supports either. > > > > Personally I think HZ=1000 is the only sensible option. But I am not going to > > fight this battle :) > > I go for only allowing values that divide evenly into 1000. > That really only gives you interrupt rates of 1000Hz, 500Hz, 250Hz, 200Hz > and 100Hz (and maybe 50Hz). I don't know. I think the HZ being tightly coupled with timers is fine and makes sense logically. Trying to decouple this decades long assumptions might come with a lot of un-intended consequences for something that can be solved much easier by decoupling users that need better consistency and guarantees across all configurations. I don't see a great benefit trying to play games with jiffies. Load balancer will still need to do work to decouple from jiffy in this instance too - though as I stated with push lb I think this work can be skipped given we have a superior alternative soon hopefully. The PTICK can be generalized to allow other users potentially who can benefit from decoupling from timer TICK. But I'll defer to the gurus :)
On 05/17/26 05:07, Qais Yousef wrote: > Previous attempt to make HZ 1000 the default [1] to help with scheduler > responsiveness didn't get merged. But maybe for the best, as I think this idea > of decoupling sched_tick() from HZ makes more sense. We shouldn't need to make > a choice between how often timers should trigger vs how often should the > scheduler update its stats/take decisions. > > With HRTICK now is default on, preemption checks are less of problem. But tasks > could need to migrate (specifically on HMP systems) and frequencies could need > updating for solo long running tasks. > > The conversion is half complete. We need to invent a new jiffy (piffy?) and > move existing load balancing logic (and potentially other users if they want > to) to speed them up. BUT, with push load balancer patches circulating around, > I am not sure it is worth while or desired to speed it since we have a better > faster mechanism to keep the system balanced without the need for the 'heavy > handed' full load balance to trigger. The push lb can benefit from this ptick > too to ensure more responsiveness. Err, this whole series is RFC. Forgot to add the tag, sorry. > > [1] https://lore.kernel.org/lkml/20250226000810.459547-1-qyousef@layalina.io/ > > Qais Yousef (3): > sched/tick: Decouple sched_tick() from HZ > sched/tick: Move TICK_NSEC users to PTICK_NSEC > sched/tick: Turn on PTICK by default > > kernel/sched/clock.c | 4 +- > kernel/sched/core.c | 78 +++++++++++++++++++++++++++++--- > kernel/sched/cpufreq_schedutil.c | 2 +- > kernel/sched/fair.c | 6 +-- > kernel/sched/features.h | 8 ++++ > kernel/sched/sched.h | 7 +++ > 6 files changed, 93 insertions(+), 12 deletions(-) > > -- > 2.34.1 >
© 2016 - 2026 Red Hat, Inc.