kernel/Kconfig.preempt | 3 +++ kernel/sched/core.c | 2 +- kernel/sched/debug.c | 2 +- 3 files changed, 5 insertions(+), 2 deletions(-)
[ with 6.18 being an LTS release, it might be a good time for this ]
The introduction of PREEMPT_LAZY was for multiple reasons:
- PREEMPT_RT suffered from over-scheduling, hurting performance compared to
!PREEMPT_RT.
- the introduction of (more) features that rely on preemption; like
folio_zero_user() which can do large memset() without preemption checks.
(Xen already had a horrible hack to deal with long running hypercalls)
- the endless and uncontrolled sprinkling of cond_resched() -- mostly cargo
cult or in response to poor to replicate workloads.
By moving to a model that is fundamentally preemptable these things become
manageable and avoid needing to introduce more horrible hacks.
Since this is a requirement; limit PREEMPT_NONE to architectures that do not
support preemption at all. Further limit PREEMPT_VOLUNTARY to those
architectures that do not yet have PREEMPT_LAZY support (with the eventual goal
to make this the empty set and completely remove voluntary preemption and
cond_resched() -- notably VOLUNTARY is already limited to !ARCH_NO_PREEMPT.)
This leaves up-to-date architectures (arm64, loongarch, powerpc, riscv, s390,
x86) with only two preemption models: full and lazy (like PREEMPT_RT).
While Lazy has been the recommended setting for a while, not all distributions
have managed to make the switch yet. Force things along. Keep the patch minimal
in case of hard to address regressions that might pop up.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/Kconfig.preempt | 3 +++
kernel/sched/core.c | 2 +-
kernel/sched/debug.c | 2 +-
3 files changed, 5 insertions(+), 2 deletions(-)
--- a/kernel/Kconfig.preempt
+++ b/kernel/Kconfig.preempt
@@ -16,11 +16,13 @@ config ARCH_HAS_PREEMPT_LAZY
choice
prompt "Preemption Model"
+ default PREEMPT_LAZY if ARCH_HAS_PREEMPT_LAZY
default PREEMPT_NONE
config PREEMPT_NONE
bool "No Forced Preemption (Server)"
depends on !PREEMPT_RT
+ depends on ARCH_NO_PREEMPT
select PREEMPT_NONE_BUILD if !PREEMPT_DYNAMIC
help
This is the traditional Linux preemption model, geared towards
@@ -35,6 +37,7 @@ config PREEMPT_NONE
config PREEMPT_VOLUNTARY
bool "Voluntary Kernel Preemption (Desktop)"
+ depends on !ARCH_HAS_PREEMPT_LAZY
depends on !ARCH_NO_PREEMPT
depends on !PREEMPT_RT
select PREEMPT_VOLUNTARY_BUILD if !PREEMPT_DYNAMIC
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7553,7 +7553,7 @@ int preempt_dynamic_mode = preempt_dynam
int sched_dynamic_mode(const char *str)
{
-# ifndef CONFIG_PREEMPT_RT
+# if !(defined(CONFIG_PREEMPT_RT) || defined(CONFIG_ARCH_HAS_PREEMPT_LAZY))
if (!strcmp(str, "none"))
return preempt_dynamic_none;
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -243,7 +243,7 @@ static ssize_t sched_dynamic_write(struc
static int sched_dynamic_show(struct seq_file *m, void *v)
{
- int i = IS_ENABLED(CONFIG_PREEMPT_RT) * 2;
+ int i = (IS_ENABLED(CONFIG_PREEMPT_RT) || IS_ENABLED(CONFIG_ARCH_HAS_PREEMPT_LAZY)) * 2;
int j;
/* Count entries in NULL terminated preempt_modes */
Hi Peter.
On 12/19/25 3:45 PM, Peter Zijlstra wrote:
>
> [ with 6.18 being an LTS release, it might be a good time for this ]
>
> The introduction of PREEMPT_LAZY was for multiple reasons:
>
> - PREEMPT_RT suffered from over-scheduling, hurting performance compared to
> !PREEMPT_RT.
>
> - the introduction of (more) features that rely on preemption; like
> folio_zero_user() which can do large memset() without preemption checks.
>
> (Xen already had a horrible hack to deal with long running hypercalls)
>
> - the endless and uncontrolled sprinkling of cond_resched() -- mostly cargo
> cult or in response to poor to replicate workloads.
>
> By moving to a model that is fundamentally preemptable these things become
> manageable and avoid needing to introduce more horrible hacks.
>
> Since this is a requirement; limit PREEMPT_NONE to architectures that do not
> support preemption at all. Further limit PREEMPT_VOLUNTARY to those
> architectures that do not yet have PREEMPT_LAZY support (with the eventual goal
> to make this the empty set and completely remove voluntary preemption and
> cond_resched() -- notably VOLUNTARY is already limited to !ARCH_NO_PREEMPT.)
>
> This leaves up-to-date architectures (arm64, loongarch, powerpc, riscv, s390,
> x86) with only two preemption models: full and lazy (like PREEMPT_RT).
>
> While Lazy has been the recommended setting for a while, not all distributions
> have managed to make the switch yet. Force things along. Keep the patch minimal
> in case of hard to address regressions that might pop up.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> kernel/Kconfig.preempt | 3 +++
> kernel/sched/core.c | 2 +-
> kernel/sched/debug.c | 2 +-
> 3 files changed, 5 insertions(+), 2 deletions(-)
>
> --- a/kernel/Kconfig.preempt
> +++ b/kernel/Kconfig.preempt
> @@ -16,11 +16,13 @@ config ARCH_HAS_PREEMPT_LAZY
>
> choice
> prompt "Preemption Model"
> + default PREEMPT_LAZY if ARCH_HAS_PREEMPT_LAZY
> default PREEMPT_NONE
>
> config PREEMPT_NONE
> bool "No Forced Preemption (Server)"
> depends on !PREEMPT_RT
> + depends on ARCH_NO_PREEMPT
> select PREEMPT_NONE_BUILD if !PREEMPT_DYNAMIC
> help
> This is the traditional Linux preemption model, geared towards
> @@ -35,6 +37,7 @@ config PREEMPT_NONE
>
> config PREEMPT_VOLUNTARY
> bool "Voluntary Kernel Preemption (Desktop)"
> + depends on !ARCH_HAS_PREEMPT_LAZY
> depends on !ARCH_NO_PREEMPT
> depends on !PREEMPT_RT
> select PREEMPT_VOLUNTARY_BUILD if !PREEMPT_DYNAMIC
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7553,7 +7553,7 @@ int preempt_dynamic_mode = preempt_dynam
>
> int sched_dynamic_mode(const char *str)
> {
> -# ifndef CONFIG_PREEMPT_RT
> +# if !(defined(CONFIG_PREEMPT_RT) || defined(CONFIG_ARCH_HAS_PREEMPT_LAZY))
> if (!strcmp(str, "none"))
> return preempt_dynamic_none;
>
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -243,7 +243,7 @@ static ssize_t sched_dynamic_write(struc
>
> static int sched_dynamic_show(struct seq_file *m, void *v)
> {
> - int i = IS_ENABLED(CONFIG_PREEMPT_RT) * 2;
> + int i = (IS_ENABLED(CONFIG_PREEMPT_RT) || IS_ENABLED(CONFIG_ARCH_HAS_PREEMPT_LAZY)) * 2;
> int j;
>
> /* Count entries in NULL terminated preempt_modes */
Maybe only change the default to LAZY, but keep other options possible
via dynamic update?
- When the kernel changes to lazy being the default, the scheduling
pattern can change and it may affect the workloads. having ability to
dynamically change to none/voluntary could help one to figure out where
it is regressing. we could document cases where regression is expected.
- with preempt=full/lazy we will likely never see softlockups. How are
we going to find out longer kernel paths(some maybe design, some may be
bugs) apart from observing workload regression?
Also, is softlockup code is of any use in preempt=full/lazy?
On Fri, 19 Dec 2025 11:15:02 +0100 Peter Zijlstra <peterz@infradead.org> wrote: > --- a/kernel/Kconfig.preempt > +++ b/kernel/Kconfig.preempt > @@ -16,11 +16,13 @@ config ARCH_HAS_PREEMPT_LAZY > > choice > prompt "Preemption Model" > + default PREEMPT_LAZY if ARCH_HAS_PREEMPT_LAZY > default PREEMPT_NONE I think you can just make this: default PREEMPT_LAZY and remove the PREEMPT_NONE. As PREEMPT_NONE now depends on ARCH_NO_PREEMPT and all the other options depend on !ARCH_NO_PREEMPT, the default will be PREEMPT_LAZY if it's available, but it will never be PREEMPT_NONE if it isn't unless PREEMPT_NONE is the only option available. I added default PREEMPT_LAZY and did a: $ mkdir /tmp/build $ make O=/tmp/build ARCH=alpha defconfig And the result is: CONFIG_PREEMPT_NONE_BUILD=y CONFIG_PREEMPT_NONE=y -- Steve > > config PREEMPT_NONE > bool "No Forced Preemption (Server)" > depends on !PREEMPT_RT > + depends on ARCH_NO_PREEMPT > select PREEMPT_NONE_BUILD if !PREEMPT_DYNAMIC > help > This is the traditional Linux preemption model, geared towards > @@ -35,6 +37,7 @@ config PREEMPT_NONE > > config PREEMPT_VOLUNTARY > bool "Voluntary Kernel Preemption (Desktop)" > + depends on !ARCH_HAS_PREEMPT_LAZY > depends on !ARCH_NO_PREEMPT > depends on !PREEMPT_RT > select PREEMPT_VOLUNTARY_BUILD if !PREEMPT_DYNAMIC
On 19/12/25 11:15, Peter Zijlstra wrote: > [ with 6.18 being an LTS release, it might be a good time for this ] > > The introduction of PREEMPT_LAZY was for multiple reasons: > > - PREEMPT_RT suffered from over-scheduling, hurting performance compared to > !PREEMPT_RT. > > - the introduction of (more) features that rely on preemption; like > folio_zero_user() which can do large memset() without preemption checks. > > (Xen already had a horrible hack to deal with long running hypercalls) > > - the endless and uncontrolled sprinkling of cond_resched() -- mostly cargo > cult or in response to poor to replicate workloads. > > By moving to a model that is fundamentally preemptable these things become > manageable and avoid needing to introduce more horrible hacks. > > Since this is a requirement; limit PREEMPT_NONE to architectures that do not > support preemption at all. Further limit PREEMPT_VOLUNTARY to those > architectures that do not yet have PREEMPT_LAZY support (with the eventual goal > to make this the empty set and completely remove voluntary preemption and > cond_resched() -- notably VOLUNTARY is already limited to !ARCH_NO_PREEMPT.) > > This leaves up-to-date architectures (arm64, loongarch, powerpc, riscv, s390, > x86) with only two preemption models: full and lazy (like PREEMPT_RT). > > While Lazy has been the recommended setting for a while, not all distributions > have managed to make the switch yet. Force things along. Keep the patch minimal > in case of hard to address regressions that might pop up. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com>
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 7dadeaa6e851e7d67733f3e24fc53ee107781d0f
Gitweb: https://git.kernel.org/tip/7dadeaa6e851e7d67733f3e24fc53ee107781d0f
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 18 Dec 2025 15:25:10 +01:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 08 Jan 2026 12:43:57 +01:00
sched: Further restrict the preemption modes
The introduction of PREEMPT_LAZY was for multiple reasons:
- PREEMPT_RT suffered from over-scheduling, hurting performance compared to
!PREEMPT_RT.
- the introduction of (more) features that rely on preemption; like
folio_zero_user() which can do large memset() without preemption checks.
(Xen already had a horrible hack to deal with long running hypercalls)
- the endless and uncontrolled sprinkling of cond_resched() -- mostly cargo
cult or in response to poor to replicate workloads.
By moving to a model that is fundamentally preemptable these things become
managable and avoid needing to introduce more horrible hacks.
Since this is a requirement; limit PREEMPT_NONE to architectures that do not
support preemption at all. Further limit PREEMPT_VOLUNTARY to those
architectures that do not yet have PREEMPT_LAZY support (with the eventual goal
to make this the empty set and completely remove voluntary preemption and
cond_resched() -- notably VOLUNTARY is already limited to !ARCH_NO_PREEMPT.)
This leaves up-to-date architectures (arm64, loongarch, powerpc, riscv, s390,
x86) with only two preemption models: full and lazy.
While Lazy has been the recommended setting for a while, not all distributions
have managed to make the switch yet. Force things along. Keep the patch minimal
in case of hard to address regressions that might pop up.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Link: https://patch.msgid.link/20251219101502.GB1132199@noisy.programming.kicks-ass.net
---
kernel/Kconfig.preempt | 3 +++
kernel/sched/core.c | 2 +-
kernel/sched/debug.c | 2 +-
3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
index da32680..88c594c 100644
--- a/kernel/Kconfig.preempt
+++ b/kernel/Kconfig.preempt
@@ -16,11 +16,13 @@ config ARCH_HAS_PREEMPT_LAZY
choice
prompt "Preemption Model"
+ default PREEMPT_LAZY if ARCH_HAS_PREEMPT_LAZY
default PREEMPT_NONE
config PREEMPT_NONE
bool "No Forced Preemption (Server)"
depends on !PREEMPT_RT
+ depends on ARCH_NO_PREEMPT
select PREEMPT_NONE_BUILD if !PREEMPT_DYNAMIC
help
This is the traditional Linux preemption model, geared towards
@@ -35,6 +37,7 @@ config PREEMPT_NONE
config PREEMPT_VOLUNTARY
bool "Voluntary Kernel Preemption (Desktop)"
+ depends on !ARCH_HAS_PREEMPT_LAZY
depends on !ARCH_NO_PREEMPT
depends on !PREEMPT_RT
select PREEMPT_VOLUNTARY_BUILD if !PREEMPT_DYNAMIC
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5b17d8e..fa72075 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7553,7 +7553,7 @@ int preempt_dynamic_mode = preempt_dynamic_undefined;
int sched_dynamic_mode(const char *str)
{
-# ifndef CONFIG_PREEMPT_RT
+# if !(defined(CONFIG_PREEMPT_RT) || defined(CONFIG_ARCH_HAS_PREEMPT_LAZY))
if (!strcmp(str, "none"))
return preempt_dynamic_none;
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 41caa22..5f9b771 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -243,7 +243,7 @@ static ssize_t sched_dynamic_write(struct file *filp, const char __user *ubuf,
static int sched_dynamic_show(struct seq_file *m, void *v)
{
- int i = IS_ENABLED(CONFIG_PREEMPT_RT) * 2;
+ int i = (IS_ENABLED(CONFIG_PREEMPT_RT) || IS_ENABLED(CONFIG_ARCH_HAS_PREEMPT_LAZY)) * 2;
int j;
/* Count entries in NULL terminated preempt_modes */
© 2016 - 2026 Red Hat, Inc.