From nobody Sat Apr 4 07:48:55 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 81A1F3AA4F3 for ; Fri, 20 Mar 2026 11:30:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774006254; cv=none; b=WdUU4hZ/RbjBOPrpz4mG5a1vxu+4vx1b59qwh9HIlbEkBU8ekbBAlTybb4NbvgCBA0rYqC2DgdQnvl7S52eHjB4LYxLa2VoYcG+lFdUFBgBGOKVBho4kdRHWSzxWvqakTMckKKHtHnJHxzzqmSVRoBNtPQfmB+HvJ+qqLyiyUKM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774006254; c=relaxed/simple; bh=b/ymKsCUhYifskHuPBx+OkURxhdFC+cKymayrNSdVEw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=L/cR/5H+anaGLV2xnM3HvwbrAwaL+eAikBMwokatfWJjAV5LDWAySQKF44wkGgnLIHKg/iG0wz7VxsPDTjOpHqH5LNXXwQJaIQBJIlwzrLZd7QhVgnwSLjqiLlAITs6CS5mOZKSQarp30UQU/hlYHtJVtYZ8eqjyuVcw5Lw0Zu4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 37A8B16F3; Fri, 20 Mar 2026 04:30:38 -0700 (PDT) Received: from lakrids.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 9FE8F3F7BD; Fri, 20 Mar 2026 04:30:42 -0700 (PDT) From: Mark Rutland To: linux-arm-kernel@lists.infradead.org Cc: ada.coupriediaz@arm.com, catalin.marinas@arm.com, linux-kernel@vger.kernel.org, luto@kernel.org, mark.rutland@arm.com, peterz@infradead.org, ruanjinjie@huawei.com, tglx@kernel.org, vladimir.murzin@arm.com, will@kernel.org Subject: [PATCH 1/2] arm64/entry: Fix involuntary preemption exception masking Date: Fri, 20 Mar 2026 11:30:25 +0000 Message-Id: <20260320113026.3219620-2-mark.rutland@arm.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20260320113026.3219620-1-mark.rutland@arm.com> References: <20260320113026.3219620-1-mark.rutland@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" On arm64, involuntary kernel preemption has been subtly broken since the move to the generic irq entry code. When preemption occurs, the new task may run with SError and Debug exceptions masked unexpectedly, leading to a loss of RAS events, breakpoints, watchpoints, and single-step exceptions. We can fix this relatively simply by moving the preemption logic out of irqentry_exit(), which is desirable for a number of other reasons on arm64. Context and rationale below: 1) Architecturally, several groups of exceptions can be masked independently, including 'Debug', 'SError', 'IRQ', and 'FIQ', whose mask bits can be read/written via the 'DAIF' register. Other mask bits exist, including 'PM' and 'AllInt', which we will need to use in future (e.g. for architectural NMI support). The entry code needs to manipulate all of these, but the generic entry code only knows about interrupts (which means both IRQ and FIQ on arm64), and the other exception masks aren't generic. 2) Architecturally, all maskable exceptions MUST be masked during exception entry and exception return. Upon exception entry, hardware places exception context into exception registers (e.g. the PC is saved into ELR_ELx). Upon exception return, hardware restores exception context from those exception registers (e.g. the PC is restored from ELR_ELx). To ensure the exception registers aren't clobbered by recursive exceptions, all maskable exceptions must be masked early during entry and late during exit. Hardware masks all maskable exceptions automatically at exception entry. Software must unmask these as required, and must mask them prior to exception return. 3) Architecturally, hardware masks all maskable exceptions upon any exception entry. A synchronous exception (e.g. a fault on a memory access) can be taken from any context (e.g. where IRQ+FIQ might be masked), and the entry code must explicitly 'inherit' the unmasking from the original context by reading the exception registers (e.g. SPSR_ELx) and writing to DAIF, etc. 4) When 'pseudo-NMI' is used, Linux masks interrupts via a combination of DAIF and the 'PMR' priority mask register. At entry and exit, interrupts must be masked via DAIF, but most kernel code will mask/unmask regular interrupts using PMR (e.g. in local_irq_save() and local_irq_restore()). This requires more complicated transitions at entry and exit. Early during entry or late during return, interrupts are masked via DAIF, and kernel code which manipulates PMR to mask/unmask interrupts will not function correctly in this state. This also requires fairly complicated management of DAIF and PMR when handling interrupts, and arm64 has special logic to avoid preempting from pseudo-NMIs which currently lives in arch_irqentry_exit_need_resched(). 5) Most kernel code runs with all exceptions unmasked. When scheduling, only interrupts should be masked (by PMR pseudo-NMI is used, and by DAIF otherwise). For most exceptions, arm64's entry code has a sequence similar to that of el1_abort(), which is used for faults: | static void noinstr el1_abort(struct pt_regs *regs, unsigned long esr) | { | unsigned long far =3D read_sysreg(far_el1); | irqentry_state_t state; | | state =3D enter_from_kernel_mode(regs); | local_daif_inherit(regs); | do_mem_abort(far, esr, regs); | local_daif_mask(); | exit_to_kernel_mode(regs, state); | } ... where enter_from_kernel_mode() and exit_to_kernel_mode() are wrappers around irqentry_enter() and irqentry_exit() which perform additional arm64-specific entry/exit logic. Currently, the generic irq entry code will attempt to preempt from any exception under irqentry_exit() where interrupts were unmasked in the original context. As arm64's entry code will have already masked exceptions via DAIF, this results in the problems described above. Fix this by opting out of preemption in irqentry_exit(), and restoring arm64's old behaivour of explicitly preempting when returning from IRQ or FIQ, before calling exit_to_kernel_mode() / irqentry_exit(). This ensures that preemption occurs when only interrupts are masked, and where that masking is compatible with most kernel code (e.g. using PMR when pseudo-NMI is in use). Fixes: 99eb057ccd67 ("arm64: entry: Move arm64_preempt_schedule_irq() into = __exit_to_kernel_mode()") Reported-by: Ada Couprie Diaz Reported-by: Vladimir Murzin Signed-off-by: Mark Rutland Cc: Andy Lutomirski Cc: Catalin Marinas Cc: Jinjie Ruan Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Will Deacon --- arch/Kconfig | 3 +++ arch/arm64/Kconfig | 1 + arch/arm64/kernel/entry-common.c | 2 ++ kernel/entry/common.c | 4 +++- 4 files changed, 9 insertions(+), 1 deletion(-) Thomas, Peter, I have a couple of things I'd like to check: (1) The generic irq entry code will preempt from any exception (e.g. a synchronous fault) where interrupts were unmasked in the original context. Is that intentional/necessary, or was that just the way the x86 code happened to be implemented? I assume that it'd be fine if arm64 only preempted from true interrupts, but if that was intentional/necessary I can go rework this. (2) The generic irq entry code only preempts when RCU was watching in the original context. IIUC that's just to avoid preempting from the idle thread. Is it functionally necessary to avoid that, or is that just an optimization? I'm asking because historically arm64 didn't check that, and I haven't bothered checking here. I don't know whether we have a latent functional bug. Mark. diff --git a/arch/Kconfig b/arch/Kconfig index 102ddbd4298ef..c8c99cd955281 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -102,6 +102,9 @@ config HOTPLUG_PARALLEL bool select HOTPLUG_SPLIT_STARTUP =20 +config ARCH_HAS_OWN_IRQ_PREEMPTION + bool + config GENERIC_IRQ_ENTRY bool =20 diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 38dba5f7e4d2d..bf0ec8237de45 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -42,6 +42,7 @@ config ARM64 select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_NONLEAF_PMD_YOUNG if ARM64_HAFT + select ARCH_HAS_OWN_IRQ_PREEMPTION select ARCH_HAS_PREEMPT_LAZY select ARCH_HAS_PTDUMP select ARCH_HAS_PTE_SPECIAL diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-com= mon.c index 3625797e9ee8f..1aedadf09eb4d 100644 --- a/arch/arm64/kernel/entry-common.c +++ b/arch/arm64/kernel/entry-common.c @@ -497,6 +497,8 @@ static __always_inline void __el1_irq(struct pt_regs *r= egs, do_interrupt_handler(regs, handler); irq_exit_rcu(); =20 + irqentry_exit_cond_resched(); + exit_to_kernel_mode(regs, state); } static void noinstr el1_interrupt(struct pt_regs *regs, diff --git a/kernel/entry/common.c b/kernel/entry/common.c index 9ef63e4147913..af9cae1f225e3 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -235,8 +235,10 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqen= try_state_t state) } =20 instrumentation_begin(); - if (IS_ENABLED(CONFIG_PREEMPTION)) + if (IS_ENABLED(CONFIG_PREEMPTION) && + !IS_ENABLED(CONFIG_ARCH_HAS_OWN_IRQ_PREEMPTION)) { irqentry_exit_cond_resched(); + } =20 /* Covers both tracing and lockdep */ trace_hardirqs_on(); --=20 2.30.2 From nobody Sat Apr 4 07:48:55 2026 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id CEAB53AA1BC for ; Fri, 20 Mar 2026 11:30:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774006257; cv=none; b=oAlVRqJLdipXT0wvJd8AUgwZSQir0Z1O/uiiV2voOhI5C0wIDeb8HpNlNYzOdW2Er+JFNUt0pDR1FfSgRtaePh+cNQLafmzY7IcZRWPSNdxm8vm2SirHyJvOvzDrl26vRbmw2wMgWtHvwIFiecLkUN+SKb9NosMnUpUoHok6CkA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774006257; c=relaxed/simple; bh=i2NyVyEZ+/7Iam2NIktlD4QK3MoEhYPNxPtfOnnI5YA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=lWo+iHuD+Koq7067tOgd8MctAdEsdy0RtDa+X8RiGc5vtXjhw1eWuzpP07ozO8pytbBd8p3BDpfIF7CgcXJytUDA2bsy+PKDhUigytF9YZEF9O8lbN2QcFY9caFsLnGeYThXuJzD8tr4l6IrtXflLILlWb5SaofqMkAZqsf+wNY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3718422F8; Fri, 20 Mar 2026 04:30:42 -0700 (PDT) Received: from lakrids.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 9FD273F7BD; Fri, 20 Mar 2026 04:30:46 -0700 (PDT) From: Mark Rutland To: linux-arm-kernel@lists.infradead.org Cc: ada.coupriediaz@arm.com, catalin.marinas@arm.com, linux-kernel@vger.kernel.org, luto@kernel.org, mark.rutland@arm.com, peterz@infradead.org, ruanjinjie@huawei.com, tglx@kernel.org, vladimir.murzin@arm.com, will@kernel.org Subject: [PATCH 2/2] arm64/entry: Remove arch_irqentry_exit_need_resched() Date: Fri, 20 Mar 2026 11:30:26 +0000 Message-Id: <20260320113026.3219620-3-mark.rutland@arm.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20260320113026.3219620-1-mark.rutland@arm.com> References: <20260320113026.3219620-1-mark.rutland@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The only user of arch_irqentry_exit_need_resched() is arm64. As arm64 provides its own preemption logic, there's no need to indirect some of this via the generic irq entry code. Remove arch_irqentry_exit_need_resched(), and fold its logic directly into arm64's entry code. Signed-off-by: Mark Rutland Cc: Ada Couprie Diaz Cc: Andy Lutomirski Cc: Catalin Marinas Cc: Jinjie Ruan Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Vladimir Murzin Cc: Will Deacon --- arch/arm64/include/asm/entry-common.h | 27 --------------------------- arch/arm64/kernel/entry-common.c | 27 ++++++++++++++++++++++++++- kernel/entry/common.c | 16 +--------------- 3 files changed, 27 insertions(+), 43 deletions(-) diff --git a/arch/arm64/include/asm/entry-common.h b/arch/arm64/include/asm= /entry-common.h index cab8cd78f6938..2b8335ea2a390 100644 --- a/arch/arm64/include/asm/entry-common.h +++ b/arch/arm64/include/asm/entry-common.h @@ -27,31 +27,4 @@ static __always_inline void arch_exit_to_user_mode_work(= struct pt_regs *regs, =20 #define arch_exit_to_user_mode_work arch_exit_to_user_mode_work =20 -static inline bool arch_irqentry_exit_need_resched(void) -{ - /* - * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC - * priority masking is used the GIC irqchip driver will clear DAIF.IF - * using gic_arch_enable_irqs() for normal IRQs. If anything is set in - * DAIF we must have handled an NMI, so skip preemption. - */ - if (system_uses_irq_prio_masking() && read_sysreg(daif)) - return false; - - /* - * Preempting a task from an IRQ means we leave copies of PSTATE - * on the stack. cpufeature's enable calls may modify PSTATE, but - * resuming one of these preempted tasks would undo those changes. - * - * Only allow a task to be preempted once cpufeatures have been - * enabled. - */ - if (!system_capabilities_finalized()) - return false; - - return true; -} - -#define arch_irqentry_exit_need_resched arch_irqentry_exit_need_resched - #endif /* _ASM_ARM64_ENTRY_COMMON_H */ diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-com= mon.c index 1aedadf09eb4d..c4481e0e326a7 100644 --- a/arch/arm64/kernel/entry-common.c +++ b/arch/arm64/kernel/entry-common.c @@ -486,6 +486,31 @@ static __always_inline void __el1_pnmi(struct pt_regs = *regs, irqentry_nmi_exit(regs, state); } =20 +static void arm64_irqentry_exit_cond_resched(void) +{ + /* + * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC + * priority masking is used the GIC irqchip driver will clear DAIF.IF + * using gic_arch_enable_irqs() for normal IRQs. If anything is set in + * DAIF we must have handled an NMI, so skip preemption. + */ + if (system_uses_irq_prio_masking() && read_sysreg(daif)) + return; + + /* + * Preempting a task from an IRQ means we leave copies of PSTATE + * on the stack. cpufeature's enable calls may modify PSTATE, but + * resuming one of these preempted tasks would undo those changes. + * + * Only allow a task to be preempted once cpufeatures have been + * enabled. + */ + if (!system_capabilities_finalized()) + return; + + irqentry_exit_cond_resched(); +} + static __always_inline void __el1_irq(struct pt_regs *regs, void (*handler)(struct pt_regs *)) { @@ -497,7 +522,7 @@ static __always_inline void __el1_irq(struct pt_regs *r= egs, do_interrupt_handler(regs, handler); irq_exit_rcu(); =20 - irqentry_exit_cond_resched(); + arm64_irqentry_exit_cond_resched(); =20 exit_to_kernel_mode(regs, state); } diff --git a/kernel/entry/common.c b/kernel/entry/common.c index af9cae1f225e3..28351d76cfeb3 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -171,20 +171,6 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs= *regs) return ret; } =20 -/** - * arch_irqentry_exit_need_resched - Architecture specific need resched fu= nction - * - * Invoked from raw_irqentry_exit_cond_resched() to check if resched is ne= eded. - * Defaults return true. - * - * The main purpose is to permit arch to avoid preemption of a task from a= n IRQ. - */ -static inline bool arch_irqentry_exit_need_resched(void); - -#ifndef arch_irqentry_exit_need_resched -static inline bool arch_irqentry_exit_need_resched(void) { return true; } -#endif - void raw_irqentry_exit_cond_resched(void) { if (!preempt_count()) { @@ -192,7 +178,7 @@ void raw_irqentry_exit_cond_resched(void) rcu_irq_exit_check_preempt(); if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) WARN_ON_ONCE(!on_thread_stack()); - if (need_resched() && arch_irqentry_exit_need_resched()) + if (need_resched()) preempt_schedule_irq(); } } --=20 2.30.2