arch/arm64/kernel/traps.c | 1 + 1 file changed, 1 insertion(+)
Set TAINT_MACHINE_CHECK when SError interrupts trigger a panic to
flag potential hardware faults. This tainting mechanism aids in
debugging and enables correlation of hardware-related crashes in
large-scale deployments.
This change aligns with similar patches[1] that mark machine check
events when the system crashes due to hardware errors.
Link: https://lore.kernel.org/all/20250702-add_tain-v1-1-9187b10914b9@debian.org/ [1]
Signed-off-by: Breno Leitao <leitao@debian.org>
---
arch/arm64/kernel/traps.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 9bfa5c944379d..7468b22585cef 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -931,6 +931,7 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne
void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr)
{
+ add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
console_verbose();
pr_crit("SError Interrupt on CPU%d, code 0x%016lx -- %s\n",
---
base-commit: 8c2e52ebbe885c7eeaabd3b7ddcdc1246fc400d2
change-id: 20250710-arm_serror-77fca8d732d4
Best regards,
--
Breno Leitao <leitao@debian.org>
On Thu, Jul 10, 2025 at 03:46:35AM -0700, Breno Leitao wrote: > Set TAINT_MACHINE_CHECK when SError interrupts trigger a panic to > flag potential hardware faults. This tainting mechanism aids in > debugging and enables correlation of hardware-related crashes in > large-scale deployments. > > This change aligns with similar patches[1] that mark machine check > events when the system crashes due to hardware errors. > > Link: https://lore.kernel.org/all/20250702-add_tain-v1-1-9187b10914b9@debian.org/ [1] > Signed-off-by: Breno Leitao <leitao@debian.org> > --- > arch/arm64/kernel/traps.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c > index 9bfa5c944379d..7468b22585cef 100644 > --- a/arch/arm64/kernel/traps.c > +++ b/arch/arm64/kernel/traps.c > @@ -931,6 +931,7 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne > > void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr) > { > + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK); > console_verbose(); > > pr_crit("SError Interrupt on CPU%d, code 0x%016lx -- %s\n", If we're going to taint for SError, shouldn't we also taint for an unclaimed SEA? Will
Hello Will, On Sun, Jul 13, 2025 at 11:46:06PM +0100, Will Deacon wrote: > On Thu, Jul 10, 2025 at 03:46:35AM -0700, Breno Leitao wrote: > > --- a/arch/arm64/kernel/traps.c > > +++ b/arch/arm64/kernel/traps.c > > @@ -931,6 +931,7 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne > > > > void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr) > > { > > + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK); > > console_verbose(); > > > > pr_crit("SError Interrupt on CPU%d, code 0x%016lx -- %s\n", > > If we're going to taint for SError, shouldn't we also taint for an > unclaimed SEA? Yes. I was not very familiar with SEA errors, given I haven't seen on in production yet, but, reading about it, that is another seems to crash the system due to hardware errors, thus, we want to taint MACHINE_CHECK. What about this? Author: Breno Leitao <leitao@debian.org> Date: Mon Jul 14 05:16:55 2025 -0700 arm64: Taint kernel on fatal hardware error in do_sea() This patch updates the do_sea() handler to taint the kernel with TAINT_MACHINE_CHECK when a fatal hardware error is detected and reported through Synchronous External Abort (SEA). By marking the kernel as tainted at the point of error, we improve post-mortem diagnostics and make it clear that a machine check or unrecoverable hardware fault has occurred. Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 11eb8d1adc84..f590dc71ce99 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -838,6 +838,7 @@ static int do_sea(unsigned long far, unsigned long esr, struct pt_regs *regs) */ siaddr = untagged_addr(far); } + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK); arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr); return 0; Thanks for the suggestion, --breno
On Mon, Jul 14, 2025 at 05:26:43AM -0700, Breno Leitao wrote: > On Sun, Jul 13, 2025 at 11:46:06PM +0100, Will Deacon wrote: > > On Thu, Jul 10, 2025 at 03:46:35AM -0700, Breno Leitao wrote: > > > > --- a/arch/arm64/kernel/traps.c > > > +++ b/arch/arm64/kernel/traps.c > > > @@ -931,6 +931,7 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne > > > > > > void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr) > > > { > > > + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK); > > > console_verbose(); > > > > > > pr_crit("SError Interrupt on CPU%d, code 0x%016lx -- %s\n", > > > > If we're going to taint for SError, shouldn't we also taint for an > > unclaimed SEA? > > Yes. I was not very familiar with SEA errors, given I haven't seen on in > production yet, but, reading about it, that is another seems to crash > the system due to hardware errors, thus, we want to taint MACHINE_CHECK. > > What about this? > > Author: Breno Leitao <leitao@debian.org> > Date: Mon Jul 14 05:16:55 2025 -0700 > > arm64: Taint kernel on fatal hardware error in do_sea() > > This patch updates the do_sea() handler to taint the kernel with > TAINT_MACHINE_CHECK when a fatal hardware error is detected and > reported through Synchronous External Abort (SEA). By marking > the kernel as tainted at the point of error, we improve > post-mortem diagnostics and make it clear that a machine check > or unrecoverable hardware fault has occurred. > > Suggested-by: Will Deacon <will@kernel.org> > Signed-off-by: Breno Leitao <leitao@debian.org> > > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c > index 11eb8d1adc84..f590dc71ce99 100644 > --- a/arch/arm64/mm/fault.c > +++ b/arch/arm64/mm/fault.c > @@ -838,6 +838,7 @@ static int do_sea(unsigned long far, unsigned long esr, struct pt_regs *regs) > */ > siaddr = untagged_addr(far); > } > + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK); > arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr); > > return 0; Yeah, I reckon so. Probably just fold these into a single patch, though. Cheers, Will
On Tue, Jul 15, 2025 at 03:02:13PM +0100, Will Deacon wrote: > On Mon, Jul 14, 2025 at 05:26:43AM -0700, Breno Leitao wrote: > > On Sun, Jul 13, 2025 at 11:46:06PM +0100, Will Deacon wrote: > > > On Thu, Jul 10, 2025 at 03:46:35AM -0700, Breno Leitao wrote: > > > > > > --- a/arch/arm64/kernel/traps.c > > > > +++ b/arch/arm64/kernel/traps.c > > > > @@ -931,6 +931,7 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne > > > > > > > > void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr) > > > > { > > > > + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK); > > > > console_verbose(); > > > > > > > > pr_crit("SError Interrupt on CPU%d, code 0x%016lx -- %s\n", > > > > > > If we're going to taint for SError, shouldn't we also taint for an > > > unclaimed SEA? > > > > Yes. I was not very familiar with SEA errors, given I haven't seen on in > > production yet, but, reading about it, that is another seems to crash > > the system due to hardware errors, thus, we want to taint MACHINE_CHECK. > > > > What about this? > > > > Author: Breno Leitao <leitao@debian.org> > > Date: Mon Jul 14 05:16:55 2025 -0700 > > > > arm64: Taint kernel on fatal hardware error in do_sea() > > > > This patch updates the do_sea() handler to taint the kernel with > > TAINT_MACHINE_CHECK when a fatal hardware error is detected and > > reported through Synchronous External Abort (SEA). By marking > > the kernel as tainted at the point of error, we improve > > post-mortem diagnostics and make it clear that a machine check > > or unrecoverable hardware fault has occurred. > > > > Suggested-by: Will Deacon <will@kernel.org> > > Signed-off-by: Breno Leitao <leitao@debian.org> > > > > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c > > index 11eb8d1adc84..f590dc71ce99 100644 > > --- a/arch/arm64/mm/fault.c > > +++ b/arch/arm64/mm/fault.c > > @@ -838,6 +838,7 @@ static int do_sea(unsigned long far, unsigned long esr, struct pt_regs *regs) > > */ > > siaddr = untagged_addr(far); > > } > > + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK); > > arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr); > > > > return 0; > > Yeah, I reckon so. Probably just fold these into a single patch, though. Thanks. I test it better tomorrow, then send it. Thanks for the suggestions, --breno
© 2016 - 2025 Red Hat, Inc.