[PATCH v2] arm64: Mark kernel as tainted on SAE and SError panic

Breno Leitao posted 1 patch 2 months, 3 weeks ago
arch/arm64/kernel/traps.c | 1 +
arch/arm64/mm/fault.c     | 1 +
2 files changed, 2 insertions(+)
[PATCH v2] arm64: Mark kernel as tainted on SAE and SError panic
Posted by Breno Leitao 2 months, 3 weeks ago
Set TAINT_MACHINE_CHECK when SError or Synchronous External Abort (SEA)
interrupts trigger a panic to flag potential hardware faults. This
tainting mechanism aids in debugging and enables correlation of
hardware-related crashes in large-scale deployments.

This change aligns with similar patches[1] that mark machine check
events when the system crashes due to hardware errors.

Link: https://lore.kernel.org/all/20250702-add_tain-v1-1-9187b10914b9@debian.org/ [1]
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v2:
- Also taint the kernel on Synchronous External Abort panics (Will Deacon)
- Link to v1: https://lore.kernel.org/r/20250714-vmcore_hw_error-v1-1-8cf45edb6334@debian.org
---
 arch/arm64/kernel/traps.c | 1 +
 arch/arm64/mm/fault.c     | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 9bfa5c944379d..7468b22585cef 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -931,6 +931,7 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne
 
 void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr)
 {
+	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
 	console_verbose();
 
 	pr_crit("SError Interrupt on CPU%d, code 0x%016lx -- %s\n",
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index ec0a337891ddf..004106ff4bd03 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -826,6 +826,7 @@ static int do_sea(unsigned long far, unsigned long esr, struct pt_regs *regs)
 		 */
 		siaddr  = untagged_addr(far);
 	}
+	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
 	arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);
 
 	return 0;

---
base-commit: d7b8f8e20813f0179d8ef519541a3527e7661d3a
change-id: 20250707-vmcore_hw_error-322429e6c316

Best regards,
--  
Breno Leitao <leitao@debian.org>
Re: [PATCH v2] arm64: Mark kernel as tainted on SAE and SError panic
Posted by Will Deacon 2 months, 3 weeks ago
On Wed, 16 Jul 2025 02:42:01 -0700, Breno Leitao wrote:
> Set TAINT_MACHINE_CHECK when SError or Synchronous External Abort (SEA)
> interrupts trigger a panic to flag potential hardware faults. This
> tainting mechanism aids in debugging and enables correlation of
> hardware-related crashes in large-scale deployments.
> 
> This change aligns with similar patches[1] that mark machine check
> events when the system crashes due to hardware errors.
> 
> [...]

Applied to arm64 (for-next/misc), thanks!

[1/1] arm64: Mark kernel as tainted on SAE and SError panic
      https://git.kernel.org/arm64/c/d7ce7e3a8464

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
Re: [PATCH v2] arm64: Mark kernel as tainted on SAE and SError panic
Posted by Mark Rutland 2 months, 3 weeks ago
Hi Breno,

On Wed, Jul 16, 2025 at 02:42:01AM -0700, Breno Leitao wrote:
> Set TAINT_MACHINE_CHECK when SError or Synchronous External Abort (SEA)
> interrupts trigger a panic to flag potential hardware faults. This
> tainting mechanism aids in debugging and enables correlation of
> hardware-related crashes in large-scale deployments.
> 
> This change aligns with similar patches[1] that mark machine check
> events when the system crashes due to hardware errors.
> 
> Link: https://lore.kernel.org/all/20250702-add_tain-v1-1-9187b10914b9@debian.org/ [1]
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> Changes in v2:
> - Also taint the kernel on Synchronous External Abort panics (Will Deacon)
> - Link to v1: https://lore.kernel.org/r/20250714-vmcore_hw_error-v1-1-8cf45edb6334@debian.org

I think something went wrong when respinning this patch, because the v1
link above is incorrect, and should be:

  https://lore.kernel.org/linux-arm-kernel/20250710-arm_serror-v1-1-2a3def3740d7@debian.org/

The Cc header for this posting matches that of the unrelated patch (and
excludes Will, Catalin, etc), rather than that of the real v1. The
change-id trailer also doesn't match v1.

The actual patch and commit message look fine to me, so:

Acked-by: Mark Rutland <mark.rutland@arm.com>

I assume that Will or Catalin will be happy to pick this up. I've added
those missing folk to this reply, so I don't imagine this should need a
respin.

Mark.

> ---
>  arch/arm64/kernel/traps.c | 1 +
>  arch/arm64/mm/fault.c     | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index 9bfa5c944379d..7468b22585cef 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -931,6 +931,7 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne
>  
>  void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr)
>  {
> +	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
>  	console_verbose();
>  
>  	pr_crit("SError Interrupt on CPU%d, code 0x%016lx -- %s\n",
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index ec0a337891ddf..004106ff4bd03 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -826,6 +826,7 @@ static int do_sea(unsigned long far, unsigned long esr, struct pt_regs *regs)
>  		 */
>  		siaddr  = untagged_addr(far);
>  	}
> +	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
>  	arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);
>  
>  	return 0;
> 
> ---
> base-commit: d7b8f8e20813f0179d8ef519541a3527e7661d3a
> change-id: 20250707-vmcore_hw_error-322429e6c316
> 
> Best regards,
> --  
> Breno Leitao <leitao@debian.org>
> 
>
Re: [PATCH v2] arm64: Mark kernel as tainted on SAE and SError panic
Posted by Breno Leitao 2 months, 3 weeks ago
Hello Mark,

On Wed, Jul 16, 2025 at 11:19:38AM +0100, Mark Rutland wrote:
> On Wed, Jul 16, 2025 at 02:42:01AM -0700, Breno Leitao wrote:
> > Set TAINT_MACHINE_CHECK when SError or Synchronous External Abort (SEA)
> > interrupts trigger a panic to flag potential hardware faults. This
> > tainting mechanism aids in debugging and enables correlation of
> > hardware-related crashes in large-scale deployments.
> > 
> > This change aligns with similar patches[1] that mark machine check
> > events when the system crashes due to hardware errors.
> > 
> > Link: https://lore.kernel.org/all/20250702-add_tain-v1-1-9187b10914b9@debian.org/ [1]
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> > ---
> > Changes in v2:
> > - Also taint the kernel on Synchronous External Abort panics (Will Deacon)
> > - Link to v1: https://lore.kernel.org/r/20250714-vmcore_hw_error-v1-1-8cf45edb6334@debian.org
> 
> I think something went wrong when respinning this patch, because the v1
> link above is incorrect, and should be:
> 
>   https://lore.kernel.org/linux-arm-kernel/20250710-arm_serror-v1-1-2a3def3740d7@debian.org/
> 
> The Cc header for this posting matches that of the unrelated patch (and
> excludes Will, Catalin, etc), rather than that of the real v1. The
> change-id trailer also doesn't match v1.
> 
> The actual patch and commit message look fine to me, so:

Sorry about it, it was totally my mess with b4 on two different
machines/branches. I've been testing it on a arm64 hosts that
has no email access. When I picked the patch into the machine with
email, I messed up where to cherry pick and branches.
 
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> 
> I assume that Will or Catalin will be happy to pick this up. I've added
> those missing folk to this reply, so I don't imagine this should need a
> respin.

Thanks. I will not respin then (unless requested).

Sorry for the mess,
--breno
Re: [PATCH v2] arm64: Mark kernel as tainted on SAE and SError panic
Posted by Will Deacon 2 months, 3 weeks ago
On Wed, Jul 16, 2025 at 03:52:55AM -0700, Breno Leitao wrote:
> On Wed, Jul 16, 2025 at 11:19:38AM +0100, Mark Rutland wrote:
> > On Wed, Jul 16, 2025 at 02:42:01AM -0700, Breno Leitao wrote:
> > > Set TAINT_MACHINE_CHECK when SError or Synchronous External Abort (SEA)
> > > interrupts trigger a panic to flag potential hardware faults. This
> > > tainting mechanism aids in debugging and enables correlation of
> > > hardware-related crashes in large-scale deployments.
> > > 
> > > This change aligns with similar patches[1] that mark machine check
> > > events when the system crashes due to hardware errors.
> > > 
> > > Link: https://lore.kernel.org/all/20250702-add_tain-v1-1-9187b10914b9@debian.org/ [1]
> > > Signed-off-by: Breno Leitao <leitao@debian.org>
> > > ---
> > > Changes in v2:
> > > - Also taint the kernel on Synchronous External Abort panics (Will Deacon)
> > > - Link to v1: https://lore.kernel.org/r/20250714-vmcore_hw_error-v1-1-8cf45edb6334@debian.org
> > 
> > I think something went wrong when respinning this patch, because the v1
> > link above is incorrect, and should be:
> > 
> >   https://lore.kernel.org/linux-arm-kernel/20250710-arm_serror-v1-1-2a3def3740d7@debian.org/
> > 
> > The Cc header for this posting matches that of the unrelated patch (and
> > excludes Will, Catalin, etc), rather than that of the real v1. The
> > change-id trailer also doesn't match v1.
> > 
> > The actual patch and commit message look fine to me, so:
> 
> Sorry about it, it was totally my mess with b4 on two different
> machines/branches. I've been testing it on a arm64 hosts that
> has no email access. When I picked the patch into the machine with
> email, I messed up where to cherry pick and branches.
>  
> > Acked-by: Mark Rutland <mark.rutland@arm.com>
> > 
> > I assume that Will or Catalin will be happy to pick this up. I've added
> > those missing folk to this reply, so I don't imagine this should need a
> > respin.
> 
> Thanks. I will not respin then (unless requested).
> 
> Sorry for the mess,

No probs, I'll figure it out!

Will