arm64: traps: Mark kernel as tainted on SError panic

[PATCH] arm64: traps: Mark kernel as tainted on SError panic

Posted by Breno Leitao 7 months ago

Set TAINT_MACHINE_CHECK when SError interrupts trigger a panic to
flag potential hardware faults. This tainting mechanism aids in
debugging and enables correlation of hardware-related crashes in
large-scale deployments.

This change aligns with similar patches[1] that mark machine check
events when the system crashes due to hardware errors.

Link: https://lore.kernel.org/all/20250702-add_tain-v1-1-9187b10914b9@debian.org/ [1]
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 arch/arm64/kernel/traps.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 9bfa5c944379d..7468b22585cef 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -931,6 +931,7 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne
 
 void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr)
 {
+	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
 	console_verbose();
 
 	pr_crit("SError Interrupt on CPU%d, code 0x%016lx -- %s\n",

---
base-commit: 8c2e52ebbe885c7eeaabd3b7ddcdc1246fc400d2
change-id: 20250710-arm_serror-77fca8d732d4

Best regards,
--  
Breno Leitao <leitao@debian.org>

Re: [PATCH] arm64: traps: Mark kernel as tainted on SError panic

Posted by Will Deacon 6 months, 4 weeks ago

On Thu, Jul 10, 2025 at 03:46:35AM -0700, Breno Leitao wrote:
> Set TAINT_MACHINE_CHECK when SError interrupts trigger a panic to
> flag potential hardware faults. This tainting mechanism aids in
> debugging and enables correlation of hardware-related crashes in
> large-scale deployments.
> 
> This change aligns with similar patches[1] that mark machine check
> events when the system crashes due to hardware errors.
> 
> Link: https://lore.kernel.org/all/20250702-add_tain-v1-1-9187b10914b9@debian.org/ [1]
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>  arch/arm64/kernel/traps.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index 9bfa5c944379d..7468b22585cef 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -931,6 +931,7 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne
>  
>  void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr)
>  {
> +	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
>  	console_verbose();
>  
>  	pr_crit("SError Interrupt on CPU%d, code 0x%016lx -- %s\n",

If we're going to taint for SError, shouldn't we also taint for an
unclaimed SEA?

Will

Re: [PATCH] arm64: traps: Mark kernel as tainted on SError panic

Posted by Breno Leitao 6 months, 4 weeks ago

Hello Will,

On Sun, Jul 13, 2025 at 11:46:06PM +0100, Will Deacon wrote:
> On Thu, Jul 10, 2025 at 03:46:35AM -0700, Breno Leitao wrote:

> > --- a/arch/arm64/kernel/traps.c
> > +++ b/arch/arm64/kernel/traps.c
> > @@ -931,6 +931,7 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne
> >  
> >  void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr)
> >  {
> > +	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
> >  	console_verbose();
> >  
> >  	pr_crit("SError Interrupt on CPU%d, code 0x%016lx -- %s\n",
> 
> If we're going to taint for SError, shouldn't we also taint for an
> unclaimed SEA?

Yes. I was not very familiar with SEA errors, given I haven't seen on in
production yet, but, reading about it, that is another seems to crash
the system due to hardware errors, thus, we want to taint MACHINE_CHECK.

What about this?

	Author: Breno Leitao <leitao@debian.org>
	Date:   Mon Jul 14 05:16:55 2025 -0700

	arm64: Taint kernel on fatal hardware error in do_sea()

	This patch updates the do_sea() handler to taint the kernel with
	TAINT_MACHINE_CHECK when a fatal hardware error is detected and
	reported through Synchronous External Abort (SEA). By marking
	the kernel as tainted at the point of error, we improve
	post-mortem diagnostics and make it clear that a machine check
	or unrecoverable hardware fault has occurred.

	Suggested-by: Will Deacon <will@kernel.org>
	Signed-off-by: Breno Leitao <leitao@debian.org>

	diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
	index 11eb8d1adc84..f590dc71ce99 100644
	--- a/arch/arm64/mm/fault.c
	+++ b/arch/arm64/mm/fault.c
	@@ -838,6 +838,7 @@ static int do_sea(unsigned long far, unsigned long esr, struct pt_regs *regs)
			*/
			siaddr  = untagged_addr(far);
		}
	+	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
		arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);

		return 0;

Thanks for the suggestion,
--breno

Re: [PATCH] arm64: traps: Mark kernel as tainted on SError panic

Posted by Will Deacon 6 months, 4 weeks ago

On Mon, Jul 14, 2025 at 05:26:43AM -0700, Breno Leitao wrote:
> On Sun, Jul 13, 2025 at 11:46:06PM +0100, Will Deacon wrote:
> > On Thu, Jul 10, 2025 at 03:46:35AM -0700, Breno Leitao wrote:
> 
> > > --- a/arch/arm64/kernel/traps.c
> > > +++ b/arch/arm64/kernel/traps.c
> > > @@ -931,6 +931,7 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne
> > >  
> > >  void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr)
> > >  {
> > > +	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
> > >  	console_verbose();
> > >  
> > >  	pr_crit("SError Interrupt on CPU%d, code 0x%016lx -- %s\n",
> > 
> > If we're going to taint for SError, shouldn't we also taint for an
> > unclaimed SEA?
> 
> Yes. I was not very familiar with SEA errors, given I haven't seen on in
> production yet, but, reading about it, that is another seems to crash
> the system due to hardware errors, thus, we want to taint MACHINE_CHECK.
> 
> What about this?
> 
> 	Author: Breno Leitao <leitao@debian.org>
> 	Date:   Mon Jul 14 05:16:55 2025 -0700
> 
> 	arm64: Taint kernel on fatal hardware error in do_sea()
> 
> 	This patch updates the do_sea() handler to taint the kernel with
> 	TAINT_MACHINE_CHECK when a fatal hardware error is detected and
> 	reported through Synchronous External Abort (SEA). By marking
> 	the kernel as tainted at the point of error, we improve
> 	post-mortem diagnostics and make it clear that a machine check
> 	or unrecoverable hardware fault has occurred.
> 
> 	Suggested-by: Will Deacon <will@kernel.org>
> 	Signed-off-by: Breno Leitao <leitao@debian.org>
> 
> 	diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> 	index 11eb8d1adc84..f590dc71ce99 100644
> 	--- a/arch/arm64/mm/fault.c
> 	+++ b/arch/arm64/mm/fault.c
> 	@@ -838,6 +838,7 @@ static int do_sea(unsigned long far, unsigned long esr, struct pt_regs *regs)
> 			*/
> 			siaddr  = untagged_addr(far);
> 		}
> 	+	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
> 		arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);
> 
> 		return 0;

Yeah, I reckon so. Probably just fold these into a single patch, though.

Cheers,

Will

Re: [PATCH] arm64: traps: Mark kernel as tainted on SError panic

Posted by Breno Leitao 6 months, 3 weeks ago

On Tue, Jul 15, 2025 at 03:02:13PM +0100, Will Deacon wrote:
> On Mon, Jul 14, 2025 at 05:26:43AM -0700, Breno Leitao wrote:
> > On Sun, Jul 13, 2025 at 11:46:06PM +0100, Will Deacon wrote:
> > > On Thu, Jul 10, 2025 at 03:46:35AM -0700, Breno Leitao wrote:
> > 
> > > > --- a/arch/arm64/kernel/traps.c
> > > > +++ b/arch/arm64/kernel/traps.c
> > > > @@ -931,6 +931,7 @@ void __noreturn panic_bad_stack(struct pt_regs *regs, unsigned long esr, unsigne
> > > >  
> > > >  void __noreturn arm64_serror_panic(struct pt_regs *regs, unsigned long esr)
> > > >  {
> > > > +	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
> > > >  	console_verbose();
> > > >  
> > > >  	pr_crit("SError Interrupt on CPU%d, code 0x%016lx -- %s\n",
> > > 
> > > If we're going to taint for SError, shouldn't we also taint for an
> > > unclaimed SEA?
> > 
> > Yes. I was not very familiar with SEA errors, given I haven't seen on in
> > production yet, but, reading about it, that is another seems to crash
> > the system due to hardware errors, thus, we want to taint MACHINE_CHECK.
> > 
> > What about this?
> > 
> > 	Author: Breno Leitao <leitao@debian.org>
> > 	Date:   Mon Jul 14 05:16:55 2025 -0700
> > 
> > 	arm64: Taint kernel on fatal hardware error in do_sea()
> > 
> > 	This patch updates the do_sea() handler to taint the kernel with
> > 	TAINT_MACHINE_CHECK when a fatal hardware error is detected and
> > 	reported through Synchronous External Abort (SEA). By marking
> > 	the kernel as tainted at the point of error, we improve
> > 	post-mortem diagnostics and make it clear that a machine check
> > 	or unrecoverable hardware fault has occurred.
> > 
> > 	Suggested-by: Will Deacon <will@kernel.org>
> > 	Signed-off-by: Breno Leitao <leitao@debian.org>
> > 
> > 	diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> > 	index 11eb8d1adc84..f590dc71ce99 100644
> > 	--- a/arch/arm64/mm/fault.c
> > 	+++ b/arch/arm64/mm/fault.c
> > 	@@ -838,6 +838,7 @@ static int do_sea(unsigned long far, unsigned long esr, struct pt_regs *regs)
> > 			*/
> > 			siaddr  = untagged_addr(far);
> > 		}
> > 	+	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
> > 		arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);
> > 
> > 		return 0;
> 
> Yeah, I reckon so. Probably just fold these into a single patch, though.

Thanks. I test it better tomorrow, then send it.

Thanks for the suggestions,
--breno