[PATCH] x86/CPU/AMD: Prevent reset reasons from being retained among boots

Rong Zhang posted 1 patch 2 weeks, 5 days ago
Documentation/arch/x86/amd-debugging.rst |  3 +++
arch/x86/kernel/cpu/amd.c                | 25 +++++++++++++++++++++---
2 files changed, 25 insertions(+), 3 deletions(-)
[PATCH] x86/CPU/AMD: Prevent reset reasons from being retained among boots
Posted by Rong Zhang 2 weeks, 5 days ago
The S5_RESET_STATUS register is parsed on boot and printed to kmsg.
However, this could sometimes be misleading and lead to users wasting a
lot of time on meaningless debugging for two reasons:

* Some bits are never cleared by hardware. It's the software's
responsibility to clear them as per the Processor Programming Reference
(see Link:).

* Some rare hardware-initiated platform resets do not update the
register at all.

In both cases, a previous reboot could leave its trace in the register,
resulting in users seeing unrelated reboot reasons while debugging
random reboots afterward.

Clearing all reason bits solves the issue. Since all reason bits are
write-1-to-clear and we must preserve all other bits, this is done by
writing the read value back to the register.

A debug message with the cleared register value is also printed to help
distinguish between non-reason bits set and present reserved values
defined in the future.

Fixes: ab8131028710 ("x86/CPU/AMD: Print the reason for the last reset")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537#attach_303991
Signed-off-by: Rong Zhang <i@rong.moe>
---
 Documentation/arch/x86/amd-debugging.rst |  3 +++
 arch/x86/kernel/cpu/amd.c                | 25 +++++++++++++++++++++---
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/Documentation/arch/x86/amd-debugging.rst b/Documentation/arch/x86/amd-debugging.rst
index d92bf59d62c77..4723ad32ddcd3 100644
--- a/Documentation/arch/x86/amd-debugging.rst
+++ b/Documentation/arch/x86/amd-debugging.rst
@@ -366,3 +366,6 @@ There are 6 classes of reasons for the reboot:
 This information is read by the kernel at bootup and printed into
 the syslog. When a random reboot occurs this message can be helpful
 to determine the next component to debug.
+
+To prevent unrelated reboot reasons from being retained among boots,
+the kernel clears all reason bits once reading the register.
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 5398db4dedb4a..c2e3925eb6855 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -1344,7 +1344,7 @@ static const char * const s5_reset_reason_txt[] = {
 static __init int print_s5_reset_status_mmio(void)
 {
 	void __iomem *addr;
-	u32 value;
+	u32 value, cleared_value;
 	int i;
 
 	if (!cpu_feature_enabled(X86_FEATURE_ZEN))
@@ -1355,11 +1355,25 @@ static __init int print_s5_reset_status_mmio(void)
 		return 0;
 
 	value = ioread32(addr);
-	iounmap(addr);
 
 	/* Value with "all bits set" is an error response and should be ignored. */
-	if (value == U32_MAX)
+	if (value == U32_MAX) {
+		iounmap(addr);
 		return 0;
+	}
+
+	/*
+	 * Clear all reason bits so they won't be retained if the next reset
+	 * does not update the register. Besides, some bits are never cleared by
+	 * hardware so it's software's responsibility to clear them.
+	 *
+	 * Writing the value back effectively clears all reason bits as they are
+	 * write-1-to-clear.
+	 */
+	iowrite32(value, addr);
+	cleared_value = ioread32(addr);
+
+	iounmap(addr);
 
 	for (i = 0; i < ARRAY_SIZE(s5_reset_reason_txt); i++) {
 		if (!(value & BIT(i)))
@@ -1371,6 +1385,11 @@ static __init int print_s5_reset_status_mmio(void)
 		}
 	}
 
+	if (cleared_value != value) {
+		pr_debug("x86/amd: Cleared system reset reasons [0x%08x => 0x%08x]\n",
+			 value, cleared_value);
+	}
+
 	return 0;
 }
 late_initcall(print_s5_reset_status_mmio);

base-commit: f7a6ef198ded30b63810efdc923b919606ea65c8
-- 
2.51.0
Re: [PATCH] x86/CPU/AMD: Prevent reset reasons from being retained among boots
Posted by Borislav Petkov 2 weeks, 2 days ago
On Sat, Sep 13, 2025 at 10:42:45PM +0800, Rong Zhang wrote:
> The S5_RESET_STATUS register is parsed on boot and printed to kmsg.
> However, this could sometimes be misleading and lead to users wasting a
> lot of time on meaningless debugging for two reasons:
> 
> * Some bits are never cleared by hardware. It's the software's
> responsibility to clear them as per the Processor Programming Reference
> (see Link:).
> 
> * Some rare hardware-initiated platform resets do not update the
> register at all.
> 
> In both cases, a previous reboot could leave its trace in the register,
> resulting in users seeing unrelated reboot reasons while debugging
> random reboots afterward.

Just a heads-up: we're figuring out internally what the right thing to do here
would be.

Stay tuned.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH] x86/CPU/AMD: Prevent reset reasons from being retained among boots
Posted by Mario Limonciello (AMD) (kernel.org) 3 days, 11 hours ago

On 9/16/2025 9:02 AM, Borislav Petkov wrote:
> On Sat, Sep 13, 2025 at 10:42:45PM +0800, Rong Zhang wrote:
>> The S5_RESET_STATUS register is parsed on boot and printed to kmsg.
>> However, this could sometimes be misleading and lead to users wasting a
>> lot of time on meaningless debugging for two reasons:
>>
>> * Some bits are never cleared by hardware. It's the software's
>> responsibility to clear them as per the Processor Programming Reference
>> (see Link:).
>>
>> * Some rare hardware-initiated platform resets do not update the
>> register at all.
>>
>> In both cases, a previous reboot could leave its trace in the register,
>> resulting in users seeing unrelated reboot reasons while debugging
>> random reboots afterward.
> 
> Just a heads-up: we're figuring out internally what the right thing to do here
> would be.
> 
> Stay tuned.
> 
> Thx.
> 

The internal conversation points in the direction of your patch makes sense.

But I don't really see a lot of value in re-reading and printing a debug 
message about what was cleared and what's still there.  Do you see a 
reason to keep that around?
Re: [PATCH] x86/CPU/AMD: Prevent reset reasons from being retained among boots
Posted by Rong Zhang 2 days, 5 hours ago
Hi Mario,

On Sun, 2025-09-28 at 23:01 -0500, Mario Limonciello (AMD) (kernel.org)
wrote:
> 
> On 9/16/2025 9:02 AM, Borislav Petkov wrote:
> > On Sat, Sep 13, 2025 at 10:42:45PM +0800, Rong Zhang wrote:
> > > The S5_RESET_STATUS register is parsed on boot and printed to kmsg.
> > > However, this could sometimes be misleading and lead to users wasting a
> > > lot of time on meaningless debugging for two reasons:
> > > 
> > > * Some bits are never cleared by hardware. It's the software's
> > > responsibility to clear them as per the Processor Programming Reference
> > > (see Link:).
> > > 
> > > * Some rare hardware-initiated platform resets do not update the
> > > register at all.
> > > 
> > > In both cases, a previous reboot could leave its trace in the register,
> > > resulting in users seeing unrelated reboot reasons while debugging
> > > random reboots afterward.
> > 
> > Just a heads-up: we're figuring out internally what the right thing to do here
> > would be.
> > 
> > Stay tuned.
> > 
> > Thx.
> > 
> 
> The internal conversation points in the direction of your patch makes sense.

Thanks for your effort!

> But I don't really see a lot of value in re-reading and printing a debug 
> message about what was cleared and what's still there.  Do you see a 
> reason to keep that around?

In order that users don't need an up-to-date documentation, or even
don't need a documentation, to distinguish between reason bits and non-
reason ones.

Let's consider two examples.

(a)

   Previous system reset reason [0x08000800]: an uncorrected error...
                                    ^   ^

A user may feel confused: Two bits are set, but only one reason is
reported. Hmm... Is there a hidden failure?

Unless the user has read the PPR, it's hard to realize BIT(11) is
already set in the reset value. The debug message is here to help:

   Cleared system reset reasons [0x08000800 => 0x00000800]
                                    ^   ^         ^   ^

Now the user realizes that BIT(11) has nothing to do with reboot
reasons.

This was literally the confusion I experienced. I had to take some time
looking for an appropriate public PPR and reading the PPR before
realizing this fact.

(b)

Suppose BIT(7) is defined in the future.

   (nothing is printed to kmsg)

The debug message is here to help:

   Cleared system reset reasons [0x00000880 => 0x00000800]
                                        ^^            ^^

Now the user realizes that BIT(7) represents a new reboot reason.

Thanks,
Rong