[PATCH RESEND V2 1/2] x86/mce: Fix missing address mask in recovery for errors in TDX/SEAM non-root mode

Adrian Hunter posted 2 patches 1 month, 2 weeks ago
[PATCH RESEND V2 1/2] x86/mce: Fix missing address mask in recovery for errors in TDX/SEAM non-root mode
Posted by Adrian Hunter 1 month, 2 weeks ago
Commit 8a01ec97dc066 ("x86/mce: Mask out non-address bits from machine
check bank") introduced a new #define MCI_ADDR_PHYSADDR for the mask of
valid physical address bits within the machine check bank address register.

This is particularly needed in the case of errors in TDX/SEAM non-root mode
because the reported address contains the TDX KeyID.  Refer to TDX and
TME-MK documentation for more information about KeyIDs.

Commit 7911f145de5fe ("x86/mce: Implement recovery for errors in TDX/SEAM
non-root mode") uses the address to mark the affected page as poisoned, but
omits to use the aforementioned mask.

Investigation of user space expectations has concluded it would be more
correct for the address to contain only address bits in the first place.
Refer https://lore.kernel.org/r/807ff02d-7af0-419d-8d14-a4d6c5d5420d@intel.com

Mask the address when it is read from the machine check bank address
register.  Do not use MCI_ADDR_PHYSADDR because that will be removed in a
later patch.

It is assumed __log_error() in arch/x86/kernel/cpu/mce/amd.c does not need
similar treatment.

Amend struct mce addr member description slightly to reflect that it is
not, and never has been, an exact copy of the bank's MCi_ADDR MSR.

Fixes: 8a01ec97dc066 ("x86/mce: Mask out non-address bits from machine check bank")
Fixes: 7911f145de5fe ("x86/mce: Implement recovery for errors in TDX/SEAM non-root mode")
Link: https://lore.kernel.org/r/807ff02d-7af0-419d-8d14-a4d6c5d5420d@intel.com
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---


Changes in V2:

      Mask address when it is read
      Amend struct mce addr description


 arch/x86/include/uapi/asm/mce.h | 2 +-
 arch/x86/kernel/cpu/mce/core.c  | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/mce.h b/arch/x86/include/uapi/asm/mce.h
index cb6b48a7c22b..abf6ee43f5f8 100644
--- a/arch/x86/include/uapi/asm/mce.h
+++ b/arch/x86/include/uapi/asm/mce.h
@@ -14,7 +14,7 @@
 struct mce {
 	__u64 status;		/* Bank's MCi_STATUS MSR */
 	__u64 misc;		/* Bank's MCi_MISC MSR */
-	__u64 addr;		/* Bank's MCi_ADDR MSR */
+	__u64 addr;		/* Address from bank's MCi_ADDR MSR */
 	__u64 mcgstatus;	/* Machine Check Global Status MSR */
 	__u64 ip;		/* Instruction Pointer when the error happened */
 	__u64 tsc;		/* CPU time stamp counter */
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 4da4eab56c81..deb47463a75d 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -699,6 +699,9 @@ static noinstr void mce_read_aux(struct mce_hw_err *err, int i)
 		}
 
 		smca_extract_err_addr(m);
+
+		/* Mask out non-address bits, such as TDX KeyID */
+		m->addr &= GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0);
 	}
 
 	if (mce_flags.smca) {
-- 
2.48.1
Re: [PATCH RESEND V2 1/2] x86/mce: Fix missing address mask in recovery for errors in TDX/SEAM non-root mode
Posted by Borislav Petkov 1 month, 2 weeks ago
On Tue, Aug 19, 2025 at 07:24:34PM +0300, Adrian Hunter wrote:
> Commit 8a01ec97dc066 ("x86/mce: Mask out non-address bits from machine
> check bank") introduced a new #define MCI_ADDR_PHYSADDR for the mask of
> valid physical address bits within the machine check bank address register.
> 
> This is particularly needed in the case of errors in TDX/SEAM non-root mode
> because the reported address contains the TDX KeyID.  Refer to TDX and
> TME-MK documentation for more information about KeyIDs.
> 
> Commit 7911f145de5fe ("x86/mce: Implement recovery for errors in TDX/SEAM
> non-root mode") uses the address to mark the affected page as poisoned, but
> omits to use the aforementioned mask.
> 
> Investigation of user space expectations has concluded it would be more
> correct for the address to contain only address bits in the first place.
> Refer https://lore.kernel.org/r/807ff02d-7af0-419d-8d14-a4d6c5d5420d@intel.com
> 
> Mask the address when it is read from the machine check bank address
> register.  Do not use MCI_ADDR_PHYSADDR because that will be removed in a
> later patch.

Why is this patch talking about TDX-something but doing "global" changes to
mce.addr?

Why don't you simply do a TDX-specific masking out when you're running on
in TDX env and leave the rest as is?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH RESEND V2 1/2] x86/mce: Fix missing address mask in recovery for errors in TDX/SEAM non-root mode
Posted by Adrian Hunter 1 month, 1 week ago
On 20/08/2025 00:32, Borislav Petkov wrote:
> On Tue, Aug 19, 2025 at 07:24:34PM +0300, Adrian Hunter wrote:
>> Commit 8a01ec97dc066 ("x86/mce: Mask out non-address bits from machine
>> check bank") introduced a new #define MCI_ADDR_PHYSADDR for the mask of
>> valid physical address bits within the machine check bank address register.
>>
>> This is particularly needed in the case of errors in TDX/SEAM non-root mode
>> because the reported address contains the TDX KeyID.  Refer to TDX and
>> TME-MK documentation for more information about KeyIDs.
>>
>> Commit 7911f145de5fe ("x86/mce: Implement recovery for errors in TDX/SEAM
>> non-root mode") uses the address to mark the affected page as poisoned, but
>> omits to use the aforementioned mask.
>>
>> Investigation of user space expectations has concluded it would be more
>> correct for the address to contain only address bits in the first place.
>> Refer https://lore.kernel.org/r/807ff02d-7af0-419d-8d14-a4d6c5d5420d@intel.com
>>
>> Mask the address when it is read from the machine check bank address
>> register.  Do not use MCI_ADDR_PHYSADDR because that will be removed in a
>> later patch.
> 
> Why is this patch talking about TDX-something but doing "global" changes to
> mce.addr?
> 
> Why don't you simply do a TDX-specific masking out when you're running on
> in TDX env and leave the rest as is?
> 

How about this?

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 4da4eab56c81..3963d4cd8fc1 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -699,6 +699,8 @@ static noinstr void mce_read_aux(struct mce_hw_err *err, int i)
 		}
 
 		smca_extract_err_addr(m);
+
+		tdx_extract_err_addr(m);
 	}
 
 	if (mce_flags.smca) {
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index b5ba598e54cb..fcf0b84a7c98 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -298,6 +298,16 @@ static inline bool amd_mce_usable_address(struct mce *m) { return false; }
 static inline void smca_extract_err_addr(struct mce *m) { }
 #endif
 
+#ifdef CONFIG_X86_MCE_INTEL
+static __always_inline void tdx_extract_err_addr(struct mce *m)
+{
+	if (boot_cpu_has(X86_FEATURE_TDX_HOST_PLATFORM))
+		m->addr &= GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0);
+}
+#else
+static inline void tdx_extract_err_addr(struct mce *m) { }
+#endif
+
 #ifdef CONFIG_X86_ANCIENT_MCE
 void intel_p5_mcheck_init(struct cpuinfo_x86 *c);
 void winchip_mcheck_init(struct cpuinfo_x86 *c);
Re: [PATCH RESEND V2 1/2] x86/mce: Fix missing address mask in recovery for errors in TDX/SEAM non-root mode
Posted by Borislav Petkov 1 month, 1 week ago
On Wed, Aug 27, 2025 at 11:22:07AM +0300, Adrian Hunter wrote:
> +#ifdef CONFIG_X86_MCE_INTEL
> +static __always_inline void tdx_extract_err_addr(struct mce *m)
> +{
> +	if (boot_cpu_has(X86_FEATURE_TDX_HOST_PLATFORM))
> +		m->addr &= GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0);

Right, you can stick that thing straight into mce_read_aux() since it is
simple enough and drop the ifdeffery and use cpu_feature_enabled():

mce_read_aux:

	...

	/* Remove TDX KeyID from the address */
	if (cpu_feature_enabled(X86_FEATURE_TDX_HOST_PLATFORM))
		m->addr &= GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0);

Something like that...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH RESEND V2 1/2] x86/mce: Fix missing address mask in recovery for errors in TDX/SEAM non-root mode
Posted by Adrian Hunter 1 month, 1 week ago
On 20/08/2025 00:32, Borislav Petkov wrote:
> On Tue, Aug 19, 2025 at 07:24:34PM +0300, Adrian Hunter wrote:
>> Commit 8a01ec97dc066 ("x86/mce: Mask out non-address bits from machine
>> check bank") introduced a new #define MCI_ADDR_PHYSADDR for the mask of
>> valid physical address bits within the machine check bank address register.
>>
>> This is particularly needed in the case of errors in TDX/SEAM non-root mode
>> because the reported address contains the TDX KeyID.  Refer to TDX and
>> TME-MK documentation for more information about KeyIDs.
>>
>> Commit 7911f145de5fe ("x86/mce: Implement recovery for errors in TDX/SEAM
>> non-root mode") uses the address to mark the affected page as poisoned, but
>> omits to use the aforementioned mask.
>>
>> Investigation of user space expectations has concluded it would be more
>> correct for the address to contain only address bits in the first place.
>> Refer https://lore.kernel.org/r/807ff02d-7af0-419d-8d14-a4d6c5d5420d@intel.com
>>
>> Mask the address when it is read from the machine check bank address
>> register.  Do not use MCI_ADDR_PHYSADDR because that will be removed in a
>> later patch.
> 
> Why is this patch talking about TDX-something but doing "global" changes to
> mce.addr?

It falls a bit into the category of: easier to maintain a
global way of doing things than have lots of special-cases.

> 
> Why don't you simply do a TDX-specific masking out when you're running on
> in TDX env and leave the rest as is?
> 

It was kinda like that in V1:

	https://lore.kernel.org/r/20250618120806.113884-2-adrian.hunter@intel.com/

where the code change was dealing with SEAM_NR in the block starting:

	} else if (m->mcgstatus & MCG_STATUS_SEAM_NR) {

Then Dave asked about changing addr itself:

	https://lore.kernel.org/all/487c5e63-07d3-41ad-bfc0-bda14b3c435e@intel.com/
	https://lore.kernel.org/all/79eca29a-8ba4-4ad9-b2e0-54d8e668f731@intel.com/

And it seems like user space does expect addr to be a physical address:

	https://lore.kernel.org/r/807ff02d-7af0-419d-8d14-a4d6c5d5420d@intel.com

Something like below would work, but doesn't answer Dave's question
of why not do it in mce_read_aux()

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 4da4eab56c81..53c7ea3d0464 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1655,28 +1655,30 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	} else if (m->mcgstatus & MCG_STATUS_SEAM_NR) {
 		/*
 		 * Saved RIP on stack makes it look like the machine check
 		 * was taken in the kernel on the instruction following
 		 * the entry to SEAM mode. But MCG_STATUS_SEAM_NR indicates
 		 * that the machine check was taken inside SEAM non-root
 		 * mode.  CPU core has already marked that guest as dead.
 		 * It is OK for the kernel to resume execution at the
 		 * apparent point of the machine check as the fault did
 		 * not occur there. Mark the page as poisoned so it won't
 		 * be added to free list when the guest is terminated.
 		 */
 		if (mce_usable_address(m)) {
-			struct page *p = pfn_to_online_page(m->addr >> PAGE_SHIFT);
+			struct page *p;
 
+			m->addr &= MCI_ADDR_PHYSADDR;
+			p = pfn_to_online_page(m->addr >> PAGE_SHIFT);
 			if (p)
 				SetPageHWPoison(p);
 		}
 	} else {
Re: [PATCH RESEND V2 1/2] x86/mce: Fix missing address mask in recovery for errors in TDX/SEAM non-root mode
Posted by Borislav Petkov 1 month, 1 week ago
On Thu, Aug 21, 2025 at 10:24:22AM +0300, Adrian Hunter wrote:
> Something like below would work, but doesn't answer Dave's question
> of why not do it in mce_read_aux()

So, let me see what I understand from all this bla: you want to zap the KeyID
from mci_addr because it is completely useless there. So zap it.

You can't make any other changes to mci_addr because that goes to luserspace.

So far so good.

Now, all that other bla leads me to believe that there might be some need to
dump the raw mci_addr value after all.

If so, your patch is not needed.

Which makes me think, all yall folks need to make up your mind here.

And you need to get rid of all that extraneous information in your commit
message:

"Investigation of user space expectations has concluded it..."

No investigation needed - this is exported to userspace so you can't touch it.

The one and only question you need to answer is, do you really need KeyID in
it or not. And whatever you do, once you do it, we're stuck with it because it
goes out to userspace.

Especially if you want this backported to stable.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH RESEND V2 1/2] x86/mce: Fix missing address mask in recovery for errors in TDX/SEAM non-root mode
Posted by Adrian Hunter 1 month, 1 week ago
On 21/08/2025 16:25, Borislav Petkov wrote:
> On Thu, Aug 21, 2025 at 10:24:22AM +0300, Adrian Hunter wrote:
>> Something like below would work, but doesn't answer Dave's question
>> of why not do it in mce_read_aux()

Thanks for looking at this.

> So, let me see what I understand from all this bla: you want to zap the KeyID
> from mci_addr because it is completely useless there. So zap it.

Not exactly.  I just want to fix the bug whereby the mce handler fails
to mark the affected page as poisoned because it does not remove the KeyID
from the address before looking-up the page.

> You can't make any other changes to mci_addr because that goes to luserspace.
> 
> So far so good.
> 
> Now, all that other bla leads me to believe that there might be some need to
> dump the raw mci_addr value after all.
> 
> If so, your patch is not needed.
> 
> Which makes me think, all yall folks need to make up your mind here.
> 
> And you need to get rid of all that extraneous information in your commit
> message:
> 
> "Investigation of user space expectations has concluded it..."
> 
> No investigation needed - this is exported to userspace so you can't touch it.

It is a bit of a grey area.  No one expects to find non-address bits in
struct mce addr, and the kernel already strips low-order bits, and in the
case of SMCA, high-order bits too, refer smca_extract_err_addr().

> The one and only question you need to answer is, do you really need KeyID in
> it or not. And whatever you do, once you do it, we're stuck with it because it
> goes out to userspace.

No, the KeyID is almost useless.  It is just a dynamically allocated
number. It might indicate which TDX VM encountered the error, except the
MCE is fatal to the VM, so a fatal error is reported to the VMM anyway.

However, it is allowed to extend struct mce, so adding KeyID or
raw MCI ADDR later is quite possible.

> Especially if you want this backported to stable.

The bug exists in stable.  Ideally it would get fixed there too somehow.
Re: [PATCH RESEND V2 1/2] x86/mce: Fix missing address mask in recovery for errors in TDX/SEAM non-root mode
Posted by Borislav Petkov 1 month, 1 week ago
On Fri, Aug 22, 2025 at 10:57:36AM +0300, Adrian Hunter wrote:
> Not exactly.  I just want to fix the bug whereby the mce handler fails
> to mark the affected page as poisoned because it does not remove the KeyID
> from the address before looking-up the page.

Lemme ask this differently then: are you ever going to need KeyID in mci_addr?

> No one expects to find non-address bits in struct mce addr,

You're preaching to the choir - I don't know whose idea it was to shove
a key ID in an address value... it sure sounds silly.

> However, it is allowed to extend struct mce, so adding KeyID or raw MCI ADDR
> later is quite possible.

Why would you want to do that? Do you have a use case?

If not, you can drop that whole angle about adding KeyID later. If yes, let's
hear it.

Just this hypothetically, maybe we'd need it, maybe not, it might be a good
idea ... bla is muddying the water unnecessarily. So let's focus pls and
address *only* the issue(s) at hand.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH RESEND V2 1/2] x86/mce: Fix missing address mask in recovery for errors in TDX/SEAM non-root mode
Posted by Adrian Hunter 1 month, 1 week ago
On 22/08/2025 16:54, Borislav Petkov wrote:
> On Fri, Aug 22, 2025 at 10:57:36AM +0300, Adrian Hunter wrote:
>> Not exactly.  I just want to fix the bug whereby the mce handler fails
>> to mark the affected page as poisoned because it does not remove the KeyID
>> from the address before looking-up the page.
> 
> Lemme ask this differently then: are you ever going to need KeyID in mci_addr?

No

> 
>> No one expects to find non-address bits in struct mce addr,
> 
> You're preaching to the choir - I don't know whose idea it was to shove
> a key ID in an address value... it sure sounds silly.
> 
>> However, it is allowed to extend struct mce, so adding KeyID or raw MCI ADDR
>> later is quite possible.
> 
> Why would you want to do that? Do you have a use case?
> 
> If not, you can drop that whole angle about adding KeyID later

Droppin' it.
Re: [PATCH RESEND V2 1/2] x86/mce: Fix missing address mask in recovery for errors in TDX/SEAM non-root mode
Posted by Yazen Ghannam 1 month, 2 weeks ago
On Tue, Aug 19, 2025 at 07:24:34PM +0300, Adrian Hunter wrote:
> Commit 8a01ec97dc066 ("x86/mce: Mask out non-address bits from machine
> check bank") introduced a new #define MCI_ADDR_PHYSADDR for the mask of
> valid physical address bits within the machine check bank address register.
> 
> This is particularly needed in the case of errors in TDX/SEAM non-root mode
> because the reported address contains the TDX KeyID.  Refer to TDX and
> TME-MK documentation for more information about KeyIDs.
> 
> Commit 7911f145de5fe ("x86/mce: Implement recovery for errors in TDX/SEAM
> non-root mode") uses the address to mark the affected page as poisoned, but
> omits to use the aforementioned mask.
> 
> Investigation of user space expectations has concluded it would be more
> correct for the address to contain only address bits in the first place.
> Refer https://lore.kernel.org/r/807ff02d-7af0-419d-8d14-a4d6c5d5420d@intel.com
> 
> Mask the address when it is read from the machine check bank address
> register.  Do not use MCI_ADDR_PHYSADDR because that will be removed in a
> later patch.
> 
> It is assumed __log_error() in arch/x86/kernel/cpu/mce/amd.c does not need
> similar treatment.
> 
> Amend struct mce addr member description slightly to reflect that it is
> not, and never has been, an exact copy of the bank's MCi_ADDR MSR.
> 

I think it would be more accurate to say that the MCi_ADDR MSR is not,
and never has been, guaranteed to be a system physical address.

We could introduce a new field that represents the system physical
address, if one exists for the error type. This way we can operate on a
value without assumption or additional checks. And we can keep the raw
MCi_ADDR MSR value in case it is of value to debug folks or hardware
designers. In my experience, they seem to appreciate having the full,
unfiltered data. We don't give them that today, but we can work towards
that goal.

I have some old work in this area:
https://github.com/AMDESE/linux/commit/76732c67cbf96c14f55ed1061804db9ff1505ea3

This isn't a quick fix, so maybe we can come back to it if folks are
happy with your current solution.

But I do think there's value in sharing the data as given to us by
hardware. And providing new interfaces to users if we need to modify
something for them to take action.

Thanks,
Yazen
Re: [PATCH RESEND V2 1/2] x86/mce: Fix missing address mask in recovery for errors in TDX/SEAM non-root mode
Posted by Luck, Tony 1 month, 2 weeks ago
On Tue, Aug 19, 2025 at 01:28:46PM -0400, Yazen Ghannam wrote:
> On Tue, Aug 19, 2025 at 07:24:34PM +0300, Adrian Hunter wrote:
> > Commit 8a01ec97dc066 ("x86/mce: Mask out non-address bits from machine
> > check bank") introduced a new #define MCI_ADDR_PHYSADDR for the mask of
> > valid physical address bits within the machine check bank address register.
> > 
> > This is particularly needed in the case of errors in TDX/SEAM non-root mode
> > because the reported address contains the TDX KeyID.  Refer to TDX and
> > TME-MK documentation for more information about KeyIDs.
> > 
> > Commit 7911f145de5fe ("x86/mce: Implement recovery for errors in TDX/SEAM
> > non-root mode") uses the address to mark the affected page as poisoned, but
> > omits to use the aforementioned mask.
> > 
> > Investigation of user space expectations has concluded it would be more
> > correct for the address to contain only address bits in the first place.
> > Refer https://lore.kernel.org/r/807ff02d-7af0-419d-8d14-a4d6c5d5420d@intel.com
> > 
> > Mask the address when it is read from the machine check bank address
> > register.  Do not use MCI_ADDR_PHYSADDR because that will be removed in a
> > later patch.
> > 
> > It is assumed __log_error() in arch/x86/kernel/cpu/mce/amd.c does not need
> > similar treatment.
> > 
> > Amend struct mce addr member description slightly to reflect that it is
> > not, and never has been, an exact copy of the bank's MCi_ADDR MSR.
> > 
> 
> I think it would be more accurate to say that the MCi_ADDR MSR is not,
> and never has been, guaranteed to be a system physical address.
> 
> We could introduce a new field that represents the system physical
> address, if one exists for the error type. This way we can operate on a
> value without assumption or additional checks. And we can keep the raw
> MCi_ADDR MSR value in case it is of value to debug folks or hardware
> designers. In my experience, they seem to appreciate having the full,
> unfiltered data. We don't give them that today, but we can work towards
> that goal.

Having and exact copy of MCi_ADDR might be useful. I recall some angst
about this code masking off low order bits:

		m->addr = mce_rdmsrq(mca_msr_reg(i, MCA_ADDR));

		/*
		 * Mask the reported address by the reported granularity.
		 */
		if (mca_cfg.ser && (m->status & MCI_STATUS_MISCV)) {
			u8 shift = MCI_MISC_ADDR_LSB(m->misc);
			m->addr >>= shift;
			m->addr <<= shift;
		}

this proposal masks some high order bits too.

I second Yazen's suggestion of a new field. One for the raw value,
another for the massaged phsical address derived from the MSR.

-Tony
Re: [PATCH RESEND V2 1/2] x86/mce: Fix missing address mask in recovery for errors in TDX/SEAM non-root mode
Posted by Adrian Hunter 1 month, 2 weeks ago
On 19/08/2025 20:51, Luck, Tony wrote:
> On Tue, Aug 19, 2025 at 01:28:46PM -0400, Yazen Ghannam wrote:
>> On Tue, Aug 19, 2025 at 07:24:34PM +0300, Adrian Hunter wrote:
>>> Commit 8a01ec97dc066 ("x86/mce: Mask out non-address bits from machine
>>> check bank") introduced a new #define MCI_ADDR_PHYSADDR for the mask of
>>> valid physical address bits within the machine check bank address register.
>>>
>>> This is particularly needed in the case of errors in TDX/SEAM non-root mode
>>> because the reported address contains the TDX KeyID.  Refer to TDX and
>>> TME-MK documentation for more information about KeyIDs.
>>>
>>> Commit 7911f145de5fe ("x86/mce: Implement recovery for errors in TDX/SEAM
>>> non-root mode") uses the address to mark the affected page as poisoned, but
>>> omits to use the aforementioned mask.
>>>
>>> Investigation of user space expectations has concluded it would be more
>>> correct for the address to contain only address bits in the first place.
>>> Refer https://lore.kernel.org/r/807ff02d-7af0-419d-8d14-a4d6c5d5420d@intel.com
>>>
>>> Mask the address when it is read from the machine check bank address
>>> register.  Do not use MCI_ADDR_PHYSADDR because that will be removed in a
>>> later patch.
>>>
>>> It is assumed __log_error() in arch/x86/kernel/cpu/mce/amd.c does not need
>>> similar treatment.
>>>
>>> Amend struct mce addr member description slightly to reflect that it is
>>> not, and never has been, an exact copy of the bank's MCi_ADDR MSR.
>>>
>>
>> I think it would be more accurate to say that the MCi_ADDR MSR is not,
>> and never has been, guaranteed to be a system physical address.
>>
>> We could introduce a new field that represents the system physical
>> address, if one exists for the error type. This way we can operate on a
>> value without assumption or additional checks. And we can keep the raw
>> MCi_ADDR MSR value in case it is of value to debug folks or hardware
>> designers. In my experience, they seem to appreciate having the full,
>> unfiltered data. We don't give them that today, but we can work towards
>> that goal.
> 
> Having and exact copy of MCi_ADDR might be useful. I recall some angst
> about this code masking off low order bits:
> 
> 		m->addr = mce_rdmsrq(mca_msr_reg(i, MCA_ADDR));
> 
> 		/*
> 		 * Mask the reported address by the reported granularity.
> 		 */
> 		if (mca_cfg.ser && (m->status & MCI_STATUS_MISCV)) {
> 			u8 shift = MCI_MISC_ADDR_LSB(m->misc);
> 			m->addr >>= shift;
> 			m->addr <<= shift;
> 		}
> 
> this proposal masks some high order bits too.
> 
> I second Yazen's suggestion of a new field. One for the raw value,
> another for the massaged phsical address derived from the MSR.

For struct mce?  Maybe that should be 2 new fields:

	__u64 addr;		/* Deprecated */
	...
	__u64 mci_addr;		/* Bank's MCi_ADDR MSR */
	__u64 phys_addr;	/* Physical address */
RE: [PATCH RESEND V2 1/2] x86/mce: Fix missing address mask in recovery for errors in TDX/SEAM non-root mode
Posted by Luck, Tony 1 month, 2 weeks ago
> For struct mce?  Maybe that should be 2 new fields:
>
>	__u64 addr;		/* Deprecated */
>	...
>	__u64 mci_addr;		/* Bank's MCi_ADDR MSR */
>	__u64 phys_addr;	/* Physical address */

Would "addr" keep the current (low bits masked, high bits preserved) value?

-Tony

Re: [PATCH RESEND V2 1/2] x86/mce: Fix missing address mask in recovery for errors in TDX/SEAM non-root mode
Posted by Adrian Hunter 1 month, 2 weeks ago
On 19/08/2025 21:03, Luck, Tony wrote:
>> For struct mce?  Maybe that should be 2 new fields:
>>
>> 	__u64 addr;		/* Deprecated */
>> 	...
>> 	__u64 mci_addr;		/* Bank's MCi_ADDR MSR */
>> 	__u64 phys_addr;	/* Physical address */
> 
> Would "addr" keep the current (low bits masked, high bits preserved) value?

Yeah, it wouldn't make much sense if phys_addr was the same as addr anyway.
Not really thinking
RE: [PATCH RESEND V2 1/2] x86/mce: Fix missing address mask in recovery for errors in TDX/SEAM non-root mode
Posted by Luck, Tony 1 month, 2 weeks ago
> >> For struct mce?  Maybe that should be 2 new fields:
> >>
> >>    __u64 addr;             /* Deprecated */
> >>    ...
> >>    __u64 mci_addr;         /* Bank's MCi_ADDR MSR */
> >>    __u64 phys_addr;        /* Physical address */
> >
> > Would "addr" keep the current (low bits masked, high bits preserved) value?
>
> Yeah, it wouldn't make much sense if phys_addr was the same as addr anyway.
> Not really thinking

The other option (but a bad one) would be:

        __u64 deprecated;       /* was "addr" */
        ...
        __u64 mci_addr;         /* Bank's MCi_ADDR MSR */
        __u64 phys_addr;        /* Physical address */

which would be good to force cleanup in the kernel, but bad for preserving
ABI (since "struct mce" is visible to user space via /dev/mcelog).

-Tony
Re: [PATCH RESEND V2 1/2] x86/mce: Fix missing address mask in recovery for errors in TDX/SEAM non-root mode
Posted by Yazen Ghannam 1 month, 2 weeks ago
On Wed, Aug 20, 2025 at 04:12:28PM +0000, Luck, Tony wrote:
> > >> For struct mce?  Maybe that should be 2 new fields:
> > >>
> > >>    __u64 addr;             /* Deprecated */
> > >>    ...
> > >>    __u64 mci_addr;         /* Bank's MCi_ADDR MSR */
> > >>    __u64 phys_addr;        /* Physical address */
> > >
> > > Would "addr" keep the current (low bits masked, high bits preserved) value?
> >
> > Yeah, it wouldn't make much sense if phys_addr was the same as addr anyway.
> > Not really thinking
> 
> The other option (but a bad one) would be:
> 
>         __u64 deprecated;       /* was "addr" */
>         ...
>         __u64 mci_addr;         /* Bank's MCi_ADDR MSR */
>         __u64 phys_addr;        /* Physical address */
> 
> which would be good to force cleanup in the kernel, but bad for preserving
> ABI (since "struct mce" is visible to user space via /dev/mcelog).
> 

/dev/mcelog has been deprecated for a while.

Is the mcelog app still in active development? Could it be updated to
use trace events for MCE info?

You could also just fix up the address value in the mcelog notifier's
copy. I believe it has its own cache separate from the MCE genpool.

Thanks,
Yazen
Re: [PATCH RESEND V2 1/2] x86/mce: Fix missing address mask in recovery for errors in TDX/SEAM non-root mode
Posted by Adrian Hunter 1 month, 1 week ago
On 20/08/2025 20:56, Yazen Ghannam wrote:
> On Wed, Aug 20, 2025 at 04:12:28PM +0000, Luck, Tony wrote:
>>>>> For struct mce?  Maybe that should be 2 new fields:
>>>>>
>>>>>    __u64 addr;             /* Deprecated */
>>>>>    ...
>>>>>    __u64 mci_addr;         /* Bank's MCi_ADDR MSR */
>>>>>    __u64 phys_addr;        /* Physical address */
>>>>
>>>> Would "addr" keep the current (low bits masked, high bits preserved) value?
>>>
>>> Yeah, it wouldn't make much sense if phys_addr was the same as addr anyway.
>>> Not really thinking
>>
>> The other option (but a bad one) would be:
>>
>>         __u64 deprecated;       /* was "addr" */
>>         ...
>>         __u64 mci_addr;         /* Bank's MCi_ADDR MSR */
>>         __u64 phys_addr;        /* Physical address */
>>
>> which would be good to force cleanup in the kernel, but bad for preserving
>> ABI (since "struct mce" is visible to user space via /dev/mcelog).
>>
> 
> /dev/mcelog has been deprecated for a while.

There is also mce_record tracepoint

> 
> Is the mcelog app still in active development? Could it be updated to
> use trace events for MCE info?
> 
> You could also just fix up the address value in the mcelog notifier's
> copy. I believe it has its own cache separate from the MCE genpool.

Is there an advantage to fixing up later rather than when addr is
initially assigned?