[PATCH] x86/mce: deal with UCE when copy clean pagecache to user space

wangchuanguo posted 1 patch 5 days, 16 hours ago
arch/x86/kernel/cpu/mce/severity.c | 30 +++++++++++++++++++++++++-----
1 file changed, 25 insertions(+), 5 deletions(-)
[PATCH] x86/mce: deal with UCE when copy clean pagecache to user space
Posted by wangchuanguo 5 days, 16 hours ago
Based on copy_from_user, extending the goal to unmap,discard,
and remap when errors occur in clean pagecache.

Signed-off-by: wangchuanguo <wangchuanguo@inspur.com>
---
 arch/x86/kernel/cpu/mce/severity.c | 30 +++++++++++++++++++++++++-----
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c
index 2235a7477436..d1fc9568fa71 100644
--- a/arch/x86/kernel/cpu/mce/severity.c
+++ b/arch/x86/kernel/cpu/mce/severity.c
@@ -231,8 +231,22 @@ static struct severity {
 
 #define mc_recoverable(mcg) (((mcg) & (MCG_STATUS_RIPV|MCG_STATUS_EIPV)) == \
 				(MCG_STATUS_RIPV|MCG_STATUS_EIPV))
-
-static bool is_copy_from_user(struct pt_regs *regs)
+static bool is_clean_pagecache(unsigned long addr)
+{
+	if (virt_addr_valid(addr)) {
+		struct page *page;
+
+		page = virt_to_page(addr);
+		if (page) {
+			if (!PageSlab(page) && !PageAnon(page)) {
+				if (!PageDirty(page))
+					return true;
+			}
+		}
+	}
+	return false;
+}
+static bool is_copy_user(struct pt_regs *regs)
 {
 	u8 insn_buf[MAX_INSN_SIZE];
 	unsigned long addr;
@@ -264,8 +278,14 @@ static bool is_copy_from_user(struct pt_regs *regs)
 		return false;
 	}
 
-	if (fault_in_kernel_space(addr))
-		return false;
+	if (fault_in_kernel_space(addr)) {
+		if (is_clean_pagecache(addr) && !fault_in_kernel_space(regs->di)) {
+			//is copying clean pagecache to user space
+			addr = regs->di;
+		} else {
+			return false;
+		}
+	}
 
 	current->mce_vaddr = (void __user *)addr;
 
@@ -297,7 +317,7 @@ static noinstr int error_context(struct mce *m, struct pt_regs *regs)
 	/* Allow instrumentation around external facilities usage. */
 	instrumentation_begin();
 	fixup_type = ex_get_fixup_type(m->ip);
-	copy_user  = is_copy_from_user(regs);
+	copy_user  = is_copy_user(regs);
 	instrumentation_end();
 
 	if (copy_user) {
-- 
2.39.3
RE: [PATCH] x86/mce: deal with UCE when copy clean pagecache to user space
Posted by Luck, Tony 5 days, 6 hours ago
> Based on copy_from_user, extending the goal to unmap,discard,
> and remap when errors occur in clean pagecache.

This looks to be covering the case where an application does:

	n = read(fd, buf, size);

and the kernel gets a machine check because the data in the page
cache has an uncorrected error, which the kernel consumes while
copying out to user space.

I think this patch solves the immediate problem by avoiding a system
crash from that machine check.

But shouldn't the kernel be able to do better? The page is clean, so
the kernel could drop it from the page cache, allocate a new page,
fill the page by reading from filesystem, and then let the read(2)
syscall resume, copying the freshly acquired copy of the page.

-Tony
Re: [PATCH] x86/mce: deal with UCE when copy clean pagecache to user space
Posted by Borislav Petkov 5 days, 12 hours ago
On Fri, Sep 26, 2025 at 01:44:02PM +0800, wangchuanguo wrote:
> Based on copy_from_user, extending the goal to unmap,discard,
> and remap when errors occur in clean pagecache.

This does not even begin to explain why this patch exists:
 
A possible way to structure the commit message is:
 
1. Prepare the context for the explanation briefly.
 
2. Explain the problem at hand.
 
3. "It happens because of <...>"
 
4. "Fix it by doing X"
 
5. "(Potentially do Y)."
 
And some of those above are optional depending on the issue being
explained.
 
For more detailed info, see
Documentation/process/submitting-patches.rst,
Section "2) Describe your changes".
 
Also, to the tone, from Documentation/process/submitting-patches.rst:
 
 "Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
  instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
  to do frotz", as if you are giving orders to the codebase to change
  its behaviour."
 
Also, do not talk about what your patch does - that should (hopefully) be
visible from the diff itself. Rather, talk about *why* you're doing what
you're doing.

> @@ -264,8 +278,14 @@ static bool is_copy_from_user(struct pt_regs *regs)
>  		return false;
>  	}
>  
> -	if (fault_in_kernel_space(addr))
> -		return false;
> +	if (fault_in_kernel_space(addr)) {
> +		if (is_clean_pagecache(addr) && !fault_in_kernel_space(regs->di)) {
> +			//is copying clean pagecache to user space

Look at the comment style in the rest of the file before you add this //

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette