[PATCH v3 1/6] x86/uaccess: Avoid barrier_nospec() in 64-bit copy_from_user()

Josh Poimboeuf posted 6 patches 3 weeks, 6 days ago
[PATCH v3 1/6] x86/uaccess: Avoid barrier_nospec() in 64-bit copy_from_user()
Posted by Josh Poimboeuf 3 weeks, 6 days ago
The barrier_nospec() in 64-bit copy_from_user() is slow.  Instead use
pointer masking to force the user pointer to all 1's if the access_ok()
mispredicted true for an invalid address.

The kernel test robot reports a 2.6% improvement in the per_thread_ops
benchmark (see link below).

To avoid regressing powerpc and 32-bit x86, move their barrier_nospec()
calls to their respective raw_copy_from_user() implementations so
there's no functional change there.

Note that for safety on some AMD CPUs, this relies on recent commit
86e6b1547b3d ("x86: fix user address masking non-canonical speculation
issue").

Link: https://lore.kernel.org/202410281344.d02c72a2-oliver.sang@intel.com
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
 arch/powerpc/include/asm/uaccess.h | 2 ++
 arch/x86/include/asm/uaccess_32.h  | 1 +
 arch/x86/include/asm/uaccess_64.h  | 1 +
 include/linux/uaccess.h            | 6 ------
 4 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index 4f5a46a77fa2..12abb8bf5eda 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -7,6 +7,7 @@
 #include <asm/extable.h>
 #include <asm/kup.h>
 #include <asm/asm-compat.h>
+#include <asm/barrier.h>
 
 #ifdef __powerpc64__
 /* We use TASK_SIZE_USER64 as TASK_SIZE is not constant */
@@ -341,6 +342,7 @@ static inline unsigned long raw_copy_from_user(void *to,
 {
 	unsigned long ret;
 
+	barrier_nospec();
 	allow_read_from_user(from, n);
 	ret = __copy_tofrom_user((__force void __user *)to, from, n);
 	prevent_read_from_user(from, n);
diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h
index 40379a1adbb8..8393ba104b2c 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -23,6 +23,7 @@ raw_copy_to_user(void __user *to, const void *from, unsigned long n)
 static __always_inline unsigned long
 raw_copy_from_user(void *to, const void __user *from, unsigned long n)
 {
+	barrier_nospec();
 	return __copy_user_ll(to, (__force const void *)from, n);
 }
 
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index b0a887209400..7ce84090f0ec 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -138,6 +138,7 @@ copy_user_generic(void *to, const void *from, unsigned long len)
 static __always_inline __must_check unsigned long
 raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
 {
+	src = mask_user_address(src);
 	return copy_user_generic(dst, (__force void *)src, size);
 }
 
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 39c7cf82b0c2..dda9725a9559 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -160,12 +160,6 @@ _inline_copy_from_user(void *to, const void __user *from, unsigned long n)
 	unsigned long res = n;
 	might_fault();
 	if (!should_fail_usercopy() && likely(access_ok(from, n))) {
-		/*
-		 * Ensure that bad access_ok() speculation will not
-		 * lead to nasty side effects *after* the copy is
-		 * finished:
-		 */
-		barrier_nospec();
 		instrument_copy_from_user_before(to, from, n);
 		res = raw_copy_from_user(to, from, n);
 		instrument_copy_from_user_after(to, from, n, res);
-- 
2.47.0
Re: [PATCH v3 1/6] x86/uaccess: Avoid barrier_nospec() in 64-bit copy_from_user()
Posted by Linus Torvalds 3 weeks, 5 days ago
On Mon, 28 Oct 2024 at 15:56, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> The barrier_nospec() in 64-bit copy_from_user() is slow.  Instead use
> pointer masking to force the user pointer to all 1's if the access_ok()
> mispredicted true for an invalid address.
>
> The kernel test robot reports a 2.6% improvement in the per_thread_ops
> benchmark (see link below).

Hmm. So it strikes me that this still does the "access_ok()", but
that's pointless for the actual pointer masking case. One of the whole
points of the pointer masking is that we can just do this without
actually checking the address (or length) at all.

That's why the strncpy_from_user() has the pattern of

        if (can_do_masked_user_access()) {
                ... don't worry about the size of the address space ..

and I think this code should do that too.

IOW, I think we can do even better than your patch with something
(UNTESTED!) like the attached.

That will also mean that any other architecture that starts doing the
user address masking trick will pick up on this automatically.

Hmm?

                Linus
Re: [PATCH v3 1/6] x86/uaccess: Avoid barrier_nospec() in 64-bit copy_from_user()
Posted by Josh Poimboeuf 3 weeks, 5 days ago
On Tue, Oct 29, 2024 at 04:03:31PM -1000, Linus Torvalds wrote:
> Hmm. So it strikes me that this still does the "access_ok()", but
> that's pointless for the actual pointer masking case. One of the whole
> points of the pointer masking is that we can just do this without
> actually checking the address (or length) at all.
> 
> That's why the strncpy_from_user() has the pattern of
> 
>         if (can_do_masked_user_access()) {
>                 ... don't worry about the size of the address space ..
> 
> and I think this code should do that too.
> 
> IOW, I think we can do even better than your patch with something
> (UNTESTED!) like the attached.
> 
> That will also mean that any other architecture that starts doing the
> user address masking trick will pick up on this automatically.
> 
> Hmm?

Yeah, it makes sense to hook into that existing
can_do_masked_user_access() thing.  The patch looks good, and it boots
without blowing up.  Thanks!

Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>

-- 
Josh
Re: [PATCH v3 1/6] x86/uaccess: Avoid barrier_nospec() in 64-bit copy_from_user()
Posted by Kirill A . Shutemov 3 weeks, 5 days ago
On Mon, Oct 28, 2024 at 06:56:14PM -0700, Josh Poimboeuf wrote:
> The barrier_nospec() in 64-bit copy_from_user() is slow.  Instead use
> pointer masking to force the user pointer to all 1's if the access_ok()
> mispredicted true for an invalid address.
> 
> The kernel test robot reports a 2.6% improvement in the per_thread_ops
> benchmark (see link below).
> 
> To avoid regressing powerpc and 32-bit x86, move their barrier_nospec()
> calls to their respective raw_copy_from_user() implementations so
> there's no functional change there.
> 
> Note that for safety on some AMD CPUs, this relies on recent commit
> 86e6b1547b3d ("x86: fix user address masking non-canonical speculation
> issue").
> 
> Link: https://lore.kernel.org/202410281344.d02c72a2-oliver.sang@intel.com
> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
  Kiryl Shutsemau / Kirill A. Shutemov