x86: mask_user_address() return base of guard page for kernel addresses

[PATCH next] x86: mask_user_address() return base of guard page for kernel addresses

Posted by David Laight 1 year, 2 months ago

mask_user_address() currently return ~0 for kernel addresses.
This is fine for avoiding speculative reads and get_user()
but is problematic for the code pattern:
		if (can_do_masked_user_access())
			from = masked_user_access_begin(from);
		else if (!user_read_access_begin(from, sizeof(*from)))
			return -EFAULT;
		unsafe_get_user(to->p, &from->p, Efault);
		unsafe_get_user(to->size, &from->size, Efault);
		user_read_access_end();
because of the requirement that the base address be accessed first.

Changing mask_user_address() to return the base of the guard page
means that any address within 4k of the tested address will fault
and the order of the structure member access is no longer critical.

The change replaces the 'sbb, or' with a 'cmov' so is also shorter.

Signed-off-by: David Laight <david.laight@aculab.com>
---

I've built and run a kernel with it - so not broken!

Probably ought to be a follow up patch to rename it to bound_user_access()
before there are too many users.

 arch/x86/include/asm/uaccess_64.h | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index b0a887209400..4cdace8c93b3 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -57,19 +57,22 @@ static inline unsigned long __untagged_addr_remote(struct mm_struct *mm,
 	((__force unsigned long)(x) <= runtime_const_ptr(USER_PTR_MAX))
 
 /*
- * Masking the user address is an alternative to a conditional
- * user_access_begin that can avoid the fencing. This only works
- * for dense accesses starting at the address.
+ * Bound a user address to the base of the guard page.
+ * This can be used to avoid speculative accesses following access_ok(),
+ * or as an alternative to a conditional user_access_begin.
+ * Both without expensive fencing.
+ * Valid provided the accesses are 'reasonably sequnential'
+ * (no jumps of a page size).
  */
 static inline void __user *mask_user_address(const void __user *ptr)
 {
-	unsigned long mask;
-	asm("cmp %1,%0\n\t"
-	    "sbb %0,%0"
-		:"=r" (mask)
-		:"r" (ptr),
-		 "0" (runtime_const_ptr(USER_PTR_MAX)));
-	return (__force void __user *)(mask | (__force unsigned long)ptr);
+	void __user *bounded = (__force void __user *)runtime_const_ptr(USER_PTR_MAX);
+
+	asm("cmp %0,%1\n\t"
+	    "cmovc %1,%0"
+		:"+r" (bounded)
+		:"r" (ptr));
+	return bounded;
 }
 #define masked_user_access_begin(x) ({				\
 	__auto_type __masked_ptr = (x);				\
-- 
2.17.1

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Re: [PATCH next] x86: mask_user_address() return base of guard page for kernel addresses

Posted by Linus Torvalds 1 year, 2 months ago

On Sun, 1 Dec 2024 at 10:12, David Laight <David.Laight@aculab.com> wrote:
>
> I've built and run a kernel with it - so not broken!

I worry that 'cmov' could be predicted - making the whole sequence
pointless. It would be a stupid thing for a CPU core to do, but it
would be simple.

Of course, 'sbb' could be done using predicting the carry flag too.
There's a lot of ways to screw this up.

Intel at some point explicitly said

 "Other instructions such as CMOVcc, AND, ADC, SBB and SETcc can also
be used to prevent bounds
  check bypass by constraining speculative execution on current family
6 processors (Intel® Core™,
  Intel® Atom™, Intel® Xeon® and Intel® Xeon Phi™ processors).
However, these instructions may not
  be guaranteed to do so on future Intel processors"

so none of these are safe according to that.

Maybe there were newer updates on this, but in the meantime I'd rather
have just *one* pattern, not switch between multiple possibly
problematic ones. And sbb has been that traditional one.

Also, if sbb is ever made speculative, I think it's time to just jump ship.

             Linus

RE: [PATCH next] x86: mask_user_address() return base of guard page for kernel addresses

Posted by David Laight 1 year, 2 months ago

From: Linus Torvalds
> Sent: 01 December 2024 20:03
> 
> On Sun, 1 Dec 2024 at 10:12, David Laight <David.Laight@aculab.com> wrote:
> >
> > I've built and run a kernel with it - so not broken!
> 
> I worry that 'cmov' could be predicted - making the whole sequence
> pointless. It would be a stupid thing for a CPU core to do, but it
> would be simple.
> 
> Of course, 'sbb' could be done using predicting the carry flag too.
> There's a lot of ways to screw this up.

About the only register that it would make any sense to 'predict'
is the flags register.
That is going to affect cmov and sbb the same.

> Intel at some point explicitly said
> 
>  "Other instructions such as CMOVcc, AND, ADC, SBB and SETcc can also
> be used to prevent bounds
>   check bypass by constraining speculative execution on current family
> 6 processors (Intel® Core™,
>   Intel® Atom™, Intel® Xeon® and Intel® Xeon Phi™ processors).
> However, these instructions may not
>   be guaranteed to do so on future Intel processors"
> 
> so none of these are safe according to that.
> 
> Maybe there were newer updates on this, but in the meantime I'd rather
> have just *one* pattern, not switch between multiple possibly
> problematic ones. And sbb has been that traditional one.

I had some more thoughts while failing to sleep :-)

The current code relies on the cmp, sbb and or/and not being predicted.
This basically requires that all alu instructions not being predicted.
If that isn't true then pretty much any memory access could speculatively
access almost any memory address.

The 'may not be guaranteed' part might be because someone has mooted
the (probably brain-dead) idea of speculatively executing alu instructions
instead of implementing pipeline stalls.
Most alu forwarding (result to next instruction) and stalls can be
determined during the decode clocks (they affect the instruction
scheduling on current x86 cpu).
Perhaps it might help with variable length instructions (like memory reads)
but they are the very ones where you don't want to use wrong values from.

Mind you the hardware engineers have done other horrid things to increase
cpu clock speed to get some benchmarks run faster without considering the
full effect on other software - which is why we are in this mess.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

RE: [PATCH next] x86: mask_user_address() return base of guard page for kernel addresses

Posted by David Laight 1 year, 2 months ago

From: Linus Torvalds
> Sent: 01 December 2024 20:03
> 
> On Sun, 1 Dec 2024 at 10:12, David Laight <David.Laight@aculab.com> wrote:
> >
> > I've built and run a kernel with it - so not broken!
> 
> I worry that 'cmov' could be predicted - making the whole sequence
> pointless. It would be a stupid thing for a CPU core to do, but it
> would be simple.

Not really, 'prediction' is all about the where the 'front end' reads
instructions from - and trying to avoid it depending on other cpu state.
Which is why some cpu don't even check the instruction is even a branch.

Whereas 'cmov' is pretty much an ALU instruction with the output
depending on the state of registers.

Agner's tables pretty much show that Intel implemented as
	x = cond ? y : x
so it suffers from being a 2 u-op instruction (the same as sbb)
on older core-2 cpu.
OTOH AMD have is as '4 per clock' (the same as mov) so could be
a 'mov' with the write disabled' (but I'm not sure how that
would work if 'mov' is a register rename).

> Of course, 'sbb' could be done using predicting the carry flag too.
> There's a lot of ways to screw this up.
> 
> Intel at some point explicitly said
> 
>  "Other instructions such as CMOVcc, AND, ADC, SBB and SETcc can also

Presumably that is 'as well an lfence' ?
Not sure how much use setcc could be - it only changes %al.

> be used to prevent bounds
>   check bypass by constraining speculative execution on current family
> 6 processors (Intel® Core™,
>   Intel® Atom™, Intel® Xeon® and Intel® Xeon Phi™ processors).
> However, these instructions may not
>   be guaranteed to do so on future Intel processors"
> 
> so none of these are safe according to that.

Well, they are all currently safe, but the future is undecided.
Sounds like just be non-committal...

> Maybe there were newer updates on this, but in the meantime I'd rather
> have just *one* pattern, not switch between multiple possibly
> problematic ones. And sbb has been that traditional one.

Well, array_index_nospec() could just be a cmp + cmov.
If you pass 'size - 1' that can be used for the 'out of range' value.
Otherwise you'd need to pass in a zero (that the compiler can generate earlier).
That would save an instruction (and register dependency) in the 'index in range' path.

The only other use I spotted is in files_lookup_fd_raw() which also uses the
0/~1 from array_index_mask_nospec() to also mask the value read from xxx[0].

> Also, if sbb is ever made speculative, I think it's time to just jump ship.

Well AND and (presumably) OR are also in the list.
You are relying on those as well as the sbb.

If you get issues in a mips-like cpu it all gets harder.
Without a 'carry flag' you can user 'sbb'.
It also makes any kind of cmov hard to implement.
I guess cmov-odd and cmov-even could be implemented by disabling
the register-file write (and used after a cmp - that sets 0/1).
Whether the architecture defines/implements them is another matter.
They aren't in the one I implemented earlier in the year.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Re: [PATCH next] x86: mask_user_address() return base of guard page for kernel addresses

Posted by Linus Torvalds 1 year, 2 months ago

On Sun, 1 Dec 2024 at 14:24, David Laight <David.Laight@aculab.com> wrote:
>
> Agner's tables pretty much show that Intel implemented as
>         x = cond ? y : x
> so it suffers from being a 2 u-op instruction (the same as sbb)
> on older core-2 cpu.

So I don't worry about a 2-cycle latency here, you'll find the same
for 'sbb' too, and there you have the additional 'or' operation that
then adds another cycle.

And Intel has documented that cmov is a data dependency, so it's
mainly just AMD that I'd worry about:

> OTOH AMD have is as '4 per clock' (the same as mov) so could be
> a 'mov' with the write disabled' (but I'm not sure how that
> would work if 'mov' is a register rename).

So that's the part that really worried me. "4 per lock, just like
'mov'" makes me worry it's a clever predicted mov instruction.

However, it looks like Anger is actually wrong here. Going to

    https://uops.info/table.html

and looking up 'cmovbe' (which I think is the op we'd want), says that
ZEN 4 is 2 per cycle (I hate how they call that 0.5 "throughput" - at
least Agner correctly calls it the "reciprocal throughput").

So that actually looks ok.

I'd still be happier if I could find some official AMD doc that says
that cmov is a data dependency and is not predicted, but at least now
the numbers line up for it.

          Linus

Re: [PATCH next] x86: mask_user_address() return base of guard page for kernel addresses

Posted by Linus Torvalds 1 year, 2 months ago

On Sun, 1 Dec 2024 at 14:24, David Laight <David.Laight@aculab.com> wrote:
>
> OTOH AMD have is as '4 per clock' (the same as mov) so could be
> a 'mov' with the write disabled' (but I'm not sure how that
> would work if 'mov' is a register rename).

It could work exactly by just predicting it one way or the other.

That's my point.

I don't think / hope anybody does it, but it's a particularly easy
mistake to do.

             Linus