[PATCH] powerpc: Simplify access_ok()

Christophe Leroy (CS GROUP) posted 1 patch 2 weeks, 6 days ago
There is a newer version of this series
arch/powerpc/include/asm/uaccess.h | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
[PATCH] powerpc: Simplify access_ok()
Posted by Christophe Leroy (CS GROUP) 2 weeks, 6 days ago
With the implementation of masked user access, we always have a memory
gap between user memory space and kernel memory space, so use it to
simplify access_ok() by relying on access fault in case of an access
in the gap.

Most of the time the size is known at build time.

On powerpc64, the kernel space starts at 0x8000000000000000 which is
always more than two times TASK_USER_MAX so when the size is known at
build time and lower than TASK_USER_MAX, only the address needs to be
verified. If not, a binary or of address and size must be lower than
TASK_USER_MAX. As TASK_USER_MAX is a power of 2, just check that
there is no bit set outside of TASK_USER_MAX - 1 mask.

On powerpc32, there is a garanteed gap of 128KB so when the size is
known at build time and not greater than 128KB, just check that the
address is below TASK_SIZE. Otherwise use the original formula.

Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
---
 arch/powerpc/include/asm/uaccess.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index 570b3d91e2e4..ec210ae62be7 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -15,8 +15,34 @@
 #define TASK_SIZE_MAX		TASK_SIZE_USER64
 #endif
 
+#define __access_ok __access_ok
+
 #include <asm-generic/access_ok.h>
 
+/*
+ * On powerpc64, TASK_SIZE_MAX is 0x0010000000000000 then even if both ptr and size
+ * are TASK_SIZE_MAX we are still inside the memory gap. So make it simple.
+ */
+static __always_inline int __access_ok(const void __user *ptr, unsigned long size)
+{
+	unsigned long addr = (unsigned long)ptr;
+
+	if (IS_ENABLED(CONFIG_PPC64)) {
+		BUILD_BUG_ON(!is_power_of_2(TASK_SIZE_MAX));
+		BUILD_BUG_ON(TASK_SIZE_MAX > 0x0010000000000000);
+
+		if (__builtin_constant_p(size))
+			return size <= TASK_SIZE_MAX && !(addr & ~(TASK_SIZE_MAX - 1));
+		else
+			return !((size | addr) & ~(TASK_SIZE_MAX - 1));
+	} else {
+		if (__builtin_constant_p(size) && size < SZ_128K)
+			return addr < TASK_SIZE;
+		else
+			return size <= TASK_SIZE && addr <= TASK_SIZE - size);
+	}
+}
+
 /*
  * These are the main single-value transfer routines.  They automatically
  * use the right size if we just have the right pointer type.
-- 
2.49.0
Re: [PATCH] powerpc: Simplify access_ok()
Posted by David Laight 2 weeks, 1 day ago
On Tue, 17 Mar 2026 19:07:04 +0100
"Christophe Leroy (CS GROUP)" <chleroy@kernel.org> wrote:

> With the implementation of masked user access, we always have a memory
> gap between user memory space and kernel memory space, so use it to
> simplify access_ok() by relying on access fault in case of an access
> in the gap.
> 
> Most of the time the size is known at build time.
> 
> On powerpc64, the kernel space starts at 0x8000000000000000 which is
> always more than two times TASK_USER_MAX so when the size is known at
> build time and lower than TASK_USER_MAX, only the address needs to be
> verified. If not, a binary or of address and size must be lower than
> TASK_USER_MAX. As TASK_USER_MAX is a power of 2, just check that
> there is no bit set outside of TASK_USER_MAX - 1 mask.
> 
> On powerpc32, there is a garanteed gap of 128KB so when the size is
> known at build time and not greater than 128KB, just check that the
> address is below TASK_SIZE. Otherwise use the original formula.

Given that the whole thing relies on the kernel code 'obeying the rules'
is it enough to require that the accesses will be 'moderately sequential'?
Provided there are no jumps greater than 128k the length can be ignored.

I think Linus thought about doing that for x86-64.

I can't imagine that happening unless there is code that probes the end of
the user buffer before starting a transfer - and that is pretty pointless.

There are places that skip a few bytes (or just access in the wrong order)
but it is likely to be alignment padding, and code should be doing the
access_ok() check for each fragment - not on the entire buffer.

> 
> Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
> ---
>  arch/powerpc/include/asm/uaccess.h | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
> index 570b3d91e2e4..ec210ae62be7 100644
> --- a/arch/powerpc/include/asm/uaccess.h
> +++ b/arch/powerpc/include/asm/uaccess.h
> @@ -15,8 +15,34 @@
>  #define TASK_SIZE_MAX		TASK_SIZE_USER64
>  #endif
>  
> +#define __access_ok __access_ok
> +
>  #include <asm-generic/access_ok.h>
>  
> +/*
> + * On powerpc64, TASK_SIZE_MAX is 0x0010000000000000 then even if both ptr and size
> + * are TASK_SIZE_MAX we are still inside the memory gap. So make it simple.
> + */
> +static __always_inline int __access_ok(const void __user *ptr, unsigned long size)
> +{
> +	unsigned long addr = (unsigned long)ptr;
> +
> +	if (IS_ENABLED(CONFIG_PPC64)) {
> +		BUILD_BUG_ON(!is_power_of_2(TASK_SIZE_MAX));
> +		BUILD_BUG_ON(TASK_SIZE_MAX > 0x0010000000000000);
> +
> +		if (__builtin_constant_p(size))
> +			return size <= TASK_SIZE_MAX && !(addr & ~(TASK_SIZE_MAX - 1));
> +		else
> +			return !((size | addr) & ~(TASK_SIZE_MAX - 1));

The compiler may know an upper bound for 'size' even when it isn't a constant.
It might be 32bit or from 'size = is_compat_foo ? 16 : 24', so:
		if (statically_true(size < TASK_SIZE_MAX)
			return !(addr & ~(TASK_SIZE_MAX - 1);
		return !((size | addr) & ~(TASK_SIZE_MAX - 1));

> +	} else {
> +		if (__builtin_constant_p(size) && size < SZ_128K)

Again the compiler may know an upper bound even if the value isn't constant:
		if (statically_true(size < SZ_128K)

	David

> +			return addr < TASK_SIZE;
> +		else
> +			return size <= TASK_SIZE && addr <= TASK_SIZE - size);
> +	}
> +}
> +
>  /*
>   * These are the main single-value transfer routines.  They automatically
>   * use the right size if we just have the right pointer type.
Re: [PATCH] powerpc: Simplify access_ok()
Posted by Christophe Leroy (CS GROUP) 2 weeks, 1 day ago

Le 22/03/2026 à 12:03, David Laight a écrit :
> On Tue, 17 Mar 2026 19:07:04 +0100
> "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> wrote:
> 
>> With the implementation of masked user access, we always have a memory
>> gap between user memory space and kernel memory space, so use it to
>> simplify access_ok() by relying on access fault in case of an access
>> in the gap.
>>
>> Most of the time the size is known at build time.
>>
>> On powerpc64, the kernel space starts at 0x8000000000000000 which is
>> always more than two times TASK_USER_MAX so when the size is known at
>> build time and lower than TASK_USER_MAX, only the address needs to be
>> verified. If not, a binary or of address and size must be lower than
>> TASK_USER_MAX. As TASK_USER_MAX is a power of 2, just check that
>> there is no bit set outside of TASK_USER_MAX - 1 mask.
>>
>> On powerpc32, there is a garanteed gap of 128KB so when the size is
>> known at build time and not greater than 128KB, just check that the
>> address is below TASK_SIZE. Otherwise use the original formula.
> 
> Given that the whole thing relies on the kernel code 'obeying the rules'
> is it enough to require that the accesses will be 'moderately sequential'?
> Provided there are no jumps greater than 128k the length can be ignored.
> 
> I think Linus thought about doing that for x86-64.

You mean ignoring length completely ?

Yes we can probably do that on 64 bits. I don't know about x86_64 but 
powerpc64 has TASK_USER < 0x0010000000000000 and kernel space is above 
0x8000000000000000, so oring size with address and comparing it to 
0x0010000000000000 doesn't add much cost compared to just comparing the 
address.

> 
> I can't imagine that happening unless there is code that probes the end of
> the user buffer before starting a transfer - and that is pretty pointless.
 > > There are places that skip a few bytes (or just access in the wrong 
order)
> but it is likely to be alignment padding, and code should be doing the
> access_ok() check for each fragment - not on the entire buffer.

I don't follow you. Why not for the entire buffer ? We try to minimise 
amount of stac/clac (or equivalent) and access_ok() is associated with 
stac. When we use access_begin/access_end we tend to try and regroup 
everything in a single bloc.

> 
>>
>> Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
>> ---
>>   arch/powerpc/include/asm/uaccess.h | 26 ++++++++++++++++++++++++++
>>   1 file changed, 26 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
>> index 570b3d91e2e4..ec210ae62be7 100644
>> --- a/arch/powerpc/include/asm/uaccess.h
>> +++ b/arch/powerpc/include/asm/uaccess.h
>> @@ -15,8 +15,34 @@
>>   #define TASK_SIZE_MAX		TASK_SIZE_USER64
>>   #endif
>>   
>> +#define __access_ok __access_ok
>> +
>>   #include <asm-generic/access_ok.h>
>>   
>> +/*
>> + * On powerpc64, TASK_SIZE_MAX is 0x0010000000000000 then even if both ptr and size
>> + * are TASK_SIZE_MAX we are still inside the memory gap. So make it simple.
>> + */
>> +static __always_inline int __access_ok(const void __user *ptr, unsigned long size)
>> +{
>> +	unsigned long addr = (unsigned long)ptr;
>> +
>> +	if (IS_ENABLED(CONFIG_PPC64)) {
>> +		BUILD_BUG_ON(!is_power_of_2(TASK_SIZE_MAX));
>> +		BUILD_BUG_ON(TASK_SIZE_MAX > 0x0010000000000000);
>> +
>> +		if (__builtin_constant_p(size))
>> +			return size <= TASK_SIZE_MAX && !(addr & ~(TASK_SIZE_MAX - 1));
>> +		else
>> +			return !((size | addr) & ~(TASK_SIZE_MAX - 1));
> 
> The compiler may know an upper bound for 'size' even when it isn't a constant.
> It might be 32bit or from 'size = is_compat_foo ? 16 : 24', so:
> 		if (statically_true(size < TASK_SIZE_MAX)
> 			return !(addr & ~(TASK_SIZE_MAX - 1);
> 		return !((size | addr) & ~(TASK_SIZE_MAX - 1));

I think you are missing the case where size is constant and > TASK_SIZE_MAX.

Or maybe that case should be catched with a BUILD_BUG ?

Christophe

> 
>> +	} else {
>> +		if (__builtin_constant_p(size) && size < SZ_128K)
> 
> Again the compiler may know an upper bound even if the value isn't constant:
> 		if (statically_true(size < SZ_128K)
> 
> 	David
> 
>> +			return addr < TASK_SIZE;
>> +		else
>> +			return size <= TASK_SIZE && addr <= TASK_SIZE - size);
>> +	}
>> +}
>> +
>>   /*
>>    * These are the main single-value transfer routines.  They automatically
>>    * use the right size if we just have the right pointer type.
> 

Re: [PATCH] powerpc: Simplify access_ok()
Posted by David Laight 2 weeks ago
On Sun, 22 Mar 2026 20:02:56 +0100
"Christophe Leroy (CS GROUP)" <chleroy@kernel.org> wrote:

> Le 22/03/2026 à 12:03, David Laight a écrit :
> > On Tue, 17 Mar 2026 19:07:04 +0100
> > "Christophe Leroy (CS GROUP)" <chleroy@kernel.org> wrote:
> >   
> >> With the implementation of masked user access, we always have a memory
> >> gap between user memory space and kernel memory space, so use it to
> >> simplify access_ok() by relying on access fault in case of an access
> >> in the gap.
> >>
> >> Most of the time the size is known at build time.
> >>
> >> On powerpc64, the kernel space starts at 0x8000000000000000 which is
> >> always more than two times TASK_USER_MAX so when the size is known at
> >> build time and lower than TASK_USER_MAX, only the address needs to be
> >> verified. If not, a binary or of address and size must be lower than
> >> TASK_USER_MAX. As TASK_USER_MAX is a power of 2, just check that
> >> there is no bit set outside of TASK_USER_MAX - 1 mask.
> >>
> >> On powerpc32, there is a garanteed gap of 128KB so when the size is
> >> known at build time and not greater than 128KB, just check that the
> >> address is below TASK_SIZE. Otherwise use the original formula.  
> > 
> > Given that the whole thing relies on the kernel code 'obeying the rules'
> > is it enough to require that the accesses will be 'moderately sequential'?
> > Provided there are no jumps greater than 128k the length can be ignored.
> > 
> > I think Linus thought about doing that for x86-64.  
> 
> You mean ignoring length completely ?

That is the thought.
The 'masked_user_access' code does that and it could be always done
provided all the call sites are checked.
The length was needed historically because the 386 allowed kernel
writes to read only pages - so COW had to be tested in software.

> 
> Yes we can probably do that on 64 bits. I don't know about x86_64 but 
> powerpc64 has TASK_USER < 0x0010000000000000 and kernel space is above 
> 0x8000000000000000, so oring size with address and comparing it to 
> 0x0010000000000000 doesn't add much cost compared to just comparing the 
> address.
> 
> > 
> > I can't imagine that happening unless there is code that probes the end of
> > the user buffer before starting a transfer - and that is pretty pointless.  
>  > > There are places that skip a few bytes (or just access in the wrong   
> order)
> > but it is likely to be alignment padding, and code should be doing the
> > access_ok() check for each fragment - not on the entire buffer.  
> 
> I don't follow you. Why not for the entire buffer ? We try to minimise 
> amount of stac/clac (or equivalent) and access_ok() is associated with 
> stac. When we use access_begin/access_end we tend to try and regroup 
> everything in a single bloc.

I mean that the access_ok/stac/clac are done together at the place where
the copy is done, rather than doing the access_ok() on the system call
interface and the actual copies much later.
There are a few ioctl (probably netlink and some of the sctp sockopt)
which process multiple 'items' in one user buffer - they'll do the
access_ok/stac/clac multiple times.

The only problem would be code that processes a buffer that contains
lots of items like:
	struct item {
		unsigned int size;
		char cmd[size];
	};
and advances to the next item believing the 'size' field (checked against
the outer buffer size) but without a separate access_ok() for the later
commands.
	

> 
> >   
> >>
> >> Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
> >> ---
> >>   arch/powerpc/include/asm/uaccess.h | 26 ++++++++++++++++++++++++++
> >>   1 file changed, 26 insertions(+)
> >>
> >> diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
> >> index 570b3d91e2e4..ec210ae62be7 100644
> >> --- a/arch/powerpc/include/asm/uaccess.h
> >> +++ b/arch/powerpc/include/asm/uaccess.h
> >> @@ -15,8 +15,34 @@
> >>   #define TASK_SIZE_MAX		TASK_SIZE_USER64
> >>   #endif
> >>   
> >> +#define __access_ok __access_ok
> >> +
> >>   #include <asm-generic/access_ok.h>
> >>   
> >> +/*
> >> + * On powerpc64, TASK_SIZE_MAX is 0x0010000000000000 then even if both ptr and size
> >> + * are TASK_SIZE_MAX we are still inside the memory gap. So make it simple.
> >> + */
> >> +static __always_inline int __access_ok(const void __user *ptr, unsigned long size)
> >> +{
> >> +	unsigned long addr = (unsigned long)ptr;
> >> +
> >> +	if (IS_ENABLED(CONFIG_PPC64)) {
> >> +		BUILD_BUG_ON(!is_power_of_2(TASK_SIZE_MAX));
> >> +		BUILD_BUG_ON(TASK_SIZE_MAX > 0x0010000000000000);
> >> +
> >> +		if (__builtin_constant_p(size))
> >> +			return size <= TASK_SIZE_MAX && !(addr & ~(TASK_SIZE_MAX - 1));
> >> +		else
> >> +			return !((size | addr) & ~(TASK_SIZE_MAX - 1));  
> > 
> > The compiler may know an upper bound for 'size' even when it isn't a constant.
> > It might be 32bit or from 'size = is_compat_foo ? 16 : 24', so:
> > 		if (statically_true(size < TASK_SIZE_MAX)
> > 			return !(addr & ~(TASK_SIZE_MAX - 1);
> > 		return !((size | addr) & ~(TASK_SIZE_MAX - 1));  
> 
> I think you are missing the case where size is constant and > TASK_SIZE_MAX.

It is no different to a variable size that happens to be > TASK_SIZE_MAX.
But I suspect the compiler will optimise it to 'return 0'.

> Or maybe that case should be catched with a BUILD_BUG ?

Probably not worth slowing down the compile.

	David

> 
> Christophe
> 
> >   
> >> +	} else {
> >> +		if (__builtin_constant_p(size) && size < SZ_128K)  
> > 
> > Again the compiler may know an upper bound even if the value isn't constant:
> > 		if (statically_true(size < SZ_128K)
> > 
> > 	David
> >   
> >> +			return addr < TASK_SIZE;
> >> +		else
> >> +			return size <= TASK_SIZE && addr <= TASK_SIZE - size);
> >> +	}
> >> +}
> >> +
> >>   /*
> >>    * These are the main single-value transfer routines.  They automatically
> >>    * use the right size if we just have the right pointer type.  
> >   
> 
Re: [PATCH] powerpc: Simplify access_ok()
Posted by kernel test robot 2 weeks, 1 day ago
Hi Christophe,

kernel test robot noticed the following build errors:

[auto build test ERROR on v7.0-rc3]
[cannot apply to powerpc/next powerpc/fixes v7.0-rc4 linus/master next-20260320]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Christophe-Leroy-CS-GROUP/powerpc-Simplify-access_ok/20260318-080252
base:   v7.0-rc3
patch link:    https://lore.kernel.org/r/56dd1a892279fade2292b7eef7a52112901ae2fd.1773770778.git.chleroy%40kernel.org
patch subject: [PATCH] powerpc: Simplify access_ok()
config: powerpc-allnoconfig (https://download.01.org/0day-ci/archive/20260322/202603220648.5ZKgsQIu-lkp@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 15.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260322/202603220648.5ZKgsQIu-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603220648.5ZKgsQIu-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   In file included from include/linux/uaccess.h:13,
                    from include/linux/sched/task.h:13,
                    from include/linux/sched/signal.h:9,
                    from include/linux/rcuwait.h:6,
                    from include/linux/percpu-rwsem.h:7,
                    from include/linux/fs/super_types.h:13,
                    from include/linux/fs/super.h:5,
                    from include/linux/fs.h:5,
                    from include/linux/compat.h:17,
                    from arch/powerpc/kernel/asm-offsets.c:13:
   arch/powerpc/include/asm/uaccess.h: In function '__access_ok':
>> arch/powerpc/include/asm/uaccess.h:42:77: error: expected ';' before ')' token
      42 |                         return size <= TASK_SIZE && addr <= TASK_SIZE - size);
         |                                                                             ^
         |                                                                             ;
>> arch/powerpc/include/asm/uaccess.h:41:17: warning: this 'else' clause does not guard... [-Wmisleading-indentation]
      41 |                 else
         |                 ^~~~
   arch/powerpc/include/asm/uaccess.h:42:77: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the 'else'
      42 |                         return size <= TASK_SIZE && addr <= TASK_SIZE - size);
         |                                                                             ^
>> arch/powerpc/include/asm/uaccess.h:42:77: error: expected statement before ')' token
   make[3]: *** [scripts/Makefile.build:184: arch/powerpc/kernel/asm-offsets.s] Error 1
   make[3]: Target 'prepare' not remade because of errors.
   make[2]: *** [Makefile:1333: prepare0] Error 2
   make[2]: Target 'prepare' not remade because of errors.
   make[1]: *** [Makefile:248: __sub-make] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:248: __sub-make] Error 2
   make: Target 'prepare' not remade because of errors.


vim +42 arch/powerpc/include/asm/uaccess.h

    21	
    22	/*
    23	 * On powerpc64, TASK_SIZE_MAX is 0x0010000000000000 then even if both ptr and size
    24	 * are TASK_SIZE_MAX we are still inside the memory gap. So make it simple.
    25	 */
    26	static __always_inline int __access_ok(const void __user *ptr, unsigned long size)
    27	{
    28		unsigned long addr = (unsigned long)ptr;
    29	
    30		if (IS_ENABLED(CONFIG_PPC64)) {
    31			BUILD_BUG_ON(!is_power_of_2(TASK_SIZE_MAX));
    32			BUILD_BUG_ON(TASK_SIZE_MAX > 0x0010000000000000);
    33	
    34			if (__builtin_constant_p(size))
    35				return size <= TASK_SIZE_MAX && !(addr & ~(TASK_SIZE_MAX - 1));
    36			else
    37				return !((size | addr) & ~(TASK_SIZE_MAX - 1));
    38		} else {
    39			if (__builtin_constant_p(size) && size < SZ_128K)
    40				return addr < TASK_SIZE;
  > 41			else
  > 42				return size <= TASK_SIZE && addr <= TASK_SIZE - size);
    43		}
    44	}
    45	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki