[PATCH next 0/3] iov: Optimise user copies

David Laight posted 3 patches 10 months, 3 weeks ago
lib/iov_iter.c | 97 ++++++++++++++++++++++++++++++++------------------
1 file changed, 63 insertions(+), 34 deletions(-)
[PATCH next 0/3] iov: Optimise user copies
Posted by David Laight 10 months, 3 weeks ago
The speculation barrier in access_ok() is expensive.

The first patch removes the initial checks when reading the iovec[].
The checks are repeated before the actual copy.

The second patch uses 'user address masking' if supported.

The third removes a lot of code for single entry iovec[].

David Laight (3):
  Remove access_ok() from import_iovec()
  Use masked user accesses
  Optimise __import_iovec_ubuf()

 lib/iov_iter.c | 97 ++++++++++++++++++++++++++++++++------------------
 1 file changed, 63 insertions(+), 34 deletions(-)

-- 
2.39.5
Re: [PATCH next 0/3] iov: Optimise user copies
Posted by Jens Axboe 10 months, 3 weeks ago
On 3/21/25 4:45 PM, David Laight wrote:
> The speculation barrier in access_ok() is expensive.
> 
> The first patch removes the initial checks when reading the iovec[].
> The checks are repeated before the actual copy.
> 
> The second patch uses 'user address masking' if supported.
> 
> The third removes a lot of code for single entry iovec[].

Looks good to me:

Reviewed-by: Jens Axboe <axboe@kernel.dk>

-- 
Jens Axboe
Re: [PATCH next 0/3] iov: Optimise user copies
Posted by Linus Torvalds 10 months, 3 weeks ago
On Fri, 21 Mar 2025 at 15:46, David Laight <david.laight.linux@gmail.com> wrote:
>
> The speculation barrier in access_ok() is expensive.
>
> The first patch removes the initial checks when reading the iovec[].
> The checks are repeated before the actual copy.
>
> The second patch uses 'user address masking' if supported.
>
> The third removes a lot of code for single entry iovec[].

Ack, except I'd really like to see numbers for things that claim to
remove expensive stuff.

But yeah, the patches look sane.

          Linus
Re: [PATCH next 0/3] iov: Optimise user copies
Posted by David Laight 10 months, 2 weeks ago
On Fri, 21 Mar 2025 16:35:52 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, 21 Mar 2025 at 15:46, David Laight <david.laight.linux@gmail.com> wrote:
> >
> > The speculation barrier in access_ok() is expensive.
> >
> > The first patch removes the initial checks when reading the iovec[].
> > The checks are repeated before the actual copy.
> >
> > The second patch uses 'user address masking' if supported.
> >
> > The third removes a lot of code for single entry iovec[].  
> 
> Ack, except I'd really like to see numbers for things that claim to
> remove expensive stuff.

I've finally managed to take some measurements that make sense.
(measurements on a zen5).
Values are from:
	mfence; start = rdpmc; mfence;
	// A bit of setup code and an indirect function call
	syscall(SYS_gettid/pwritev/preadv, ...)
	mfence; end = rdpmc; mfence;
	cycles = end - start;
The prints are done after all the tests.
The preadv/pwritev are for 8 buffers of 1 byte at offset 0 to /dev/zero.
gettid() is used as a base, a completely empty test is ~180 clocks.
(I've deleted the other 30 results for gettid - they are very consistent.)

I think they show an improvement, but it is similar to the change in gettid.

original
              bac  3790   396   397   397   397   397   397   397   397   397  gettid
                8  8792  1088   739   678   674   674   669   669   669   669  pwrite
                8   717   676   669   669   669   669   669   669   669   669  pwrite
                8   692   669   669   669   669   669   671   669   669   669  pwrite
                8   692   669   669   669   669   669   669   669   669   669  pwrite
                8 16744  1221   805   827   824   847   823   819   846   838  pread
                8   865   857   827   828   808   820   845   828   821   824  pread
                8   823   804   802   813   811   845   836   839   815   813  pread
                8   862   813   853   846   847   828   821   820   806   846  pread

address masking
              bdc  3218   592   391   391   391   391   391   391   391   391  gettid
                8  7701  1091  1913   676   672   665   665   665   665   665  pwrite
                8   722   669   665   665   665   665   665   665   665   665  pwrite
                8   690   665   665   665   665   665   665   665   665   665  pwrite
                8   690   666   665   665   665   665   665   665   665   665  pwrite
                8 13532  1114   802   812   798   810   824   806   825   829  pread
                8   808   805   806   838   799   829   839   831   843   796  pread
                8   812   825   835   802   803   806   809   829   827   806  pread
                8   807   801   842   804   806   828   811   824   810   808  pread

I ran an extra test with a barrier_nospec() following the access_ok() in the
'copy from user' path.
I'm not sure if its absence is an oversight or a valid decision on the assumption
that the data being read is just 'data' and not used for any control decisions.
Oddly this makes pread more expensive even though the change in is the write path.
I suspect the cache alignment of all the code changed.

fenced
              bc5  4681   394   395   395   395   395   395   395   395   395  gettid
                8  8988  1005  1114   772   688   684   684   684   684   684  pwrite
                8   742   688   684   684   684   684   684   684   684   684  pwrite
                8   709   681   684   684   684   684   684   684   684   685  pwrite
                8   690   684   684   684   684   684   684   681   684   684  pwrite
                8 13259  1025   813   834   825   833   833   833   830   815  pread
                8   821   819   837   827   837   837   837   837   837   837  pread
                8   819   834   819   833   837   834   809   833   837   837  pread
                8   816   819   833   837   837   837   837   837   834   809  pread

	David

~                                                                                                                      
> 
> But yeah, the patches look sane.
> 
>           Linus
Re: [PATCH next 0/3] iov: Optimise user copies
Posted by David Laight 10 months, 3 weeks ago
On Fri, 21 Mar 2025 16:35:52 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, 21 Mar 2025 at 15:46, David Laight <david.laight.linux@gmail.com> wrote:
> >
> > The speculation barrier in access_ok() is expensive.
> >
> > The first patch removes the initial checks when reading the iovec[].
> > The checks are repeated before the actual copy.
> >
> > The second patch uses 'user address masking' if supported.
> >
> > The third removes a lot of code for single entry iovec[].  
> 
> Ack, except I'd really like to see numbers for things that claim to
> remove expensive stuff.

Except that some of the 'expensive stuff' is missing!

copy_from_user_iter() does:
	if (access_ok())
		raw_copy_from_user();
So it is missing the barrier_nospec().
The error handling is also different from _inline_copy_from_user().
(It doesn't zero-fill after a partial read.)

The observant will also notice that it is missing the massive
performance hit (and code bloat) of check_copy_size() (usercopy hardening).

Talking of performance I've dug out my clock cycle measuring code
(still full of different ipcsum functions).
I'm sure I got 12 bytes/clock on my i7-7 for the loop in the current kernel,
but it is only giving 10 today (possibly I don't have the latest version).
OTOH my new zen5 runs the adxo/adxc loop at 16 bytes/clock (i7-7 manages 12).
I'm going to try to find time for some memcpy() experiments.

	David

> 
> But yeah, the patches look sane.
> 
>           Linus
Re: [PATCH next 0/3] iov: Optimise user copies
Posted by David Laight 10 months, 3 weeks ago
On Fri, 21 Mar 2025 16:35:52 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, 21 Mar 2025 at 15:46, David Laight <david.laight.linux@gmail.com> wrote:
> >
> > The speculation barrier in access_ok() is expensive.
> >
> > The first patch removes the initial checks when reading the iovec[].
> > The checks are repeated before the actual copy.
> >
> > The second patch uses 'user address masking' if supported.
> >
> > The third removes a lot of code for single entry iovec[].  
> 
> Ack, except I'd really like to see numbers for things that claim to
> remove expensive stuff.

Testing readv() of /dev/zero or writev() of /dev/null probably show
most gain.

I did do some allmodconfig builds and got no change, but I might have the
lfence compiled out of access_ok().
In any case kernel builds are pretty much user space limited.

	David

> 
> But yeah, the patches look sane.
> 
>           Linus