[RFC 00/11] optimise registered buffer/file updates

Pavel Begunkov posted 11 patches 2 years, 10 months ago
There is a newer version of this series
include/linux/io_uring_types.h |   7 +-
io_uring/io_uring.c            |  47 ++++++----
io_uring/rsrc.c                | 152 +++++++++++----------------------
io_uring/rsrc.h                |  50 ++++++-----
4 files changed, 105 insertions(+), 151 deletions(-)
[RFC 00/11] optimise registered buffer/file updates
Posted by Pavel Begunkov 2 years, 10 months ago
Updating registered files and buffers is a very slow operation, which
makes it not feasible for workloads with medium update frequencies.
Rework the underlying rsrc infra for greater performance and lesser
memory footprint.

The improvement is ~11x for a benchmark updating files in a loop
(1040K -> 11468K updates / sec).

The set requires a couple of patches from the 6.3 branch, for that
reason it's an RFC and will be resent after merge.

https://github.com/isilence/linux.git optimise-rsrc-update

Pavel Begunkov (11):
  io_uring/rsrc: use non-pcpu refcounts for nodes
  io_uring/rsrc: keep cached refs per node
  io_uring: don't put nodes under spinlocks
  io_uring: io_free_req() via tw
  io_uring/rsrc: protect node refs with uring_lock
  io_uring/rsrc: kill rsrc_ref_lock
  io_uring/rsrc: rename rsrc_list
  io_uring/rsrc: optimise io_rsrc_put allocation
  io_uring/rsrc: don't offload node free
  io_uring/rsrc: cache struct io_rsrc_node
  io_uring/rsrc: add lockdep sanity checks

 include/linux/io_uring_types.h |   7 +-
 io_uring/io_uring.c            |  47 ++++++----
 io_uring/rsrc.c                | 152 +++++++++++----------------------
 io_uring/rsrc.h                |  50 ++++++-----
 4 files changed, 105 insertions(+), 151 deletions(-)

-- 
2.39.1
Re: [RFC 00/11] optimise registered buffer/file updates
Posted by Jens Axboe 2 years, 10 months ago
On 3/30/23 8:53 AM, Pavel Begunkov wrote:
> Updating registered files and buffers is a very slow operation, which
> makes it not feasible for workloads with medium update frequencies.
> Rework the underlying rsrc infra for greater performance and lesser
> memory footprint.
> 
> The improvement is ~11x for a benchmark updating files in a loop
> (1040K -> 11468K updates / sec).
> 
> The set requires a couple of patches from the 6.3 branch, for that
> reason it's an RFC and will be resent after merge.

Looks pretty sane to me, didn't find anything immediately wrong. I
do wonder if we should have a conditional uring_lock helper, we do
have a few of those. But not really related to this series, as it
just moves one around.

-- 
Jens Axboe


Re: [RFC 00/11] optimise registered buffer/file updates
Posted by Gabriel Krisman Bertazi 2 years, 10 months ago
Pavel,

Pavel Begunkov <asml.silence@gmail.com> writes:
> Updating registered files and buffers is a very slow operation, which
> makes it not feasible for workloads with medium update frequencies.
> Rework the underlying rsrc infra for greater performance and lesser
> memory footprint.
>
> The improvement is ~11x for a benchmark updating files in a loop
> (1040K -> 11468K updates / sec).

Nice. That's a really impressive improvement.

I've been adding io_uring test cases for automated performance
regression testing with mmtests (open source).  I'd love to take a look
at this test case and adapt it to mmtests, so we can pick it up and run
it frequently.

is it something you can share?

-- 
Gabriel Krisman Bertazi
Re: [RFC 00/11] optimise registered buffer/file updates
Posted by Pavel Begunkov 2 years, 10 months ago
On 3/31/23 14:35, Gabriel Krisman Bertazi wrote:
> Pavel,
> 
> Pavel Begunkov <asml.silence@gmail.com> writes:
>> Updating registered files and buffers is a very slow operation, which
>> makes it not feasible for workloads with medium update frequencies.
>> Rework the underlying rsrc infra for greater performance and lesser
>> memory footprint.
>>
>> The improvement is ~11x for a benchmark updating files in a loop
>> (1040K -> 11468K updates / sec).
> 
> Nice. That's a really impressive improvement.
> 
> I've been adding io_uring test cases for automated performance
> regression testing with mmtests (open source).  I'd love to take a look
> at this test case and adapt it to mmtests, so we can pick it up and run
> it frequently.
> 
> is it something you can share?

I'll post it later.

The test is quite stupid and with the patches less than 10% of CPU
cycles go to the update machinery (against 90+ w/o), the rest is spend
for syscalling, submitting update requests, etc., so it almost hits the
limit.

Another test we can do is to measure latency b/w the point we asked a
rsrc to be removed and when it actually got destroyed/freed, e.g. tags
will help with that. It should've been improved nicely as well as it
removes the RCU grace period and other bouncing.

-- 
Pavel Begunkov