[PATCH V4 00/27] ublk: add UBLK_F_BATCH_IO

Ming Lei posted 27 patches 1 week, 3 days ago
Documentation/block/ublk.rst                  |   60 +-
drivers/block/ublk_drv.c                      | 1312 +++++++++++++++--
include/linux/kfifo.h                         |   34 +-
include/uapi/linux/ublk_cmd.h                 |   85 ++
lib/kfifo.c                                   |    8 +-
tools/testing/selftests/ublk/Makefile         |    7 +-
tools/testing/selftests/ublk/batch.c          |  604 ++++++++
tools/testing/selftests/ublk/common.c         |    2 +-
tools/testing/selftests/ublk/file_backed.c    |   11 +-
tools/testing/selftests/ublk/kublk.c          |  143 +-
tools/testing/selftests/ublk/kublk.h          |  195 ++-
tools/testing/selftests/ublk/null.c           |   18 +-
tools/testing/selftests/ublk/stripe.c         |   17 +-
.../testing/selftests/ublk/test_generic_14.sh |   32 +
.../testing/selftests/ublk/test_generic_15.sh |   30 +
.../testing/selftests/ublk/test_generic_16.sh |   30 +
.../testing/selftests/ublk/test_stress_06.sh  |   45 +
.../testing/selftests/ublk/test_stress_07.sh  |   44 +
tools/testing/selftests/ublk/utils.h          |   64 +
19 files changed, 2563 insertions(+), 178 deletions(-)
create mode 100644 tools/testing/selftests/ublk/batch.c
create mode 100755 tools/testing/selftests/ublk/test_generic_14.sh
create mode 100755 tools/testing/selftests/ublk/test_generic_15.sh
create mode 100755 tools/testing/selftests/ublk/test_generic_16.sh
create mode 100755 tools/testing/selftests/ublk/test_stress_06.sh
create mode 100755 tools/testing/selftests/ublk/test_stress_07.sh
[PATCH V4 00/27] ublk: add UBLK_F_BATCH_IO
Posted by Ming Lei 1 week, 3 days ago
Hello,

This patchset adds UBLK_F_BATCH_IO feature for communicating between kernel and ublk
server in batching way:

- Per-queue vs Per-I/O: Commands operate on queues rather than individual I/Os

- Batch processing: Multiple I/Os are handled in single operation

- Multishot commands: Use io_uring multishot for reducing submission overhead

- Flexible task assignment: Any task can handle any I/O (no per-I/O daemons)

- Better load balancing: Tasks can adjust their workload dynamically

- help for future optimizations:
	- blk-mq batch tags free
  	- support io-poll
	- per-task batch for avoiding per-io lock
	- fetch command priority

- simplify command cancel process with per-queue lock

selftest are provided.


Performance test result(IOPS) on V3:

- page copy

tools/testing/selftests/ublk//kublk add -t null -q 16 [-b]

- zero copy(--auto_zc)
tools/testing/selftests/ublk//kublk add -t null -q 16 --auto_zc [-b]

- IO test
taskset -c 0-31 fio/t/io_uring -p0 -n $JOBS -r 30 /dev/ublkb0

1) 16 jobs IO
- page copy:  			37.77M vs. 42.40M(BATCH_IO), +12%
- zero copy(--auto_zc): 42.83M vs. 44.43M(BATCH_IO), +3.7%


2) single job IO
- page copy:  			2.54M vs. 2.6M(BATCH_IO),   +2.3%
- zero copy(--auto_zc): 3.13M vs. 3.35M(BATCH_IO),  +7%


V4:
	- fix handling in case of running out of mshot buffer, request has to
	  be un-prepared for zero copy
	- don't expose unused tag to userspace
	- replace fixed buffer with plain user buffer for
	  UBLK_U_IO_PREP_IO_CMDS and UBLK_U_IO_COMMIT_IO_CMDS
	- replace iov iterator with plain copy_from_user() for
	  ublk_walk_cmd_buf(), code is simplified with performance improvement
	- don't touch sqe->len for UBLK_U_IO_PREP_IO_CMDS and
	  UBLK_U_IO_COMMIT_IO_CMDS(Caleb Sander Mateos)
	- use READ_ONCE() for access sqe->addr (Caleb Sander Mateos)
	- all kinds of patch style fix(Caleb Sander Mateos)
	- inline __kfifo_alloc() (Caleb Sander Mateos)


V3:
	- rebase on for-6.19/block
	- use blk_mq_end_request_batch() to free requests in batch, only for
	  page copy
	- fix one IO hang issue because of memory barrier order, comments on
	the memory barrier pairing
	- add NUMA ware kfifo_alloc_node()
	- fix one build warning reported by 0-DAY CI
	- selftests improvement & fix

V2:
	- ublk_config_io_buf() vs. __ublk_fetch() order
	- code style clean
	- use READ_ONCE() to cache sqe data because sqe copy becomes
	  conditional recently
	- don't use sqe->len for UBLK_U_IO_PREP_IO_CMDS &
	  UBLK_U_IO_COMMIT_IO_CMDS
	- fix one build warning
	- fix build_user_data()
	- run performance analysis, and find one bug in
	  io_uring_cmd_buffer_select(), fix is posted already

Ming Lei (27):
  kfifo: add kfifo_alloc_node() helper for NUMA awareness
  ublk: add parameter `struct io_uring_cmd *` to
    ublk_prep_auto_buf_reg()
  ublk: add `union ublk_io_buf` with improved naming
  ublk: refactor auto buffer register in ublk_dispatch_req()
  ublk: pass const pointer to ublk_queue_is_zoned()
  ublk: add helper of __ublk_fetch()
  ublk: define ublk_ch_batch_io_fops for the coming feature F_BATCH_IO
  ublk: prepare for not tracking task context for command batch
  ublk: add new batch command UBLK_U_IO_PREP_IO_CMDS &
    UBLK_U_IO_COMMIT_IO_CMDS
  ublk: handle UBLK_U_IO_PREP_IO_CMDS
  ublk: handle UBLK_U_IO_COMMIT_IO_CMDS
  ublk: add io events fifo structure
  ublk: add batch I/O dispatch infrastructure
  ublk: add UBLK_U_IO_FETCH_IO_CMDS for batch I/O processing
  ublk: abort requests filled in event kfifo
  ublk: add new feature UBLK_F_BATCH_IO
  ublk: document feature UBLK_F_BATCH_IO
  ublk: implement batch request completion via
    blk_mq_end_request_batch()
  selftests: ublk: fix user_data truncation for tgt_data >= 256
  selftests: ublk: replace assert() with ublk_assert()
  selftests: ublk: add ublk_io_buf_idx() for returning io buffer index
  selftests: ublk: add batch buffer management infrastructure
  selftests: ublk: handle UBLK_U_IO_PREP_IO_CMDS
  selftests: ublk: handle UBLK_U_IO_COMMIT_IO_CMDS
  selftests: ublk: handle UBLK_U_IO_FETCH_IO_CMDS
  selftests: ublk: add --batch/-b for enabling F_BATCH_IO
  selftests: ublk: support arbitrary threads/queues combination

 Documentation/block/ublk.rst                  |   60 +-
 drivers/block/ublk_drv.c                      | 1312 +++++++++++++++--
 include/linux/kfifo.h                         |   34 +-
 include/uapi/linux/ublk_cmd.h                 |   85 ++
 lib/kfifo.c                                   |    8 +-
 tools/testing/selftests/ublk/Makefile         |    7 +-
 tools/testing/selftests/ublk/batch.c          |  604 ++++++++
 tools/testing/selftests/ublk/common.c         |    2 +-
 tools/testing/selftests/ublk/file_backed.c    |   11 +-
 tools/testing/selftests/ublk/kublk.c          |  143 +-
 tools/testing/selftests/ublk/kublk.h          |  195 ++-
 tools/testing/selftests/ublk/null.c           |   18 +-
 tools/testing/selftests/ublk/stripe.c         |   17 +-
 .../testing/selftests/ublk/test_generic_14.sh |   32 +
 .../testing/selftests/ublk/test_generic_15.sh |   30 +
 .../testing/selftests/ublk/test_generic_16.sh |   30 +
 .../testing/selftests/ublk/test_stress_06.sh  |   45 +
 .../testing/selftests/ublk/test_stress_07.sh  |   44 +
 tools/testing/selftests/ublk/utils.h          |   64 +
 19 files changed, 2563 insertions(+), 178 deletions(-)
 create mode 100644 tools/testing/selftests/ublk/batch.c
 create mode 100755 tools/testing/selftests/ublk/test_generic_14.sh
 create mode 100755 tools/testing/selftests/ublk/test_generic_15.sh
 create mode 100755 tools/testing/selftests/ublk/test_generic_16.sh
 create mode 100755 tools/testing/selftests/ublk/test_stress_06.sh
 create mode 100755 tools/testing/selftests/ublk/test_stress_07.sh

-- 
2.47.0
Re: [PATCH V4 00/27] ublk: add UBLK_F_BATCH_IO
Posted by Ming Lei 3 days, 13 hours ago
On Fri, Nov 21, 2025 at 09:58:22AM +0800, Ming Lei wrote:
> Hello,
> 
> This patchset adds UBLK_F_BATCH_IO feature for communicating between kernel and ublk
> server in batching way:
> 
> - Per-queue vs Per-I/O: Commands operate on queues rather than individual I/Os
> 
> - Batch processing: Multiple I/Os are handled in single operation
> 
> - Multishot commands: Use io_uring multishot for reducing submission overhead
> 
> - Flexible task assignment: Any task can handle any I/O (no per-I/O daemons)
> 
> - Better load balancing: Tasks can adjust their workload dynamically
> 
> - help for future optimizations:
> 	- blk-mq batch tags free
>   	- support io-poll
> 	- per-task batch for avoiding per-io lock
> 	- fetch command priority
> 
> - simplify command cancel process with per-queue lock
> 
> selftest are provided.
> 
> 
> Performance test result(IOPS) on V3:
> 
> - page copy
> 
> tools/testing/selftests/ublk//kublk add -t null -q 16 [-b]
> 
> - zero copy(--auto_zc)
> tools/testing/selftests/ublk//kublk add -t null -q 16 --auto_zc [-b]
> 
> - IO test
> taskset -c 0-31 fio/t/io_uring -p0 -n $JOBS -r 30 /dev/ublkb0
> 
> 1) 16 jobs IO
> - page copy:  			37.77M vs. 42.40M(BATCH_IO), +12%
> - zero copy(--auto_zc): 42.83M vs. 44.43M(BATCH_IO), +3.7%
> 
> 
> 2) single job IO
> - page copy:  			2.54M vs. 2.6M(BATCH_IO),   +2.3%
> - zero copy(--auto_zc): 3.13M vs. 3.35M(BATCH_IO),  +7%
> 
> 
> V4:
> 	- fix handling in case of running out of mshot buffer, request has to
> 	  be un-prepared for zero copy
> 	- don't expose unused tag to userspace
> 	- replace fixed buffer with plain user buffer for
> 	  UBLK_U_IO_PREP_IO_CMDS and UBLK_U_IO_COMMIT_IO_CMDS
> 	- replace iov iterator with plain copy_from_user() for
> 	  ublk_walk_cmd_buf(), code is simplified with performance improvement
> 	- don't touch sqe->len for UBLK_U_IO_PREP_IO_CMDS and
> 	  UBLK_U_IO_COMMIT_IO_CMDS(Caleb Sander Mateos)
> 	- use READ_ONCE() for access sqe->addr (Caleb Sander Mateos)
> 	- all kinds of patch style fix(Caleb Sander Mateos)
> 	- inline __kfifo_alloc() (Caleb Sander Mateos)

Hi Caleb Sander Mateos and Jens,

Caleb have reviewed patch 1 ~ patch 8, and driver patch 9 ~ patch 18 are not
reviewed yet.

I'd want to hear your idea for how to move on. So far, looks there are
several ways:

1) merge patch 1 ~ patch 6 to v6.19 first, which can be prep patches for BATCH_IO

2) delay the whole patchset to v6.20 cycle

3) merge the whole patchset to v6.19

I am fine with either one, which one do you prefer to?

BTW, V4 pass all builtin function and stress tests, and there is just one small bug
fix not posted yet, which can be a follow-up. The new feature takes standalone
code path, so regression risk is pretty small.


Thanks,
Ming
Re: [PATCH V4 00/27] ublk: add UBLK_F_BATCH_IO
Posted by Jens Axboe 3 days, 9 hours ago
On 11/28/25 4:59 AM, Ming Lei wrote:
> On Fri, Nov 21, 2025 at 09:58:22AM +0800, Ming Lei wrote:
>> Hello,
>>
>> This patchset adds UBLK_F_BATCH_IO feature for communicating between kernel and ublk
>> server in batching way:
>>
>> - Per-queue vs Per-I/O: Commands operate on queues rather than individual I/Os
>>
>> - Batch processing: Multiple I/Os are handled in single operation
>>
>> - Multishot commands: Use io_uring multishot for reducing submission overhead
>>
>> - Flexible task assignment: Any task can handle any I/O (no per-I/O daemons)
>>
>> - Better load balancing: Tasks can adjust their workload dynamically
>>
>> - help for future optimizations:
>> 	- blk-mq batch tags free
>>   	- support io-poll
>> 	- per-task batch for avoiding per-io lock
>> 	- fetch command priority
>>
>> - simplify command cancel process with per-queue lock
>>
>> selftest are provided.
>>
>>
>> Performance test result(IOPS) on V3:
>>
>> - page copy
>>
>> tools/testing/selftests/ublk//kublk add -t null -q 16 [-b]
>>
>> - zero copy(--auto_zc)
>> tools/testing/selftests/ublk//kublk add -t null -q 16 --auto_zc [-b]
>>
>> - IO test
>> taskset -c 0-31 fio/t/io_uring -p0 -n $JOBS -r 30 /dev/ublkb0
>>
>> 1) 16 jobs IO
>> - page copy:  			37.77M vs. 42.40M(BATCH_IO), +12%
>> - zero copy(--auto_zc): 42.83M vs. 44.43M(BATCH_IO), +3.7%
>>
>>
>> 2) single job IO
>> - page copy:  			2.54M vs. 2.6M(BATCH_IO),   +2.3%
>> - zero copy(--auto_zc): 3.13M vs. 3.35M(BATCH_IO),  +7%
>>
>>
>> V4:
>> 	- fix handling in case of running out of mshot buffer, request has to
>> 	  be un-prepared for zero copy
>> 	- don't expose unused tag to userspace
>> 	- replace fixed buffer with plain user buffer for
>> 	  UBLK_U_IO_PREP_IO_CMDS and UBLK_U_IO_COMMIT_IO_CMDS
>> 	- replace iov iterator with plain copy_from_user() for
>> 	  ublk_walk_cmd_buf(), code is simplified with performance improvement
>> 	- don't touch sqe->len for UBLK_U_IO_PREP_IO_CMDS and
>> 	  UBLK_U_IO_COMMIT_IO_CMDS(Caleb Sander Mateos)
>> 	- use READ_ONCE() for access sqe->addr (Caleb Sander Mateos)
>> 	- all kinds of patch style fix(Caleb Sander Mateos)
>> 	- inline __kfifo_alloc() (Caleb Sander Mateos)
> 
> Hi Caleb Sander Mateos and Jens,
> 
> Caleb have reviewed patch 1 ~ patch 8, and driver patch 9 ~ patch 18 are not
> reviewed yet.
> 
> I'd want to hear your idea for how to move on. So far, looks there are
> several ways:
> 
> 1) merge patch 1 ~ patch 6 to v6.19 first, which can be prep patches for BATCH_IO
> 
> 2) delay the whole patchset to v6.20 cycle
> 
> 3) merge the whole patchset to v6.19
> 
> I am fine with either one, which one do you prefer to?
> 
> BTW, V4 pass all builtin function and stress tests, and there is just one small bug
> fix not posted yet, which can be a follow-up. The new feature takes standalone
> code path, so regression risk is pretty small.

I'm fine taking the whole thing for 6.19. Caleb let me know if you
disagree. I'll queue 1..6 for now, then can follow up later today with
the rest as needed.

-- 
Jens Axboe
Re: [PATCH V4 00/27] ublk: add UBLK_F_BATCH_IO
Posted by Caleb Sander Mateos 3 days, 6 hours ago
On Fri, Nov 28, 2025 at 8:19 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 11/28/25 4:59 AM, Ming Lei wrote:
> > On Fri, Nov 21, 2025 at 09:58:22AM +0800, Ming Lei wrote:
> >> Hello,
> >>
> >> This patchset adds UBLK_F_BATCH_IO feature for communicating between kernel and ublk
> >> server in batching way:
> >>
> >> - Per-queue vs Per-I/O: Commands operate on queues rather than individual I/Os
> >>
> >> - Batch processing: Multiple I/Os are handled in single operation
> >>
> >> - Multishot commands: Use io_uring multishot for reducing submission overhead
> >>
> >> - Flexible task assignment: Any task can handle any I/O (no per-I/O daemons)
> >>
> >> - Better load balancing: Tasks can adjust their workload dynamically
> >>
> >> - help for future optimizations:
> >>      - blk-mq batch tags free
> >>      - support io-poll
> >>      - per-task batch for avoiding per-io lock
> >>      - fetch command priority
> >>
> >> - simplify command cancel process with per-queue lock
> >>
> >> selftest are provided.
> >>
> >>
> >> Performance test result(IOPS) on V3:
> >>
> >> - page copy
> >>
> >> tools/testing/selftests/ublk//kublk add -t null -q 16 [-b]
> >>
> >> - zero copy(--auto_zc)
> >> tools/testing/selftests/ublk//kublk add -t null -q 16 --auto_zc [-b]
> >>
> >> - IO test
> >> taskset -c 0-31 fio/t/io_uring -p0 -n $JOBS -r 30 /dev/ublkb0
> >>
> >> 1) 16 jobs IO
> >> - page copy:                         37.77M vs. 42.40M(BATCH_IO), +12%
> >> - zero copy(--auto_zc): 42.83M vs. 44.43M(BATCH_IO), +3.7%
> >>
> >>
> >> 2) single job IO
> >> - page copy:                         2.54M vs. 2.6M(BATCH_IO),   +2.3%
> >> - zero copy(--auto_zc): 3.13M vs. 3.35M(BATCH_IO),  +7%
> >>
> >>
> >> V4:
> >>      - fix handling in case of running out of mshot buffer, request has to
> >>        be un-prepared for zero copy
> >>      - don't expose unused tag to userspace
> >>      - replace fixed buffer with plain user buffer for
> >>        UBLK_U_IO_PREP_IO_CMDS and UBLK_U_IO_COMMIT_IO_CMDS
> >>      - replace iov iterator with plain copy_from_user() for
> >>        ublk_walk_cmd_buf(), code is simplified with performance improvement
> >>      - don't touch sqe->len for UBLK_U_IO_PREP_IO_CMDS and
> >>        UBLK_U_IO_COMMIT_IO_CMDS(Caleb Sander Mateos)
> >>      - use READ_ONCE() for access sqe->addr (Caleb Sander Mateos)
> >>      - all kinds of patch style fix(Caleb Sander Mateos)
> >>      - inline __kfifo_alloc() (Caleb Sander Mateos)
> >
> > Hi Caleb Sander Mateos and Jens,
> >
> > Caleb have reviewed patch 1 ~ patch 8, and driver patch 9 ~ patch 18 are not
> > reviewed yet.
> >
> > I'd want to hear your idea for how to move on. So far, looks there are
> > several ways:
> >
> > 1) merge patch 1 ~ patch 6 to v6.19 first, which can be prep patches for BATCH_IO
> >
> > 2) delay the whole patchset to v6.20 cycle
> >
> > 3) merge the whole patchset to v6.19
> >
> > I am fine with either one, which one do you prefer to?
> >
> > BTW, V4 pass all builtin function and stress tests, and there is just one small bug
> > fix not posted yet, which can be a follow-up. The new feature takes standalone
> > code path, so regression risk is pretty small.
>
> I'm fine taking the whole thing for 6.19. Caleb let me know if you
> disagree. I'll queue 1..6 for now, then can follow up later today with
> the rest as needed.

Sorry I haven't gotten around to reviewing the rest of the series yet.
I will try to take a look at them all this weekend. I'm not sure the
batching feature would make sense for our ublk application use case,
but I have no objection to it as long as it doesn't regress the
non-batched ublk behavior/performance.
No problem with queueing up patches 1-6 now (though patch 1 may need
an ack from a kfifo maintainer?).

Thanks,
Caleb
>
> --
> Jens Axboe
Re: [PATCH V4 00/27] ublk: add UBLK_F_BATCH_IO
Posted by Ming Lei 3 days ago
On Fri, Nov 28, 2025 at 11:07:17AM -0800, Caleb Sander Mateos wrote:
> On Fri, Nov 28, 2025 at 8:19 AM Jens Axboe <axboe@kernel.dk> wrote:
> >
> > On 11/28/25 4:59 AM, Ming Lei wrote:
> > > On Fri, Nov 21, 2025 at 09:58:22AM +0800, Ming Lei wrote:
> > >> Hello,
> > >>
> > >> This patchset adds UBLK_F_BATCH_IO feature for communicating between kernel and ublk
> > >> server in batching way:
> > >>
> > >> - Per-queue vs Per-I/O: Commands operate on queues rather than individual I/Os
> > >>
> > >> - Batch processing: Multiple I/Os are handled in single operation
> > >>
> > >> - Multishot commands: Use io_uring multishot for reducing submission overhead
> > >>
> > >> - Flexible task assignment: Any task can handle any I/O (no per-I/O daemons)
> > >>
> > >> - Better load balancing: Tasks can adjust their workload dynamically
> > >>
> > >> - help for future optimizations:
> > >>      - blk-mq batch tags free
> > >>      - support io-poll
> > >>      - per-task batch for avoiding per-io lock
> > >>      - fetch command priority
> > >>
> > >> - simplify command cancel process with per-queue lock
> > >>
> > >> selftest are provided.
> > >>
> > >>
> > >> Performance test result(IOPS) on V3:
> > >>
> > >> - page copy
> > >>
> > >> tools/testing/selftests/ublk//kublk add -t null -q 16 [-b]
> > >>
> > >> - zero copy(--auto_zc)
> > >> tools/testing/selftests/ublk//kublk add -t null -q 16 --auto_zc [-b]
> > >>
> > >> - IO test
> > >> taskset -c 0-31 fio/t/io_uring -p0 -n $JOBS -r 30 /dev/ublkb0
> > >>
> > >> 1) 16 jobs IO
> > >> - page copy:                         37.77M vs. 42.40M(BATCH_IO), +12%
> > >> - zero copy(--auto_zc): 42.83M vs. 44.43M(BATCH_IO), +3.7%
> > >>
> > >>
> > >> 2) single job IO
> > >> - page copy:                         2.54M vs. 2.6M(BATCH_IO),   +2.3%
> > >> - zero copy(--auto_zc): 3.13M vs. 3.35M(BATCH_IO),  +7%
> > >>
> > >>
> > >> V4:
> > >>      - fix handling in case of running out of mshot buffer, request has to
> > >>        be un-prepared for zero copy
> > >>      - don't expose unused tag to userspace
> > >>      - replace fixed buffer with plain user buffer for
> > >>        UBLK_U_IO_PREP_IO_CMDS and UBLK_U_IO_COMMIT_IO_CMDS
> > >>      - replace iov iterator with plain copy_from_user() for
> > >>        ublk_walk_cmd_buf(), code is simplified with performance improvement
> > >>      - don't touch sqe->len for UBLK_U_IO_PREP_IO_CMDS and
> > >>        UBLK_U_IO_COMMIT_IO_CMDS(Caleb Sander Mateos)
> > >>      - use READ_ONCE() for access sqe->addr (Caleb Sander Mateos)
> > >>      - all kinds of patch style fix(Caleb Sander Mateos)
> > >>      - inline __kfifo_alloc() (Caleb Sander Mateos)
> > >
> > > Hi Caleb Sander Mateos and Jens,
> > >
> > > Caleb have reviewed patch 1 ~ patch 8, and driver patch 9 ~ patch 18 are not
> > > reviewed yet.
> > >
> > > I'd want to hear your idea for how to move on. So far, looks there are
> > > several ways:
> > >
> > > 1) merge patch 1 ~ patch 6 to v6.19 first, which can be prep patches for BATCH_IO
> > >
> > > 2) delay the whole patchset to v6.20 cycle
> > >
> > > 3) merge the whole patchset to v6.19
> > >
> > > I am fine with either one, which one do you prefer to?
> > >
> > > BTW, V4 pass all builtin function and stress tests, and there is just one small bug
> > > fix not posted yet, which can be a follow-up. The new feature takes standalone
> > > code path, so regression risk is pretty small.
> >
> > I'm fine taking the whole thing for 6.19. Caleb let me know if you
> > disagree. I'll queue 1..6 for now, then can follow up later today with
> > the rest as needed.
> 
> Sorry I haven't gotten around to reviewing the rest of the series yet.
> I will try to take a look at them all this weekend. I'm not sure the
> batching feature would make sense for our ublk application use case,
> but I have no objection to it as long as it doesn't regress the
> non-batched ublk behavior/performance.
> No problem with queueing up patches 1-6 now (though patch 1 may need
> an ack from a kfifo maintainer?).

BTW, there are many good things with BATCH_IO features:

- batch blk-mq completion: page copy IO mode has shown >12% IOPS improvement; and
	there is chance to apply it for zero copy too in future

- io poll become much easier to support: it can be used to poll nvme char/block device
  to get better iops

- io cancel code path becomes less fragile, and easier to debug: in typical
  implementation, there is only one or two per-queue FETCH(multishot)
  command, others are just sync one-shot commands.

- more chances to improve perf: saved lots of generic uring_cmd code
  path cost, such as, security_uring_cmd()

- `perf bug fix` for UBLK_F_PER_IO_DAEMON, meantime robust load balance
  support

	iops is improved by 4X-5X in `fio/t/io_uring -p0 /dev/ublkbN` between:
		./kublk add -t null  --nthreads 8 -q 4 --per_io_tasks
		and
		./kublk add -t null  --nthreads 8 -q 4 -b

- with per-io lock: fast io path becomes more robust, still can be bypassed
  in future in case of per-io-daemon 


The cost is some complexity in ublk server implementation for maintaining
one or two per-queue FETCH buffer, and one or two per-queue COMMIT buffer.


Thanks,
Ming

Re: (subset) [PATCH V4 00/27] ublk: add UBLK_F_BATCH_IO
Posted by Jens Axboe 3 days, 9 hours ago
On Fri, 21 Nov 2025 09:58:22 +0800, Ming Lei wrote:
> This patchset adds UBLK_F_BATCH_IO feature for communicating between kernel and ublk
> server in batching way:
> 
> - Per-queue vs Per-I/O: Commands operate on queues rather than individual I/Os
> 
> - Batch processing: Multiple I/Os are handled in single operation
> 
> [...]

Applied, thanks!

[01/27] kfifo: add kfifo_alloc_node() helper for NUMA awareness
        commit: 9574b21e952256d4fa3c8797c94482a240992d18
[02/27] ublk: add parameter `struct io_uring_cmd *` to ublk_prep_auto_buf_reg()
        commit: 3035b9b46b0611898babc0b96ede65790d3566f7
[03/27] ublk: add `union ublk_io_buf` with improved naming
        commit: 8d61ece156bd4f2b9e7d3b2a374a26d42c7a4a06
[04/27] ublk: refactor auto buffer register in ublk_dispatch_req()
        commit: 0a9beafa7c633e6ff66b05b81eea78231b7e6520
[05/27] ublk: pass const pointer to ublk_queue_is_zoned()
        commit: 3443bab2f8e44e00adaf76ba677d4219416376f2
[06/27] ublk: add helper of __ublk_fetch()
        commit: 28d7a371f021419cb6c3a243f5cf167f88eb51b9

Best regards,
-- 
Jens Axboe