[PATCH 0/6] block: add blk_io_plug_call() API

Stefan Hajnoczi posted 6 patches 11 months, 1 week ago
Failed in applying to current master (apply log)
There is a newer version of this series
MAINTAINERS                       |   1 +
include/block/block-io.h          |   3 -
include/block/block_int-common.h  |  11 ---
include/block/raw-aio.h           |  14 ---
include/sysemu/block-backend-io.h |  13 +--
block/blkio.c                     |  40 ++++----
block/block-backend.c             |  22 -----
block/file-posix.c                |  38 -------
block/io.c                        |  37 -------
block/io_uring.c                  |  45 ++++-----
block/linux-aio.c                 |  41 +++-----
block/nvme.c                      |  44 +++------
block/plug.c                      | 159 ++++++++++++++++++++++++++++++
hw/block/dataplane/xen-block.c    |   8 +-
hw/block/virtio-blk.c             |   4 +-
hw/scsi/virtio-scsi.c             |   6 +-
block/meson.build                 |   1 +
block/trace-events                |   5 +-
18 files changed, 236 insertions(+), 256 deletions(-)
create mode 100644 block/plug.c
[PATCH 0/6] block: add blk_io_plug_call() API
Posted by Stefan Hajnoczi 11 months, 1 week ago
The existing blk_io_plug() API is not block layer multi-queue friendly because
the plug state is per-BlockDriverState.

Change blk_io_plug()'s implementation so it is thread-local. This is done by
introducing the blk_io_plug_call() function that block drivers use to batch
calls while plugged. It is relatively easy to convert block drivers from
.bdrv_co_io_plug() to blk_io_plug_call().

Random read 4KB performance with virtio-blk on a host NVMe block device:

iodepth   iops   change vs today
1        45612   -4%
2        87967   +2%
4       129872   +0%
8       171096   -3%
16      194508   -4%
32      208947   -1%
64      217647   +0%
128     229629   +0%

The results are within the noise for these benchmarks. This is to be expected
because the plugging behavior for a single thread hasn't changed in this patch
series, only that the state is thread-local now.

The following graph compares several approaches:
https://vmsplice.net/~stefan/blk_io_plug-thread-local.png
- v7.2.0: before most of the multi-queue block layer changes landed.
- with-blk_io_plug: today's post-8.0.0 QEMU.
- blk_io_plug-thread-local: this patch series.
- no-blk_io_plug: what happens when we simply remove plugging?
- call-after-dispatch: what if we integrate plugging into the event loop? I
  decided against this approach in the end because it's more likely to
  introduce performance regressions since I/O submission is deferred until the
  end of the event loop iteration.

Aside from the no-blk_io_plug case, which bottlenecks much earlier than the
others, we see that all plugging approaches are more or less equivalent in this
benchmark. It is also clear that QEMU 8.0.0 has lower performance than 7.2.0.

The Ansible playbook, fio results, and a Jupyter notebook are available here:
https://github.com/stefanha/qemu-perf/tree/remove-blk_io_plug

Stefan Hajnoczi (6):
  block: add blk_io_plug_call() API
  block/nvme: convert to blk_io_plug_call() API
  block/blkio: convert to blk_io_plug_call() API
  block/io_uring: convert to blk_io_plug_call() API
  block/linux-aio: convert to blk_io_plug_call() API
  block: remove bdrv_co_io_plug() API

 MAINTAINERS                       |   1 +
 include/block/block-io.h          |   3 -
 include/block/block_int-common.h  |  11 ---
 include/block/raw-aio.h           |  14 ---
 include/sysemu/block-backend-io.h |  13 +--
 block/blkio.c                     |  40 ++++----
 block/block-backend.c             |  22 -----
 block/file-posix.c                |  38 -------
 block/io.c                        |  37 -------
 block/io_uring.c                  |  45 ++++-----
 block/linux-aio.c                 |  41 +++-----
 block/nvme.c                      |  44 +++------
 block/plug.c                      | 159 ++++++++++++++++++++++++++++++
 hw/block/dataplane/xen-block.c    |   8 +-
 hw/block/virtio-blk.c             |   4 +-
 hw/scsi/virtio-scsi.c             |   6 +-
 block/meson.build                 |   1 +
 block/trace-events                |   5 +-
 18 files changed, 236 insertions(+), 256 deletions(-)
 create mode 100644 block/plug.c

-- 
2.40.1