These patches enable Linux io_uring flags that can improve performance.
Bernd Schubert mentioned io_uring_setup(2) flags that may improve performance:
- IORING_SETUP_SINGLE_ISSUER: optimization when only 1 thread uses an io_uring context
- IORING_SETUP_COOP_TASKRUN: avoids IPIs
- IORING_SETUP_TASKRUN_FLAG: makes COOP_TASKRUN work with userspace CQ ring polling
Jens Axboe recently confirmed that SINGLE_ISSUER makes sense.
Suraj Shirvankar already started work on SINGLE_ISSUER in the past:
https://lore.kernel.org/qemu-devel/174293621917.22751.11381319865102029969-0@git.sr.ht/
Where this differs from Suraj's previous work is that I have worked around the
need for the main loop AioContext to be shared by multiple threads (vCPU
threads and the migration thread).
Here are the performance numbers for fio bs=4k in a 4 vCPU guest with 1
IOThread using a virtio-blk disk backed by a local NVMe drive:
IOPS IOPS IOPS
Benchmark SINGLE_ISSUER +TASKRUN +NO_SQARRAY
randread iodepth=1 99108 (+0.33%) 100816 (+2.1%) 104411 (+5.7%)
randread iodepth=64 276314 (+0.12%) 275939 (-0.012%) 275899 (-0.026%)
randwrite iodepth=1 99997 (-0.11%) 102866 (+2.8%) 105588 (+5.5%)
randwrite iodepth=64 272205 (-0.2%) 271973 (-0.29%) 273257 (+0.18%)
You can find detailed benchmarking results here including the fio
output, fio command-line, and guest libvirt domain XML:
https://gitlab.com/stefanha/virt-playbooks/-/tree/io_uring-flags/notebook/fio-output
https://gitlab.com/stefanha/virt-playbooks/-/blob/io_uring-flags/files/fio.sh
https://gitlab.com/stefanha/virt-playbooks/-/blob/io_uring-flags/files/test.xml.j2
Stefan Hajnoczi (4):
iothread: create AioContext in iothread_run()
aio-posix: enable IORING_SETUP_SINGLE_ISSUER
aio-posix: enable IORING_SETUP_COOP_TASKRUN |
IORING_SETUP_TASKRUN_FLAG
aio-posix: enable IORING_SETUP_NO_SQARRAY
include/system/iothread.h | 1 -
iothread.c | 140 +++++++++++++++++++++-----------------
util/fdmon-io_uring.c | 38 ++++++++++-
3 files changed, 113 insertions(+), 66 deletions(-)
--
2.53.0