Hello everyone,
This patch series introduces native io_uring support for FUSE storage
export to overcome the scalability limits of the /dev/fuse interface.
By utilizing shared memory ring buffers and per-core queues, this
feature drastically reduces context switch overhead and lock contention.
This allows FUSE export daemons to achieve much higher throughput and
lower latency by minimizing the userspace-kernel switch penalty.
More details on Fuse-over-io_uring:
https://docs.kernel.org/filesystems/fuse/fuse-io-uring.html
Changes in this version:
- Reorganized patch structure.
- Unified naming of Uring data structures (e.g. FuseRing -> FuseUring)
- Refactored FUSE_IN/OUT_OP_STRUCT_LEGACY
- Code cleanup and logic simplification:
- Used the io_uring flag to indicate the intention to enable
Fuse-over-io_uring.
- Used uring_started to track the active state.
- Removed unnecessary #ifdef CONFIG_LINUX_IO_URING guards.
- Moved fuse_fd closing to BH in uring mode to prevent data races.
- Updated tests: now using mount to verify if the test image mount is
fully gone.
More detail in the v3 cover letter:
https://lists.nongnu.org/archive/html/qemu-block/2025-08/msg00325.html
V2 cover letter:
https://lists.nongnu.org/archive/html/qemu-block/2025-08/msg00140.html
V1 cover letter:
https://lists.nongnu.org/archive/html/qemu-block/2025-07/msg00280.html
We used fio to test a 1GB file under both legacy FUSE and
FUSE-over-io_uring modes. The experiments were conducted with the
following iodepth and numjobs configurations: 1-1, 64-1, 1-4, and 64-4,
with 70% read and 30% write mix. This resulted in a total of 8 test
cases, measuring both latency and throughput.
Performance Results:
[Bandwidth (MiB/s)]
| Config (Job/QD) | Read (Leg -> Uring) | Write (Leg -> Uring)|
|------------------|---------------------|---------------------|
| 1 Job, QD=1 | 72.2 -> 104 | 30.9 -> 44.7 |
| 1 Job, QD=64 | 114 -> 181 | 48.8 -> 77.7 |
| 4 Jobs, QD=1 | 109 -> 159 | 47.0 -> 68.5 |
| 4 Jobs, QD=64 | 106 -> 160 | 45.7 -> 68.9 |
[Latency (usec)]
| Config (Job/QD) | Read (Leg -> Uring) | Write (Leg -> Uring)|
|------------------|---------------------|---------------------|
| 1 Job, QD=1 | 37.0 -> 23.7 | 36.9 -> 29.5 |
| 1 Job, QD=64 | 1537 -> 964 | 1535 -> 967 |
| 4 Jobs, QD=1 | 96.6 -> 66.4 | 114.2 -> 71.9 |
| 4 Jobs, QD=64 | 6560 -> 4234 | 6600 -> 4280 |
Brian Song (7):
[Patch v4 1/7] aio-posix: enable 128-byte SQEs
[Patch v4 2/7] fuse: io_uring mode init
[Patch v4 3/7] fuse: uring support for write ops
[Patch v4 4/7] fuse: refactor FUSE request handler
[Patch v4 5/6] fuse: safe termination for io_uring
[Patch v4 6/7] fuse: add 'io-uring' option
[Patch v4 7/7] fuse: add io_uring test support
block/export/fuse.c | 958 +++++++++++++++++++++++----
docs/tools/qemu-storage-daemon.rst | 7 +-
qapi/block-export.json | 5 +-
storage-daemon/qemu-storage-daemon.c | 1 +
tests/qemu-iotests/check | 2 +
tests/qemu-iotests/common.rc | 47 +-
util/fdmon-io_uring.c | 7 +-
7 files changed, 879 insertions(+), 148 deletions(-)
--
2.43.0