block/export/fuse.c | 838 +++++++++++++++++++++------ docs/tools/qemu-storage-daemon.rst | 11 +- qapi/block-export.json | 5 +- storage-daemon/qemu-storage-daemon.c | 1 + tests/qemu-iotests/check | 2 + tests/qemu-iotests/common.rc | 45 +- util/fdmon-io_uring.c | 5 +- 7 files changed, 717 insertions(+), 190 deletions(-)
Hi all, This is a GSoC project. More details are available here: https://wiki.qemu.org/Google_Summer_of_Code_2025#FUSE-over-io_uring_exports This patch series includes: - Add a round-robin mechanism to distribute the kernel-required Ring Queues to FUSE Queues - Support multiple in-flight requests (multiple ring entries) - Add tests for FUSE-over-io_uring More detail in the v2 cover letter: https://lists.nongnu.org/archive/html/qemu-block/2025-08/msg00140.html And in the v1 cover letter: https://lists.nongnu.org/archive/html/qemu-block/2025-07/msg00280.html Brian Song (4): export/fuse: add opt to enable FUSE-over-io_uring export/fuse: process FUSE-over-io_uring requests export/fuse: Safe termination for FUSE-uring iotests: add tests for FUSE-over-io_uring block/export/fuse.c | 838 +++++++++++++++++++++------ docs/tools/qemu-storage-daemon.rst | 11 +- qapi/block-export.json | 5 +- storage-daemon/qemu-storage-daemon.c | 1 + tests/qemu-iotests/check | 2 + tests/qemu-iotests/common.rc | 45 +- util/fdmon-io_uring.c | 5 +- 7 files changed, 717 insertions(+), 190 deletions(-) -- 2.45.2
We used fio to test a 1 GB file under both traditional FUSE and FUSE-over-io_uring modes. The experiments were conducted with the following iodepth and numjobs configurations: 1-1, 64-1, 1-4, and 64-4, with 70% read and 30% write, resulting in a total of eight test cases, measuring both latency and throughput. Test results: https://gist.github.com/hibriansong/a4849903387b297516603e83b53bbde4 On 8/29/25 10:50 PM, Brian Song wrote: > Hi all, > > This is a GSoC project. More details are available here: > https://wiki.qemu.org/Google_Summer_of_Code_2025#FUSE-over-io_uring_exports > > This patch series includes: > - Add a round-robin mechanism to distribute the kernel-required Ring > Queues to FUSE Queues > - Support multiple in-flight requests (multiple ring entries) > - Add tests for FUSE-over-io_uring > > More detail in the v2 cover letter: > https://lists.nongnu.org/archive/html/qemu-block/2025-08/msg00140.html > > And in the v1 cover letter: > https://lists.nongnu.org/archive/html/qemu-block/2025-07/msg00280.html > > > Brian Song (4): > export/fuse: add opt to enable FUSE-over-io_uring > export/fuse: process FUSE-over-io_uring requests > export/fuse: Safe termination for FUSE-uring > iotests: add tests for FUSE-over-io_uring > > block/export/fuse.c | 838 +++++++++++++++++++++------ > docs/tools/qemu-storage-daemon.rst | 11 +- > qapi/block-export.json | 5 +- > storage-daemon/qemu-storage-daemon.c | 1 + > tests/qemu-iotests/check | 2 + > tests/qemu-iotests/common.rc | 45 +- > util/fdmon-io_uring.c | 5 +- > 7 files changed, 717 insertions(+), 190 deletions(-) >
On Sat, Aug 30, 2025 at 08:00:00AM -0400, Brian Song wrote: > We used fio to test a 1 GB file under both traditional FUSE and > FUSE-over-io_uring modes. The experiments were conducted with the > following iodepth and numjobs configurations: 1-1, 64-1, 1-4, and 64-4, > with 70% read and 30% write, resulting in a total of eight test cases, > measuring both latency and throughput. > > Test results: > > https://gist.github.com/hibriansong/a4849903387b297516603e83b53bbde4 CCing Eugenio, who is looking at optimizing FUSE server performance using virtiofs with VDUSE. > > > > > On 8/29/25 10:50 PM, Brian Song wrote: > > Hi all, > > > > This is a GSoC project. More details are available here: > > https://wiki.qemu.org/Google_Summer_of_Code_2025#FUSE-over-io_uring_exports > > > > This patch series includes: > > - Add a round-robin mechanism to distribute the kernel-required Ring > > Queues to FUSE Queues > > - Support multiple in-flight requests (multiple ring entries) > > - Add tests for FUSE-over-io_uring > > > > More detail in the v2 cover letter: > > https://lists.nongnu.org/archive/html/qemu-block/2025-08/msg00140.html > > > > And in the v1 cover letter: > > https://lists.nongnu.org/archive/html/qemu-block/2025-07/msg00280.html > > > > > > Brian Song (4): > > export/fuse: add opt to enable FUSE-over-io_uring > > export/fuse: process FUSE-over-io_uring requests > > export/fuse: Safe termination for FUSE-uring > > iotests: add tests for FUSE-over-io_uring > > > > block/export/fuse.c | 838 +++++++++++++++++++++------ > > docs/tools/qemu-storage-daemon.rst | 11 +- > > qapi/block-export.json | 5 +- > > storage-daemon/qemu-storage-daemon.c | 1 + > > tests/qemu-iotests/check | 2 + > > tests/qemu-iotests/common.rc | 45 +- > > util/fdmon-io_uring.c | 5 +- > > 7 files changed, 717 insertions(+), 190 deletions(-) > > >
On Sat, Aug 30, 2025 at 08:00:00AM -0400, Brian Song wrote: > We used fio to test a 1 GB file under both traditional FUSE and > FUSE-over-io_uring modes. The experiments were conducted with the > following iodepth and numjobs configurations: 1-1, 64-1, 1-4, and 64-4, > with 70% read and 30% write, resulting in a total of eight test cases, > measuring both latency and throughput. > > Test results: > > https://gist.github.com/hibriansong/a4849903387b297516603e83b53bbde4 Hanna: You benchmarked the FUSE export coroutine implementation a little while ago. What do you think about these results with FUSE-over-io_uring? What stands out to me is that iodepth=1 numjobs=4 already saturates the system, so increasing iodepth to 64 does not improve the results much. Brian: What is the qemu-storage-daemon command-line for the benchmark and what are the details of /mnt/tmp/ (e.g. a preallocated 10 GB file with an XFS file system mounted from the FUSE image)? Thanks, Stefan > > > > > On 8/29/25 10:50 PM, Brian Song wrote: > > Hi all, > > > > This is a GSoC project. More details are available here: > > https://wiki.qemu.org/Google_Summer_of_Code_2025#FUSE-over-io_uring_exports > > > > This patch series includes: > > - Add a round-robin mechanism to distribute the kernel-required Ring > > Queues to FUSE Queues > > - Support multiple in-flight requests (multiple ring entries) > > - Add tests for FUSE-over-io_uring > > > > More detail in the v2 cover letter: > > https://lists.nongnu.org/archive/html/qemu-block/2025-08/msg00140.html > > > > And in the v1 cover letter: > > https://lists.nongnu.org/archive/html/qemu-block/2025-07/msg00280.html > > > > > > Brian Song (4): > > export/fuse: add opt to enable FUSE-over-io_uring > > export/fuse: process FUSE-over-io_uring requests > > export/fuse: Safe termination for FUSE-uring > > iotests: add tests for FUSE-over-io_uring > > > > block/export/fuse.c | 838 +++++++++++++++++++++------ > > docs/tools/qemu-storage-daemon.rst | 11 +- > > qapi/block-export.json | 5 +- > > storage-daemon/qemu-storage-daemon.c | 1 + > > tests/qemu-iotests/check | 2 + > > tests/qemu-iotests/common.rc | 45 +- > > util/fdmon-io_uring.c | 5 +- > > 7 files changed, 717 insertions(+), 190 deletions(-) > > >
On 9/3/25 5:49 AM, Stefan Hajnoczi wrote: > On Sat, Aug 30, 2025 at 08:00:00AM -0400, Brian Song wrote: >> We used fio to test a 1 GB file under both traditional FUSE and >> FUSE-over-io_uring modes. The experiments were conducted with the >> following iodepth and numjobs configurations: 1-1, 64-1, 1-4, and 64-4, >> with 70% read and 30% write, resulting in a total of eight test cases, >> measuring both latency and throughput. >> >> Test results: >> >> https://gist.github.com/hibriansong/a4849903387b297516603e83b53bbde4 > > Hanna: You benchmarked the FUSE export coroutine implementation a little > while ago. What do you think about these results with > FUSE-over-io_uring? > > What stands out to me is that iodepth=1 numjobs=4 already saturates the > system, so increasing iodepth to 64 does not improve the results much. > > Brian: What is the qemu-storage-daemon command-line for the benchmark > and what are the details of /mnt/tmp/ (e.g. a preallocated 10 GB file > with an XFS file system mounted from the FUSE image)? QMP script: https://gist.github.com/hibriansong/399f9564a385cfb94db58669e63611f8 Or: ### NORMAL ./qemu/build/storage-daemon/qemu-storage-daemon \ --object iothread,id=iothread1 \ --object iothread,id=iothread2 \ --object iothread,id=iothread3 \ --object iothread,id=iothread4 \ --blockdev node-name=prot-node,driver=file,filename=ubuntu.qcow2 \ --blockdev node-name=fmt-node,driver=qcow2,file=prot-node \ --export type=fuse,id=exp0,node-name=fmt-node,mountpoint=mount-point,writable=on,iothread.0=iothread1,iothread.1=iothread2,iothread.2=iothread3,iothread.3=iothread4 ### URING echo Y > /sys/module/fuse/parameters/enable_uring ./qemu/build/storage-daemon/qemu-storage-daemon \ --object iothread,id=iothread1 \ --object iothread,id=iothread2 \ --object iothread,id=iothread3 \ --object iothread,id=iothread4 \ --blockdev node-name=prot-node,driver=file,filename=ubuntu.qcow2 \ --blockdev node-name=fmt-node,driver=qcow2,file=prot-node \ --export type=fuse,id=exp0,node-name=fmt-node,mountpoint=mount-point,writable=on,io-uring=on,iothread.0=iothread1,iothread.1=iothread2,iothread.2=iothread3,iothread.3=iothread4 ubuntu.qcow2 has been prealloacted and enlarge the space to 100GB by $ qemu-img resize ubuntu.qcow2 100G $ virt-customize \ --run-command '/bin/bash /bin/growpart /dev/sda 1' \ --run-command 'resize2fs /dev/sda1' -a ubuntu.qcow2 The image file, formatted with an Ext4 filesystem, was mounted on /mnt/tmp on my PC equipped with a Kingston PCIe 4.0 NVMe SSD $ sudo kpartx -av mount-point $ sudo mount /dev/mapper/loop31p1 /mnt/tmp/ Unmount the partition after done using it. $ sudo umount /mnt/tmp # sudo kpartx -dv mount-point > > Thanks, > Stefan > >> >> >> >> >> On 8/29/25 10:50 PM, Brian Song wrote: >>> Hi all, >>> >>> This is a GSoC project. More details are available here: >>> https://wiki.qemu.org/Google_Summer_of_Code_2025#FUSE-over-io_uring_exports >>> >>> This patch series includes: >>> - Add a round-robin mechanism to distribute the kernel-required Ring >>> Queues to FUSE Queues >>> - Support multiple in-flight requests (multiple ring entries) >>> - Add tests for FUSE-over-io_uring >>> >>> More detail in the v2 cover letter: >>> https://lists.nongnu.org/archive/html/qemu-block/2025-08/msg00140.html >>> >>> And in the v1 cover letter: >>> https://lists.nongnu.org/archive/html/qemu-block/2025-07/msg00280.html >>> >>> >>> Brian Song (4): >>> export/fuse: add opt to enable FUSE-over-io_uring >>> export/fuse: process FUSE-over-io_uring requests >>> export/fuse: Safe termination for FUSE-uring >>> iotests: add tests for FUSE-over-io_uring >>> >>> block/export/fuse.c | 838 +++++++++++++++++++++------ >>> docs/tools/qemu-storage-daemon.rst | 11 +- >>> qapi/block-export.json | 5 +- >>> storage-daemon/qemu-storage-daemon.c | 1 + >>> tests/qemu-iotests/check | 2 + >>> tests/qemu-iotests/common.rc | 45 +- >>> util/fdmon-io_uring.c | 5 +- >>> 7 files changed, 717 insertions(+), 190 deletions(-) >>> >>
Am 03.09.2025 um 20:11 hat Brian Song geschrieben: > > > On 9/3/25 5:49 AM, Stefan Hajnoczi wrote: > > On Sat, Aug 30, 2025 at 08:00:00AM -0400, Brian Song wrote: > > > We used fio to test a 1 GB file under both traditional FUSE and > > > FUSE-over-io_uring modes. The experiments were conducted with the > > > following iodepth and numjobs configurations: 1-1, 64-1, 1-4, and 64-4, > > > with 70% read and 30% write, resulting in a total of eight test cases, > > > measuring both latency and throughput. > > > > > > Test results: > > > > > > https://gist.github.com/hibriansong/a4849903387b297516603e83b53bbde4 > > > > Hanna: You benchmarked the FUSE export coroutine implementation a little > > while ago. What do you think about these results with > > FUSE-over-io_uring? > > > > What stands out to me is that iodepth=1 numjobs=4 already saturates the > > system, so increasing iodepth to 64 does not improve the results much. > > > > Brian: What is the qemu-storage-daemon command-line for the benchmark > > and what are the details of /mnt/tmp/ (e.g. a preallocated 10 GB file > > with an XFS file system mounted from the FUSE image)? > > QMP script: > https://gist.github.com/hibriansong/399f9564a385cfb94db58669e63611f8 > > Or: > ### NORMAL > ./qemu/build/storage-daemon/qemu-storage-daemon \ > --object iothread,id=iothread1 \ > --object iothread,id=iothread2 \ > --object iothread,id=iothread3 \ > --object iothread,id=iothread4 \ > --blockdev node-name=prot-node,driver=file,filename=ubuntu.qcow2 \ This uses the default AIO and most importantly cache mode, which means that the host kernel page cache is used. This makes it hard to tell how much it accessed just RAM on the host and how much really went to the disk, so the results are difficult to interpret correctly. For benchmarks, it's generally best to use cache.direct=on and I think I'd also prefer aio=native (or aio=io_uring). > --blockdev node-name=fmt-node,driver=qcow2,file=prot-node \ > --export type=fuse,id=exp0,node-name=fmt-node,mountpoint=mount-point,writable=on,iothread.0=iothread1,iothread.1=iothread2,iothread.2=iothread3,iothread.3=iothread4 > > ### URING > echo Y > /sys/module/fuse/parameters/enable_uring > > ./qemu/build/storage-daemon/qemu-storage-daemon \ > --object iothread,id=iothread1 \ > --object iothread,id=iothread2 \ > --object iothread,id=iothread3 \ > --object iothread,id=iothread4 \ > --blockdev node-name=prot-node,driver=file,filename=ubuntu.qcow2 \ > --blockdev node-name=fmt-node,driver=qcow2,file=prot-node \ > --export type=fuse,id=exp0,node-name=fmt-node,mountpoint=mount-point,writable=on,io-uring=on,iothread.0=iothread1,iothread.1=iothread2,iothread.2=iothread3,iothread.3=iothread4 > > ubuntu.qcow2 has been prealloacted and enlarge the space to 100GB by > > $ qemu-img resize ubuntu.qcow2 100G I think this doesn't preallocate the newly added space, you should add --preallocation=falloc at least. > $ virt-customize \ > --run-command '/bin/bash /bin/growpart /dev/sda 1' \ > --run-command 'resize2fs /dev/sda1' -a ubuntu.qcow2 > > The image file, formatted with an Ext4 filesystem, was mounted on /mnt/tmp > on my PC equipped with a Kingston PCIe 4.0 NVMe SSD > > $ sudo kpartx -av mount-point > $ sudo mount /dev/mapper/loop31p1 /mnt/tmp/ > > > Unmount the partition after done using it. > > $ sudo umount /mnt/tmp > # sudo kpartx -dv mount-point What I would personally use to benchmark performance is just a clean preallocated raw image without a guest on it. I wouldn't even partition it or necessarily put a filesystem on it, but just run the benchmark directly on the FUSE export's mountpoint. The other thing I'd consider for benchmarking is the null-co block driver so that the FUSE overhead really dominates and isn't dwarved by a slow disk. (A null block device is where you can't have a filesystem even if you wanted.) Kevin > > > On 8/29/25 10:50 PM, Brian Song wrote: > > > > Hi all, > > > > > > > > This is a GSoC project. More details are available here: > > > > https://wiki.qemu.org/Google_Summer_of_Code_2025#FUSE-over-io_uring_exports > > > > > > > > This patch series includes: > > > > - Add a round-robin mechanism to distribute the kernel-required Ring > > > > Queues to FUSE Queues > > > > - Support multiple in-flight requests (multiple ring entries) > > > > - Add tests for FUSE-over-io_uring > > > > > > > > More detail in the v2 cover letter: > > > > https://lists.nongnu.org/archive/html/qemu-block/2025-08/msg00140.html > > > > > > > > And in the v1 cover letter: > > > > https://lists.nongnu.org/archive/html/qemu-block/2025-07/msg00280.html > > > > > > > > > > > > Brian Song (4): > > > > export/fuse: add opt to enable FUSE-over-io_uring > > > > export/fuse: process FUSE-over-io_uring requests > > > > export/fuse: Safe termination for FUSE-uring > > > > iotests: add tests for FUSE-over-io_uring > > > > > > > > block/export/fuse.c | 838 +++++++++++++++++++++------ > > > > docs/tools/qemu-storage-daemon.rst | 11 +- > > > > qapi/block-export.json | 5 +- > > > > storage-daemon/qemu-storage-daemon.c | 1 + > > > > tests/qemu-iotests/check | 2 + > > > > tests/qemu-iotests/common.rc | 45 +- > > > > util/fdmon-io_uring.c | 5 +- > > > > 7 files changed, 717 insertions(+), 190 deletions(-) > > > > > > > >
© 2016 - 2025 Red Hat, Inc.