[PATCH 0/4] export/fuse: Add FUSE-over-io_uring for Storage Exports

Brian Song posted 4 patches 4 weeks, 1 day ago
Failed in applying to current master (apply log)
block/export/fuse.c                  | 838 +++++++++++++++++++++------
docs/tools/qemu-storage-daemon.rst   |  11 +-
qapi/block-export.json               |   5 +-
storage-daemon/qemu-storage-daemon.c |   1 +
tests/qemu-iotests/check             |   2 +
tests/qemu-iotests/common.rc         |  45 +-
util/fdmon-io_uring.c                |   5 +-
7 files changed, 717 insertions(+), 190 deletions(-)
[PATCH 0/4] export/fuse: Add FUSE-over-io_uring for Storage Exports
Posted by Brian Song 4 weeks, 1 day ago
Hi all,

This is a GSoC project. More details are available here:
https://wiki.qemu.org/Google_Summer_of_Code_2025#FUSE-over-io_uring_exports

This patch series includes:
- Add a round-robin mechanism to distribute the kernel-required Ring
Queues to FUSE Queues
- Support multiple in-flight requests (multiple ring entries)
- Add tests for FUSE-over-io_uring

More detail in the v2 cover letter:
https://lists.nongnu.org/archive/html/qemu-block/2025-08/msg00140.html

And in the v1 cover letter:
https://lists.nongnu.org/archive/html/qemu-block/2025-07/msg00280.html


Brian Song (4):
  export/fuse: add opt to enable FUSE-over-io_uring
  export/fuse: process FUSE-over-io_uring requests
  export/fuse: Safe termination for FUSE-uring
  iotests: add tests for FUSE-over-io_uring

 block/export/fuse.c                  | 838 +++++++++++++++++++++------
 docs/tools/qemu-storage-daemon.rst   |  11 +-
 qapi/block-export.json               |   5 +-
 storage-daemon/qemu-storage-daemon.c |   1 +
 tests/qemu-iotests/check             |   2 +
 tests/qemu-iotests/common.rc         |  45 +-
 util/fdmon-io_uring.c                |   5 +-
 7 files changed, 717 insertions(+), 190 deletions(-)

-- 
2.45.2
Re: [PATCH 0/4] export/fuse: Add FUSE-over-io_uring for Storage Exports
Posted by Brian Song 4 weeks, 1 day ago
We used fio to test a 1 GB file under both traditional FUSE and
FUSE-over-io_uring modes. The experiments were conducted with the
following iodepth and numjobs configurations: 1-1, 64-1, 1-4, and 64-4,
with 70% read and 30% write, resulting in a total of eight test cases,
measuring both latency and throughput.

Test results:

https://gist.github.com/hibriansong/a4849903387b297516603e83b53bbde4




On 8/29/25 10:50 PM, Brian Song wrote:
> Hi all,
>
> This is a GSoC project. More details are available here:
> https://wiki.qemu.org/Google_Summer_of_Code_2025#FUSE-over-io_uring_exports
>
> This patch series includes:
> - Add a round-robin mechanism to distribute the kernel-required Ring
> Queues to FUSE Queues
> - Support multiple in-flight requests (multiple ring entries)
> - Add tests for FUSE-over-io_uring
>
> More detail in the v2 cover letter:
> https://lists.nongnu.org/archive/html/qemu-block/2025-08/msg00140.html
>
> And in the v1 cover letter:
> https://lists.nongnu.org/archive/html/qemu-block/2025-07/msg00280.html
>
>
> Brian Song (4):
>    export/fuse: add opt to enable FUSE-over-io_uring
>    export/fuse: process FUSE-over-io_uring requests
>    export/fuse: Safe termination for FUSE-uring
>    iotests: add tests for FUSE-over-io_uring
>
>   block/export/fuse.c                  | 838 +++++++++++++++++++++------
>   docs/tools/qemu-storage-daemon.rst   |  11 +-
>   qapi/block-export.json               |   5 +-
>   storage-daemon/qemu-storage-daemon.c |   1 +
>   tests/qemu-iotests/check             |   2 +
>   tests/qemu-iotests/common.rc         |  45 +-
>   util/fdmon-io_uring.c                |   5 +-
>   7 files changed, 717 insertions(+), 190 deletions(-)
>
Re: [PATCH 0/4] export/fuse: Add FUSE-over-io_uring for Storage Exports
Posted by Stefan Hajnoczi 3 weeks, 2 days ago
On Sat, Aug 30, 2025 at 08:00:00AM -0400, Brian Song wrote:
> We used fio to test a 1 GB file under both traditional FUSE and
> FUSE-over-io_uring modes. The experiments were conducted with the
> following iodepth and numjobs configurations: 1-1, 64-1, 1-4, and 64-4,
> with 70% read and 30% write, resulting in a total of eight test cases,
> measuring both latency and throughput.
> 
> Test results:
> 
> https://gist.github.com/hibriansong/a4849903387b297516603e83b53bbde4

CCing Eugenio, who is looking at optimizing FUSE server performance
using virtiofs with VDUSE.

> 
> 
> 
> 
> On 8/29/25 10:50 PM, Brian Song wrote:
> > Hi all,
> >
> > This is a GSoC project. More details are available here:
> > https://wiki.qemu.org/Google_Summer_of_Code_2025#FUSE-over-io_uring_exports
> >
> > This patch series includes:
> > - Add a round-robin mechanism to distribute the kernel-required Ring
> > Queues to FUSE Queues
> > - Support multiple in-flight requests (multiple ring entries)
> > - Add tests for FUSE-over-io_uring
> >
> > More detail in the v2 cover letter:
> > https://lists.nongnu.org/archive/html/qemu-block/2025-08/msg00140.html
> >
> > And in the v1 cover letter:
> > https://lists.nongnu.org/archive/html/qemu-block/2025-07/msg00280.html
> >
> >
> > Brian Song (4):
> >    export/fuse: add opt to enable FUSE-over-io_uring
> >    export/fuse: process FUSE-over-io_uring requests
> >    export/fuse: Safe termination for FUSE-uring
> >    iotests: add tests for FUSE-over-io_uring
> >
> >   block/export/fuse.c                  | 838 +++++++++++++++++++++------
> >   docs/tools/qemu-storage-daemon.rst   |  11 +-
> >   qapi/block-export.json               |   5 +-
> >   storage-daemon/qemu-storage-daemon.c |   1 +
> >   tests/qemu-iotests/check             |   2 +
> >   tests/qemu-iotests/common.rc         |  45 +-
> >   util/fdmon-io_uring.c                |   5 +-
> >   7 files changed, 717 insertions(+), 190 deletions(-)
> >
> 
Re: [PATCH 0/4] export/fuse: Add FUSE-over-io_uring for Storage Exports
Posted by Stefan Hajnoczi 3 weeks, 4 days ago
On Sat, Aug 30, 2025 at 08:00:00AM -0400, Brian Song wrote:
> We used fio to test a 1 GB file under both traditional FUSE and
> FUSE-over-io_uring modes. The experiments were conducted with the
> following iodepth and numjobs configurations: 1-1, 64-1, 1-4, and 64-4,
> with 70% read and 30% write, resulting in a total of eight test cases,
> measuring both latency and throughput.
> 
> Test results:
> 
> https://gist.github.com/hibriansong/a4849903387b297516603e83b53bbde4

Hanna: You benchmarked the FUSE export coroutine implementation a little
while ago. What do you think about these results with
FUSE-over-io_uring?

What stands out to me is that iodepth=1 numjobs=4 already saturates the
system, so increasing iodepth to 64 does not improve the results much.

Brian: What is the qemu-storage-daemon command-line for the benchmark
and what are the details of /mnt/tmp/ (e.g. a preallocated 10 GB file
with an XFS file system mounted from the FUSE image)?

Thanks,
Stefan

> 
> 
> 
> 
> On 8/29/25 10:50 PM, Brian Song wrote:
> > Hi all,
> >
> > This is a GSoC project. More details are available here:
> > https://wiki.qemu.org/Google_Summer_of_Code_2025#FUSE-over-io_uring_exports
> >
> > This patch series includes:
> > - Add a round-robin mechanism to distribute the kernel-required Ring
> > Queues to FUSE Queues
> > - Support multiple in-flight requests (multiple ring entries)
> > - Add tests for FUSE-over-io_uring
> >
> > More detail in the v2 cover letter:
> > https://lists.nongnu.org/archive/html/qemu-block/2025-08/msg00140.html
> >
> > And in the v1 cover letter:
> > https://lists.nongnu.org/archive/html/qemu-block/2025-07/msg00280.html
> >
> >
> > Brian Song (4):
> >    export/fuse: add opt to enable FUSE-over-io_uring
> >    export/fuse: process FUSE-over-io_uring requests
> >    export/fuse: Safe termination for FUSE-uring
> >    iotests: add tests for FUSE-over-io_uring
> >
> >   block/export/fuse.c                  | 838 +++++++++++++++++++++------
> >   docs/tools/qemu-storage-daemon.rst   |  11 +-
> >   qapi/block-export.json               |   5 +-
> >   storage-daemon/qemu-storage-daemon.c |   1 +
> >   tests/qemu-iotests/check             |   2 +
> >   tests/qemu-iotests/common.rc         |  45 +-
> >   util/fdmon-io_uring.c                |   5 +-
> >   7 files changed, 717 insertions(+), 190 deletions(-)
> >
> 
Re: [PATCH 0/4] export/fuse: Add FUSE-over-io_uring for Storage Exports
Posted by Brian Song 3 weeks, 3 days ago

On 9/3/25 5:49 AM, Stefan Hajnoczi wrote:
> On Sat, Aug 30, 2025 at 08:00:00AM -0400, Brian Song wrote:
>> We used fio to test a 1 GB file under both traditional FUSE and
>> FUSE-over-io_uring modes. The experiments were conducted with the
>> following iodepth and numjobs configurations: 1-1, 64-1, 1-4, and 64-4,
>> with 70% read and 30% write, resulting in a total of eight test cases,
>> measuring both latency and throughput.
>>
>> Test results:
>>
>> https://gist.github.com/hibriansong/a4849903387b297516603e83b53bbde4
> 
> Hanna: You benchmarked the FUSE export coroutine implementation a little
> while ago. What do you think about these results with
> FUSE-over-io_uring?
> 
> What stands out to me is that iodepth=1 numjobs=4 already saturates the
> system, so increasing iodepth to 64 does not improve the results much.
> 
> Brian: What is the qemu-storage-daemon command-line for the benchmark
> and what are the details of /mnt/tmp/ (e.g. a preallocated 10 GB file
> with an XFS file system mounted from the FUSE image)?

QMP script:
https://gist.github.com/hibriansong/399f9564a385cfb94db58669e63611f8

Or:
### NORMAL
./qemu/build/storage-daemon/qemu-storage-daemon \
   --object iothread,id=iothread1 \
   --object iothread,id=iothread2 \
   --object iothread,id=iothread3 \
   --object iothread,id=iothread4 \
   --blockdev node-name=prot-node,driver=file,filename=ubuntu.qcow2 \
   --blockdev node-name=fmt-node,driver=qcow2,file=prot-node \
   --export 
type=fuse,id=exp0,node-name=fmt-node,mountpoint=mount-point,writable=on,iothread.0=iothread1,iothread.1=iothread2,iothread.2=iothread3,iothread.3=iothread4

### URING
echo Y > /sys/module/fuse/parameters/enable_uring

./qemu/build/storage-daemon/qemu-storage-daemon \
   --object iothread,id=iothread1 \
   --object iothread,id=iothread2 \
   --object iothread,id=iothread3 \
   --object iothread,id=iothread4 \
   --blockdev node-name=prot-node,driver=file,filename=ubuntu.qcow2 \
   --blockdev node-name=fmt-node,driver=qcow2,file=prot-node \
   --export 
type=fuse,id=exp0,node-name=fmt-node,mountpoint=mount-point,writable=on,io-uring=on,iothread.0=iothread1,iothread.1=iothread2,iothread.2=iothread3,iothread.3=iothread4

ubuntu.qcow2 has been prealloacted and enlarge the space to 100GB by

$ qemu-img resize ubuntu.qcow2 100G
$ virt-customize \
    --run-command '/bin/bash /bin/growpart /dev/sda 1' \
    --run-command 'resize2fs /dev/sda1' -a ubuntu.qcow2

The image file, formatted with an Ext4 filesystem, was mounted on 
/mnt/tmp on my PC equipped with a Kingston PCIe 4.0 NVMe SSD

$ sudo kpartx -av mount-point
$ sudo mount /dev/mapper/loop31p1 /mnt/tmp/


Unmount the partition after done using it.

$ sudo umount /mnt/tmp
# sudo kpartx -dv mount-point

> 
> Thanks,
> Stefan
> 
>>
>>
>>
>>
>> On 8/29/25 10:50 PM, Brian Song wrote:
>>> Hi all,
>>>
>>> This is a GSoC project. More details are available here:
>>> https://wiki.qemu.org/Google_Summer_of_Code_2025#FUSE-over-io_uring_exports
>>>
>>> This patch series includes:
>>> - Add a round-robin mechanism to distribute the kernel-required Ring
>>> Queues to FUSE Queues
>>> - Support multiple in-flight requests (multiple ring entries)
>>> - Add tests for FUSE-over-io_uring
>>>
>>> More detail in the v2 cover letter:
>>> https://lists.nongnu.org/archive/html/qemu-block/2025-08/msg00140.html
>>>
>>> And in the v1 cover letter:
>>> https://lists.nongnu.org/archive/html/qemu-block/2025-07/msg00280.html
>>>
>>>
>>> Brian Song (4):
>>>     export/fuse: add opt to enable FUSE-over-io_uring
>>>     export/fuse: process FUSE-over-io_uring requests
>>>     export/fuse: Safe termination for FUSE-uring
>>>     iotests: add tests for FUSE-over-io_uring
>>>
>>>    block/export/fuse.c                  | 838 +++++++++++++++++++++------
>>>    docs/tools/qemu-storage-daemon.rst   |  11 +-
>>>    qapi/block-export.json               |   5 +-
>>>    storage-daemon/qemu-storage-daemon.c |   1 +
>>>    tests/qemu-iotests/check             |   2 +
>>>    tests/qemu-iotests/common.rc         |  45 +-
>>>    util/fdmon-io_uring.c                |   5 +-
>>>    7 files changed, 717 insertions(+), 190 deletions(-)
>>>
>>


Re: [PATCH 0/4] export/fuse: Add FUSE-over-io_uring for Storage Exports
Posted by Kevin Wolf 1 week, 5 days ago
Am 03.09.2025 um 20:11 hat Brian Song geschrieben:
> 
> 
> On 9/3/25 5:49 AM, Stefan Hajnoczi wrote:
> > On Sat, Aug 30, 2025 at 08:00:00AM -0400, Brian Song wrote:
> > > We used fio to test a 1 GB file under both traditional FUSE and
> > > FUSE-over-io_uring modes. The experiments were conducted with the
> > > following iodepth and numjobs configurations: 1-1, 64-1, 1-4, and 64-4,
> > > with 70% read and 30% write, resulting in a total of eight test cases,
> > > measuring both latency and throughput.
> > > 
> > > Test results:
> > > 
> > > https://gist.github.com/hibriansong/a4849903387b297516603e83b53bbde4
> > 
> > Hanna: You benchmarked the FUSE export coroutine implementation a little
> > while ago. What do you think about these results with
> > FUSE-over-io_uring?
> > 
> > What stands out to me is that iodepth=1 numjobs=4 already saturates the
> > system, so increasing iodepth to 64 does not improve the results much.
> > 
> > Brian: What is the qemu-storage-daemon command-line for the benchmark
> > and what are the details of /mnt/tmp/ (e.g. a preallocated 10 GB file
> > with an XFS file system mounted from the FUSE image)?
> 
> QMP script:
> https://gist.github.com/hibriansong/399f9564a385cfb94db58669e63611f8
> 
> Or:
> ### NORMAL
> ./qemu/build/storage-daemon/qemu-storage-daemon \
>   --object iothread,id=iothread1 \
>   --object iothread,id=iothread2 \
>   --object iothread,id=iothread3 \
>   --object iothread,id=iothread4 \
>   --blockdev node-name=prot-node,driver=file,filename=ubuntu.qcow2 \

This uses the default AIO and most importantly cache mode, which means
that the host kernel page cache is used. This makes it hard to tell how
much it accessed just RAM on the host and how much really went to the
disk, so the results are difficult to interpret correctly.

For benchmarks, it's generally best to use cache.direct=on and I think
I'd also prefer aio=native (or aio=io_uring).

>   --blockdev node-name=fmt-node,driver=qcow2,file=prot-node \
>   --export type=fuse,id=exp0,node-name=fmt-node,mountpoint=mount-point,writable=on,iothread.0=iothread1,iothread.1=iothread2,iothread.2=iothread3,iothread.3=iothread4
> 
> ### URING
> echo Y > /sys/module/fuse/parameters/enable_uring
> 
> ./qemu/build/storage-daemon/qemu-storage-daemon \
>   --object iothread,id=iothread1 \
>   --object iothread,id=iothread2 \
>   --object iothread,id=iothread3 \
>   --object iothread,id=iothread4 \
>   --blockdev node-name=prot-node,driver=file,filename=ubuntu.qcow2 \
>   --blockdev node-name=fmt-node,driver=qcow2,file=prot-node \
>   --export type=fuse,id=exp0,node-name=fmt-node,mountpoint=mount-point,writable=on,io-uring=on,iothread.0=iothread1,iothread.1=iothread2,iothread.2=iothread3,iothread.3=iothread4
> 
> ubuntu.qcow2 has been prealloacted and enlarge the space to 100GB by
> 
> $ qemu-img resize ubuntu.qcow2 100G

I think this doesn't preallocate the newly added space, you should add
--preallocation=falloc at least.

> $ virt-customize \
>    --run-command '/bin/bash /bin/growpart /dev/sda 1' \
>    --run-command 'resize2fs /dev/sda1' -a ubuntu.qcow2
> 
> The image file, formatted with an Ext4 filesystem, was mounted on /mnt/tmp
> on my PC equipped with a Kingston PCIe 4.0 NVMe SSD
> 
> $ sudo kpartx -av mount-point
> $ sudo mount /dev/mapper/loop31p1 /mnt/tmp/
> 
> 
> Unmount the partition after done using it.
> 
> $ sudo umount /mnt/tmp
> # sudo kpartx -dv mount-point

What I would personally use to benchmark performance is just a clean
preallocated raw image without a guest on it. I wouldn't even partition
it or necessarily put a filesystem on it, but just run the benchmark
directly on the FUSE export's mountpoint.

The other thing I'd consider for benchmarking is the null-co block
driver so that the FUSE overhead really dominates and isn't dwarved by a
slow disk. (A null block device is where you can't have a filesystem
even if you wanted.)

Kevin

> > > On 8/29/25 10:50 PM, Brian Song wrote:
> > > > Hi all,
> > > > 
> > > > This is a GSoC project. More details are available here:
> > > > https://wiki.qemu.org/Google_Summer_of_Code_2025#FUSE-over-io_uring_exports
> > > > 
> > > > This patch series includes:
> > > > - Add a round-robin mechanism to distribute the kernel-required Ring
> > > > Queues to FUSE Queues
> > > > - Support multiple in-flight requests (multiple ring entries)
> > > > - Add tests for FUSE-over-io_uring
> > > > 
> > > > More detail in the v2 cover letter:
> > > > https://lists.nongnu.org/archive/html/qemu-block/2025-08/msg00140.html
> > > > 
> > > > And in the v1 cover letter:
> > > > https://lists.nongnu.org/archive/html/qemu-block/2025-07/msg00280.html
> > > > 
> > > > 
> > > > Brian Song (4):
> > > >     export/fuse: add opt to enable FUSE-over-io_uring
> > > >     export/fuse: process FUSE-over-io_uring requests
> > > >     export/fuse: Safe termination for FUSE-uring
> > > >     iotests: add tests for FUSE-over-io_uring
> > > > 
> > > >    block/export/fuse.c                  | 838 +++++++++++++++++++++------
> > > >    docs/tools/qemu-storage-daemon.rst   |  11 +-
> > > >    qapi/block-export.json               |   5 +-
> > > >    storage-daemon/qemu-storage-daemon.c |   1 +
> > > >    tests/qemu-iotests/check             |   2 +
> > > >    tests/qemu-iotests/common.rc         |  45 +-
> > > >    util/fdmon-io_uring.c                |   5 +-
> > > >    7 files changed, 717 insertions(+), 190 deletions(-)
> > > > 
> > > 
>