block/export/fuse.c | 632 ++++++++++++++++++++------- docs/tools/qemu-storage-daemon.rst | 11 +- qapi/block-export.json | 5 +- storage-daemon/qemu-storage-daemon.c | 1 + util/fdmon-io_uring.c | 5 +- 5 files changed, 486 insertions(+), 168 deletions(-)
From: Brian Song <hibriansong@gmail.com> Hello, This is a GSoC project. You could check out more here: https://wiki.qemu.org/Google_Summer_of_Code_2025#FUSE-over-io_uring_exports This series: - Merges the request processing functions for both traditional FUSE and FUSE-over-io_uring modes - Implements multi-threading (Multi-IOThread) - Improves FUSE-over-io_uring termination handling Due to kernel limitations, when the FUSE-over-io_uring option is enabled, you must create and assign nr_cpu IOThreads. For example: qemu-storage-daemon \ --object iothread,id=iothread1 \ --object iothread,id=iothread2 \ --blockdev node-name=prot-node,driver=file,filename=img.qcow2 \ --blockdev node-name=fmt-node,driver=qcow2,file=prot-node \ --export type=fuse,id=exp0,node-name=fmt-node,mountpoint=mount-point, \ writable=on,iothread.0=iothread1,iothread.1=iothread2 More detail on the v1 cover letter: https://lists.nongnu.org/archive/html/qemu-block/2025-07/msg00280.html Brian Song (3): fuse: add FUSE-over-io_uring enable opt and init fuse: Handle FUSE-uring requests fuse: Safe termination for FUSE-uring block/export/fuse.c | 632 ++++++++++++++++++++------- docs/tools/qemu-storage-daemon.rst | 11 +- qapi/block-export.json | 5 +- storage-daemon/qemu-storage-daemon.c | 1 + util/fdmon-io_uring.c | 5 +- 5 files changed, 486 insertions(+), 168 deletions(-) -- 2.45.2
On Thu, Aug 14, 2025 at 11:46:16PM -0400, Zhi Song wrote: > Due to kernel limitations, when the FUSE-over-io_uring option is > enabled, > you must create and assign nr_cpu IOThreads. For example: While it would be nice for the kernel to support a more flexible queue mapping policy, userspace can work around this. I think Kevin suggested creating the number of FUSE queues required by the kernel and configuring them across the user's IOThreads. That way the number of IOThreads can be smaller than the number of FUSE queues. Stefan
On 8/17/25 9:45 AM, Stefan Hajnoczi wrote: > On Thu, Aug 14, 2025 at 11:46:16PM -0400, Zhi Song wrote: >> Due to kernel limitations, when the FUSE-over-io_uring option is >> enabled, >> you must create and assign nr_cpu IOThreads. For example: > > While it would be nice for the kernel to support a more flexible queue > mapping policy, userspace can work around this. > > I think Kevin suggested creating the number of FUSE queues required by > the kernel and configuring them across the user's IOThreads. That way > the number of IOThreads can be smaller than the number of FUSE queues. > > Stefan If we are mapping user specified IOThreads to nr_cpu queues Q, when we register entries, we need to think about how many entries in each Q[i] go to different IOThreads, and bind the qid when submitting. Once a CQE comes back, the corresponding IOThread handles it. Looks like we don't really need a round robin for dispatching. The actual question is how to split entries in each queue across IOThreads. For example, if we split entries evenly: USER: define 2 IOThreads to submit and recv ring entries NR_CPU: 4 Q = malloc(sizeof(entry) * 32 * nr_cpu); IOThread-1: Q[0] Q[1] Q[2] Q[3] 16 16 16 16 IOThread-2: Q[0] Q[1] Q[2] Q[3] 16 16 16 16
On Wed, Aug 20, 2025 at 09:32:44PM -0400, Brian Song wrote: > On 8/17/25 9:45 AM, Stefan Hajnoczi wrote: > > On Thu, Aug 14, 2025 at 11:46:16PM -0400, Zhi Song wrote: > >> Due to kernel limitations, when the FUSE-over-io_uring option is > >> enabled, > >> you must create and assign nr_cpu IOThreads. For example: > > > > While it would be nice for the kernel to support a more flexible queue > > mapping policy, userspace can work around this. > > > > I think Kevin suggested creating the number of FUSE queues required by > > the kernel and configuring them across the user's IOThreads. That way > > the number of IOThreads can be smaller than the number of FUSE queues. > > > > Stefan > > If we are mapping user specified IOThreads to nr_cpu queues Q, when we > register entries, we need to think about how many entries in each Q[i] > go to different IOThreads, and bind the qid when submitting. Once a CQE > comes back, the corresponding IOThread handles it. Looks like we don't > really need a round robin for dispatching. The actual question is how Round-robin is needed for qid -> IOThread mapping, not for dispatching individual requests. The kernel currently dispatches requests based on a 1:1 CPU:Queue mapping. > to split entries in each queue across IOThreads. > > For example, if we split entries evenly: > > USER: define 2 IOThreads to submit and recv ring entries > NR_CPU: 4 > > Q = malloc(sizeof(entry) * 32 * nr_cpu); > > IOThread-1: > Q[0] Q[1] Q[2] Q[3] > 16 16 16 16 > > IOThread-2: > Q[0] Q[1] Q[2] Q[3] > 16 16 16 16 There is no need to have nr_cpus queues in each IOThread. The constraint is that the total number of queues across all IOThreads must equal nr_cpus. The malloc in your example implies that each FuseQueue will have 32 entries (REGISTER uring_cmds). nr_cpu is 4 so the mapping should look like this: IOThread-1: Q[0] Q[2] 32 32 IOThread-2: Q[1] Q[3] 32 32 Stefan
On 8/17/25 15:45, Stefan Hajnoczi wrote: > On Thu, Aug 14, 2025 at 11:46:16PM -0400, Zhi Song wrote: >> Due to kernel limitations, when the FUSE-over-io_uring option is >> enabled, >> you must create and assign nr_cpu IOThreads. For example: > > While it would be nice for the kernel to support a more flexible queue > mapping policy, userspace can work around this. > > I think Kevin suggested creating the number of FUSE queues required by > the kernel and configuring them across the user's IOThreads. That way > the number of IOThreads can be smaller than the number of FUSE queues. Sorry, had been another week off last week and I'm only slowly catching up. Regarding more flexible queues, see here https://lore.kernel.org/r/20250722-reduced-nr-ring-queues_3-v1-0-aa8e37ae97e6@ddn.com And actually forgot to mention the corresponding libfuse branch for that: https://github.com/bsbernd/libfuse/tree/uring-reduce-nr-queues Bernd
© 2016 - 2025 Red Hat, Inc.