FUSE allows creating multiple request queues by "cloning" /dev/fuse FDs
(via open("/dev/fuse") + ioctl(FUSE_DEV_IOC_CLONE)).
We can use this to implement multi-threading.
Note that the interface presented here differs from the multi-queue
interface of virtio-blk: The latter maps virtqueues to iothreads, which
allows processing multiple virtqueues in a single iothread. The
equivalent (processing multiple FDs in a single iothread) would not make
sense for FUSE because those FDs are used in a round-robin fashion by
the FUSE kernel driver. Putting two of them into a single iothread will
just create a bottleneck.
Therefore, all we need is an array of iothreads, and we will create one
"queue" (FD) per thread.
These are the benchmark results when using four threads (compared to a
single thread); note that fio still only uses a single job, but
performance can still be improved because of said round-robin usage for
the queues. (Not in the sync case, though, in which case I guess it
just adds overhead.)
file:
read:
seq aio: 264.8k ±0.8k (+120 %)
rand aio: 143.8k ±0.4k (+ 27 %)
seq sync: 49.9k ±0.5k (- 5 %)
rand sync: 10.3k ±0.1k (- 1 %)
write:
seq aio: 226.6k ±2.1k (+184 %)
rand aio: 225.9k ±1.8k (+186 %)
seq sync: 36.9k ±0.6k (- 11 %)
rand sync: 36.9k ±0.2k (- 11 %)
null:
read:
seq aio: 315.2k ±11.0k (+18 %)
rand aio: 300.5k ±10.8k (+14 %)
seq sync: 114.2k ± 3.6k (-16 %)
rand sync: 112.5k ± 2.8k (-16 %)
write:
seq aio: 222.6k ±6.8k (-21 %)
rand aio: 220.5k ±6.8k (-23 %)
seq sync: 117.2k ±3.7k (-18 %)
rand sync: 116.3k ±4.4k (-18 %)
(I don't know what's going on in the null-write AIO case, sorry.)
Here's results for numjobs=4:
"Before", i.e. without multithreading in QSD/FUSE (results compared to
numjobs=1):
file:
read:
seq aio: 104.7k ± 0.4k (- 13 %)
rand aio: 111.5k ± 0.4k (- 2 %)
seq sync: 71.0k ±13.8k (+ 36 %)
rand sync: 41.4k ± 0.1k (+297 %)
write:
seq aio: 79.4k ±0.1k (- 1 %)
rand aio: 78.6k ±0.1k (± 0 %)
seq sync: 83.3k ±0.1k (+101 %)
rand sync: 82.0k ±0.2k (+ 98 %)
null:
read:
seq aio: 260.5k ±1.5k (- 2 %)
rand aio: 260.1k ±1.4k (- 2 %)
seq sync: 291.8k ±1.3k (+115 %)
rand sync: 280.1k ±1.7k (+115 %)
write:
seq aio: 280.1k ±1.7k (± 0 %)
rand aio: 279.5k ±1.4k (- 3 %)
seq sync: 306.7k ±2.2k (+116 %)
rand sync: 305.9k ±1.8k (+117 %)
(As probably expected, little difference in the AIO case, but great
improvements in the sync case because it kind of gives it an artificial
iodepth of 4.)
"After", i.e. with four threads in QSD/FUSE (now results compared to the
above):
file:
read:
seq aio: 193.3k ± 1.8k (+ 85 %)
rand aio: 329.3k ± 0.3k (+195 %)
seq sync: 66.2k ±13.0k (- 7 %)
rand sync: 40.1k ± 0.0k (- 3 %)
write:
seq aio: 219.7k ±0.8k (+177 %)
rand aio: 217.2k ±1.5k (+176 %)
seq sync: 92.5k ±0.2k (+ 11 %)
rand sync: 91.9k ±0.2k (+ 12 %)
null:
read:
seq aio: 706.7k ±2.1k (+171 %)
rand aio: 714.7k ±3.2k (+175 %)
seq sync: 431.7k ±3.0k (+ 48 %)
rand sync: 435.4k ±2.8k (+ 50 %)
write:
seq aio: 746.9k ±2.8k (+167 %)
rand aio: 749.0k ±4.9k (+168 %)
seq sync: 420.7k ±3.1k (+ 37 %)
rand sync: 419.1k ±2.5k (+ 37 %)
So this helps mainly for the AIO cases, but also in the null sync cases,
because null is always CPU-bound, so more threads help.
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
qapi/block-export.json | 8 +-
block/export/fuse.c | 214 +++++++++++++++++++++++++++++++++--------
2 files changed, 179 insertions(+), 43 deletions(-)
diff --git a/qapi/block-export.json b/qapi/block-export.json
index c783e01a53..0bdd5992eb 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -179,12 +179,18 @@
# mount the export with allow_other, and if that fails, try again
# without. (since 6.1; default: auto)
#
+# @iothreads: Enables multi-threading: Handle requests in each of the
+# given iothreads (instead of the block device's iothread, or the
+# export's "main" iothread). For this, the FUSE FD is duplicated so
+# there is one FD per iothread. (since 10.1)
+#
# Since: 6.0
##
{ 'struct': 'BlockExportOptionsFuse',
'data': { 'mountpoint': 'str',
'*growable': 'bool',
- '*allow-other': 'FuseExportAllowOther' },
+ '*allow-other': 'FuseExportAllowOther',
+ '*iothreads': ['str'] },
'if': 'CONFIG_FUSE' }
##
diff --git a/block/export/fuse.c b/block/export/fuse.c
index 345e833171..0edd994392 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -31,11 +31,14 @@
#include "qemu/error-report.h"
#include "qemu/main-loop.h"
#include "system/block-backend.h"
+#include "system/block-backend.h"
+#include "system/iothread.h"
#include <fuse.h>
#include <fuse_lowlevel.h>
#include "standard-headers/linux/fuse.h"
+#include <sys/ioctl.h>
#if defined(CONFIG_FALLOCATE_ZERO_RANGE)
#include <linux/falloc.h>
@@ -50,12 +53,17 @@
/* Small enough to fit in the request buffer */
#define FUSE_MAX_WRITE_BYTES (4 * 1024)
-typedef struct FuseExport {
- BlockExport common;
+typedef struct FuseExport FuseExport;
- struct fuse_session *fuse_session;
- unsigned int in_flight; /* atomic */
- bool mounted, fd_handler_set_up;
+/*
+ * One FUSE "queue", representing one FUSE FD from which requests are fetched
+ * and processed. Each queue is tied to an AioContext.
+ */
+typedef struct FuseQueue {
+ FuseExport *exp;
+
+ AioContext *ctx;
+ int fuse_fd;
/*
* The request buffer must be able to hold a full write, and/or at least
@@ -66,6 +74,14 @@ typedef struct FuseExport {
FUSE_MAX_WRITE_BYTES,
FUSE_MIN_READ_BUFFER
)];
+} FuseQueue;
+
+struct FuseExport {
+ BlockExport common;
+
+ struct fuse_session *fuse_session;
+ unsigned int in_flight; /* atomic */
+ bool mounted, fd_handler_set_up;
/*
* Set when there was an unrecoverable error and no requests should be read
@@ -74,7 +90,15 @@ typedef struct FuseExport {
*/
bool halted;
- int fuse_fd;
+ int num_queues;
+ FuseQueue *queues;
+ /*
+ * True if this export should follow the generic export's AioContext.
+ * Will be false if the queues' AioContexts have been explicitly set by the
+ * user, i.e. are expected to stay in those contexts.
+ * (I.e. is always false if there is more than one queue.)
+ */
+ bool follow_aio_context;
char *mountpoint;
bool writable;
@@ -85,11 +109,11 @@ typedef struct FuseExport {
mode_t st_mode;
uid_t st_uid;
gid_t st_gid;
-} FuseExport;
+};
/* Parameters to the request processing coroutine */
typedef struct FuseRequestCoParam {
- FuseExport *exp;
+ FuseQueue *q;
int got_request;
} FuseRequestCoParam;
@@ -102,12 +126,13 @@ static void fuse_export_halt(FuseExport *exp);
static void init_exports_table(void);
static int mount_fuse_export(FuseExport *exp, Error **errp);
+static int clone_fuse_fd(int fd, Error **errp);
static bool is_regular_file(const char *path, Error **errp);
static bool poll_fuse_fd(void *opaque);
static void read_fuse_fd(void *opaque);
-static void coroutine_fn fuse_co_process_request(FuseExport *exp);
+static void coroutine_fn fuse_co_process_request(FuseQueue *q);
static void fuse_inc_in_flight(FuseExport *exp)
{
@@ -137,9 +162,11 @@ static void fuse_attach_handlers(FuseExport *exp)
return;
}
- aio_set_fd_handler(exp->common.ctx, exp->fuse_fd,
- read_fuse_fd, NULL, poll_fuse_fd,
- read_fuse_fd, exp);
+ for (int i = 0; i < exp->num_queues; i++) {
+ aio_set_fd_handler(exp->queues[i].ctx, exp->queues[i].fuse_fd,
+ read_fuse_fd, NULL, poll_fuse_fd,
+ read_fuse_fd, &exp->queues[i]);
+ }
exp->fd_handler_set_up = true;
}
@@ -148,8 +175,10 @@ static void fuse_attach_handlers(FuseExport *exp)
*/
static void fuse_detach_handlers(FuseExport *exp)
{
- aio_set_fd_handler(exp->common.ctx, exp->fuse_fd,
- NULL, NULL, NULL, NULL, NULL);
+ for (int i = 0; i < exp->num_queues; i++) {
+ aio_set_fd_handler(exp->queues[i].ctx, exp->queues[i].fuse_fd,
+ NULL, NULL, NULL, NULL, NULL);
+ }
exp->fd_handler_set_up = false;
}
@@ -164,6 +193,11 @@ static void fuse_export_drained_end(void *opaque)
/* Refresh AioContext in case it changed */
exp->common.ctx = blk_get_aio_context(exp->common.blk);
+ if (exp->follow_aio_context) {
+ assert(exp->num_queues == 1);
+ exp->queues[0].ctx = exp->common.ctx;
+ }
+
fuse_attach_handlers(exp);
}
@@ -187,10 +221,52 @@ static int fuse_export_create(BlockExport *blk_exp,
ERRP_GUARD(); /* ensure clean-up even with error_fatal */
FuseExport *exp = container_of(blk_exp, FuseExport, common);
BlockExportOptionsFuse *args = &blk_exp_args->u.fuse;
+ FuseQueue *q;
int ret;
assert(blk_exp_args->type == BLOCK_EXPORT_TYPE_FUSE);
+ if (args->iothreads) {
+ strList *e;
+
+ exp->follow_aio_context = false;
+ exp->num_queues = 0;
+ for (e = args->iothreads; e; e = e->next) {
+ exp->num_queues++;
+ }
+ if (exp->num_queues < 1) {
+ error_setg(errp, "Need at least one queue");
+ ret = -EINVAL;
+ goto fail;
+ }
+ exp->queues = g_new0(FuseQueue, exp->num_queues);
+ q = exp->queues;
+ for (e = args->iothreads; e; e = e->next) {
+ IOThread *iothread = iothread_by_id(e->value);
+
+ if (!iothread) {
+ error_setg(errp, "IOThread \"%s\" does not exist", e->value);
+ ret = -EINVAL;
+ goto fail;
+ }
+
+ *(q++) = (FuseQueue) {
+ .exp = exp,
+ .ctx = iothread_get_aio_context(iothread),
+ .fuse_fd = -1,
+ };
+ }
+ } else {
+ exp->follow_aio_context = true;
+ exp->num_queues = 1;
+ exp->queues = g_new(FuseQueue, exp->num_queues);
+ exp->queues[0] = (FuseQueue) {
+ .exp = exp,
+ .ctx = exp->common.ctx,
+ .fuse_fd = -1,
+ };
+ }
+
/* For growable and writable exports, take the RESIZE permission */
if (args->growable || blk_exp_args->writable) {
uint64_t blk_perm, blk_shared_perm;
@@ -275,14 +351,24 @@ static int fuse_export_create(BlockExport *blk_exp,
g_hash_table_insert(exports, g_strdup(exp->mountpoint), NULL);
- exp->fuse_fd = fuse_session_fd(exp->fuse_session);
- ret = fcntl(exp->fuse_fd, F_SETFL, O_NONBLOCK);
+ assert(exp->num_queues >= 1);
+ exp->queues[0].fuse_fd = fuse_session_fd(exp->fuse_session);
+ ret = fcntl(exp->queues[0].fuse_fd, F_SETFL, O_NONBLOCK);
if (ret < 0) {
ret = -errno;
error_setg_errno(errp, errno, "Failed to make FUSE FD non-blocking");
goto fail;
}
+ for (int i = 1; i < exp->num_queues; i++) {
+ int fd = clone_fuse_fd(exp->queues[0].fuse_fd, errp);
+ if (fd < 0) {
+ ret = fd;
+ goto fail;
+ }
+ exp->queues[i].fuse_fd = fd;
+ }
+
fuse_attach_handlers(exp);
return 0;
@@ -355,6 +441,39 @@ static int mount_fuse_export(FuseExport *exp, Error **errp)
return 0;
}
+/**
+ * Clone the given /dev/fuse file descriptor, yielding a second FD from which
+ * requests can be pulled for the associated filesystem. Returns an FD on
+ * success, and -errno on error.
+ */
+static int clone_fuse_fd(int fd, Error **errp)
+{
+ uint32_t src_fd = fd;
+ int new_fd;
+ int ret;
+
+ /*
+ * The name "/dev/fuse" is fixed, see libfuse's lib/fuse_loop_mt.c
+ * (fuse_clone_chan()).
+ */
+ new_fd = open("/dev/fuse", O_RDWR | O_CLOEXEC | O_NONBLOCK);
+ if (new_fd < 0) {
+ ret = -errno;
+ error_setg_errno(errp, errno, "Failed to open /dev/fuse");
+ return ret;
+ }
+
+ ret = ioctl(new_fd, FUSE_DEV_IOC_CLONE, &src_fd);
+ if (ret < 0) {
+ ret = -errno;
+ error_setg_errno(errp, errno, "Failed to clone FUSE FD");
+ close(new_fd);
+ return ret;
+ }
+
+ return new_fd;
+}
+
/**
* Try to read a single request from the FUSE FD.
* Takes a FuseRequestCoParam object pointer in `opaque`.
@@ -370,8 +489,9 @@ static int mount_fuse_export(FuseExport *exp, Error **errp)
static void coroutine_fn co_read_from_fuse_fd(void *opaque)
{
FuseRequestCoParam *co_param = opaque;
- FuseExport *exp = co_param->exp;
- int fuse_fd = exp->fuse_fd;
+ FuseQueue *q = co_param->q;
+ int fuse_fd = q->fuse_fd;
+ FuseExport *exp = q->exp;
ssize_t ret;
const struct fuse_in_header *in_hdr;
@@ -381,8 +501,7 @@ static void coroutine_fn co_read_from_fuse_fd(void *opaque)
goto no_request;
}
- ret = RETRY_ON_EINTR(read(fuse_fd, exp->request_buf,
- sizeof(exp->request_buf)));
+ ret = RETRY_ON_EINTR(read(fuse_fd, q->request_buf, sizeof(q->request_buf)));
if (ret < 0 && errno == EAGAIN) {
/* No request available */
goto no_request;
@@ -400,7 +519,7 @@ static void coroutine_fn co_read_from_fuse_fd(void *opaque)
goto no_request;
}
- in_hdr = (const struct fuse_in_header *)exp->request_buf;
+ in_hdr = (const struct fuse_in_header *)q->request_buf;
if (unlikely(ret != in_hdr->len)) {
error_report("Number of bytes read from FUSE device does not match "
"request size, expected %" PRIu32 " bytes, read %zi "
@@ -413,7 +532,7 @@ static void coroutine_fn co_read_from_fuse_fd(void *opaque)
/* Must set this before yielding */
co_param->got_request = 1;
- fuse_co_process_request(exp);
+ fuse_co_process_request(q);
fuse_dec_in_flight(exp);
return;
@@ -432,7 +551,7 @@ static bool poll_fuse_fd(void *opaque)
{
Coroutine *co;
FuseRequestCoParam co_param = {
- .exp = opaque,
+ .q = opaque,
.got_request = -EINPROGRESS,
};
@@ -451,7 +570,7 @@ static void read_fuse_fd(void *opaque)
{
Coroutine *co;
FuseRequestCoParam co_param = {
- .exp = opaque,
+ .q = opaque,
.got_request = -EINPROGRESS,
};
@@ -481,6 +600,16 @@ static void fuse_export_delete(BlockExport *blk_exp)
{
FuseExport *exp = container_of(blk_exp, FuseExport, common);
+ for (int i = 0; i < exp->num_queues; i++) {
+ FuseQueue *q = &exp->queues[i];
+
+ /* Queue 0's FD belongs to the FUSE session */
+ if (i > 0 && q->fuse_fd >= 0) {
+ close(q->fuse_fd);
+ }
+ }
+ g_free(exp->queues);
+
if (exp->fuse_session) {
if (exp->mounted) {
fuse_session_unmount(exp->fuse_session);
@@ -1137,23 +1266,23 @@ static int fuse_write_buf_response(int fd, uint32_t req_id,
/*
* For use in fuse_co_process_request():
* Returns a pointer to the parameter object for the given operation (inside of
- * exp->request_buf, which is assumed to hold a fuse_in_header first).
- * Verifies that the object is complete (exp->request_buf is large enough to
+ * q->request_buf, which is assumed to hold a fuse_in_header first).
+ * Verifies that the object is complete (q->request_buf is large enough to
* hold it in one piece, and the request length includes the whole object).
*
- * Note that exp->request_buf may be overwritten after yielding, so the returned
+ * Note that q->request_buf may be overwritten after yielding, so the returned
* pointer must not be used across a function that may yield!
*/
-#define FUSE_IN_OP_STRUCT(op_name, export) \
+#define FUSE_IN_OP_STRUCT(op_name, queue) \
({ \
const struct fuse_in_header *__in_hdr = \
- (const struct fuse_in_header *)(export)->request_buf; \
+ (const struct fuse_in_header *)(q)->request_buf; \
const struct fuse_##op_name##_in *__in = \
(const struct fuse_##op_name##_in *)(__in_hdr + 1); \
const size_t __param_len = sizeof(*__in_hdr) + sizeof(*__in); \
uint32_t __req_len; \
\
- QEMU_BUILD_BUG_ON(sizeof((export)->request_buf) < __param_len); \
+ QEMU_BUILD_BUG_ON(sizeof((q)->request_buf) < __param_len); \
\
__req_len = __in_hdr->len; \
if (__req_len < __param_len) { \
@@ -1190,11 +1319,12 @@ static int fuse_write_buf_response(int fd, uint32_t req_id,
* Process a FUSE request, incl. writing the response.
*
* Note that yielding in any request-processing function can overwrite the
- * contents of exp->request_buf. Anything that takes a buffer needs to take
+ * contents of q->request_buf. Anything that takes a buffer needs to take
* care that the content is copied before yielding.
*/
-static void coroutine_fn fuse_co_process_request(FuseExport *exp)
+static void coroutine_fn fuse_co_process_request(FuseQueue *q)
{
+ FuseExport *exp = q->exp;
uint32_t opcode;
uint64_t req_id;
/*
@@ -1217,7 +1347,7 @@ static void coroutine_fn fuse_co_process_request(FuseExport *exp)
/* Limit scope to ensure pointer is no longer used after yielding */
{
const struct fuse_in_header *in_hdr =
- (const struct fuse_in_header *)exp->request_buf;
+ (const struct fuse_in_header *)q->request_buf;
opcode = in_hdr->opcode;
req_id = in_hdr->unique;
@@ -1225,7 +1355,7 @@ static void coroutine_fn fuse_co_process_request(FuseExport *exp)
switch (opcode) {
case FUSE_INIT: {
- const struct fuse_init_in *in = FUSE_IN_OP_STRUCT(init, exp);
+ const struct fuse_init_in *in = FUSE_IN_OP_STRUCT(init, q);
ret = fuse_co_init(exp, FUSE_OUT_OP_STRUCT(init, out_buf),
in->max_readahead, in->flags);
break;
@@ -1248,23 +1378,23 @@ static void coroutine_fn fuse_co_process_request(FuseExport *exp)
break;
case FUSE_SETATTR: {
- const struct fuse_setattr_in *in = FUSE_IN_OP_STRUCT(setattr, exp);
+ const struct fuse_setattr_in *in = FUSE_IN_OP_STRUCT(setattr, q);
ret = fuse_co_setattr(exp, FUSE_OUT_OP_STRUCT(attr, out_buf),
in->valid, in->size, in->mode, in->uid, in->gid);
break;
}
case FUSE_READ: {
- const struct fuse_read_in *in = FUSE_IN_OP_STRUCT(read, exp);
+ const struct fuse_read_in *in = FUSE_IN_OP_STRUCT(read, q);
ret = fuse_co_read(exp, &out_data_buffer, in->offset, in->size);
break;
}
case FUSE_WRITE: {
- const struct fuse_write_in *in = FUSE_IN_OP_STRUCT(write, exp);
+ const struct fuse_write_in *in = FUSE_IN_OP_STRUCT(write, q);
uint32_t req_len;
- req_len = ((const struct fuse_in_header *)exp->request_buf)->len;
+ req_len = ((const struct fuse_in_header *)q->request_buf)->len;
if (unlikely(req_len < sizeof(struct fuse_in_header) + sizeof(*in) +
in->size)) {
warn_report("FUSE WRITE truncated; received %zu bytes of %" PRIu32,
@@ -1293,7 +1423,7 @@ static void coroutine_fn fuse_co_process_request(FuseExport *exp)
}
case FUSE_FALLOCATE: {
- const struct fuse_fallocate_in *in = FUSE_IN_OP_STRUCT(fallocate, exp);
+ const struct fuse_fallocate_in *in = FUSE_IN_OP_STRUCT(fallocate, q);
ret = fuse_co_fallocate(exp, in->offset, in->length, in->mode);
break;
}
@@ -1308,7 +1438,7 @@ static void coroutine_fn fuse_co_process_request(FuseExport *exp)
#ifdef CONFIG_FUSE_LSEEK
case FUSE_LSEEK: {
- const struct fuse_lseek_in *in = FUSE_IN_OP_STRUCT(lseek, exp);
+ const struct fuse_lseek_in *in = FUSE_IN_OP_STRUCT(lseek, q);
ret = fuse_co_lseek(exp, FUSE_OUT_OP_STRUCT(lseek, out_buf),
in->offset, in->whence);
break;
@@ -1322,11 +1452,11 @@ static void coroutine_fn fuse_co_process_request(FuseExport *exp)
/* Ignore errors from fuse_write*(), nothing we can do anyway */
if (out_data_buffer) {
assert(ret >= 0);
- fuse_write_buf_response(exp->fuse_fd, req_id, out_hdr,
+ fuse_write_buf_response(q->fuse_fd, req_id, out_hdr,
out_data_buffer, ret);
qemu_vfree(out_data_buffer);
} else {
- fuse_write_response(exp->fuse_fd, req_id, out_hdr,
+ fuse_write_response(q->fuse_fd, req_id, out_hdr,
ret < 0 ? ret : 0,
ret < 0 ? 0 : ret);
}
--
2.48.1
On Tue, Mar 25, 2025 at 05:06:54PM +0100, Hanna Czenczek wrote: > FUSE allows creating multiple request queues by "cloning" /dev/fuse FDs > (via open("/dev/fuse") + ioctl(FUSE_DEV_IOC_CLONE)). > > We can use this to implement multi-threading. > > Note that the interface presented here differs from the multi-queue > interface of virtio-blk: The latter maps virtqueues to iothreads, which > allows processing multiple virtqueues in a single iothread. The > equivalent (processing multiple FDs in a single iothread) would not make > sense for FUSE because those FDs are used in a round-robin fashion by > the FUSE kernel driver. Putting two of them into a single iothread will > just create a bottleneck. > > Therefore, all we need is an array of iothreads, and we will create one > "queue" (FD) per thread. > > @@ -275,14 +351,24 @@ static int fuse_export_create(BlockExport *blk_exp, > > g_hash_table_insert(exports, g_strdup(exp->mountpoint), NULL); > > - exp->fuse_fd = fuse_session_fd(exp->fuse_session); > - ret = fcntl(exp->fuse_fd, F_SETFL, O_NONBLOCK); > + assert(exp->num_queues >= 1); > + exp->queues[0].fuse_fd = fuse_session_fd(exp->fuse_session); > + ret = fcntl(exp->queues[0].fuse_fd, F_SETFL, O_NONBLOCK); As mentioned before, F_SETFL should be set by read-modify-write. -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
On Tue, Mar 25, 2025 at 05:06:54PM +0100, Hanna Czenczek wrote: > FUSE allows creating multiple request queues by "cloning" /dev/fuse FDs > (via open("/dev/fuse") + ioctl(FUSE_DEV_IOC_CLONE)). > > We can use this to implement multi-threading. > > Note that the interface presented here differs from the multi-queue > interface of virtio-blk: The latter maps virtqueues to iothreads, which > allows processing multiple virtqueues in a single iothread. The > equivalent (processing multiple FDs in a single iothread) would not make > sense for FUSE because those FDs are used in a round-robin fashion by > the FUSE kernel driver. Putting two of them into a single iothread will > just create a bottleneck. This text might be outdated. virtio-blk's new iothread-vq-mapping parameter provides the "array of iothreads" mentioned below and a way to assign virtqueues to those IOThreads. > > Therefore, all we need is an array of iothreads, and we will create one > "queue" (FD) per thread. > > These are the benchmark results when using four threads (compared to a > single thread); note that fio still only uses a single job, but > performance can still be improved because of said round-robin usage for > the queues. (Not in the sync case, though, in which case I guess it > just adds overhead.) Interesting. FUSE-over-io_uring seems to be different from FUSE_DEV_IOC_CLONE here. It doesn't do round-robin. It uses CPU affinity instead, handing requests to the io_uring context associated with the current CPU when possible. > > file: > read: > seq aio: 264.8k ±0.8k (+120 %) > rand aio: 143.8k ±0.4k (+ 27 %) > seq sync: 49.9k ±0.5k (- 5 %) > rand sync: 10.3k ±0.1k (- 1 %) > write: > seq aio: 226.6k ±2.1k (+184 %) > rand aio: 225.9k ±1.8k (+186 %) > seq sync: 36.9k ±0.6k (- 11 %) > rand sync: 36.9k ±0.2k (- 11 %) > null: > read: > seq aio: 315.2k ±11.0k (+18 %) > rand aio: 300.5k ±10.8k (+14 %) > seq sync: 114.2k ± 3.6k (-16 %) > rand sync: 112.5k ± 2.8k (-16 %) > write: > seq aio: 222.6k ±6.8k (-21 %) > rand aio: 220.5k ±6.8k (-23 %) > seq sync: 117.2k ±3.7k (-18 %) > rand sync: 116.3k ±4.4k (-18 %) > > (I don't know what's going on in the null-write AIO case, sorry.) > > Here's results for numjobs=4: > > "Before", i.e. without multithreading in QSD/FUSE (results compared to > numjobs=1): > > file: > read: > seq aio: 104.7k ± 0.4k (- 13 %) > rand aio: 111.5k ± 0.4k (- 2 %) > seq sync: 71.0k ±13.8k (+ 36 %) > rand sync: 41.4k ± 0.1k (+297 %) > write: > seq aio: 79.4k ±0.1k (- 1 %) > rand aio: 78.6k ±0.1k (± 0 %) > seq sync: 83.3k ±0.1k (+101 %) > rand sync: 82.0k ±0.2k (+ 98 %) > null: > read: > seq aio: 260.5k ±1.5k (- 2 %) > rand aio: 260.1k ±1.4k (- 2 %) > seq sync: 291.8k ±1.3k (+115 %) > rand sync: 280.1k ±1.7k (+115 %) > write: > seq aio: 280.1k ±1.7k (± 0 %) > rand aio: 279.5k ±1.4k (- 3 %) > seq sync: 306.7k ±2.2k (+116 %) > rand sync: 305.9k ±1.8k (+117 %) > > (As probably expected, little difference in the AIO case, but great > improvements in the sync case because it kind of gives it an artificial > iodepth of 4.) > > "After", i.e. with four threads in QSD/FUSE (now results compared to the > above): > > file: > read: > seq aio: 193.3k ± 1.8k (+ 85 %) > rand aio: 329.3k ± 0.3k (+195 %) > seq sync: 66.2k ±13.0k (- 7 %) > rand sync: 40.1k ± 0.0k (- 3 %) > write: > seq aio: 219.7k ±0.8k (+177 %) > rand aio: 217.2k ±1.5k (+176 %) > seq sync: 92.5k ±0.2k (+ 11 %) > rand sync: 91.9k ±0.2k (+ 12 %) > null: > read: > seq aio: 706.7k ±2.1k (+171 %) > rand aio: 714.7k ±3.2k (+175 %) > seq sync: 431.7k ±3.0k (+ 48 %) > rand sync: 435.4k ±2.8k (+ 50 %) > write: > seq aio: 746.9k ±2.8k (+167 %) > rand aio: 749.0k ±4.9k (+168 %) > seq sync: 420.7k ±3.1k (+ 37 %) > rand sync: 419.1k ±2.5k (+ 37 %) > > So this helps mainly for the AIO cases, but also in the null sync cases, > because null is always CPU-bound, so more threads help. > > Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > --- > qapi/block-export.json | 8 +- > block/export/fuse.c | 214 +++++++++++++++++++++++++++++++++-------- > 2 files changed, 179 insertions(+), 43 deletions(-) > > diff --git a/qapi/block-export.json b/qapi/block-export.json > index c783e01a53..0bdd5992eb 100644 > --- a/qapi/block-export.json > +++ b/qapi/block-export.json > @@ -179,12 +179,18 @@ > # mount the export with allow_other, and if that fails, try again > # without. (since 6.1; default: auto) > # > +# @iothreads: Enables multi-threading: Handle requests in each of the > +# given iothreads (instead of the block device's iothread, or the > +# export's "main" iothread). For this, the FUSE FD is duplicated so > +# there is one FD per iothread. (since 10.1) This option isn't FUSE-specific but FUSE is the first export type to support it. Please add it to BlockExportOptions instead and refuse export creation when the export type only supports 1 IOThread. Eric: Are you interested in implementing support for multiple IOThreads in the NBD export? I remember some time ago we talked about NBD multi-conn support, although maybe that was for the client rather than the server. > +# > # Since: 6.0 > ## > { 'struct': 'BlockExportOptionsFuse', > 'data': { 'mountpoint': 'str', > '*growable': 'bool', > - '*allow-other': 'FuseExportAllowOther' }, > + '*allow-other': 'FuseExportAllowOther', > + '*iothreads': ['str'] }, > 'if': 'CONFIG_FUSE' } > > ## > diff --git a/block/export/fuse.c b/block/export/fuse.c > index 345e833171..0edd994392 100644 > --- a/block/export/fuse.c > +++ b/block/export/fuse.c > @@ -31,11 +31,14 @@ > #include "qemu/error-report.h" > #include "qemu/main-loop.h" > #include "system/block-backend.h" > +#include "system/block-backend.h" > +#include "system/iothread.h" > > #include <fuse.h> > #include <fuse_lowlevel.h> > > #include "standard-headers/linux/fuse.h" > +#include <sys/ioctl.h> > > #if defined(CONFIG_FALLOCATE_ZERO_RANGE) > #include <linux/falloc.h> > @@ -50,12 +53,17 @@ > /* Small enough to fit in the request buffer */ > #define FUSE_MAX_WRITE_BYTES (4 * 1024) > > -typedef struct FuseExport { > - BlockExport common; > +typedef struct FuseExport FuseExport; > > - struct fuse_session *fuse_session; > - unsigned int in_flight; /* atomic */ > - bool mounted, fd_handler_set_up; > +/* > + * One FUSE "queue", representing one FUSE FD from which requests are fetched > + * and processed. Each queue is tied to an AioContext. > + */ > +typedef struct FuseQueue { > + FuseExport *exp; > + > + AioContext *ctx; > + int fuse_fd; > > /* > * The request buffer must be able to hold a full write, and/or at least > @@ -66,6 +74,14 @@ typedef struct FuseExport { > FUSE_MAX_WRITE_BYTES, > FUSE_MIN_READ_BUFFER > )]; > +} FuseQueue; > + > +struct FuseExport { > + BlockExport common; > + > + struct fuse_session *fuse_session; > + unsigned int in_flight; /* atomic */ > + bool mounted, fd_handler_set_up; > > /* > * Set when there was an unrecoverable error and no requests should be read > @@ -74,7 +90,15 @@ typedef struct FuseExport { > */ > bool halted; > > - int fuse_fd; > + int num_queues; > + FuseQueue *queues; > + /* > + * True if this export should follow the generic export's AioContext. > + * Will be false if the queues' AioContexts have been explicitly set by the > + * user, i.e. are expected to stay in those contexts. > + * (I.e. is always false if there is more than one queue.) > + */ > + bool follow_aio_context; > > char *mountpoint; > bool writable; > @@ -85,11 +109,11 @@ typedef struct FuseExport { > mode_t st_mode; > uid_t st_uid; > gid_t st_gid; > -} FuseExport; > +}; > > /* Parameters to the request processing coroutine */ > typedef struct FuseRequestCoParam { > - FuseExport *exp; > + FuseQueue *q; > int got_request; > } FuseRequestCoParam; > > @@ -102,12 +126,13 @@ static void fuse_export_halt(FuseExport *exp); > static void init_exports_table(void); > > static int mount_fuse_export(FuseExport *exp, Error **errp); > +static int clone_fuse_fd(int fd, Error **errp); > > static bool is_regular_file(const char *path, Error **errp); > > static bool poll_fuse_fd(void *opaque); > static void read_fuse_fd(void *opaque); > -static void coroutine_fn fuse_co_process_request(FuseExport *exp); > +static void coroutine_fn fuse_co_process_request(FuseQueue *q); > > static void fuse_inc_in_flight(FuseExport *exp) > { > @@ -137,9 +162,11 @@ static void fuse_attach_handlers(FuseExport *exp) > return; > } > > - aio_set_fd_handler(exp->common.ctx, exp->fuse_fd, > - read_fuse_fd, NULL, poll_fuse_fd, > - read_fuse_fd, exp); > + for (int i = 0; i < exp->num_queues; i++) { > + aio_set_fd_handler(exp->queues[i].ctx, exp->queues[i].fuse_fd, > + read_fuse_fd, NULL, poll_fuse_fd, > + read_fuse_fd, &exp->queues[i]); > + } > exp->fd_handler_set_up = true; > } > > @@ -148,8 +175,10 @@ static void fuse_attach_handlers(FuseExport *exp) > */ > static void fuse_detach_handlers(FuseExport *exp) > { > - aio_set_fd_handler(exp->common.ctx, exp->fuse_fd, > - NULL, NULL, NULL, NULL, NULL); > + for (int i = 0; i < exp->num_queues; i++) { > + aio_set_fd_handler(exp->queues[i].ctx, exp->queues[i].fuse_fd, > + NULL, NULL, NULL, NULL, NULL); > + } > exp->fd_handler_set_up = false; > } > > @@ -164,6 +193,11 @@ static void fuse_export_drained_end(void *opaque) > > /* Refresh AioContext in case it changed */ > exp->common.ctx = blk_get_aio_context(exp->common.blk); > + if (exp->follow_aio_context) { > + assert(exp->num_queues == 1); > + exp->queues[0].ctx = exp->common.ctx; > + } > + > fuse_attach_handlers(exp); > } > > @@ -187,10 +221,52 @@ static int fuse_export_create(BlockExport *blk_exp, > ERRP_GUARD(); /* ensure clean-up even with error_fatal */ > FuseExport *exp = container_of(blk_exp, FuseExport, common); > BlockExportOptionsFuse *args = &blk_exp_args->u.fuse; > + FuseQueue *q; > int ret; > > assert(blk_exp_args->type == BLOCK_EXPORT_TYPE_FUSE); > > + if (args->iothreads) { > + strList *e; > + > + exp->follow_aio_context = false; > + exp->num_queues = 0; > + for (e = args->iothreads; e; e = e->next) { > + exp->num_queues++; > + } > + if (exp->num_queues < 1) { > + error_setg(errp, "Need at least one queue"); > + ret = -EINVAL; > + goto fail; > + } > + exp->queues = g_new0(FuseQueue, exp->num_queues); > + q = exp->queues; > + for (e = args->iothreads; e; e = e->next) { > + IOThread *iothread = iothread_by_id(e->value); > + > + if (!iothread) { > + error_setg(errp, "IOThread \"%s\" does not exist", e->value); > + ret = -EINVAL; > + goto fail; > + } > + > + *(q++) = (FuseQueue) { > + .exp = exp, > + .ctx = iothread_get_aio_context(iothread), > + .fuse_fd = -1, > + }; > + } > + } else { > + exp->follow_aio_context = true; > + exp->num_queues = 1; > + exp->queues = g_new(FuseQueue, exp->num_queues); > + exp->queues[0] = (FuseQueue) { > + .exp = exp, > + .ctx = exp->common.ctx, > + .fuse_fd = -1, > + }; > + } > + > /* For growable and writable exports, take the RESIZE permission */ > if (args->growable || blk_exp_args->writable) { > uint64_t blk_perm, blk_shared_perm; > @@ -275,14 +351,24 @@ static int fuse_export_create(BlockExport *blk_exp, > > g_hash_table_insert(exports, g_strdup(exp->mountpoint), NULL); > > - exp->fuse_fd = fuse_session_fd(exp->fuse_session); > - ret = fcntl(exp->fuse_fd, F_SETFL, O_NONBLOCK); > + assert(exp->num_queues >= 1); > + exp->queues[0].fuse_fd = fuse_session_fd(exp->fuse_session); > + ret = fcntl(exp->queues[0].fuse_fd, F_SETFL, O_NONBLOCK); > if (ret < 0) { > ret = -errno; > error_setg_errno(errp, errno, "Failed to make FUSE FD non-blocking"); > goto fail; > } > > + for (int i = 1; i < exp->num_queues; i++) { > + int fd = clone_fuse_fd(exp->queues[0].fuse_fd, errp); > + if (fd < 0) { > + ret = fd; > + goto fail; > + } > + exp->queues[i].fuse_fd = fd; > + } > + > fuse_attach_handlers(exp); > return 0; > > @@ -355,6 +441,39 @@ static int mount_fuse_export(FuseExport *exp, Error **errp) > return 0; > } > > +/** > + * Clone the given /dev/fuse file descriptor, yielding a second FD from which > + * requests can be pulled for the associated filesystem. Returns an FD on > + * success, and -errno on error. > + */ > +static int clone_fuse_fd(int fd, Error **errp) > +{ > + uint32_t src_fd = fd; > + int new_fd; > + int ret; > + > + /* > + * The name "/dev/fuse" is fixed, see libfuse's lib/fuse_loop_mt.c > + * (fuse_clone_chan()). > + */ > + new_fd = open("/dev/fuse", O_RDWR | O_CLOEXEC | O_NONBLOCK); > + if (new_fd < 0) { > + ret = -errno; > + error_setg_errno(errp, errno, "Failed to open /dev/fuse"); > + return ret; > + } > + > + ret = ioctl(new_fd, FUSE_DEV_IOC_CLONE, &src_fd); > + if (ret < 0) { > + ret = -errno; > + error_setg_errno(errp, errno, "Failed to clone FUSE FD"); > + close(new_fd); > + return ret; > + } > + > + return new_fd; > +} > + > /** > * Try to read a single request from the FUSE FD. > * Takes a FuseRequestCoParam object pointer in `opaque`. > @@ -370,8 +489,9 @@ static int mount_fuse_export(FuseExport *exp, Error **errp) > static void coroutine_fn co_read_from_fuse_fd(void *opaque) > { > FuseRequestCoParam *co_param = opaque; > - FuseExport *exp = co_param->exp; > - int fuse_fd = exp->fuse_fd; > + FuseQueue *q = co_param->q; > + int fuse_fd = q->fuse_fd; > + FuseExport *exp = q->exp; > ssize_t ret; > const struct fuse_in_header *in_hdr; > > @@ -381,8 +501,7 @@ static void coroutine_fn co_read_from_fuse_fd(void *opaque) > goto no_request; > } > > - ret = RETRY_ON_EINTR(read(fuse_fd, exp->request_buf, > - sizeof(exp->request_buf))); > + ret = RETRY_ON_EINTR(read(fuse_fd, q->request_buf, sizeof(q->request_buf))); > if (ret < 0 && errno == EAGAIN) { > /* No request available */ > goto no_request; > @@ -400,7 +519,7 @@ static void coroutine_fn co_read_from_fuse_fd(void *opaque) > goto no_request; > } > > - in_hdr = (const struct fuse_in_header *)exp->request_buf; > + in_hdr = (const struct fuse_in_header *)q->request_buf; > if (unlikely(ret != in_hdr->len)) { > error_report("Number of bytes read from FUSE device does not match " > "request size, expected %" PRIu32 " bytes, read %zi " > @@ -413,7 +532,7 @@ static void coroutine_fn co_read_from_fuse_fd(void *opaque) > > /* Must set this before yielding */ > co_param->got_request = 1; > - fuse_co_process_request(exp); > + fuse_co_process_request(q); > fuse_dec_in_flight(exp); > return; > > @@ -432,7 +551,7 @@ static bool poll_fuse_fd(void *opaque) > { > Coroutine *co; > FuseRequestCoParam co_param = { > - .exp = opaque, > + .q = opaque, > .got_request = -EINPROGRESS, > }; > > @@ -451,7 +570,7 @@ static void read_fuse_fd(void *opaque) > { > Coroutine *co; > FuseRequestCoParam co_param = { > - .exp = opaque, > + .q = opaque, > .got_request = -EINPROGRESS, > }; > > @@ -481,6 +600,16 @@ static void fuse_export_delete(BlockExport *blk_exp) > { > FuseExport *exp = container_of(blk_exp, FuseExport, common); > > + for (int i = 0; i < exp->num_queues; i++) { > + FuseQueue *q = &exp->queues[i]; > + > + /* Queue 0's FD belongs to the FUSE session */ > + if (i > 0 && q->fuse_fd >= 0) { > + close(q->fuse_fd); > + } > + } > + g_free(exp->queues); > + > if (exp->fuse_session) { > if (exp->mounted) { > fuse_session_unmount(exp->fuse_session); > @@ -1137,23 +1266,23 @@ static int fuse_write_buf_response(int fd, uint32_t req_id, > /* > * For use in fuse_co_process_request(): > * Returns a pointer to the parameter object for the given operation (inside of > - * exp->request_buf, which is assumed to hold a fuse_in_header first). > - * Verifies that the object is complete (exp->request_buf is large enough to > + * q->request_buf, which is assumed to hold a fuse_in_header first). > + * Verifies that the object is complete (q->request_buf is large enough to > * hold it in one piece, and the request length includes the whole object). > * > - * Note that exp->request_buf may be overwritten after yielding, so the returned > + * Note that q->request_buf may be overwritten after yielding, so the returned > * pointer must not be used across a function that may yield! > */ > -#define FUSE_IN_OP_STRUCT(op_name, export) \ > +#define FUSE_IN_OP_STRUCT(op_name, queue) \ > ({ \ > const struct fuse_in_header *__in_hdr = \ > - (const struct fuse_in_header *)(export)->request_buf; \ > + (const struct fuse_in_header *)(q)->request_buf; \ > const struct fuse_##op_name##_in *__in = \ > (const struct fuse_##op_name##_in *)(__in_hdr + 1); \ > const size_t __param_len = sizeof(*__in_hdr) + sizeof(*__in); \ > uint32_t __req_len; \ > \ > - QEMU_BUILD_BUG_ON(sizeof((export)->request_buf) < __param_len); \ > + QEMU_BUILD_BUG_ON(sizeof((q)->request_buf) < __param_len); \ > \ > __req_len = __in_hdr->len; \ > if (__req_len < __param_len) { \ > @@ -1190,11 +1319,12 @@ static int fuse_write_buf_response(int fd, uint32_t req_id, > * Process a FUSE request, incl. writing the response. > * > * Note that yielding in any request-processing function can overwrite the > - * contents of exp->request_buf. Anything that takes a buffer needs to take > + * contents of q->request_buf. Anything that takes a buffer needs to take > * care that the content is copied before yielding. > */ > -static void coroutine_fn fuse_co_process_request(FuseExport *exp) > +static void coroutine_fn fuse_co_process_request(FuseQueue *q) > { > + FuseExport *exp = q->exp; > uint32_t opcode; > uint64_t req_id; > /* > @@ -1217,7 +1347,7 @@ static void coroutine_fn fuse_co_process_request(FuseExport *exp) > /* Limit scope to ensure pointer is no longer used after yielding */ > { > const struct fuse_in_header *in_hdr = > - (const struct fuse_in_header *)exp->request_buf; > + (const struct fuse_in_header *)q->request_buf; > > opcode = in_hdr->opcode; > req_id = in_hdr->unique; > @@ -1225,7 +1355,7 @@ static void coroutine_fn fuse_co_process_request(FuseExport *exp) > > switch (opcode) { > case FUSE_INIT: { > - const struct fuse_init_in *in = FUSE_IN_OP_STRUCT(init, exp); > + const struct fuse_init_in *in = FUSE_IN_OP_STRUCT(init, q); > ret = fuse_co_init(exp, FUSE_OUT_OP_STRUCT(init, out_buf), > in->max_readahead, in->flags); > break; > @@ -1248,23 +1378,23 @@ static void coroutine_fn fuse_co_process_request(FuseExport *exp) > break; > > case FUSE_SETATTR: { > - const struct fuse_setattr_in *in = FUSE_IN_OP_STRUCT(setattr, exp); > + const struct fuse_setattr_in *in = FUSE_IN_OP_STRUCT(setattr, q); > ret = fuse_co_setattr(exp, FUSE_OUT_OP_STRUCT(attr, out_buf), > in->valid, in->size, in->mode, in->uid, in->gid); > break; > } > > case FUSE_READ: { > - const struct fuse_read_in *in = FUSE_IN_OP_STRUCT(read, exp); > + const struct fuse_read_in *in = FUSE_IN_OP_STRUCT(read, q); > ret = fuse_co_read(exp, &out_data_buffer, in->offset, in->size); > break; > } > > case FUSE_WRITE: { > - const struct fuse_write_in *in = FUSE_IN_OP_STRUCT(write, exp); > + const struct fuse_write_in *in = FUSE_IN_OP_STRUCT(write, q); > uint32_t req_len; > > - req_len = ((const struct fuse_in_header *)exp->request_buf)->len; > + req_len = ((const struct fuse_in_header *)q->request_buf)->len; > if (unlikely(req_len < sizeof(struct fuse_in_header) + sizeof(*in) + > in->size)) { > warn_report("FUSE WRITE truncated; received %zu bytes of %" PRIu32, > @@ -1293,7 +1423,7 @@ static void coroutine_fn fuse_co_process_request(FuseExport *exp) > } > > case FUSE_FALLOCATE: { > - const struct fuse_fallocate_in *in = FUSE_IN_OP_STRUCT(fallocate, exp); > + const struct fuse_fallocate_in *in = FUSE_IN_OP_STRUCT(fallocate, q); > ret = fuse_co_fallocate(exp, in->offset, in->length, in->mode); > break; > } > @@ -1308,7 +1438,7 @@ static void coroutine_fn fuse_co_process_request(FuseExport *exp) > > #ifdef CONFIG_FUSE_LSEEK > case FUSE_LSEEK: { > - const struct fuse_lseek_in *in = FUSE_IN_OP_STRUCT(lseek, exp); > + const struct fuse_lseek_in *in = FUSE_IN_OP_STRUCT(lseek, q); > ret = fuse_co_lseek(exp, FUSE_OUT_OP_STRUCT(lseek, out_buf), > in->offset, in->whence); > break; > @@ -1322,11 +1452,11 @@ static void coroutine_fn fuse_co_process_request(FuseExport *exp) > /* Ignore errors from fuse_write*(), nothing we can do anyway */ > if (out_data_buffer) { > assert(ret >= 0); > - fuse_write_buf_response(exp->fuse_fd, req_id, out_hdr, > + fuse_write_buf_response(q->fuse_fd, req_id, out_hdr, > out_data_buffer, ret); > qemu_vfree(out_data_buffer); > } else { > - fuse_write_response(exp->fuse_fd, req_id, out_hdr, > + fuse_write_response(q->fuse_fd, req_id, out_hdr, > ret < 0 ? ret : 0, > ret < 0 ? 0 : ret); > } > -- > 2.48.1 > >
On Thu, Mar 27, 2025 at 11:55:57AM -0400, Stefan Hajnoczi wrote: > On Tue, Mar 25, 2025 at 05:06:54PM +0100, Hanna Czenczek wrote: > > FUSE allows creating multiple request queues by "cloning" /dev/fuse FDs > > (via open("/dev/fuse") + ioctl(FUSE_DEV_IOC_CLONE)). > > > > We can use this to implement multi-threading. > > > > Note that the interface presented here differs from the multi-queue > > interface of virtio-blk: The latter maps virtqueues to iothreads, which > > allows processing multiple virtqueues in a single iothread. The > > equivalent (processing multiple FDs in a single iothread) would not make > > sense for FUSE because those FDs are used in a round-robin fashion by > > the FUSE kernel driver. Putting two of them into a single iothread will > > just create a bottleneck. > > This text might be outdated. virtio-blk's new iothread-vq-mapping > parameter provides the "array of iothreads" mentioned below and a way to > assign virtqueues to those IOThreads. > > > +++ b/qapi/block-export.json > > @@ -179,12 +179,18 @@ > > # mount the export with allow_other, and if that fails, try again > > # without. (since 6.1; default: auto) > > # > > +# @iothreads: Enables multi-threading: Handle requests in each of the > > +# given iothreads (instead of the block device's iothread, or the > > +# export's "main" iothread). For this, the FUSE FD is duplicated so > > +# there is one FD per iothread. (since 10.1) > > This option isn't FUSE-specific but FUSE is the first export type to > support it. Please add it to BlockExportOptions instead and refuse > export creation when the export type only supports 1 IOThread. > > Eric: Are you interested in implementing support for multiple IOThreads > in the NBD export? I remember some time ago we talked about NBD > multi-conn support, although maybe that was for the client rather than > the server. The NBD server already supports clients that make requests through multiple TCP sockets, but right now, that server is not taking advantage of iothreads to spread the TCP load. And yes, I am in the middle of working on adding client NBD multi-conn support (reviving Rich Jones' preliminary patches on what it would take to have qemu open parallel TCP sockets to a supporting NBD server), which also will use a round-robin approach (but here, the round-robin is something we would code up in qemu, rather than the behavior handed to us by the FUSE kernel layer). Pinning specific iothreads to a specific TCP socket may or may not make sense, but I definitely want to have support for handing a pool of iothreads to an NBD client that will be using multi-conn. -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
On Tue, Apr 01, 2025 at 03:36:40PM -0500, Eric Blake wrote: > On Thu, Mar 27, 2025 at 11:55:57AM -0400, Stefan Hajnoczi wrote: > > On Tue, Mar 25, 2025 at 05:06:54PM +0100, Hanna Czenczek wrote: > > > FUSE allows creating multiple request queues by "cloning" /dev/fuse FDs > > > (via open("/dev/fuse") + ioctl(FUSE_DEV_IOC_CLONE)). > > > > > > We can use this to implement multi-threading. > > > > > > Note that the interface presented here differs from the multi-queue > > > interface of virtio-blk: The latter maps virtqueues to iothreads, which > > > allows processing multiple virtqueues in a single iothread. The > > > equivalent (processing multiple FDs in a single iothread) would not make > > > sense for FUSE because those FDs are used in a round-robin fashion by > > > the FUSE kernel driver. Putting two of them into a single iothread will > > > just create a bottleneck. > > > > This text might be outdated. virtio-blk's new iothread-vq-mapping > > parameter provides the "array of iothreads" mentioned below and a way to > > assign virtqueues to those IOThreads. > > > > > > +++ b/qapi/block-export.json > > > @@ -179,12 +179,18 @@ > > > # mount the export with allow_other, and if that fails, try again > > > # without. (since 6.1; default: auto) > > > # > > > +# @iothreads: Enables multi-threading: Handle requests in each of the > > > +# given iothreads (instead of the block device's iothread, or the > > > +# export's "main" iothread). For this, the FUSE FD is duplicated so > > > +# there is one FD per iothread. (since 10.1) > > > > This option isn't FUSE-specific but FUSE is the first export type to > > support it. Please add it to BlockExportOptions instead and refuse > > export creation when the export type only supports 1 IOThread. > > > > Eric: Are you interested in implementing support for multiple IOThreads > > in the NBD export? I remember some time ago we talked about NBD > > multi-conn support, although maybe that was for the client rather than > > the server. > > The NBD server already supports clients that make requests through > multiple TCP sockets, but right now, that server is not taking > advantage of iothreads to spread the TCP load. > > And yes, I am in the middle of working on adding client NBD multi-conn > support (reviving Rich Jones' preliminary patches on what it would > take to have qemu open parallel TCP sockets to a supporting NBD > server), which also will use a round-robin approach (but here, the > round-robin is something we would code up in qemu, rather than the > behavior handed to us by the FUSE kernel layer). Pinning specific > iothreads to a specific TCP socket may or may not make sense, but I > definitely want to have support for handing a pool of iothreads to an > NBD client that will be using multi-conn. Here I'm asking Hanna to make the "iothreads" export parameter generic for all export types (not just for FUSE). Do you think that the NBD export will be able to use the generic parameter or would you prefer an NBD-specific export parameter? Stefan
Hanna Czenczek <hreitz@redhat.com> writes: > FUSE allows creating multiple request queues by "cloning" /dev/fuse FDs > (via open("/dev/fuse") + ioctl(FUSE_DEV_IOC_CLONE)). > > We can use this to implement multi-threading. > > Note that the interface presented here differs from the multi-queue > interface of virtio-blk: The latter maps virtqueues to iothreads, which > allows processing multiple virtqueues in a single iothread. The > equivalent (processing multiple FDs in a single iothread) would not make > sense for FUSE because those FDs are used in a round-robin fashion by > the FUSE kernel driver. Putting two of them into a single iothread will > just create a bottleneck. > > Therefore, all we need is an array of iothreads, and we will create one > "queue" (FD) per thread. [...] > Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > --- > qapi/block-export.json | 8 +- > block/export/fuse.c | 214 +++++++++++++++++++++++++++++++++-------- > 2 files changed, 179 insertions(+), 43 deletions(-) > > diff --git a/qapi/block-export.json b/qapi/block-export.json > index c783e01a53..0bdd5992eb 100644 > --- a/qapi/block-export.json > +++ b/qapi/block-export.json > @@ -179,12 +179,18 @@ > # mount the export with allow_other, and if that fails, try again > # without. (since 6.1; default: auto) > # > +# @iothreads: Enables multi-threading: Handle requests in each of the > +# given iothreads (instead of the block device's iothread, or the > +# export's "main" iothread). When does "the block device's iothread" apply, and when "the export's main iothread"? Is this something the QMP user needs to know? > +# For this, the FUSE FD is duplicated so > +# there is one FD per iothread. (since 10.1) Is the file descriptor duplication something the QMP user needs to know? > +# > # Since: 6.0 > ## > { 'struct': 'BlockExportOptionsFuse', > 'data': { 'mountpoint': 'str', > '*growable': 'bool', > - '*allow-other': 'FuseExportAllowOther' }, > + '*allow-other': 'FuseExportAllowOther', > + '*iothreads': ['str'] }, > 'if': 'CONFIG_FUSE' } > > ## [...]
On 26.03.25 06:38, Markus Armbruster wrote: > Hanna Czenczek <hreitz@redhat.com> writes: > >> FUSE allows creating multiple request queues by "cloning" /dev/fuse FDs >> (via open("/dev/fuse") + ioctl(FUSE_DEV_IOC_CLONE)). >> >> We can use this to implement multi-threading. >> >> Note that the interface presented here differs from the multi-queue >> interface of virtio-blk: The latter maps virtqueues to iothreads, which >> allows processing multiple virtqueues in a single iothread. The >> equivalent (processing multiple FDs in a single iothread) would not make >> sense for FUSE because those FDs are used in a round-robin fashion by >> the FUSE kernel driver. Putting two of them into a single iothread will >> just create a bottleneck. >> >> Therefore, all we need is an array of iothreads, and we will create one >> "queue" (FD) per thread. > [...] > >> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >> --- >> qapi/block-export.json | 8 +- >> block/export/fuse.c | 214 +++++++++++++++++++++++++++++++++-------- >> 2 files changed, 179 insertions(+), 43 deletions(-) >> >> diff --git a/qapi/block-export.json b/qapi/block-export.json >> index c783e01a53..0bdd5992eb 100644 >> --- a/qapi/block-export.json >> +++ b/qapi/block-export.json >> @@ -179,12 +179,18 @@ >> # mount the export with allow_other, and if that fails, try again >> # without. (since 6.1; default: auto) >> # >> +# @iothreads: Enables multi-threading: Handle requests in each of the >> +# given iothreads (instead of the block device's iothread, or the >> +# export's "main" iothread). > When does "the block device's iothread" apply, and when "the export's > main iothread"? Depends on where you set the iothread option. > Is this something the QMP user needs to know? I think so, because e.g. if you set iothread on the device and the export, you’ll get a conflict. But if you set it there and set this option, you won’t. This option will just override the device/export option. > > >> +# For this, the FUSE FD is duplicated so >> +# there is one FD per iothread. (since 10.1) > Is the file descriptor duplication something the QMP user needs to know? I found this technical detail interesting, i.e. how multiqueue is implemented for FUSE. Compare virtio devices, for which we make it clear that virtqueues are mapped to I/O threads (not just in documentation, but actually in option naming). Is it something they must not know? Hanna > >> +# >> # Since: 6.0 >> ## >> { 'struct': 'BlockExportOptionsFuse', >> 'data': { 'mountpoint': 'str', >> '*growable': 'bool', >> - '*allow-other': 'FuseExportAllowOther' }, >> + '*allow-other': 'FuseExportAllowOther', >> + '*iothreads': ['str'] }, >> 'if': 'CONFIG_FUSE' } >> >> ## > [...] >
Hanna Czenczek <hreitz@redhat.com> writes: > On 26.03.25 06:38, Markus Armbruster wrote: >> Hanna Czenczek <hreitz@redhat.com> writes: >> >>> FUSE allows creating multiple request queues by "cloning" /dev/fuse FDs >>> (via open("/dev/fuse") + ioctl(FUSE_DEV_IOC_CLONE)). >>> >>> We can use this to implement multi-threading. >>> >>> Note that the interface presented here differs from the multi-queue >>> interface of virtio-blk: The latter maps virtqueues to iothreads, which >>> allows processing multiple virtqueues in a single iothread. The >>> equivalent (processing multiple FDs in a single iothread) would not make >>> sense for FUSE because those FDs are used in a round-robin fashion by >>> the FUSE kernel driver. Putting two of them into a single iothread will >>> just create a bottleneck. >>> >>> Therefore, all we need is an array of iothreads, and we will create one >>> "queue" (FD) per thread. >> >> [...] >> >>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >>> --- >>> qapi/block-export.json | 8 +- >>> block/export/fuse.c | 214 +++++++++++++++++++++++++++++++++-------- >>> 2 files changed, 179 insertions(+), 43 deletions(-) >>> >>> diff --git a/qapi/block-export.json b/qapi/block-export.json >>> index c783e01a53..0bdd5992eb 100644 >>> --- a/qapi/block-export.json >>> +++ b/qapi/block-export.json >>> @@ -179,12 +179,18 @@ >>> # mount the export with allow_other, and if that fails, try again >>> # without. (since 6.1; default: auto) >>> # >>> +# @iothreads: Enables multi-threading: Handle requests in each of the >>> +# given iothreads (instead of the block device's iothread, or the >>> +# export's "main" iothread). >> >> When does "the block device's iothread" apply, and when "the export's >> main iothread"? > > Depends on where you set the iothread option. Assuming QMP users need to know (see right below), can we trust they understand which one applies when? If not, can we provide clues? >> Is this something the QMP user needs to know? > > I think so, because e.g. if you set iothread on the device and the export, you’ll get a conflict. But if you set it there and set this option, you won’t. This option will just override the device/export option. Do we think the doc comment sufficient for QMP users to figure this out? If not, can we provide clues? In particular, do we think they can go from an export failure to the setting @iothreads here? Perhaps the error message will guide them. What is the message? >>> +# For this, the FUSE FD is duplicated so >>> +# there is one FD per iothread. (since 10.1) >> >> Is the file descriptor duplication something the QMP user needs to know? > > I found this technical detail interesting, i.e. how multiqueue is implemented for FUSE. Compare virtio devices, for which we make it clear that virtqueues are mapped to I/O threads (not just in documentation, but actually in option naming). Is it something they must not know? Interesting to whom? Users of QMP? Then explaining it in the doc comment (and thus the QEMU QMP Reference Manual) is proper. Just developers? Then the doc comment is the wrong spot. The QEMU QMP Reference Manual is for users of QMP. It's dense reading. Information the users are not expected to need / understand makes that worse. > Hanna > >> >>> +# >>> # Since: 6.0 >>> ## >>> { 'struct': 'BlockExportOptionsFuse', >>> 'data': { 'mountpoint': 'str', >>> '*growable': 'bool', >>> - '*allow-other': 'FuseExportAllowOther' }, >>> + '*allow-other': 'FuseExportAllowOther', >>> + '*iothreads': ['str'] }, >>> 'if': 'CONFIG_FUSE' } >>> ## >> [...] >>
On 26.03.25 12:41, Markus Armbruster wrote: > Hanna Czenczek <hreitz@redhat.com> writes: > >> On 26.03.25 06:38, Markus Armbruster wrote: >>> Hanna Czenczek <hreitz@redhat.com> writes: >>> >>>> FUSE allows creating multiple request queues by "cloning" /dev/fuse FDs >>>> (via open("/dev/fuse") + ioctl(FUSE_DEV_IOC_CLONE)). >>>> >>>> We can use this to implement multi-threading. >>>> >>>> Note that the interface presented here differs from the multi-queue >>>> interface of virtio-blk: The latter maps virtqueues to iothreads, which >>>> allows processing multiple virtqueues in a single iothread. The >>>> equivalent (processing multiple FDs in a single iothread) would not make >>>> sense for FUSE because those FDs are used in a round-robin fashion by >>>> the FUSE kernel driver. Putting two of them into a single iothread will >>>> just create a bottleneck. >>>> >>>> Therefore, all we need is an array of iothreads, and we will create one >>>> "queue" (FD) per thread. >>> [...] >>> >>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >>>> --- >>>> qapi/block-export.json | 8 +- >>>> block/export/fuse.c | 214 +++++++++++++++++++++++++++++++++-------- >>>> 2 files changed, 179 insertions(+), 43 deletions(-) >>>> >>>> diff --git a/qapi/block-export.json b/qapi/block-export.json >>>> index c783e01a53..0bdd5992eb 100644 >>>> --- a/qapi/block-export.json >>>> +++ b/qapi/block-export.json >>>> @@ -179,12 +179,18 @@ >>>> # mount the export with allow_other, and if that fails, try again >>>> # without. (since 6.1; default: auto) >>>> # >>>> +# @iothreads: Enables multi-threading: Handle requests in each of the >>>> +# given iothreads (instead of the block device's iothread, or the >>>> +# export's "main" iothread). >>> When does "the block device's iothread" apply, and when "the export's >>> main iothread"? >> Depends on where you set the iothread option. > Assuming QMP users need to know (see right below), can we trust they > understand which one applies when? If not, can we provide clues? I don’t understand what exactly you mean, but which one applies when has nothing to do with this option, but with the @iothread (and @fixed-iothread) option(s) on BlockExportOptions, which do document this. > >>> Is this something the QMP user needs to know? >> I think so, because e.g. if you set iothread on the device and the export, you’ll get a conflict. But if you set it there and set this option, you won’t. This option will just override the device/export option. > Do we think the doc comment sufficient for QMP users to figure this out? As for conflict, BlockExportOptions.iothread and BlockExportOptions.fixed-iothread do. As for overriding, I do think so. Do you not? I’m always open to suggestions. > If not, can we provide clues? > > In particular, do we think they can go from an export failure to the > setting @iothreads here? Perhaps the error message will guide them. > What is the message? I don’t understand what failure you mean. > >>>> +# For this, the FUSE FD is duplicated so >>>> +# there is one FD per iothread. (since 10.1) >>> Is the file descriptor duplication something the QMP user needs to know? >> I found this technical detail interesting, i.e. how multiqueue is implemented for FUSE. Compare virtio devices, for which we make it clear that virtqueues are mapped to I/O threads (not just in documentation, but actually in option naming). Is it something they must not know? > Interesting to whom? > > Users of QMP? Then explaining it in the doc comment (and thus the QEMU > QMP Reference Manual) is proper. Yes, QEMU users. I find this information interesting to users because virtio explains how multiqueue works there (see IOThreadVirtQueueMapping in virtio.json), and this explains that for FUSE exports, there are no virt queues, but requests come from that FD, which explains implicitly why this doesn’t use the IOThreadVirtQueueMapping type. In fact, if anything, I would even expand on the explanation to say that requests are generally distributed in a round-robin fashion across FUSE FDs regardless of where they originate from, contrasting with virtqueues, which are generally tied to vCPUs. Hanna > > Just developers? Then the doc comment is the wrong spot. > > The QEMU QMP Reference Manual is for users of QMP. It's dense reading. > Information the users are not expected to need / understand makes that > worse. > >> Hanna >> >>>> +# >>>> # Since: 6.0 >>>> ## >>>> { 'struct': 'BlockExportOptionsFuse', >>>> 'data': { 'mountpoint': 'str', >>>> '*growable': 'bool', >>>> - '*allow-other': 'FuseExportAllowOther' }, >>>> + '*allow-other': 'FuseExportAllowOther', >>>> + '*iothreads': ['str'] }, >>>> 'if': 'CONFIG_FUSE' } >>>> ## >>> [...] >>>
Hanna Czenczek <hreitz@redhat.com> writes: > On 26.03.25 12:41, Markus Armbruster wrote: >> Hanna Czenczek <hreitz@redhat.com> writes: >> >>> On 26.03.25 06:38, Markus Armbruster wrote: >>>> Hanna Czenczek <hreitz@redhat.com> writes: >>>> >>>>> FUSE allows creating multiple request queues by "cloning" /dev/fuse FDs >>>>> (via open("/dev/fuse") + ioctl(FUSE_DEV_IOC_CLONE)). >>>>> >>>>> We can use this to implement multi-threading. >>>>> >>>>> Note that the interface presented here differs from the multi-queue >>>>> interface of virtio-blk: The latter maps virtqueues to iothreads, which >>>>> allows processing multiple virtqueues in a single iothread. The >>>>> equivalent (processing multiple FDs in a single iothread) would not make >>>>> sense for FUSE because those FDs are used in a round-robin fashion by >>>>> the FUSE kernel driver. Putting two of them into a single iothread will >>>>> just create a bottleneck. >>>>> >>>>> Therefore, all we need is an array of iothreads, and we will create one >>>>> "queue" (FD) per thread. >>>> >>>> [...] >>>> >>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >>>>> --- >>>>> qapi/block-export.json | 8 +- >>>>> block/export/fuse.c | 214 +++++++++++++++++++++++++++++++++-------- >>>>> 2 files changed, 179 insertions(+), 43 deletions(-) >>>>> >>>>> diff --git a/qapi/block-export.json b/qapi/block-export.json >>>>> index c783e01a53..0bdd5992eb 100644 >>>>> --- a/qapi/block-export.json >>>>> +++ b/qapi/block-export.json >>>>> @@ -179,12 +179,18 @@ >>>>> # mount the export with allow_other, and if that fails, try again >>>>> # without. (since 6.1; default: auto) >>>>> # >>>>> +# @iothreads: Enables multi-threading: Handle requests in each of the >>>>> +# given iothreads (instead of the block device's iothread, or the >>>>> +# export's "main" iothread). >>>> >>>> When does "the block device's iothread" apply, and when "the export's >>>> main iothread"? >>> >>> Depends on where you set the iothread option. >> >> Assuming QMP users need to know (see right below), can we trust they >> understand which one applies when? If not, can we provide clues? > > I don’t understand what exactly you mean, but which one applies when has nothing to do with this option, but with the @iothread (and @fixed-iothread) option(s) on BlockExportOptions, which do document this. Can you point me to the spot? >>>> Is this something the QMP user needs to know? >>> >>> I think so, because e.g. if you set iothread on the device and the export, you’ll get a conflict. But if you set it there and set this option, you won’t. This option will just override the device/export option. >> >> Do we think the doc comment sufficient for QMP users to figure this out? > > As for conflict, BlockExportOptions.iothread and BlockExportOptions.fixed-iothread do. > > As for overriding, I do think so. Do you not? I’m always open to suggestions. > >> If not, can we provide clues? >> >> In particular, do we think they can go from an export failure to the >> setting @iothreads here? Perhaps the error message will guide them. >> What is the message? > > I don’t understand what failure you mean. You wrote "you'll get a conflict". I assume this manifests as failure of a QMP command (let's ignore CLI to keep things simple here). Do we think ordinary users running into that failure can figure out they can avoid it by setting @iothreads? What's that failure's error message? >>>>> +# For this, the FUSE FD is duplicated so >>>>> +# there is one FD per iothread. (since 10.1) >>>> >>>> Is the file descriptor duplication something the QMP user needs to know? >>> >>> I found this technical detail interesting, i.e. how multiqueue is implemented for FUSE. Compare virtio devices, for which we make it clear that virtqueues are mapped to I/O threads (not just in documentation, but actually in option naming). Is it something they must not know? >> >> Interesting to whom? >> >> Users of QMP? Then explaining it in the doc comment (and thus the QEMU >> QMP Reference Manual) is proper. > > Yes, QEMU users. I find this information interesting to users because virtio explains how multiqueue works there (see IOThreadVirtQueueMapping in virtio.json), and this explains that for FUSE exports, there are no virt queues, but requests come from that FD, which explains implicitly why this doesn’t use the IOThreadVirtQueueMapping type. > > In fact, if anything, I would even expand on the explanation to say that requests are generally distributed in a round-robin fashion across FUSE FDs regardless of where they originate from, contrasting with virtqueues, which are generally tied to vCPUs. Up to you. I lack context to judge. Making yourself or your fellow experts understand how this works is not the same as making users understand how to use it. Making me understand is not the same either, but it might be closer. If this isn't useful to you, let me know, and I'll shut up :) > Hanna > >> >> Just developers? Then the doc comment is the wrong spot. >> >> The QEMU QMP Reference Manual is for users of QMP. It's dense reading. >> Information the users are not expected to need / understand makes that >> worse. >> >>> Hanna >>> >>>>> +# >>>>> # Since: 6.0 >>>>> ## >>>>> { 'struct': 'BlockExportOptionsFuse', >>>>> 'data': { 'mountpoint': 'str', >>>>> '*growable': 'bool', >>>>> - '*allow-other': 'FuseExportAllowOther' }, >>>>> + '*allow-other': 'FuseExportAllowOther', >>>>> + '*iothreads': ['str'] }, >>>>> 'if': 'CONFIG_FUSE' } >>>>> ## >>>> [...] >>>>
On 27.03.25 13:18, Markus Armbruster wrote: > Hanna Czenczek <hreitz@redhat.com> writes: > >> On 26.03.25 12:41, Markus Armbruster wrote: >>> Hanna Czenczek <hreitz@redhat.com> writes: >>> >>>> On 26.03.25 06:38, Markus Armbruster wrote: >>>>> Hanna Czenczek <hreitz@redhat.com> writes: >>>>> >>>>>> FUSE allows creating multiple request queues by "cloning" /dev/fuse FDs >>>>>> (via open("/dev/fuse") + ioctl(FUSE_DEV_IOC_CLONE)). >>>>>> >>>>>> We can use this to implement multi-threading. >>>>>> >>>>>> Note that the interface presented here differs from the multi-queue >>>>>> interface of virtio-blk: The latter maps virtqueues to iothreads, which >>>>>> allows processing multiple virtqueues in a single iothread. The >>>>>> equivalent (processing multiple FDs in a single iothread) would not make >>>>>> sense for FUSE because those FDs are used in a round-robin fashion by >>>>>> the FUSE kernel driver. Putting two of them into a single iothread will >>>>>> just create a bottleneck. >>>>>> >>>>>> Therefore, all we need is an array of iothreads, and we will create one >>>>>> "queue" (FD) per thread. >>>>> [...] >>>>> >>>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >>>>>> --- >>>>>> qapi/block-export.json | 8 +- >>>>>> block/export/fuse.c | 214 +++++++++++++++++++++++++++++++++-------- >>>>>> 2 files changed, 179 insertions(+), 43 deletions(-) >>>>>> >>>>>> diff --git a/qapi/block-export.json b/qapi/block-export.json >>>>>> index c783e01a53..0bdd5992eb 100644 >>>>>> --- a/qapi/block-export.json >>>>>> +++ b/qapi/block-export.json >>>>>> @@ -179,12 +179,18 @@ >>>>>> # mount the export with allow_other, and if that fails, try again >>>>>> # without. (since 6.1; default: auto) >>>>>> # >>>>>> +# @iothreads: Enables multi-threading: Handle requests in each of the >>>>>> +# given iothreads (instead of the block device's iothread, or the >>>>>> +# export's "main" iothread). >>>>> When does "the block device's iothread" apply, and when "the export's >>>>> main iothread"? >>>> Depends on where you set the iothread option. >>> Assuming QMP users need to know (see right below), can we trust they >>> understand which one applies when? If not, can we provide clues? >> I don’t understand what exactly you mean, but which one applies when has nothing to do with this option, but with the @iothread (and @fixed-iothread) option(s) on BlockExportOptions, which do document this. > Can you point me to the spot? Sure: https://www.qemu.org/docs/master/interop/qemu-qmp-ref.html#object-QMP-block-export.BlockExportOptions (iothread and fixed-iothread) > >>>>> Is this something the QMP user needs to know? >>>> I think so, because e.g. if you set iothread on the device and the export, you’ll get a conflict. But if you set it there and set this option, you won’t. This option will just override the device/export option. >>> Do we think the doc comment sufficient for QMP users to figure this out? >> As for conflict, BlockExportOptions.iothread and BlockExportOptions.fixed-iothread do. >> >> As for overriding, I do think so. Do you not? I’m always open to suggestions. >> >>> If not, can we provide clues? >>> >>> In particular, do we think they can go from an export failure to the >>> setting @iothreads here? Perhaps the error message will guide them. >>> What is the message? >> I don’t understand what failure you mean. > You wrote "you'll get a conflict". I assume this manifests as failure > of a QMP command (let's ignore CLI to keep things simple here). If you set the @iothread option on both the (guest) device and the export (and also @fixed-iothread on the export), then you’ll get an error. Nothing to do with this new @iothreads option here. > Do we think ordinary users running into that failure can figure out they > can avoid it by setting @iothreads? It shouldn’t affect the failure. Setting @iothread on both device and export (with @fixed-iothread) will always cause an error, and should. Setting this option is not supposed to “fix” that configuration error. Theoretically, setting @iothreads here could make it so that BlockExportOptions.iothread (and/or fixed-iothread) is ignored, because that thread will no longer be used for export-issued I/O; but in practice, setting that option (BlockExportOptions.iothread) moves that export and the whole BDS tree behind it to that I/O thread, so if you haven’t specified an I/O thread on the guest device, the guest device will then use that thread. So making @iothreads silently completely ignore BlockExportOptions.iothread may cause surprising behavior. Maybe we could make setting @iothreads here and the generic BlockExportOptions.iothread at the same time an error. That would save us the explanation here. > What's that failure's error message? $ echo '{"execute":"qmp_capabilities"} {"execute":"block-export-add", "arguments":{"type":"fuse", "id":"exp", "node-name":"null", "mountpoint":"/tmp/fuse-export", "iothread":"iothr1", "fixed-iothread":true}}' | build/qemu-system-x86_64 \ -object iothread,id=iothr0 \ -object iothread,id=iothr1 \ -blockdev null-co,node-name=null \ -device virtio-blk,drive=null,iothread=iothr0 \ -qmp stdio {"QMP": {"version": {"qemu": {"micro": 91, "minor": 2, "major": 9}, "package": "v10.0.0-rc1"}, "capabilities": ["oob"]}} {"return": {}} {"error": {"class": "GenericError", "desc": "Cannot change iothread of active block backend"}} > >>>>>> +# For this, the FUSE FD is duplicated so >>>>>> +# there is one FD per iothread. (since 10.1) >>>>> Is the file descriptor duplication something the QMP user needs to know? >>>> I found this technical detail interesting, i.e. how multiqueue is implemented for FUSE. Compare virtio devices, for which we make it clear that virtqueues are mapped to I/O threads (not just in documentation, but actually in option naming). Is it something they must not know? >>> Interesting to whom? >>> >>> Users of QMP? Then explaining it in the doc comment (and thus the QEMU >>> QMP Reference Manual) is proper. >> Yes, QEMU users. I find this information interesting to users because virtio explains how multiqueue works there (see IOThreadVirtQueueMapping in virtio.json), and this explains that for FUSE exports, there are no virt queues, but requests come from that FD, which explains implicitly why this doesn’t use the IOThreadVirtQueueMapping type. >> >> In fact, if anything, I would even expand on the explanation to say that requests are generally distributed in a round-robin fashion across FUSE FDs regardless of where they originate from, contrasting with virtqueues, which are generally tied to vCPUs. > Up to you. I lack context to judge. > > Making yourself or your fellow experts understand how this works is not > the same as making users understand how to use it. Making me understand > is not the same either, but it might be closer. This part of the documentation would concern itself less with “how to use it”, and more “when to use it”: This round-robin distribution of requests across FDs means that even if I/O is run in a single thread, using multiple threads for the export may improve performance (as shown in the commit message) – in contrast to virtqueue-based systems. So I think that’s important information to users. Hanna > > If this isn't useful to you, let me know, and I'll shut up :) > >> Hanna >> >>> Just developers? Then the doc comment is the wrong spot. >>> >>> The QEMU QMP Reference Manual is for users of QMP. It's dense reading. >>> Information the users are not expected to need / understand makes that >>> worse. >>> >>>> Hanna >>>> >>>>>> +# >>>>>> # Since: 6.0 >>>>>> ## >>>>>> { 'struct': 'BlockExportOptionsFuse', >>>>>> 'data': { 'mountpoint': 'str', >>>>>> '*growable': 'bool', >>>>>> - '*allow-other': 'FuseExportAllowOther' }, >>>>>> + '*allow-other': 'FuseExportAllowOther', >>>>>> + '*iothreads': ['str'] }, >>>>>> 'if': 'CONFIG_FUSE' } >>>>>> ## >>>>> [...] >>>>>
Am 27.03.2025 um 14:45 hat Hanna Czenczek geschrieben: > On 27.03.25 13:18, Markus Armbruster wrote: > > Hanna Czenczek <hreitz@redhat.com> writes: > > > > > On 26.03.25 12:41, Markus Armbruster wrote: > > > > Hanna Czenczek <hreitz@redhat.com> writes: > > > > > > > > > On 26.03.25 06:38, Markus Armbruster wrote: > > > > > > Hanna Czenczek <hreitz@redhat.com> writes: > > > > > > > > > > > > > FUSE allows creating multiple request queues by "cloning" /dev/fuse FDs > > > > > > > (via open("/dev/fuse") + ioctl(FUSE_DEV_IOC_CLONE)). > > > > > > > > > > > > > > We can use this to implement multi-threading. > > > > > > > > > > > > > > Note that the interface presented here differs from the multi-queue > > > > > > > interface of virtio-blk: The latter maps virtqueues to iothreads, which > > > > > > > allows processing multiple virtqueues in a single iothread. The > > > > > > > equivalent (processing multiple FDs in a single iothread) would not make > > > > > > > sense for FUSE because those FDs are used in a round-robin fashion by > > > > > > > the FUSE kernel driver. Putting two of them into a single iothread will > > > > > > > just create a bottleneck. > > > > > > > > > > > > > > Therefore, all we need is an array of iothreads, and we will create one > > > > > > > "queue" (FD) per thread. > > > > > > [...] > > > > > > > > > > > > > Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > > > > > > > --- > > > > > > > qapi/block-export.json | 8 +- > > > > > > > block/export/fuse.c | 214 +++++++++++++++++++++++++++++++++-------- > > > > > > > 2 files changed, 179 insertions(+), 43 deletions(-) > > > > > > > > > > > > > > diff --git a/qapi/block-export.json b/qapi/block-export.json > > > > > > > index c783e01a53..0bdd5992eb 100644 > > > > > > > --- a/qapi/block-export.json > > > > > > > +++ b/qapi/block-export.json > > > > > > > @@ -179,12 +179,18 @@ > > > > > > > # mount the export with allow_other, and if that fails, try again > > > > > > > # without. (since 6.1; default: auto) > > > > > > > # > > > > > > > +# @iothreads: Enables multi-threading: Handle requests in each of the > > > > > > > +# given iothreads (instead of the block device's iothread, or the > > > > > > > +# export's "main" iothread). > > > > > > When does "the block device's iothread" apply, and when "the export's > > > > > > main iothread"? > > > > > Depends on where you set the iothread option. > > > > Assuming QMP users need to know (see right below), can we trust they > > > > understand which one applies when? If not, can we provide clues? > > > I don’t understand what exactly you mean, but which one applies when has nothing to do with this option, but with the @iothread (and @fixed-iothread) option(s) on BlockExportOptions, which do document this. > > Can you point me to the spot? > > Sure: https://www.qemu.org/docs/master/interop/qemu-qmp-ref.html#object-QMP-block-export.BlockExportOptions > (iothread and fixed-iothread) > > > > > > > > > Is this something the QMP user needs to know? > > > > > I think so, because e.g. if you set iothread on the device and the export, you’ll get a conflict. But if you set it there and set this option, you won’t. This option will just override the device/export option. > > > > Do we think the doc comment sufficient for QMP users to figure this out? > > > As for conflict, BlockExportOptions.iothread and BlockExportOptions.fixed-iothread do. > > > > > > As for overriding, I do think so. Do you not? I’m always open to suggestions. > > > > > > > If not, can we provide clues? > > > > > > > > In particular, do we think they can go from an export failure to the > > > > setting @iothreads here? Perhaps the error message will guide them. > > > > What is the message? > > > I don’t understand what failure you mean. > > You wrote "you'll get a conflict". I assume this manifests as failure > > of a QMP command (let's ignore CLI to keep things simple here). > > If you set the @iothread option on both the (guest) device and the export > (and also @fixed-iothread on the export), then you’ll get an error. Nothing > to do with this new @iothreads option here. > > > Do we think ordinary users running into that failure can figure out they > > can avoid it by setting @iothreads? > > It shouldn’t affect the failure. Setting @iothread on both device and > export (with @fixed-iothread) will always cause an error, and should. > Setting this option is not supposed to “fix” that configuration error. > > Theoretically, setting @iothreads here could make it so that > BlockExportOptions.iothread (and/or fixed-iothread) is ignored, because that > thread will no longer be used for export-issued I/O; but in practice, > setting that option (BlockExportOptions.iothread) moves that export and the > whole BDS tree behind it to that I/O thread, so if you haven’t specified an > I/O thread on the guest device, the guest device will then use that thread. > So making @iothreads silently completely ignore BlockExportOptions.iothread > may cause surprising behavior. > > Maybe we could make setting @iothreads here and the generic > BlockExportOptions.iothread at the same time an error. That would save us > the explanation here. This raises the question if the better interface wouldn't be to change the BlockExportOptions.iothread from 'str' to an alternate between 'str' and ['str'], allowing the user to specify multiple iothreads in the already existing related option without creating conflicting options. In the long run, I would be surprised if FUSE remained the only export supporting multiple iothreads. At least the virtio based ones (vhost-user-blk and VDUSE) even have precedence in the virtio-blk device itself, so while I don't know how much interest there is in actually implementing it, in theory we know it makes sense. Kevin
On Tue, Apr 01, 2025 at 02:05:40PM +0200, Kevin Wolf wrote: > > Maybe we could make setting @iothreads here and the generic > > BlockExportOptions.iothread at the same time an error. That would save us > > the explanation here. > > This raises the question if the better interface wouldn't be to change > the BlockExportOptions.iothread from 'str' to an alternate between 'str' > and ['str'], allowing the user to specify multiple iothreads in the > already existing related option without creating conflicting options. > > In the long run, I would be surprised if FUSE remained the only export > supporting multiple iothreads. At least the virtio based ones > (vhost-user-blk and VDUSE) even have precedence in the virtio-blk device > itself, so while I don't know how much interest there is in actually > implementing it, in theory we know it makes sense. And I really want my work on NBD Multi-conn support to merge well with multiple iothreads. That is yet another case where even if the I/O is single-threaded, having parallel connections to the NBD server via round-robin of requests can improve throughput. But if you can afford to assign four iothreads to manage four TCP connections to a single NBD server, you'll get even better response. -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
© 2016 - 2025 Red Hat, Inc.