From nobody Sun Sep 28 15:30:17 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=gmail.com ARC-Seal: i=1; a=rsa-sha256; t=1756575293; cv=none; d=zohomail.com; s=zohoarc; b=mmBcwraP+CL+gONhOH5zrzWeP2TNnZnIUrhgjleJ0fWQabE2GmcydoyE5sE6MMwssEBUBjo7afKQD5cgRo5qjCtjX96QTlvv8pP3RvOWGtit+gwICg771u4fPxsEoIy3Sr5RbxJXX9VlkUNOKU4P9F0k6aRxgd93GaMvTAuokX8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1756575293; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=j3cm0Mg08qHhHYH4vjziiJMKFR53qXZs+LCPZ4kO6b4=; b=FFMGDulK2QhUvU4M/gz20ZlSmmPgu297XL8KthtPzp+D6QQZEJvsSa2AxNMqrxpadUWsdkqvWvpdn41dLoMZUP9q3QiOjEbBzrjL8QzKxzJsYIL2+haQ/+845/GYcBOdkicYzzCNlSbk5jCW1wDtwrSzV3SXV77Xsmwjt3C/Hog= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1756575293633560.4969202267469; Sat, 30 Aug 2025 10:34:53 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1usNqE-0000MB-4x; Sat, 30 Aug 2025 11:50:12 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1usBg3-0003as-VW; Fri, 29 Aug 2025 22:50:51 -0400 Received: from mail-qk1-x735.google.com ([2607:f8b0:4864:20::735]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1usBg1-00087L-Bu; Fri, 29 Aug 2025 22:50:51 -0400 Received: by mail-qk1-x735.google.com with SMTP id af79cd13be357-7fd454e65cdso100275485a.2; Fri, 29 Aug 2025 19:50:48 -0700 (PDT) Received: from localhost.localdomain (wn-campus-nat-129-97-124-90.dynamic.uwaterloo.ca. [129.97.124.90]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7fc0eacf1b4sm299457085a.21.2025.08.29.19.50.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Aug 2025 19:50:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1756522248; x=1757127048; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=j3cm0Mg08qHhHYH4vjziiJMKFR53qXZs+LCPZ4kO6b4=; b=jGVuTtrPgH51kBPoitQbSPItYsrSqJK008uQtHVrrTxiJceMnSM4PqP/HMWPfWGkUV gu4B3guORIJjKkN8217syqjXSdD7DJ2Cw8wmfpuZSNiCkKGK8SKK3+MrRqWVsSygNQGN aJ4dvcYvM6RSCGD9J+7FrY8DZMvhCe4DMOlTe8YAVq4mDh6wYFc3NaPrSEn2THoyja6u 9Reis9IT+6fELUAK665nbWmd4qZxZRnspi73r2WLrsyUgcPYgJWVjD850DPSiRJwZXDK eKJC/l271NUlWumsuVDIwdFcIsCqARlvyKM70z6T1uPz5/IFGVEGxopIe9QY+SfTvkfE 0DfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1756522248; x=1757127048; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=j3cm0Mg08qHhHYH4vjziiJMKFR53qXZs+LCPZ4kO6b4=; b=sgVYkrh7wj8O1eLxXhyjxY2k8aHXlZUIDwD4EPFetzsWQCieCKkS3FQetV1gLONo2U X2E0ws/73H4KyQNgpOl6DtbVSzB5FLpvNIhyEiMRK6BKqP+3nCniGZ2WKC7SdGBW5IW1 9PA3h337lRvtUW9a0L7SDJdkWkQZOJo2rsvLl7a4f5Cnnl6xic+ObXWcpDeFVSXtG19x WuoNgOUQag+pNK3l4+O4mKug1aT/kMtKim2ZkG0p4Dg9OcJP2RXeBL7dxUepX8+5bhxr cmRodm5nxl+qBI4Wf1/xta6hCl+XtnF/wr+TuIItFbLeWSUR56IkB36gVSgReimEUpkw B3Ug== X-Gm-Message-State: AOJu0Yyp+MchIrSHAKo12Qfm9mNPG8keD/8wWOGfpfQ8WqY/w/AGUpKY u6lSkbrCZLElTk8Cf9B9lEPF9mQCF63iKLc990MPxVn0KVTvvLxyDHkWXheL6Q== X-Gm-Gg: ASbGnctwTEdMHasVYdoiXqrguKVwEwZYEPz6DhZtQGdJUZnJgcAipx9xpNBRvUwk+sy RXMVBUVQanVTQfNEbNMaO3EA0Q5LVO6RK0bvK+e8Gq0+bQxl0hUFCXjoGgsR/LZqpl4bb93UopY rJSI1L4C1H3qFAwLdk0Gygt8zuYh/pk4miUaJecj0frVgZcREWR8zzo4EW7yqmLuMLSc3SXylz4 FfLHW482oKpWxKeOYbWmOfDN2RybvIRBv3WcE7BIEx/SpPnq8R01bCNGqpFbooehD0fO/MxqIE4 wSAoBN+gKXCp2Y8hVxCq3rlcgjyo4TuMv372eWn5dNNRENlrU0EH2H8KLG38tbBGKp8I9Q7+gc0 0cfM9UlVkP1epE7g+LhMRGXpl0vf8Gca/o9NYfxXKWvkqxBvwtX7WVGGo0EKhYl4v9h/MHs9VV+ Ljee4toOu2GtfQulF6 X-Google-Smtp-Source: AGHT+IGm5K3vhxn/n2Iuhyh9GwzoTKCBXM4mGj0Uu7vc/4P+n2Idoa5ygxR+fWN6Qt9MfFUKK2jbOg== X-Received: by 2002:a05:620a:4016:b0:7e8:2604:bdab with SMTP id af79cd13be357-7ff2b0d75f3mr88206485a.47.1756522247554; Fri, 29 Aug 2025 19:50:47 -0700 (PDT) From: Brian Song To: qemu-block@nongnu.org Cc: qemu-devel@nongnu.org, armbru@redhat.com, bernd@bsbernd.com, fam@euphon.net, hibriansong@gmail.com, hreitz@redhat.com, kwolf@redhat.com, stefanha@redhat.com Subject: [PATCH 1/4] export/fuse: add opt to enable FUSE-over-io_uring Date: Fri, 29 Aug 2025 22:50:22 -0400 Message-ID: <20250830025025.3610-2-hibriansong@gmail.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250830025025.3610-1-hibriansong@gmail.com> References: <20250830025025.3610-1-hibriansong@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::735; envelope-from=hibriansong@gmail.com; helo=mail-qk1-x735.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @gmail.com) X-ZM-MESSAGEID: 1756575296235124100 Content-Type: text/plain; charset="utf-8" This patch adds a new export option for storage-export-daemon to enable FUSE-over-io_uring via the switch io-uring=3Don|off (disableby default). It also implements the protocol handshake with the Linux kernel during the FUSE-over-io_uring initialization phase. See: https://docs.kernel.org/filesystems/fuse-io-uring.html The kernel documentation describes in detail how FUSE-over-io_uring works. This patch implements the Initial SQE stage shown in thediagram: it initializes one queue per IOThread, each currently supporting a single submission queue entry (SQE). When the FUSE driver sends the first FUSE request (FUSE_INIT), storage-export-daemon calls fuse_uring_start() to complete initialization, ultimately submitting the SQE with the FUSE_IO_URING_CMD_REGISTER command to confirm successful initialization with the kernel. We also added support for multiple IOThreads. The current Linux kernel requires registering $(nproc) queues when setting up FUSE-over-io_uring To let users customize the number of FUSE Queues (i.e., IOThreads), we first create nproc Ring Queues as required by the kernel, then distribute them in a round-robin manner to the FUSE Queues for registration. In addition, to support multiple in-flight requests, we configure each Ring Queue with FUSE_DEFAULT_RING_QUEUE_DEPTH entries/requests. Suggested-by: Kevin Wolf Suggested-by: Stefan Hajnoczi Signed-off-by: Brian Song --- block/export/fuse.c | 310 +++++++++++++++++++++++++-- docs/tools/qemu-storage-daemon.rst | 11 +- qapi/block-export.json | 5 +- storage-daemon/qemu-storage-daemon.c | 1 + util/fdmon-io_uring.c | 5 +- 5 files changed, 309 insertions(+), 23 deletions(-) diff --git a/block/export/fuse.c b/block/export/fuse.c index c0ad4696ce..19bf9e5f74 100644 --- a/block/export/fuse.c +++ b/block/export/fuse.c @@ -48,6 +48,9 @@ #include #endif =20 +/* room needed in buffer to accommodate header */ +#define FUSE_BUFFER_HEADER_SIZE 0x1000 + /* Prevent overly long bounce buffer allocations */ #define FUSE_MAX_READ_BYTES (MIN(BDRV_REQUEST_MAX_BYTES, 1 * 1024 * 1024)) /* @@ -63,12 +66,59 @@ (FUSE_MAX_WRITE_BYTES - FUSE_IN_PLACE_WRITE_BYTES) =20 typedef struct FuseExport FuseExport; +typedef struct FuseQueue FuseQueue; + +#ifdef CONFIG_LINUX_IO_URING +#define FUSE_DEFAULT_RING_QUEUE_DEPTH 64 +#define FUSE_DEFAULT_MAX_PAGES_PER_REQ 32 + +typedef struct FuseRingQueue FuseRingQueue; +typedef struct FuseRingEnt { + /* back pointer */ + FuseRingQueue *rq; + + /* commit id of a fuse request */ + uint64_t req_commit_id; + + /* fuse request header and payload */ + struct fuse_uring_req_header req_header; + void *op_payload; + size_t req_payload_sz; + + /* The vector passed to the kernel */ + struct iovec iov[2]; + + CqeHandler fuse_cqe_handler; +} FuseRingEnt; + +struct FuseRingQueue { + int rqid; + + /* back pointer */ + FuseQueue *q; + FuseRingEnt *ent; + + /* List entry for ring_queues */ + QLIST_ENTRY(FuseRingQueue) next; +}; + +/* + * Round-robin distribution of ring queues across FUSE queues. + * This structure manages the mapping between kernel ring queues and user + * FUSE queues. + */ +typedef struct FuseRingQueueManager { + FuseRingQueue *ring_queues; + int num_ring_queues; + int num_fuse_queues; +} FuseRingQueueManager; +#endif =20 /* * One FUSE "queue", representing one FUSE FD from which requests are fetc= hed * and processed. Each queue is tied to an AioContext. */ -typedef struct FuseQueue { +struct FuseQueue { FuseExport *exp; =20 AioContext *ctx; @@ -109,15 +159,11 @@ typedef struct FuseQueue { * Free this buffer with qemu_vfree(). */ void *spillover_buf; -} FuseQueue; =20 -/* - * Verify that FuseQueue.request_buf plus the spill-over buffer together - * are big enough to be accepted by the FUSE kernel driver. - */ -QEMU_BUILD_BUG_ON(sizeof(((FuseQueue *)0)->request_buf) + - FUSE_SPILLOVER_BUF_SIZE < - FUSE_MIN_READ_BUFFER); +#ifdef CONFIG_LINUX_IO_URING + QLIST_HEAD(, FuseRingQueue) ring_queue_list; +#endif +}; =20 struct FuseExport { BlockExport common; @@ -133,7 +179,7 @@ struct FuseExport { */ bool halted; =20 - int num_queues; + size_t num_queues; FuseQueue *queues; /* * True if this export should follow the generic export's AioContext. @@ -149,6 +195,12 @@ struct FuseExport { /* Whether allow_other was used as a mount option or not */ bool allow_other; =20 +#ifdef CONFIG_LINUX_IO_URING + bool is_uring; + size_t ring_queue_depth; + FuseRingQueueManager *ring_queue_manager; +#endif + mode_t st_mode; uid_t st_uid; gid_t st_gid; @@ -205,7 +257,7 @@ static void fuse_attach_handlers(FuseExport *exp) return; } =20 - for (int i =3D 0; i < exp->num_queues; i++) { + for (size_t i =3D 0; i < exp->num_queues; i++) { aio_set_fd_handler(exp->queues[i].ctx, exp->queues[i].fuse_fd, read_from_fuse_fd, NULL, NULL, NULL, &exp->queues[i]); @@ -257,6 +309,189 @@ static const BlockDevOps fuse_export_blk_dev_ops =3D { .drained_poll =3D fuse_export_drained_poll, }; =20 +#ifdef CONFIG_LINUX_IO_URING +static void fuse_uring_sqe_set_req_data(struct fuse_uring_cmd_req *req, + const unsigned int rqid, + const unsigned int commit_id) +{ + req->qid =3D rqid; + req->commit_id =3D commit_id; + req->flags =3D 0; +} + +static void fuse_uring_sqe_prepare(struct io_uring_sqe *sqe, FuseQueue *q, + __u32 cmd_op) +{ + sqe->opcode =3D IORING_OP_URING_CMD; + + sqe->fd =3D q->fuse_fd; + sqe->rw_flags =3D 0; + sqe->ioprio =3D 0; + sqe->off =3D 0; + + sqe->cmd_op =3D cmd_op; + sqe->__pad1 =3D 0; +} + +static void fuse_uring_prep_sqe_register(struct io_uring_sqe *sqe, void *o= paque) +{ + FuseRingEnt *ent =3D opaque; + struct fuse_uring_cmd_req *req =3D (void *)&sqe->cmd[0]; + + fuse_uring_sqe_prepare(sqe, ent->rq->q, FUSE_IO_URING_CMD_REGISTER); + + sqe->addr =3D (uint64_t)(ent->iov); + sqe->len =3D 2; + + fuse_uring_sqe_set_req_data(req, ent->rq->rqid, 0); +} + +static void fuse_uring_submit_register(void *opaque) +{ + FuseRingEnt *ent =3D opaque; + FuseExport *exp =3D ent->rq->q->exp; + + + aio_add_sqe(fuse_uring_prep_sqe_register, ent, &(ent->fuse_cqe_handler= )); +} + +/** + * Distribute ring queues across FUSE queues using round-robin algorithm. + * This ensures even distribution of kernel ring queues across user-specif= ied + * FUSE queues. + */ +static +FuseRingQueueManager *fuse_ring_queue_manager_create(int num_fuse_queues, + size_t ring_queue_dept= h, + size_t bufsize) +{ + int num_ring_queues =3D get_nprocs(); + FuseRingQueueManager *manager =3D g_new(FuseRingQueueManager, 1); + + if (!manager) { + return NULL; + } + + manager->ring_queues =3D g_new(FuseRingQueue, num_ring_queues); + manager->num_ring_queues =3D num_ring_queues; + manager->num_fuse_queues =3D num_fuse_queues; + + if (!manager->ring_queues) { + g_free(manager); + return NULL; + } + + for (int i =3D 0; i < num_ring_queues; i++) { + FuseRingQueue *rq =3D &manager->ring_queues[i]; + rq->rqid =3D i; + rq->ent =3D g_new(FuseRingEnt, ring_queue_depth); + + if (!rq->ent) { + for (int j =3D 0; j < i; j++) { + g_free(manager->ring_queues[j].ent); + } + g_free(manager->ring_queues); + g_free(manager); + return NULL; + } + + for (size_t j =3D 0; j < ring_queue_depth; j++) { + FuseRingEnt *ent =3D &rq->ent[j]; + ent->rq =3D rq; + ent->req_payload_sz =3D bufsize - FUSE_BUFFER_HEADER_SIZE; + ent->op_payload =3D g_malloc0(ent->req_payload_sz); + + if (!ent->op_payload) { + for (size_t k =3D 0; k < j; k++) { + g_free(rq->ent[k].op_payload); + } + g_free(rq->ent); + for (int k =3D 0; k < i; k++) { + g_free(manager->ring_queues[k].ent); + } + g_free(manager->ring_queues); + g_free(manager); + return NULL; + } + + ent->iov[0] =3D (struct iovec) { + &(ent->req_header), + sizeof(struct fuse_uring_req_header) + }; + ent->iov[1] =3D (struct iovec) { + ent->op_payload, + ent->req_payload_sz + }; + + ent->fuse_cqe_handler.cb =3D fuse_uring_cqe_handler; + } + } + + return manager; +} + +static +void fuse_distribute_ring_queues(FuseExport *exp, FuseRingQueueManager *ma= nager) +{ + int queue_index =3D 0; + + for (int i =3D 0; i < manager->num_ring_queues; i++) { + FuseRingQueue *rq =3D &manager->ring_queues[i]; + + rq->q =3D &exp->queues[queue_index]; + QLIST_INSERT_HEAD(&(rq->q->ring_queue_list), rq, next); + + queue_index =3D (queue_index + 1) % manager->num_fuse_queues; + } +} + +static +void fuse_schedule_ring_queue_registrations(FuseExport *exp, + FuseRingQueueManager *manager) +{ + for (int i =3D 0; i < manager->num_fuse_queues; i++) { + FuseQueue *q =3D &exp->queues[i]; + FuseRingQueue *rq; + + QLIST_FOREACH(rq, &q->ring_queue_list, next) { + for (int j =3D 0; j < exp->ring_queue_depth; j++) { + aio_bh_schedule_oneshot(q->ctx, fuse_uring_submit_register, + &(rq->ent[j])); + } + } + } +} + +static void fuse_uring_start(FuseExport *exp, struct fuse_init_out *out) +{ + /* + * Since we didn't enable the FUSE_MAX_PAGES feature, the value of + * fc->max_pages should be FUSE_DEFAULT_MAX_PAGES_PER_REQ, which is se= t by + * the kernel by default. Also, max_write should not exceed + * FUSE_DEFAULT_MAX_PAGES_PER_REQ * PAGE_SIZE. + */ + size_t bufsize =3D out->max_write + FUSE_BUFFER_HEADER_SIZE; + + if (!(out->flags & FUSE_MAX_PAGES)) { + bufsize =3D FUSE_DEFAULT_MAX_PAGES_PER_REQ * qemu_real_host_page_s= ize() + + FUSE_BUFFER_HEADER_SIZE; + } + + exp->ring_queue_manager =3D fuse_ring_queue_manager_create( + exp->num_queues, exp->ring_queue_depth, bufsize); + + if (!exp->ring_queue_manager) { + error_report("Failed to create ring queue manager"); + return; + } + + /* Distribute ring queues across FUSE queues using round-robin */ + fuse_distribute_ring_queues(exp, exp->ring_queue_manager); + + fuse_schedule_ring_queue_registrations(exp, exp->ring_queue_manager); +} +#endif + static int fuse_export_create(BlockExport *blk_exp, BlockExportOptions *blk_exp_args, AioContext *const *multithread, @@ -270,6 +505,11 @@ static int fuse_export_create(BlockExport *blk_exp, =20 assert(blk_exp_args->type =3D=3D BLOCK_EXPORT_TYPE_FUSE); =20 +#ifdef CONFIG_LINUX_IO_URING + exp->is_uring =3D args->io_uring; + exp->ring_queue_depth =3D FUSE_DEFAULT_RING_QUEUE_DEPTH; +#endif + if (multithread) { /* Guaranteed by common export code */ assert(mt_count >=3D 1); @@ -283,6 +523,10 @@ static int fuse_export_create(BlockExport *blk_exp, .exp =3D exp, .ctx =3D multithread[i], .fuse_fd =3D -1, +#ifdef CONFIG_LINUX_IO_URING + .ring_queue_list =3D + QLIST_HEAD_INITIALIZER(exp->queues[i].ring_queue_list), +#endif }; } } else { @@ -296,6 +540,10 @@ static int fuse_export_create(BlockExport *blk_exp, .exp =3D exp, .ctx =3D exp->common.ctx, .fuse_fd =3D -1, +#ifdef CONFIG_LINUX_IO_URING + .ring_queue_list =3D + QLIST_HEAD_INITIALIZER(exp->queues[0].ring_queue_list), +#endif }; } =20 @@ -685,17 +933,39 @@ static bool is_regular_file(const char *path, Error *= *errp) */ static ssize_t coroutine_fn fuse_co_init(FuseExport *exp, struct fuse_init_out *out, - uint32_t max_readahead, uint32_t flags) + uint32_t max_readahead, const struct fuse_init_in *in) { - const uint32_t supported_flags =3D FUSE_ASYNC_READ | FUSE_ASYNC_DIO; + uint64_t supported_flags =3D FUSE_ASYNC_READ | FUSE_ASYNC_DIO + | FUSE_INIT_EXT; + uint64_t outargflags =3D 0; + uint64_t inargflags =3D in->flags; + + ssize_t ret =3D 0; + + if (inargflags & FUSE_INIT_EXT) { + inargflags =3D inargflags | (uint64_t) in->flags2 << 32; + } + +#ifdef CONFIG_LINUX_IO_URING + if (exp->is_uring) { + if (inargflags & FUSE_OVER_IO_URING) { + supported_flags |=3D FUSE_OVER_IO_URING; + } else { + exp->is_uring =3D false; + ret =3D -ENODEV; + } + } +#endif + + outargflags =3D inargflags & supported_flags; =20 *out =3D (struct fuse_init_out) { .major =3D FUSE_KERNEL_VERSION, .minor =3D FUSE_KERNEL_MINOR_VERSION, .max_readahead =3D max_readahead, .max_write =3D FUSE_MAX_WRITE_BYTES, - .flags =3D flags & supported_flags, - .flags2 =3D 0, + .flags =3D outargflags, + .flags2 =3D outargflags >> 32, =20 /* libfuse maximum: 2^16 - 1 */ .max_background =3D UINT16_MAX, @@ -717,7 +987,7 @@ fuse_co_init(FuseExport *exp, struct fuse_init_out *out, .map_alignment =3D 0, }; =20 - return sizeof(*out); + return ret < 0 ? ret : sizeof(*out); } =20 /** @@ -1506,6 +1776,14 @@ fuse_co_process_request(FuseQueue *q, void *spillove= r_buf) fuse_write_buf_response(q->fuse_fd, req_id, out_hdr, out_data_buffer, ret); qemu_vfree(out_data_buffer); +#ifdef CONFIG_LINUX_IO_URING + /* Handle FUSE-over-io_uring initialization */ + if (unlikely(opcode =3D=3D FUSE_INIT && exp->is_uring)) { + struct fuse_init_out *out =3D + (struct fuse_init_out *)FUSE_OUT_OP_STRUCT(out_buf); + fuse_uring_start(exp, out); + } +#endif } else { fuse_write_response(q->fuse_fd, req_id, out_hdr, ret < 0 ? ret : 0, diff --git a/docs/tools/qemu-storage-daemon.rst b/docs/tools/qemu-storage-d= aemon.rst index 35ab2d7807..c5076101e0 100644 --- a/docs/tools/qemu-storage-daemon.rst +++ b/docs/tools/qemu-storage-daemon.rst @@ -78,7 +78,7 @@ Standard options: .. option:: --export [type=3D]nbd,id=3D,node-name=3D[,name= =3D][,writable=3Don|off][,bitmap=3D] --export [type=3D]vhost-user-blk,id=3D,node-name=3D,addr.= type=3Dunix,addr.path=3D[,writable=3Don|off][,logical-block-si= ze=3D][,num-queues=3D] --export [type=3D]vhost-user-blk,id=3D,node-name=3D,addr.= type=3Dfd,addr.str=3D[,writable=3Don|off][,logical-block-size=3D][,num-queues=3D] - --export [type=3D]fuse,id=3D,node-name=3D,mountpoint=3D[,growable=3Don|off][,writable=3Don|off][,allow-other=3Don|off|auto] + --export [type=3D]fuse,id=3D,node-name=3D,mountpoint=3D[,growable=3Don|off][,writable=3Don|off][,allow-other=3Don|off|auto][,i= o-uring=3Don|off] --export [type=3D]vduse-blk,id=3D,node-name=3D,name=3D[,writable=3Don|off][,num-queues=3D][,queue-size=3D][,logical-block-size=3D][,serial=3D] =20 is a block export definition. ``node-name`` is the block node that shoul= d be @@ -111,10 +111,11 @@ Standard options: that enabling this option as a non-root user requires enabling the user_allow_other option in the global fuse.conf configuration file. Set= ting ``allow-other`` to auto (the default) will try enabling this option, and= on - error fall back to disabling it. - - The ``vduse-blk`` export type takes a ``name`` (must be unique across th= e host) - to create the VDUSE device. + error fall back to disabling it. Once ``io-uring`` is enabled (off by de= fault), + the FUSE-over-io_uring-related settings will be initialized to bypass the + traditional /dev/fuse communication mechanism and instead use io_uring to + handle FUSE operations. The ``vduse-blk`` export type takes a ``name`` + (must be unique across the host) to create the VDUSE device. ``num-queues`` sets the number of virtqueues (the default is 1). ``queue-size`` sets the virtqueue descriptor table size (the default is = 256). =20 diff --git a/qapi/block-export.json b/qapi/block-export.json index 9ae703ad01..37f2fc47e2 100644 --- a/qapi/block-export.json +++ b/qapi/block-export.json @@ -184,12 +184,15 @@ # mount the export with allow_other, and if that fails, try again # without. (since 6.1; default: auto) # +# @io-uring: Use FUSE-over-io-uring. (since 10.2; default: false) +# # Since: 6.0 ## { 'struct': 'BlockExportOptionsFuse', 'data': { 'mountpoint': 'str', '*growable': 'bool', - '*allow-other': 'FuseExportAllowOther' }, + '*allow-other': 'FuseExportAllowOther', + '*io-uring': 'bool' }, 'if': 'CONFIG_FUSE' } =20 ## diff --git a/storage-daemon/qemu-storage-daemon.c b/storage-daemon/qemu-sto= rage-daemon.c index eb72561358..0cd4cd2b58 100644 --- a/storage-daemon/qemu-storage-daemon.c +++ b/storage-daemon/qemu-storage-daemon.c @@ -107,6 +107,7 @@ static void help(void) #ifdef CONFIG_FUSE " --export [type=3D]fuse,id=3D,node-name=3D,mountpoint=3D<= file>\n" " [,growable=3Don|off][,writable=3Don|off][,allow-other=3Don|off= |auto]\n" +" [,io-uring=3Don|off]" " export the specified block node over FUSE\n" "\n" #endif /* CONFIG_FUSE */ diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c index d2433d1d99..68d3fe8e01 100644 --- a/util/fdmon-io_uring.c +++ b/util/fdmon-io_uring.c @@ -452,10 +452,13 @@ static const FDMonOps fdmon_io_uring_ops =3D { void fdmon_io_uring_setup(AioContext *ctx, Error **errp) { int ret; + int flags; =20 ctx->io_uring_fd_tag =3D NULL; + flags =3D IORING_SETUP_SQE128; =20 - ret =3D io_uring_queue_init(FDMON_IO_URING_ENTRIES, &ctx->fdmon_io_uri= ng, 0); + ret =3D io_uring_queue_init(FDMON_IO_URING_ENTRIES, + &ctx->fdmon_io_uring, flags); if (ret !=3D 0) { error_setg_errno(errp, -ret, "Failed to initialize io_uring"); return; --=20 2.45.2 From nobody Sun Sep 28 15:30:17 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=gmail.com ARC-Seal: i=1; a=rsa-sha256; t=1756571668; cv=none; d=zohomail.com; s=zohoarc; b=gIWe5zrPnGgh0P73grIjddQynL4Zp5+A0qqOyjsWP+A11RH6mN/+SwUzP8pit/OhNz/vTU7H70Y+c3f8UXOseo1OqusMDB4Add2wScg18NX7OmXId8tGgzJWJq2brVAImvRIuv2wQAxxly6Dop+bH6hEdkC4lIzOkvO0Agc8faM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1756571668; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=XAdEckVxs7wUU52+bCeoIAHQ38yDPI3pC7AsOopH9Xw=; b=bovDmIArQcJ9bpWsNWtMbZQUEMhzCh/aO940BnKYO40s5HC9kCK7X59zVYlGnjUJIoU0A67Hguajg7ErTevZGU2vR2mlGoYZFX5haqrW0bm3TYMrhKUIngGr3bWSCtoZV59pK4POOT+SWYSWuWT58p4iX0KHXHhnnnWiHVW69Xk= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1756571668785887.3517194823497; Sat, 30 Aug 2025 09:34:28 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1usNqM-0000XR-M8; Sat, 30 Aug 2025 11:50:20 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1usBg5-0003bM-7J; Fri, 29 Aug 2025 22:50:53 -0400 Received: from mail-qt1-x836.google.com ([2607:f8b0:4864:20::836]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1usBg2-00087U-B7; Fri, 29 Aug 2025 22:50:52 -0400 Received: by mail-qt1-x836.google.com with SMTP id d75a77b69052e-4b297962b24so29026821cf.2; Fri, 29 Aug 2025 19:50:49 -0700 (PDT) Received: from localhost.localdomain (wn-campus-nat-129-97-124-90.dynamic.uwaterloo.ca. [129.97.124.90]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7fc0eacf1b4sm299457085a.21.2025.08.29.19.50.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Aug 2025 19:50:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1756522249; x=1757127049; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XAdEckVxs7wUU52+bCeoIAHQ38yDPI3pC7AsOopH9Xw=; b=NSfGp42vA0XD8E87pUX1+6tVPl4vjohFL4SpTsAStF7TKAmqSE+XVZpRi+0amFb6M9 0W8BufaQg+3iSx5BQD2oYpa05UynekaEO8h3xO6b+5obPalZxt0z/LKKxu2waisHa8GY YO/6GpfOlbBMLhEtN3q4CyDohZbhiDoIQsRn02lS3VASTOf05qCUeIyrB6KicMmxReyV cfEHul1Gm2eB3x+MxrbTdvQKCWc4y6v+UWSpMvj/5MN44RiUO60umnDhR5DCQ8SH9Aa7 /Y5KgguRaYQl8FCAEE9UQT8cuDfnOcTzmJ2sFb5r2XYHlIwzXGX6VIKkMXMOCt/eRtLO kcMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1756522249; x=1757127049; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XAdEckVxs7wUU52+bCeoIAHQ38yDPI3pC7AsOopH9Xw=; b=nATgD9mihFqV5AyHru/sTLwu9kNq6R1hU3AO3Z7YNIiPl7SWGm2GqIlirSrHCjpYQp ey//bvRUPv8giImowS1v8fRJFVSAQENGFBUU2tFm7ewiTTv6FHT6zQvZ/McgTufhazqW 9qw4UsLdq8v1ztIAuAQZ0O7fC5sbhwtgmkD8PTVZIleOFKBb/Ni0nOwRjhN/MCblTx1M iDmSehKCSHYAorR9peLG6BKsmafsdcS/1hdU2OHNw1WsP2S1xtYM5qMTi3l4FAQOR1el O1F2eTE6ZUH7i+aBPK5w65rHas2Msz4TX40myyWN5a+Khj4BRaF1U/oJRUmvX6L0vYu0 KMnw== X-Gm-Message-State: AOJu0YzxYxCtEzHEcJj3rBjVAJU2IoOdWKE2drrTSJxrMNpaKWhQgmx9 WQmr0Ep4ZC9/xLeGF8Z0FN4L+1wMAOCiKFRMvYhJEmk796qWdOUwIpoNcPoAAg== X-Gm-Gg: ASbGncs5R+np0vxsKMFRjqj5Zi91uzPCN4zJko/epuDiE70SJCikO6kPqZ5m+xoazIP tryM9WChtYFB0tfRQqZiB8K+LVOdO0ypC6ZwNKBwgBBXfc+MpvE2bMgOHP2sM3TbJpTWuOtnK6B 3zjZMlN3S/c6FW1z+vRIelFRE3PaRj5Nzng6jo/W/ifxwLh3RiUJ+32aGMqjHrelE9tv2SXuHW5 IlGSi1QNWQ9ISdlTWR3VOzAftiTOoJ8cEBfw7diFnK/XNkxeFinifLC8fGv/Dq1cbCUWg6M4Odv GTLTRmlZMyfwjzTc//1LfBSK8RH21Hy19IjSZh0HaM7fFtzbQuVJwh0h8qb0jPk3LN708zxrMCH /EMnkVZY0FCbclawujghfoLcb4TekTxftEtJ/NtG9hLjEHrkdvsjQl9DTBrClmcaJV7vEKNA+J3 a/G/QBERM7xoagCciRpEEnG74sPRY= X-Google-Smtp-Source: AGHT+IHY1684My/DJOqAEiegu5qoqFoLPTlj2mu7pKXrp026xgpo2nRFAAaApxVg3awdNIhrrX70Dw== X-Received: by 2002:a05:622a:1353:b0:4b2:9cdc:6d52 with SMTP id d75a77b69052e-4b31dcf3a7bmr9040131cf.71.1756522248677; Fri, 29 Aug 2025 19:50:48 -0700 (PDT) From: Brian Song To: qemu-block@nongnu.org Cc: qemu-devel@nongnu.org, armbru@redhat.com, bernd@bsbernd.com, fam@euphon.net, hibriansong@gmail.com, hreitz@redhat.com, kwolf@redhat.com, stefanha@redhat.com Subject: [PATCH 2/4] export/fuse: process FUSE-over-io_uring requests Date: Fri, 29 Aug 2025 22:50:23 -0400 Message-ID: <20250830025025.3610-3-hibriansong@gmail.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250830025025.3610-1-hibriansong@gmail.com> References: <20250830025025.3610-1-hibriansong@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::836; envelope-from=hibriansong@gmail.com; helo=mail-qt1-x836.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @gmail.com) X-ZM-MESSAGEID: 1756571671236124100 Content-Type: text/plain; charset="utf-8" https://docs.kernel.org/filesystems/fuse-io-uring.html As described in the kernel documentation, after FUSE-over-io_uring initialization and handshake, FUSE interacts with the kernel using SQE/CQE to send requests and receive responses. This corresponds to the "Sending requests with CQEs" section in the docs. This patch implements three key parts: registering the CQE handler (fuse_uring_cqe_handler), processing FUSE requests (fuse_uring_co_ process_request), and sending response results (fuse_uring_send_ response). It also merges the traditional /dev/fuse request handling with the FUSE-over-io_uring handling functions. Suggested-by: Kevin Wolf Suggested-by: Stefan Hajnoczi Signed-off-by: Brian Song --- block/export/fuse.c | 457 ++++++++++++++++++++++++++++++-------------- 1 file changed, 309 insertions(+), 148 deletions(-) diff --git a/block/export/fuse.c b/block/export/fuse.c index 19bf9e5f74..07f74fc8ec 100644 --- a/block/export/fuse.c +++ b/block/export/fuse.c @@ -310,6 +310,47 @@ static const BlockDevOps fuse_export_blk_dev_ops =3D { }; =20 #ifdef CONFIG_LINUX_IO_URING +static void coroutine_fn fuse_uring_co_process_request(FuseRingEnt *ent); + +static void coroutine_fn co_fuse_uring_queue_handle_cqes(void *opaque) +{ + FuseRingEnt *ent =3D opaque; + FuseExport *exp =3D ent->rq->q->exp; + + /* Going to process requests */ + fuse_inc_in_flight(exp); + + /* A ring entry returned */ + fuse_uring_co_process_request(ent); + + /* Finished processing requests */ + fuse_dec_in_flight(exp); +} + +static void fuse_uring_cqe_handler(CqeHandler *cqe_handler) +{ + FuseRingEnt *ent =3D container_of(cqe_handler, FuseRingEnt, fuse_cqe_h= andler); + Coroutine *co; + FuseExport *exp =3D ent->rq->q->exp; + + if (unlikely(exp->halted)) { + return; + } + + int err =3D cqe_handler->cqe.res; + + if (err !=3D 0) { + /* -ENOTCONN is ok on umount */ + if (err !=3D -EINTR && err !=3D -EAGAIN && + err !=3D -ENOTCONN) { + fuse_export_halt(exp); + } + } else { + co =3D qemu_coroutine_create(co_fuse_uring_queue_handle_cqes, ent); + qemu_coroutine_enter(co); + } +} + static void fuse_uring_sqe_set_req_data(struct fuse_uring_cmd_req *req, const unsigned int rqid, const unsigned int commit_id) @@ -1213,6 +1254,9 @@ fuse_co_read(FuseExport *exp, void **bufptr, uint64_t= offset, uint32_t size) * Data in @in_place_buf is assumed to be overwritten after yielding, so w= ill * be copied to a bounce buffer beforehand. @spillover_buf in contrast is * assumed to be exclusively owned and will be used as-is. + * In FUSE-over-io_uring mode, the actual op_payload content is stored in + * @spillover_buf. To ensure this buffer is used for writing, @in_place_buf + * is explicitly set to NULL. * Return the number of bytes written to *out on success, and -errno on er= ror. */ static ssize_t coroutine_fn @@ -1220,8 +1264,8 @@ fuse_co_write(FuseExport *exp, struct fuse_write_out = *out, uint64_t offset, uint32_t size, const void *in_place_buf, const void *spillover_buf) { - size_t in_place_size; - void *copied; + size_t in_place_size =3D 0; + void *copied =3D NULL; int64_t blk_len; int ret; struct iovec iov[2]; @@ -1236,10 +1280,12 @@ fuse_co_write(FuseExport *exp, struct fuse_write_ou= t *out, return -EACCES; } =20 - /* Must copy to bounce buffer before potentially yielding */ - in_place_size =3D MIN(size, FUSE_IN_PLACE_WRITE_BYTES); - copied =3D blk_blockalign(exp->common.blk, in_place_size); - memcpy(copied, in_place_buf, in_place_size); + if (in_place_buf) { + /* Must copy to bounce buffer before potentially yielding */ + in_place_size =3D MIN(size, FUSE_IN_PLACE_WRITE_BYTES); + copied =3D blk_blockalign(exp->common.blk, in_place_size); + memcpy(copied, in_place_buf, in_place_size); + } =20 /** * Clients will expect short writes at EOF, so we have to limit @@ -1263,26 +1309,38 @@ fuse_co_write(FuseExport *exp, struct fuse_write_ou= t *out, } } =20 - iov[0] =3D (struct iovec) { - .iov_base =3D copied, - .iov_len =3D in_place_size, - }; - if (size > FUSE_IN_PLACE_WRITE_BYTES) { - assert(size - FUSE_IN_PLACE_WRITE_BYTES <=3D FUSE_SPILLOVER_BUF_SI= ZE); - iov[1] =3D (struct iovec) { - .iov_base =3D (void *)spillover_buf, - .iov_len =3D size - FUSE_IN_PLACE_WRITE_BYTES, + if (in_place_buf) { + iov[0] =3D (struct iovec) { + .iov_base =3D copied, + .iov_len =3D in_place_size, }; - qemu_iovec_init_external(&qiov, iov, 2); + if (size > FUSE_IN_PLACE_WRITE_BYTES) { + assert(size - FUSE_IN_PLACE_WRITE_BYTES <=3D FUSE_SPILLOVER_BU= F_SIZE); + iov[1] =3D (struct iovec) { + .iov_base =3D (void *)spillover_buf, + .iov_len =3D size - FUSE_IN_PLACE_WRITE_BYTES, + }; + qemu_iovec_init_external(&qiov, iov, 2); + } else { + qemu_iovec_init_external(&qiov, iov, 1); + } } else { + /* fuse over io_uring */ + iov[0] =3D (struct iovec) { + .iov_base =3D (void *)spillover_buf, + .iov_len =3D size, + }; qemu_iovec_init_external(&qiov, iov, 1); } + ret =3D blk_co_pwritev(exp->common.blk, offset, size, &qiov, 0); if (ret < 0) { goto fail_free_buffer; } =20 - qemu_vfree(copied); + if (in_place_buf) { + qemu_vfree(copied); + } =20 *out =3D (struct fuse_write_out) { .size =3D size, @@ -1290,7 +1348,9 @@ fuse_co_write(FuseExport *exp, struct fuse_write_out = *out, return sizeof(*out); =20 fail_free_buffer: - qemu_vfree(copied); + if (in_place_buf) { + qemu_vfree(copied); + } return ret; } =20 @@ -1578,173 +1638,151 @@ static int fuse_write_buf_response(int fd, uint32= _t req_id, } } =20 -/* - * For use in fuse_co_process_request(): - * Returns a pointer to the parameter object for the given operation (insi= de of - * queue->request_buf, which is assumed to hold a fuse_in_header first). - * Verifies that the object is complete (queue->request_buf is large enoug= h to - * hold it in one piece, and the request length includes the whole object). - * - * Note that queue->request_buf may be overwritten after yielding, so the - * returned pointer must not be used across a function that may yield! - */ -#define FUSE_IN_OP_STRUCT(op_name, queue) \ +#define FUSE_IN_OP_STRUCT_LEGACY(in_buf) \ ({ \ - const struct fuse_in_header *__in_hdr =3D \ - (const struct fuse_in_header *)(queue)->request_buf; \ - const struct fuse_##op_name##_in *__in =3D \ - (const struct fuse_##op_name##_in *)(__in_hdr + 1); \ - const size_t __param_len =3D sizeof(*__in_hdr) + sizeof(*__in); \ - uint32_t __req_len; \ - \ - QEMU_BUILD_BUG_ON(sizeof((queue)->request_buf) < __param_len); \ - \ - __req_len =3D __in_hdr->len; \ - if (__req_len < __param_len) { \ - warn_report("FUSE request truncated (%" PRIu32 " < %zu)", \ - __req_len, __param_len); \ - ret =3D -EINVAL; \ - break; \ - } \ - __in; \ + (void *)(((struct fuse_in_header *)in_buf) + 1); \ }) =20 -/* - * For use in fuse_co_process_request(): - * Returns a pointer to the return object for the given operation (inside = of - * out_buf, which is assumed to hold a fuse_out_header first). - * Verifies that out_buf is large enough to hold the whole object. - * - * (out_buf should be a char[] array.) - */ -#define FUSE_OUT_OP_STRUCT(op_name, out_buf) \ +#define FUSE_OUT_OP_STRUCT_LEGACY(out_buf) \ ({ \ - struct fuse_out_header *__out_hdr =3D \ - (struct fuse_out_header *)(out_buf); \ - struct fuse_##op_name##_out *__out =3D \ - (struct fuse_##op_name##_out *)(__out_hdr + 1); \ - \ - QEMU_BUILD_BUG_ON(sizeof(*__out_hdr) + sizeof(*__out) > \ - sizeof(out_buf)); \ - \ - __out; \ + (void *)(((struct fuse_out_header *)out_buf) + 1); \ }) =20 -/** - * Process a FUSE request, incl. writing the response. - * - * Note that yielding in any request-processing function can overwrite the - * contents of q->request_buf. Anything that takes a buffer needs to take - * care that the content is copied before yielding. - * - * @spillover_buf can contain the tail of a write request too large to fit= into - * q->request_buf. This function takes ownership of it (i.e. will free it= ), - * which assumes that its contents will not be overwritten by concurrent - * requests (as opposed to q->request_buf). + +/* + * Shared helper for FUSE request processing. Handles both legacy and io_u= ring + * paths. */ -static void coroutine_fn -fuse_co_process_request(FuseQueue *q, void *spillover_buf) +static void coroutine_fn fuse_co_process_request_common( + FuseExport *exp, + uint32_t opcode, + uint64_t req_id, + void *in_buf, + void *spillover_buf, + void *out_buf, + int fd, /* -1 for uring */ + void (*send_response)(void *opaque, uint32_t req_id, ssize_t ret, + const void *buf, void *out_buf), + void *opaque /* FuseQueue* or FuseRingEnt* */) { - FuseExport *exp =3D q->exp; - uint32_t opcode; - uint64_t req_id; - /* - * Return buffer. Must be large enough to hold all return headers, bu= t does - * not include space for data returned by read requests. - * (FUSE_IN_OP_STRUCT() verifies at compile time that out_buf is indeed - * large enough.) - */ - char out_buf[sizeof(struct fuse_out_header) + - MAX_CONST(sizeof(struct fuse_init_out), - MAX_CONST(sizeof(struct fuse_open_out), - MAX_CONST(sizeof(struct fuse_attr_out), - MAX_CONST(sizeof(struct fuse_write_out), - sizeof(struct fuse_lseek_out)))))]; - struct fuse_out_header *out_hdr =3D (struct fuse_out_header *)out_buf; - /* For read requests: Data to be returned */ void *out_data_buffer =3D NULL; - ssize_t ret; + ssize_t ret =3D 0; =20 - /* Limit scope to ensure pointer is no longer used after yielding */ - { - const struct fuse_in_header *in_hdr =3D - (const struct fuse_in_header *)q->request_buf; + void *op_in_buf =3D (void *)FUSE_IN_OP_STRUCT_LEGACY(in_buf); + void *op_out_buf =3D (void *)FUSE_OUT_OP_STRUCT_LEGACY(out_buf); =20 - opcode =3D in_hdr->opcode; - req_id =3D in_hdr->unique; +#ifdef CONFIG_LINUX_IO_URING + if (opcode !=3D FUSE_INIT && exp->is_uring) { + op_in_buf =3D (void *)in_buf; + op_out_buf =3D (void *)out_buf; } +#endif =20 switch (opcode) { case FUSE_INIT: { - const struct fuse_init_in *in =3D FUSE_IN_OP_STRUCT(init, q); - ret =3D fuse_co_init(exp, FUSE_OUT_OP_STRUCT(init, out_buf), - in->max_readahead, in->flags); + const struct fuse_init_in *in =3D + (const struct fuse_init_in *)FUSE_IN_OP_STRUCT_LEGACY(in_buf); + + struct fuse_init_out *out =3D + (struct fuse_init_out *)FUSE_OUT_OP_STRUCT_LEGACY(out_buf); + + ret =3D fuse_co_init(exp, out, in->max_readahead, in); break; } =20 - case FUSE_OPEN: - ret =3D fuse_co_open(exp, FUSE_OUT_OP_STRUCT(open, out_buf)); + case FUSE_OPEN: { + struct fuse_open_out *out =3D + (struct fuse_open_out *)op_out_buf; + + ret =3D fuse_co_open(exp, out); break; + } =20 case FUSE_RELEASE: ret =3D 0; break; =20 case FUSE_LOOKUP: - ret =3D -ENOENT; /* There is no node but the root node */ + ret =3D -ENOENT; break; =20 - case FUSE_GETATTR: - ret =3D fuse_co_getattr(exp, FUSE_OUT_OP_STRUCT(attr, out_buf)); + case FUSE_GETATTR: { + struct fuse_attr_out *out =3D + (struct fuse_attr_out *)op_out_buf; + + ret =3D fuse_co_getattr(exp, out); break; + } =20 case FUSE_SETATTR: { - const struct fuse_setattr_in *in =3D FUSE_IN_OP_STRUCT(setattr, q); - ret =3D fuse_co_setattr(exp, FUSE_OUT_OP_STRUCT(attr, out_buf), - in->valid, in->size, in->mode, in->uid, in->= gid); + const struct fuse_setattr_in *in =3D + (const struct fuse_setattr_in *)op_in_buf; + + struct fuse_attr_out *out =3D + (struct fuse_attr_out *)op_out_buf; + + ret =3D fuse_co_setattr(exp, out, in->valid, in->size, in->mode, + in->uid, in->gid); break; } =20 case FUSE_READ: { - const struct fuse_read_in *in =3D FUSE_IN_OP_STRUCT(read, q); + const struct fuse_read_in *in =3D + (const struct fuse_read_in *)op_in_buf; + ret =3D fuse_co_read(exp, &out_data_buffer, in->offset, in->size); break; } =20 case FUSE_WRITE: { - const struct fuse_write_in *in =3D FUSE_IN_OP_STRUCT(write, q); - uint32_t req_len; - - req_len =3D ((const struct fuse_in_header *)q->request_buf)->len; - if (unlikely(req_len < sizeof(struct fuse_in_header) + sizeof(*in)= + - in->size)) { - warn_report("FUSE WRITE truncated; received %zu bytes of %" PR= Iu32, - req_len - sizeof(struct fuse_in_header) - sizeof(*= in), - in->size); - ret =3D -EINVAL; - break; + const struct fuse_write_in *in =3D + (const struct fuse_write_in *)op_in_buf; + + struct fuse_write_out *out =3D + (struct fuse_write_out *)op_out_buf; + +#ifdef CONFIG_LINUX_IO_URING + if (!exp->is_uring) { +#endif + uint32_t req_len =3D ((const struct fuse_in_header *)in_buf)->= len; + + if (unlikely(req_len < sizeof(struct fuse_in_header) + sizeof(= *in) + + in->size)) { + warn_report("FUSE WRITE truncated; received %zu bytes of %" + PRIu32, + req_len - sizeof(struct fuse_in_header) - sizeof(*in), + in->size); + ret =3D -EINVAL; + break; + } +#ifdef CONFIG_LINUX_IO_URING + } else { + assert(in->size <=3D + ((FuseRingEnt *)opaque)->req_header.ring_ent_in_out.payloa= d_sz); } +#endif =20 - /* - * poll_fuse_fd() has checked that in_hdr->len matches the number = of - * bytes read, which cannot exceed the max_write value we set - * (FUSE_MAX_WRITE_BYTES). So we know that FUSE_MAX_WRITE_BYTES >= =3D - * in_hdr->len >=3D in->size + X, so this assertion must hold. - */ assert(in->size <=3D FUSE_MAX_WRITE_BYTES); =20 - /* - * Passing a pointer to `in` (i.e. the request buffer) is fine bec= ause - * fuse_co_write() takes care to copy its contents before potentia= lly - * yielding. - */ - ret =3D fuse_co_write(exp, FUSE_OUT_OP_STRUCT(write, out_buf), - in->offset, in->size, in + 1, spillover_buf); + const void *in_place_buf =3D in + 1; + const void *spill_buf =3D spillover_buf; + +#ifdef CONFIG_LINUX_IO_URING + if (exp->is_uring) { + in_place_buf =3D NULL; + spill_buf =3D out_buf; + } +#endif + + ret =3D fuse_co_write(exp, out, in->offset, in->size, + in_place_buf, spill_buf); break; } =20 case FUSE_FALLOCATE: { - const struct fuse_fallocate_in *in =3D FUSE_IN_OP_STRUCT(fallocate= , q); + const struct fuse_fallocate_in *in =3D + (const struct fuse_fallocate_in *)op_in_buf; + ret =3D fuse_co_fallocate(exp, in->offset, in->length, in->mode); break; } @@ -1759,9 +1797,13 @@ fuse_co_process_request(FuseQueue *q, void *spillove= r_buf) =20 #ifdef CONFIG_FUSE_LSEEK case FUSE_LSEEK: { - const struct fuse_lseek_in *in =3D FUSE_IN_OP_STRUCT(lseek, q); - ret =3D fuse_co_lseek(exp, FUSE_OUT_OP_STRUCT(lseek, out_buf), - in->offset, in->whence); + const struct fuse_lseek_in *in =3D + (const struct fuse_lseek_in *)op_in_buf; + + struct fuse_lseek_out *out =3D + (struct fuse_lseek_out *)op_out_buf; + + ret =3D fuse_co_lseek(exp, out, in->offset, in->whence); break; } #endif @@ -1770,28 +1812,147 @@ fuse_co_process_request(FuseQueue *q, void *spillo= ver_buf) ret =3D -ENOSYS; } =20 - /* Ignore errors from fuse_write*(), nothing we can do anyway */ + send_response(opaque, req_id, ret, out_data_buffer, out_buf); + if (out_data_buffer) { - assert(ret >=3D 0); - fuse_write_buf_response(q->fuse_fd, req_id, out_hdr, - out_data_buffer, ret); qemu_vfree(out_data_buffer); + } + + if (fd !=3D -1) { + qemu_vfree(spillover_buf); + } + #ifdef CONFIG_LINUX_IO_URING + /* Handle FUSE initialization errors */ + if (unlikely(opcode =3D=3D FUSE_INIT && ret =3D=3D -ENODEV)) { + error_report("System doesn't support FUSE-over-io_uring"); + fuse_export_halt(exp); + return; + } + /* Handle FUSE-over-io_uring initialization */ - if (unlikely(opcode =3D=3D FUSE_INIT && exp->is_uring)) { + if (unlikely(opcode =3D=3D FUSE_INIT && exp->is_uring && fd !=3D -1)) { struct fuse_init_out *out =3D - (struct fuse_init_out *)FUSE_OUT_OP_STRUCT(out_buf); + (struct fuse_init_out *)FUSE_OUT_OP_STRUCT_LEGACY(out_buf); fuse_uring_start(exp, out); } #endif +} + +/* Helper to send response for legacy */ +static void send_response_legacy(void *opaque, uint32_t req_id, ssize_t re= t, + const void *buf, void *out_buf) +{ + FuseQueue *q =3D (FuseQueue *)opaque; + struct fuse_out_header *out_hdr =3D (struct fuse_out_header *)out_buf; + if (buf) { + assert(ret >=3D 0); + fuse_write_buf_response(q->fuse_fd, req_id, out_hdr, buf, ret); } else { fuse_write_response(q->fuse_fd, req_id, out_hdr, ret < 0 ? ret : 0, ret < 0 ? 0 : ret); } +} =20 - qemu_vfree(spillover_buf); +static void coroutine_fn +fuse_co_process_request(FuseQueue *q, void *spillover_buf) +{ + FuseExport *exp =3D q->exp; + uint32_t opcode; + uint64_t req_id; + + /* + * Return buffer. Must be large enough to hold all return headers, bu= t does + * not include space for data returned by read requests. + */ + char out_buf[sizeof(struct fuse_out_header) + + MAX_CONST(sizeof(struct fuse_init_out), + MAX_CONST(sizeof(struct fuse_open_out), + MAX_CONST(sizeof(struct fuse_attr_out), + MAX_CONST(sizeof(struct fuse_write_out), + sizeof(struct fuse_lseek_out)))))] =3D {0}; + + /* Limit scope to ensure pointer is no longer used after yielding */ + { + const struct fuse_in_header *in_hdr =3D + (const struct fuse_in_header *)q->request_buf; + + opcode =3D in_hdr->opcode; + req_id =3D in_hdr->unique; + } + + fuse_co_process_request_common(exp, opcode, req_id, q->request_buf, + spillover_buf, out_buf, q->fuse_fd, send_response_legacy, q); +} + +#ifdef CONFIG_LINUX_IO_URING +static void fuse_uring_prep_sqe_commit(struct io_uring_sqe *sqe, void *opa= que) +{ + FuseRingEnt *ent =3D opaque; + struct fuse_uring_cmd_req *req =3D (void *)&sqe->cmd[0]; + + fuse_uring_sqe_prepare(sqe, ent->rq->q, FUSE_IO_URING_CMD_COMMIT_AND_F= ETCH); + fuse_uring_sqe_set_req_data(req, ent->rq->rqid, + ent->req_commit_id); +} + +static void +fuse_uring_send_response(FuseRingEnt *ent, uint32_t req_id, ssize_t ret, + const void *out_data_buffer) +{ + FuseExport *exp =3D ent->rq->q->exp; + + struct fuse_uring_req_header *rrh =3D &ent->req_header; + struct fuse_out_header *out_header =3D (struct fuse_out_header *)&rrh-= >in_out; + struct fuse_uring_ent_in_out *ent_in_out =3D + (struct fuse_uring_ent_in_out *)&rrh->ring_ent_in_out; + + /* FUSE_READ */ + if (out_data_buffer && ret > 0) { + memcpy(ent->op_payload, out_data_buffer, ret); + } + + out_header->error =3D ret < 0 ? ret : 0; + out_header->unique =3D req_id; + /* out_header->len =3D ret > 0 ? ret : 0; */ + ent_in_out->payload_sz =3D ret > 0 ? ret : 0; + aio_add_sqe(fuse_uring_prep_sqe_commit, ent, + &ent->fuse_cqe_handler); +} + +/* Helper to send response for uring */ +static void send_response_uring(void *opaque, uint32_t req_id, ssize_t ret, + const void *out_data_buffer, void *payload) +{ + FuseRingEnt *ent =3D (FuseRingEnt *)opaque; + + fuse_uring_send_response(ent, req_id, ret, out_data_buffer); +} + +static void coroutine_fn fuse_uring_co_process_request(FuseRingEnt *ent) +{ + FuseExport *exp =3D ent->rq->q->exp; + struct fuse_uring_req_header *rrh =3D &ent->req_header; + struct fuse_uring_ent_in_out *ent_in_out =3D + (struct fuse_uring_ent_in_out *)&rrh->ring_ent_in_out; + struct fuse_in_header *in_hdr =3D + (struct fuse_in_header *)&rrh->in_out; + uint32_t opcode =3D in_hdr->opcode; + uint64_t req_id =3D in_hdr->unique; + ent->req_commit_id =3D ent_in_out->commit_id; + + if (unlikely(ent->req_commit_id =3D=3D 0)) { + error_report("If this happens kernel will not find the response - " + "it will be stuck forever - better to abort immediately."); + fuse_export_halt(exp); + return; + } + + fuse_co_process_request_common(exp, opcode, req_id, &rrh->op_in, + NULL, ent->op_payload, -1, send_response_uring, ent); } +#endif =20 const BlockExportDriver blk_exp_fuse =3D { .type =3D BLOCK_EXPORT_TYPE_FUSE, --=20 2.45.2 From nobody Sun Sep 28 15:30:17 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=gmail.com ARC-Seal: i=1; a=rsa-sha256; t=1756572131; cv=none; d=zohomail.com; s=zohoarc; b=dCcP9W9wauvhfcC+8KnO+2wQNl14tuyFMpEMbX0otBaq2BHuI3C39LXvwTT8G2OK2sMZnuAJ8fO9LfBEWjL6IFffs9612rpyrUucmAyXidYVqFvrvPAD08bAzNS6D0a12x2gtblpBnYrA4ntdLHt6R70y5Sn2unm9jx/JwkH1ZY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1756572131; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=u0/rMuj5BJq38ST18amwRehThVkEDs8BGEyd5stjYDo=; b=dIR+CpgkSyLDQYUS+0QF9nQNNAriXfwFGPtpmewTjDY9zKduA9wTcq0+XOW1rmxJBJVlZbh2Rb4xOFlDMQFrYdh6VnHouPoNQdk9o8LgM/OEkoy7Ys3895AeyJshYmi8BEcMJHu5lq/7Oal6UlqOyZzlm519JH0QsksJkGosBhU= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1756572131502236.77367776461892; Sat, 30 Aug 2025 09:42:11 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1usNqS-0001Ll-FM; Sat, 30 Aug 2025 11:50:24 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1usBg5-0003bQ-DG; Fri, 29 Aug 2025 22:50:53 -0400 Received: from mail-qk1-x736.google.com ([2607:f8b0:4864:20::736]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1usBg3-00087l-6Y; Fri, 29 Aug 2025 22:50:53 -0400 Received: by mail-qk1-x736.google.com with SMTP id af79cd13be357-7f6d8fcd106so376235185a.2; Fri, 29 Aug 2025 19:50:50 -0700 (PDT) Received: from localhost.localdomain (wn-campus-nat-129-97-124-90.dynamic.uwaterloo.ca. [129.97.124.90]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7fc0eacf1b4sm299457085a.21.2025.08.29.19.50.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Aug 2025 19:50:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1756522250; x=1757127050; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=u0/rMuj5BJq38ST18amwRehThVkEDs8BGEyd5stjYDo=; b=GQyHS0CKJGLxTbZz3wBB7eZEcHdicVnkDm0BNhjEcQCZlQB4MycWbhe6a6FPRq0YZg xCFtIdy89y0zy9KAyA/+3JcTQ/y4UcMUlKQdT7ZcHWCZOffS3vdNRsWdrbL3bFRpTz6A yXtJ8YGTHLDlySLq2+ti3euV2mKS287+j0IYQJ8wLfzTFXlW8yIDjKHMaXB+BLBr7kzn Luu0/juJPw96jm7bHEm0CSB1xhqMhS4s4Y5IaaDY6nbNUbv4BQgcc0uSHFe1r7RDEwr5 IKbWJNJlcLMo2SyDQDLiLDvs3q8smKE+u6IawNkHLByug+tLK0nCSy7D849aoUIa0RLt nosA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1756522250; x=1757127050; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=u0/rMuj5BJq38ST18amwRehThVkEDs8BGEyd5stjYDo=; b=ob7tfXKGsgtjN1hsF1zL2XsMTX7eAfvVjrRTrJa6g7FNXefl24W70ptiPdlz0R8iZB hH2Y3op5/Pe4yGrVA7PJC7TNxlvvbI8xA9W1S3T19WQf/g3nMLQMZXES/jwjCo+KSjtP +uoU78TRH7ES1tq8SLMWynmoL+nWmQcAhiA0r6NfyCI3VjJPuGUTzkOhfKWmURQk+XIw Dps2hNC5Euvbn05vMj5/4rjxV83vjONbidBBjLbrqaFRdRfrGwCTxz3EYOmfi/K4rfKt B+tMco3aA4vJ65cQi9l0Kd7vJoZp+6fCoE0umCmCbY7VSXgJ+sqvw/JLW/tyIEFaap7r QIXw== X-Gm-Message-State: AOJu0YwSdB6Zj2ovPrIDLHezKTzvuKXmlDy7f8Iiy4/XW2YLoCF3ksFa 8CpkSbRnZXmG8WsA6oJ5ooZS+hAhdZiEuya8y0SkxUtu0rOg8Rq/8hcfmYbfKQ== X-Gm-Gg: ASbGnctaoztnbm+XHPUjrgyjshORmxa9E0M8SJy03LafA1azG1B0NMbtMtfpUrfE3KE QxprbiZ3+PIxHP4nW2xdVq5yCl9dTW56GbepR5z7hrFJeTVdy66qe7kUJTYdi4KAEhMGGI6JP0R ysnw2W69AUemvBUlYE3+djbsXailL45+OgrFhUe8H0elR/uZXWlY/Dutzl2DZulBPOSkEVIia+o jYvy2pMqe9rVMRjkegEJ3pcsZqqgLvOYa5GTSQHEiMxYo8AyqULEp7t4ubQwX2pEXxIZDVF/9eM VejVMUkoC0loX4E97RM5lp1u8Igq0FP0/qvaqITUsEORy7ILKMerYzcJFHSb1FH+5UkYQ8aKiCo hDHp2OkC90gKexagjC2/+LBOY4yAThaudIWBAp8D/xBlcJA/06Fu1OTJcT+IgTeznPrHmT2hSGK PEHs27R5SZvO+cAsilYv3KMgE4mf0= X-Google-Smtp-Source: AGHT+IH18oXSzBFKJ4Xk5tXPZGMM9o2LuvSrmp5tf/Cp7Okqf1nNmkA+1nQBGVyxlav+H4uferfSXg== X-Received: by 2002:a05:620a:4403:b0:7ea:458:e6e2 with SMTP id af79cd13be357-7ff2cbd87f0mr76635585a.77.1756522249642; Fri, 29 Aug 2025 19:50:49 -0700 (PDT) From: Brian Song To: qemu-block@nongnu.org Cc: qemu-devel@nongnu.org, armbru@redhat.com, bernd@bsbernd.com, fam@euphon.net, hibriansong@gmail.com, hreitz@redhat.com, kwolf@redhat.com, stefanha@redhat.com Subject: [PATCH 3/4] export/fuse: Safe termination for FUSE-uring Date: Fri, 29 Aug 2025 22:50:24 -0400 Message-ID: <20250830025025.3610-4-hibriansong@gmail.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250830025025.3610-1-hibriansong@gmail.com> References: <20250830025025.3610-1-hibriansong@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::736; envelope-from=hibriansong@gmail.com; helo=mail-qk1-x736.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @gmail.com) X-ZM-MESSAGEID: 1756572132847116600 Content-Type: text/plain; charset="utf-8" When the user sends a termination signal, storage-export-daemon stops the export, exits the main loop (main_loop_wait), and begins cleaning up associated resources. At this point, some SQEs submitted via FUSE_IO _URING_CMD_COMMIT_AND_FETCH may still be pending in the kernel, waiting for incoming FUSE requests, which can trigger CQE handlers in user space. Currently, there is no way to manually cancel these pending CQEs in the kernel. As a result, after export termination, the related data structures might be deleted before the pending CQEs return, causing the CQE handler to be invoked after it has been freed, which may lead to a segfault. As a workaround, when submitting an SQE to the kernel, we increment the block reference (blk_exp_ref) to prevent the CQE handler from being deleted during export termination. Once the CQE is received, we decrement the reference (blk_exp_unref). Suggested-by: Kevin Wolf Suggested-by: Stefan Hajnoczi Signed-off-by: Brian Song --- block/export/fuse.c | 75 +++++++++++++++++++++++++++++++++++++++------ 1 file changed, 65 insertions(+), 10 deletions(-) diff --git a/block/export/fuse.c b/block/export/fuse.c index 07f74fc8ec..ab2eb895ad 100644 --- a/block/export/fuse.c +++ b/block/export/fuse.c @@ -39,6 +39,7 @@ =20 #include "standard-headers/linux/fuse.h" #include +#include =20 #if defined(CONFIG_FALLOCATE_ZERO_RANGE) #include @@ -321,6 +322,8 @@ static void coroutine_fn co_fuse_uring_queue_handle_cqe= s(void *opaque) fuse_inc_in_flight(exp); =20 /* A ring entry returned */ + blk_exp_unref(&exp->common); + fuse_uring_co_process_request(ent); =20 /* Finished processing requests */ @@ -345,6 +348,9 @@ static void fuse_uring_cqe_handler(CqeHandler *cqe_hand= ler) err !=3D -ENOTCONN) { fuse_export_halt(exp); } + + /* A ring entry returned */ + blk_exp_unref(&exp->common); } else { co =3D qemu_coroutine_create(co_fuse_uring_queue_handle_cqes, ent); qemu_coroutine_enter(co); @@ -392,6 +398,8 @@ static void fuse_uring_submit_register(void *opaque) FuseRingEnt *ent =3D opaque; FuseExport *exp =3D ent->rq->q->exp; =20 + /* Commit and fetch a ring entry */ + blk_exp_ref(&exp->common); =20 aio_add_sqe(fuse_uring_prep_sqe_register, ent, &(ent->fuse_cqe_handler= )); } @@ -886,6 +894,38 @@ static void read_from_fuse_fd(void *opaque) qemu_coroutine_enter(co); } =20 +#ifdef CONFIG_LINUX_IO_URING +static void fuse_ring_queue_manager_destroy(FuseRingQueueManager *manager) +{ + if (!manager) { + return; + } + + for (int i =3D 0; i < manager->num_ring_queues; i++) { + FuseRingQueue *rq =3D &manager->ring_queues[i]; + + for (int j =3D 0; j < FUSE_DEFAULT_RING_QUEUE_DEPTH; j++) { + g_free(rq->ent[j].op_payload); + } + g_free(rq->ent); + } + + g_free(manager->ring_queues); + g_free(manager); +} + +static void fuse_export_delete_uring(FuseExport *exp) +{ + exp->is_uring =3D false; + + /* Clean up ring queue manager */ + if (exp->ring_queue_manager) { + fuse_ring_queue_manager_destroy(exp->ring_queue_manager); + exp->ring_queue_manager =3D NULL; + } +} +#endif + static void fuse_export_shutdown(BlockExport *blk_exp) { FuseExport *exp =3D container_of(blk_exp, FuseExport, common); @@ -901,24 +941,15 @@ static void fuse_export_shutdown(BlockExport *blk_exp) */ g_hash_table_remove(exports, exp->mountpoint); } -} - -static void fuse_export_delete(BlockExport *blk_exp) -{ - FuseExport *exp =3D container_of(blk_exp, FuseExport, common); =20 - for (int i =3D 0; i < exp->num_queues; i++) { + for (size_t i =3D 0; i < exp->num_queues; i++) { FuseQueue *q =3D &exp->queues[i]; =20 /* Queue 0's FD belongs to the FUSE session */ if (i > 0 && q->fuse_fd >=3D 0) { close(q->fuse_fd); } - if (q->spillover_buf) { - qemu_vfree(q->spillover_buf); - } } - g_free(exp->queues); =20 if (exp->fuse_session) { if (exp->mounted) { @@ -927,8 +958,29 @@ static void fuse_export_delete(BlockExport *blk_exp) =20 fuse_session_destroy(exp->fuse_session); } +} + +static void fuse_export_delete(BlockExport *blk_exp) +{ + FuseExport *exp =3D container_of(blk_exp, FuseExport, common); + + for (size_t i =3D 0; i < exp->num_queues; i++) { + FuseQueue *q =3D &exp->queues[i]; + + if (q->spillover_buf) { + qemu_vfree(q->spillover_buf); + } + } =20 g_free(exp->mountpoint); + +#ifdef CONFIG_LINUX_IO_URING + if (exp->is_uring) { + fuse_export_delete_uring(exp); + } +#endif + + g_free(exp->queues); } =20 /** @@ -1917,6 +1969,9 @@ fuse_uring_send_response(FuseRingEnt *ent, uint32_t r= eq_id, ssize_t ret, out_header->unique =3D req_id; /* out_header->len =3D ret > 0 ? ret : 0; */ ent_in_out->payload_sz =3D ret > 0 ? ret : 0; + + /* Commit and fetch a ring entry */ + blk_exp_ref(&exp->common); aio_add_sqe(fuse_uring_prep_sqe_commit, ent, &ent->fuse_cqe_handler); } --=20 2.45.2 From nobody Sun Sep 28 15:30:17 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=gmail.com ARC-Seal: i=1; a=rsa-sha256; t=1756572291; cv=none; d=zohomail.com; s=zohoarc; b=XjwHUzfCEskHPe3bluSBSgd5Pm+FquHMAaVWgcdqcgUF/FBDghCu4Dcw+XRmwYAFrXygwozeXntpMH/VKu2+No0JB6dXoyPp8k1yAkxOdC1VfiBbF9Jy+Bor1JXQ60h09aZiU+xSsq8duj79xoOjeSrVVNocgFU54uBN7+Pv83s= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1756572291; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=Qwx7Non1RKQkij6XP/CEOZXUJ1kdTiRPKXDqDi7L1NE=; b=X6k8Wc9sztxixt6tUTVTb9RWCiaCe5/iGE627cuam3sKE1Wjnl4z3OamIc5x8ZC3XUBaJNXl/Wg+f0xAnmAWC/k78IdaUqZQMYW8v5J1qN6BRXTziGAmMKpQw6+c6ssm9Gsg/o9Zm4oHLnNGZtcwo7+RLfxd6IwxcVbSoySjUPM= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1756572291511596.912565394282; Sat, 30 Aug 2025 09:44:51 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1usNqS-0001MW-QQ; Sat, 30 Aug 2025 11:50:25 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1usBg5-0003c9-Vj; Fri, 29 Aug 2025 22:50:54 -0400 Received: from mail-qk1-x72a.google.com ([2607:f8b0:4864:20::72a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1usBg4-00088D-82; Fri, 29 Aug 2025 22:50:53 -0400 Received: by mail-qk1-x72a.google.com with SMTP id af79cd13be357-7f7e89e4b37so260077485a.3; Fri, 29 Aug 2025 19:50:51 -0700 (PDT) Received: from localhost.localdomain (wn-campus-nat-129-97-124-90.dynamic.uwaterloo.ca. [129.97.124.90]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7fc0eacf1b4sm299457085a.21.2025.08.29.19.50.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Aug 2025 19:50:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1756522251; x=1757127051; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Qwx7Non1RKQkij6XP/CEOZXUJ1kdTiRPKXDqDi7L1NE=; b=eC2kBaebvVPOntPMu/wiDcHeLqhB5umi4Dt12j45XToWzyjP8MjWltHmbekhWJHjDJ 0vOPCvD181/s0nmHWTGrL1hmcf3CnMWGKQhySLSuTxnwCkV0NvSXKMKUDOUwZMdlkLPy 50vjYSHrGdPiIBR6QsuW5Oe04BWztCxm//xccJxDMnXE5I9CmH8fGNZWR156oNJvfzd9 nbtNv8qYcyGklztInWF05OBgpfUncB0IP972NCOl5QhkEk/3JtLLM4VkdEWh5QErX6yy l59j3D2t4M4HAU7JoGysDqhhBL4YZyesyfLpCqS7FFzqd+ZKJM5Bvs3b5Z4yfHtI4NOa ZHcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1756522251; x=1757127051; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Qwx7Non1RKQkij6XP/CEOZXUJ1kdTiRPKXDqDi7L1NE=; b=djehV/yDWN8TkMpftauWuX3X/DbEFDtVKS2Mt63QR221bmYqYNvVpPGlNcrBgyJ2R4 wdADN7+ivVX73yNcT0q99EFN0ghl8Nlmd1Xd9483WIT+7ZfSin0eqy7FnyMRoek3t0Ss kNQZOdXnN5LDex1fVled2ykXHTDszUkALw4z1MSj1ZURisLvc+e6OiyJe+/BAkX1FdHZ 0nqjjzgY+oasXhPFnlkGzT/xn3I7j464HwY7dRPCHZMohud4fqDu/FSbTKWVeF7958om VutRJq0bgVWGMNBouPQ+hDZ/p52tWEcoykfakK3oTmX1HxKWcd+h91G+q+7cMNU6CQOk c7SA== X-Gm-Message-State: AOJu0YzGz3NjIS7x/cPmJwNQ8iMY0T+MDXQc7B0t7YNsX73cm9RjiJAx 2IC3vj5ykjBsRdI/SN9cUuqCGsOmnR0yLeQPmNo9ucsL5Qf/GrS+pseTwT/X6Q== X-Gm-Gg: ASbGncsMT41rh82lAMgxFazwtGbHe/9rQQYs+ubZydIMquVkNSp9dLjerrSDs4o468V 8LWkrsYuqoamk68VvSSYBA4Ur4VgYoku9wCrhDfaGUPOGXNwCVMfejY4dFwzUJ9BRDnHANJ3mjn dNRUrhnyMdelOYI4j3PA2FJTBkUulY8dJ9fbT07o/WAaXqBKFJ+wqfrOyeCrCbgs/nuytDYb+/z huwysQ0+kDHF1ZENFYGtOE8yjl2w5ajHcBajASOUeXc5wpaOUjiB9MbW41kaRZUlXkMA1M2yw11 ZK8cx3kaIVu6a0Oq0SGJeJZus4xPzVbtrDC6C1dh6WHoOV+CoQYg7doQwH8fk4oz49XJ2k+Fufv fxM3fydtZ4+4jClTFHT/8CeQRo64Wi3AM4y4MiyUQ/14rjPoITJdSlMOkSgbPiMwP0HEheBIefK blolxYpGAp4swrV5lbchqhyJKbW4w= X-Google-Smtp-Source: AGHT+IGXDfsIf0avkJ5sxwRhCFbRVyU9krZ2QlADQXNxoBq7/CyDcVLwJmFFJEogH/CEImWaTWSjyg== X-Received: by 2002:a05:620a:7010:b0:7e9:f81f:ce72 with SMTP id af79cd13be357-7ff2ccd09e3mr99786185a.72.1756522250697; Fri, 29 Aug 2025 19:50:50 -0700 (PDT) From: Brian Song To: qemu-block@nongnu.org Cc: qemu-devel@nongnu.org, armbru@redhat.com, bernd@bsbernd.com, fam@euphon.net, hibriansong@gmail.com, hreitz@redhat.com, kwolf@redhat.com, stefanha@redhat.com Subject: [PATCH 4/4] iotests: add tests for FUSE-over-io_uring Date: Fri, 29 Aug 2025 22:50:25 -0400 Message-ID: <20250830025025.3610-5-hibriansong@gmail.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250830025025.3610-1-hibriansong@gmail.com> References: <20250830025025.3610-1-hibriansong@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::72a; envelope-from=hibriansong@gmail.com; helo=mail-qk1-x72a.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @gmail.com) X-ZM-MESSAGEID: 1756572293888124100 Content-Type: text/plain; charset="utf-8" To test FUSE-over-io_uring, set the environment variable FUSE_OVER_IO_URING=3D1. This applies only when using the 'fuse' protocol. $ FUSE_OVER_IO_URING=3D1 ./check -fuse Suggested-by: Kevin Wolf Suggested-by: Stefan Hajnoczi Signed-off-by: Brian Song --- tests/qemu-iotests/check | 2 ++ tests/qemu-iotests/common.rc | 45 +++++++++++++++++++++++++++--------- 2 files changed, 36 insertions(+), 11 deletions(-) diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check index 545f9ec7bd..c6fa0f9e3d 100755 --- a/tests/qemu-iotests/check +++ b/tests/qemu-iotests/check @@ -94,6 +94,8 @@ def make_argparser() -> argparse.ArgumentParser: mg.add_argument('-' + fmt, dest=3D'imgfmt', action=3D'store_const', const=3Dfmt, help=3Df'test {fmt}') =20 + # To test FUSE-over-io_uring, set the environment variable + # FUSE_OVER_IO_URING=3D1. This applies only when using the 'fuse' prot= ocol protocol_list =3D ['file', 'rbd', 'nbd', 'ssh', 'nfs', 'fuse'] g_prt =3D p.add_argument_group( ' image protocol options', diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc index e977cb4eb6..f8b79c3810 100644 --- a/tests/qemu-iotests/common.rc +++ b/tests/qemu-iotests/common.rc @@ -539,17 +539,38 @@ _make_test_img() touch "$export_mp" rm -f "$SOCK_DIR/fuse-output" =20 - # Usually, users would export formatted nodes. But we present fus= e as a - # protocol-level driver here, so we have to leave the format to the - # client. - # Switch off allow-other, because in general we do not need it for - # iotests. The default allow-other=3Dauto has the downside of pri= nting a - # fusermount error on its first attempt if allow_other is not - # permissible, which we would need to filter. - QSD_NEED_PID=3Dy $QSD \ - --blockdev file,node-name=3Dexport-node,filename=3D$img_name= ,discard=3Dunmap \ - --export fuse,id=3Dfuse-export,node-name=3Dexport-node,mount= point=3D"$export_mp",writable=3Don,growable=3Don,allow-other=3Doff \ - & + if [ -n "$FUSE_OVER_IO_URING" ]; then + nr_cpu=3D$(nproc 2>/dev/null || echo 1) + nr_iothreads=3D$((nr_cpu / 2)) + if [ $nr_iothreads -lt 1 ]; then + nr_iothreads=3D1 + fi + + iothread_args=3D"" + iothread_export_args=3D"" + for ((i=3D0; i<$nr_iothreads; i++)); do + iothread_args=3D"$iothread_args --object iothread,id=3Diot= hread$i" + iothread_export_args=3D"$iothread_export_args,iothread.$i= =3Diothread$i" + done + + QSD_NEED_PID=3Dy $QSD \ + $iothread_args \ + --blockdev file,node-name=3Dexport-node,filename=3D$im= g_name,discard=3Dunmap \ + --export fuse,id=3Dfuse-export,node-name=3Dexport-node= ,mountpoint=3D"$export_mp",writable=3Don,growable=3Don,allow-other=3Doff,io= -uring=3Don$iothread_export_args \ + & + else + # Usually, users would export formatted nodes. But we present= fuse as a + # protocol-level driver here, so we have to leave the format t= o the + # client. + # Switch off allow-other, because in general we do not need it= for + # iotests. The default allow-other=3Dauto has the downside of= printing a + # fusermount error on its first attempt if allow_other is not + # permissible, which we would need to filter. + QSD_NEED_PID=3Dy $QSD \ + --blockdev file,node-name=3Dexport-node,filename=3D$img_na= me,discard=3Dunmap \ + --export fuse,id=3Dfuse-export,node-name=3Dexport-node,mou= ntpoint=3D"$export_mp",writable=3Don,growable=3Don,allow-other=3Doff \ + & + fi =20 pidfile=3D"$QEMU_TEST_DIR/qemu-storage-daemon.pid" =20 @@ -592,6 +613,8 @@ _rm_test_img() =20 kill "${FUSE_PIDS[index]}" =20 + sleep 1 + # Wait until the mount is gone timeout=3D10 # *0.5 s while true; do --=20 2.45.2