From nobody Sat Nov 15 05:35:28 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=gmail.com ARC-Seal: i=1; a=rsa-sha256; t=1755229708; cv=none; d=zohomail.com; s=zohoarc; b=cXToYw88u7EzH3O2xUQuhcUMVJ8oQtIQrI/Ky3+EtCXvYvLs29aA9lkkcHMHLlktM95voKtLFAMWN6Qvs+bQum1ccHVcjq6ZCXRekyh1yRea3DB5wTJmaAH03xKvFBXGOkLexTo9lF3t0d2d10t0sx/ZDcAs/8ME7T7LY5qr6Do= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1755229708; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=8w4Pu64unMhEvmaErPpREsaRP6ANziVJj5cm8exbtTA=; b=G4BkrSdsykkcsy+tpmDonJRq7caqw/fjuDnW33XqQQs5ekLLkUyQdcWbyLWiIfl1Q5223L/Lj0BzZeaaIB6E9kG9QzmKHJi7h4ghMPYvLd3k4F2bYICwpW/F2eVG2h3f9UJjnFhtLZgjRy3DXZxtSmxFEe3OHIpbYO0Y5QfPa+w= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1755229708154803.9034152562929; Thu, 14 Aug 2025 20:48:28 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1umlPc-0004bb-DM; Thu, 14 Aug 2025 23:47:28 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1umlPb-0004b3-4I; Thu, 14 Aug 2025 23:47:27 -0400 Received: from mail-qv1-xf35.google.com ([2607:f8b0:4864:20::f35]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1umlPY-0002tW-Q2; Thu, 14 Aug 2025 23:47:26 -0400 Received: by mail-qv1-xf35.google.com with SMTP id 6a1803df08f44-70a88db0416so16650126d6.0; Thu, 14 Aug 2025 20:47:24 -0700 (PDT) Received: from zzzhi.uwaterloo.ca (wn-campus-nat-129-97-124-101.dynamic.uwaterloo.ca. [129.97.124.101]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-70ba906706esm1402346d6.17.2025.08.14.20.47.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Aug 2025 20:47:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1755229643; x=1755834443; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8w4Pu64unMhEvmaErPpREsaRP6ANziVJj5cm8exbtTA=; b=JkCPcmPqmxHgJnSGVwUIkoO4oQG4OYDp0hFvFJSJ3gLab+bTED79vE2QH6WbMFB5MI izSGbitCx0sKgqS9Gj316UIPHXWhDzT5+ScbbD8aHfDaxz/lcUpdI2OCh8ymWaAm7qtb vAU0Yyay3ZbC5xw1RsFIjSp/rL/ODifpdqHxDPf7enHPo/q3sWWqb/6UI7YMIOWsrkkj T/K1FY9/UXHvpE/0fPfFrFb49KJ8EyAWjOWv7ZwQi/YHw8ZEUwH9c4IAXDJXDoS8DgXh cWUR+Ve+eBen1UqGHS7ZkSIkt2BQvX2s8LVLx/JRW44nrjQUcAmn87WC4iPENlqxnWNR 1jOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755229643; x=1755834443; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8w4Pu64unMhEvmaErPpREsaRP6ANziVJj5cm8exbtTA=; b=oXjM4GOQng6mZHYjIqbYgOcIuWL9vcborlmfk2BLB1w6DmVT+WB4NByP4hdi2PGX7w t5n80cvYxTBJ/TEWRs6+LSaxUIjg9+/jW4+HxAC/dNKhlw/Ild4acBxFLNXKYTV1LhuE txtT+ZBJAw46sE9cb/ml1s5ESG3XX2XbGGrDMOfWko8LEvnIDFa1bBlDgC1p02hPWCCq 2H7G/TWsFfMPdiVAG9XhaUSMwHjZ17ExbApoK35021trvPxNjPAbb1vHZDmi+c7K68++ mBFRdBjwNIVXUJzP0bvBZ2eNQe6OikB1scMhKPPXrrIZkA7DEiXiT0bGlULRmGf3vCK8 IvWA== X-Gm-Message-State: AOJu0YwQY3+S50gK4SYdVHJOXxnyhMRllCJ32Pa04VypQhyKtgUp3RiL YKwicbS6x2uz1whq7i7qC+v/bpd4mynsIskNbaNdh50JdK/IvK+IdTI/tozchA== X-Gm-Gg: ASbGncthLuzmrburUQomf+hv0IxHnFN4CXATtwIUj4o9wAfXejhJn/Kfd00t1eN9hZC SQWsqSn0yv8tQPus9V7HCy5lwNAvmT4/w0FaUKnTo1kV1e67WDYws8w/WJQ20YXLzqCedUqgWcM tVVvViwBx55+bIHboNV2AHv2w9RB8fmGkhudzxopRHLi9DifpuHEp6rmMdffksDjeCyR25+Te3d rweyZCUb3Yvh3NbSCD6xuxXlByDIg/i7Ab3+VNcSRO0Rt5Yhvf5mkepLin7mG13c34xdSTGVRBe oj/wCOSlSrgpA+L/12of1oBbGoR1Mhhweq/iFhkoHgm+bEnbzCe7aPPGhRRCKj0t89gPetngjH+ paqa44pgMufd4yfMv95nVb+mE6lDwdBaCDtMfZqV2oVWfRZ4AvyBZ/dBiN/dNgPygfJoiaCq7mc PieBcvLHN4FYh3 X-Google-Smtp-Source: AGHT+IFlS38TGb6HNa2vrZWfeyTf/zHiI1L9ttVM7YfEXsh+LyK9gJgdU4k1ECThWpa1SbUMAlpv2A== X-Received: by 2002:ad4:5942:0:b0:70b:9b96:d751 with SMTP id 6a1803df08f44-70ba7c92190mr4388816d6.44.1755229642793; Thu, 14 Aug 2025 20:47:22 -0700 (PDT) From: Zhi Song X-Google-Original-From: Zhi Song To: qemu-block@nongnu.org Cc: qemu-devel@nongnu.org, armbru@redhat.com, bernd@bsbernd.com, fam@euphon.net, hibriansong@gmail.com, hreitz@redhat.com, kwolf@redhat.com, stefanha@redhat.com Subject: [PATCH 1/3] fuse: add FUSE-over-io_uring enable opt and init Date: Thu, 14 Aug 2025 23:46:17 -0400 Message-ID: <20250815034619.51980-2-hizhisong@gmail.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: <20250815034619.51980-1-hizhisong@gmail.com> References: <20250815034619.51980-1-hizhisong@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::f35; envelope-from=hibriansong@gmail.com; helo=mail-qv1-xf35.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @gmail.com) X-ZM-MESSAGEID: 1755229711165124100 Content-Type: text/plain; charset="utf-8" From: Brian Song This patch adds a new export option for storage-export-daemon to enable or disable FUSE-over-io_uring via the switch io-uring=3Don|off (disable by default). It also implements the protocol handshake with the Linux kernel during the FUSE-over-io_uring initialization phase. See: https://docs.kernel.org/filesystems/fuse-io-uring.html The kernel documentation describes in detail how FUSE-over-io_uring works. This patch implements the Initial SQE stage shown in thediagram: it initializes one queue per IOThread, each currently supporting a single submission queue entry (SQE). When the FUSE driver sends the first FUSE request (FUSE_INIT), storage-export-daemon calls fuse_uring_start() to complete initialization, ultimately submitting the SQE with the FUSE_IO_URING_CMD_REGISTER command to confirm successful initialization with the kernel. Suggested-by: Kevin Wolf Suggested-by: Stefan Hajnoczi Signed-off-by: Brian Song --- block/export/fuse.c | 161 ++++++++++++++++++++++++--- docs/tools/qemu-storage-daemon.rst | 11 +- qapi/block-export.json | 5 +- storage-daemon/qemu-storage-daemon.c | 1 + util/fdmon-io_uring.c | 5 +- 5 files changed, 159 insertions(+), 24 deletions(-) diff --git a/block/export/fuse.c b/block/export/fuse.c index c0ad4696ce..59fa79f486 100644 --- a/block/export/fuse.c +++ b/block/export/fuse.c @@ -48,6 +48,11 @@ #include #endif +#define FUSE_DEFAULT_MAX_PAGES_PER_REQ 32 + +/* room needed in buffer to accommodate header */ +#define FUSE_BUFFER_HEADER_SIZE 0x1000 + /* Prevent overly long bounce buffer allocations */ #define FUSE_MAX_READ_BYTES (MIN(BDRV_REQUEST_MAX_BYTES, 1 * 1024 * 1024)) /* @@ -63,12 +68,31 @@ (FUSE_MAX_WRITE_BYTES - FUSE_IN_PLACE_WRITE_BYTES) typedef struct FuseExport FuseExport; +typedef struct FuseQueue FuseQueue; + +typedef struct FuseRingEnt { + /* back pointer */ + FuseQueue *q; + + /* commit id of a fuse request */ + uint64_t req_commit_id; + + /* fuse request header and payload */ + struct fuse_uring_req_header req_header; + void *op_payload; + size_t req_payload_sz; + + /* The vector passed to the kernel */ + struct iovec iov[2]; + + CqeHandler fuse_cqe_handler; +} FuseRingEnt; /* * One FUSE "queue", representing one FUSE FD from which requests are fetc= hed * and processed. Each queue is tied to an AioContext. */ -typedef struct FuseQueue { +struct FuseQueue { FuseExport *exp; AioContext *ctx; @@ -109,7 +133,12 @@ typedef struct FuseQueue { * Free this buffer with qemu_vfree(). */ void *spillover_buf; -} FuseQueue; + +#ifdef CONFIG_LINUX_IO_URING + int qid; + FuseRingEnt ent; +#endif +}; /* * Verify that FuseQueue.request_buf plus the spill-over buffer together @@ -148,6 +177,7 @@ struct FuseExport { bool growable; /* Whether allow_other was used as a mount option or not */ bool allow_other; + bool is_uring; mode_t st_mode; uid_t st_uid; @@ -257,6 +287,93 @@ static const BlockDevOps fuse_export_blk_dev_ops =3D { .drained_poll =3D fuse_export_drained_poll, }; +#ifdef CONFIG_LINUX_IO_URING + +static void fuse_uring_sqe_set_req_data(struct fuse_uring_cmd_req *req, + const unsigned int qid, + const unsigned int commit_id) +{ + req->qid =3D qid; + req->commit_id =3D commit_id; + req->flags =3D 0; +} + +static void fuse_uring_sqe_prepare(struct io_uring_sqe *sqe, FuseQueue *q, + __u32 cmd_op) +{ + sqe->opcode =3D IORING_OP_URING_CMD; + + sqe->fd =3D q->fuse_fd; + sqe->rw_flags =3D 0; + sqe->ioprio =3D 0; + sqe->off =3D 0; + + sqe->cmd_op =3D cmd_op; + sqe->__pad1 =3D 0; +} + +static void fuse_uring_prep_sqe_register(struct io_uring_sqe *sqe, void *o= paque) +{ + FuseQueue *q =3D opaque; + struct fuse_uring_cmd_req *req =3D (void *)&sqe->cmd[0]; + + fuse_uring_sqe_prepare(sqe, q, FUSE_IO_URING_CMD_REGISTER); + + sqe->addr =3D (uint64_t)(q->ent.iov); + sqe->len =3D 2; + + fuse_uring_sqe_set_req_data(req, q->qid, 0); +} + +static void fuse_uring_submit_register(void *opaque) +{ + FuseQueue *q =3D opaque; + FuseExport *exp =3D q->exp; + + + aio_add_sqe(fuse_uring_prep_sqe_register, q, &(q->ent.fuse_cqe_handler= )); +} + +static void fuse_uring_start(FuseExport *exp, struct fuse_init_out *out) +{ + /* + * Since we didn't enable the FUSE_MAX_PAGES feature, the value of + * fc->max_pages should be FUSE_DEFAULT_MAX_PAGES_PER_REQ, which is se= t by + * the kernel by default. Also, max_write should not exceed + * FUSE_DEFAULT_MAX_PAGES_PER_REQ * PAGE_SIZE. + */ + size_t bufsize =3D out->max_write + FUSE_BUFFER_HEADER_SIZE; + + if (!(out->flags & FUSE_MAX_PAGES)) { + bufsize =3D FUSE_DEFAULT_MAX_PAGES_PER_REQ * qemu_real_host_page_s= ize() + + FUSE_BUFFER_HEADER_SIZE; + } + + for (int i =3D 0; i < exp->num_queues; i++) { + FuseQueue *q =3D &exp->queues[i]; + FuseRingEnt *ent =3D &q->ent; + + ent->q =3D q; + + ent->req_payload_sz =3D bufsize - FUSE_BUFFER_HEADER_SIZE; + ent->op_payload =3D g_malloc0(ent->req_payload_sz); + + ent->iov[0] =3D (struct iovec) { + &(ent->req_header), + sizeof(struct fuse_uring_req_header) + }; + ent->iov[1] =3D (struct iovec) { + ent->op_payload, + ent->req_payload_sz + }; + + ent->fuse_cqe_handler.cb =3D fuse_uring_cqe_handler; + + aio_bh_schedule_oneshot(q->ctx, fuse_uring_submit_register, q); + } +} +#endif + static int fuse_export_create(BlockExport *blk_exp, BlockExportOptions *blk_exp_args, AioContext *const *multithread, @@ -280,6 +397,9 @@ static int fuse_export_create(BlockExport *blk_exp, for (size_t i =3D 0; i < mt_count; i++) { exp->queues[i] =3D (FuseQueue) { +#ifdef CONFIG_LINUX_IO_URING + .qid =3D i, +#endif .exp =3D exp, .ctx =3D multithread[i], .fuse_fd =3D -1, @@ -293,6 +413,9 @@ static int fuse_export_create(BlockExport *blk_exp, exp->num_queues =3D 1; exp->queues =3D g_new(FuseQueue, 1); exp->queues[0] =3D (FuseQueue) { +#ifdef CONFIG_LINUX_IO_URING + .qid =3D 0, +#endif .exp =3D exp, .ctx =3D exp->common.ctx, .fuse_fd =3D -1, @@ -312,6 +435,8 @@ static int fuse_export_create(BlockExport *blk_exp, } } + exp->is_uring =3D args->io_uring ? true : false; + blk_set_dev_ops(exp->common.blk, &fuse_export_blk_dev_ops, exp); /* @@ -687,15 +812,22 @@ static ssize_t coroutine_fn fuse_co_init(FuseExport *exp, struct fuse_init_out *out, uint32_t max_readahead, uint32_t flags) { - const uint32_t supported_flags =3D FUSE_ASYNC_READ | FUSE_ASYNC_DIO; + const uint32_t supported_flags =3D FUSE_ASYNC_READ | FUSE_ASYNC_DIO + | FUSE_INIT_EXT; + uint64_t outargflags =3D flags; + +#ifdef CONFIG_LINUX_IO_URING + if (exp->is_uring) + outargflags |=3D FUSE_OVER_IO_URING; +#endif *out =3D (struct fuse_init_out) { .major =3D FUSE_KERNEL_VERSION, .minor =3D FUSE_KERNEL_MINOR_VERSION, .max_readahead =3D max_readahead, .max_write =3D FUSE_MAX_WRITE_BYTES, - .flags =3D flags & supported_flags, - .flags2 =3D 0, + .flags =3D outargflags & supported_flags, + .flags2 =3D outargflags >> 32, /* libfuse maximum: 2^16 - 1 */ .max_background =3D UINT16_MAX, @@ -1393,22 +1525,17 @@ fuse_co_process_request(FuseQueue *q, void *spillov= er_buf) struct fuse_out_header *out_hdr =3D (struct fuse_out_header *)out_buf; /* For read requests: Data to be returned */ void *out_data_buffer =3D NULL; - ssize_t ret; - /* Limit scope to ensure pointer is no longer used after yielding */ - { - const struct fuse_in_header *in_hdr =3D - (const struct fuse_in_header *)q->request_buf; - - opcode =3D in_hdr->opcode; - req_id =3D in_hdr->unique; - } + bool is_uring =3D exp->is_uring; switch (opcode) { case FUSE_INIT: { - const struct fuse_init_in *in =3D FUSE_IN_OP_STRUCT(init, q); - ret =3D fuse_co_init(exp, FUSE_OUT_OP_STRUCT(init, out_buf), - in->max_readahead, in->flags); +#ifdef CONFIG_LINUX_IO_URING + /* FUSE-over-io_uring enabled && start from the tradition path */ + if (is_uring) { + fuse_uring_start(exp, out); + } +#endif break; } diff --git a/docs/tools/qemu-storage-daemon.rst b/docs/tools/qemu-storage-d= aemon.rst index 35ab2d7807..c5076101e0 100644 --- a/docs/tools/qemu-storage-daemon.rst +++ b/docs/tools/qemu-storage-daemon.rst @@ -78,7 +78,7 @@ Standard options: .. option:: --export [type=3D]nbd,id=3D,node-name=3D[,name= =3D][,writable=3Don|off][,bitmap=3D] --export [type=3D]vhost-user-blk,id=3D,node-name=3D,addr.= type=3Dunix,addr.path=3D[,writable=3Don|off][,logical-block-si= ze=3D][,num-queues=3D] --export [type=3D]vhost-user-blk,id=3D,node-name=3D,addr.= type=3Dfd,addr.str=3D[,writable=3Don|off][,logical-block-size=3D][,num-queues=3D] - --export [type=3D]fuse,id=3D,node-name=3D,mountpoint=3D[,growable=3Don|off][,writable=3Don|off][,allow-other=3Don|off|auto] + --export [type=3D]fuse,id=3D,node-name=3D,mountpoint=3D[,growable=3Don|off][,writable=3Don|off][,allow-other=3Don|off|auto][,i= o-uring=3Don|off] --export [type=3D]vduse-blk,id=3D,node-name=3D,name=3D[,writable=3Don|off][,num-queues=3D][,queue-size=3D][,logical-block-size=3D][,serial=3D] is a block export definition. ``node-name`` is the block node that shoul= d be @@ -111,10 +111,11 @@ Standard options: that enabling this option as a non-root user requires enabling the user_allow_other option in the global fuse.conf configuration file. Set= ting ``allow-other`` to auto (the default) will try enabling this option, and= on - error fall back to disabling it. - - The ``vduse-blk`` export type takes a ``name`` (must be unique across th= e host) - to create the VDUSE device. + error fall back to disabling it. Once ``io-uring`` is enabled (off by de= fault), + the FUSE-over-io_uring-related settings will be initialized to bypass the + traditional /dev/fuse communication mechanism and instead use io_uring to + handle FUSE operations. The ``vduse-blk`` export type takes a ``name`` + (must be unique across the host) to create the VDUSE device. ``num-queues`` sets the number of virtqueues (the default is 1). ``queue-size`` sets the virtqueue descriptor table size (the default is = 256). diff --git a/qapi/block-export.json b/qapi/block-export.json index 9ae703ad01..37f2fc47e2 100644 --- a/qapi/block-export.json +++ b/qapi/block-export.json @@ -184,12 +184,15 @@ # mount the export with allow_other, and if that fails, try again # without. (since 6.1; default: auto) # +# @io-uring: Use FUSE-over-io-uring. (since 10.2; default: false) +# # Since: 6.0 ## { 'struct': 'BlockExportOptionsFuse', 'data': { 'mountpoint': 'str', '*growable': 'bool', - '*allow-other': 'FuseExportAllowOther' }, + '*allow-other': 'FuseExportAllowOther', + '*io-uring': 'bool' }, 'if': 'CONFIG_FUSE' } ## diff --git a/storage-daemon/qemu-storage-daemon.c b/storage-daemon/qemu-sto= rage-daemon.c index eb72561358..0cd4cd2b58 100644 --- a/storage-daemon/qemu-storage-daemon.c +++ b/storage-daemon/qemu-storage-daemon.c @@ -107,6 +107,7 @@ static void help(void) #ifdef CONFIG_FUSE " --export [type=3D]fuse,id=3D,node-name=3D,mountpoint=3D<= file>\n" " [,growable=3Don|off][,writable=3Don|off][,allow-other=3Don|off= |auto]\n" +" [,io-uring=3Don|off]" " export the specified block node over FUSE\n" "\n" #endif /* CONFIG_FUSE */ diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c index d2433d1d99..68d3fe8e01 100644 --- a/util/fdmon-io_uring.c +++ b/util/fdmon-io_uring.c @@ -452,10 +452,13 @@ static const FDMonOps fdmon_io_uring_ops =3D { void fdmon_io_uring_setup(AioContext *ctx, Error **errp) { int ret; + int flags; ctx->io_uring_fd_tag =3D NULL; + flags =3D IORING_SETUP_SQE128; - ret =3D io_uring_queue_init(FDMON_IO_URING_ENTRIES, &ctx->fdmon_io_uri= ng, 0); + ret =3D io_uring_queue_init(FDMON_IO_URING_ENTRIES, + &ctx->fdmon_io_uring, flags); if (ret !=3D 0) { error_setg_errno(errp, -ret, "Failed to initialize io_uring"); return; -- 2.45.2 From nobody Sat Nov 15 05:35:28 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=gmail.com ARC-Seal: i=1; a=rsa-sha256; t=1755229725; cv=none; d=zohomail.com; s=zohoarc; b=dVpPL46gIcYnuHkdW3eRTzQmX9sdL/ap8ti0t5jY1iQzoLgsTIpfXbJUJ6tnLsOgn95oZRm8bbiOxHSsrLnf7q8PXNLqhPD3pbZxlpCxXCmrH8BV54F9q5VdhCWkhQ4xOUow5E5+wiHxns3KlROKn1jGz5A0/0aNoa6hI+8hJu0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1755229725; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=IS2rJiUNIoOPC4jKy0MbPPTjXf6JnFqJMKHpOMuuXaI=; b=ZQAKJPVqgz5i60MFs8fKIC77A6hB7VnAKjKIGsBwMKTjdr3rCNKcFuyqbjKRyRxVJuSd84Ge8rsNYwDj55ud9qPyrChgIwSs5VZb4JD67Zp4zWz010tE4TN1wZbqFitx/TkeqZDjDWgV5TdFskTZXj8VSiXcu780Ic5eXon5tPs= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1755229724930282.71754302148804; Thu, 14 Aug 2025 20:48:44 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1umlPf-0004dL-PR; Thu, 14 Aug 2025 23:47:31 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1umlPd-0004bs-CD; Thu, 14 Aug 2025 23:47:29 -0400 Received: from mail-qv1-xf34.google.com ([2607:f8b0:4864:20::f34]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1umlPa-0002vz-Ip; Thu, 14 Aug 2025 23:47:29 -0400 Received: by mail-qv1-xf34.google.com with SMTP id 6a1803df08f44-70a88ddb1a2so15609966d6.0; Thu, 14 Aug 2025 20:47:25 -0700 (PDT) Received: from zzzhi.uwaterloo.ca (wn-campus-nat-129-97-124-101.dynamic.uwaterloo.ca. [129.97.124.101]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-70ba906706esm1402346d6.17.2025.08.14.20.47.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Aug 2025 20:47:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1755229644; x=1755834444; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=IS2rJiUNIoOPC4jKy0MbPPTjXf6JnFqJMKHpOMuuXaI=; b=KwiesENzADD3Ba5xw/JfuVVIEU5IVlJrvd+pQdvBnbIuQkFZFgEEb9OUt3XVdjOVfu OP7Eq0MtFLlIWKucDsCrNfK0SsTnKyObn5IIHzDFevI+or8xNtWIFv4CvIXe1BN+unjH y1Pqn9hV1EM48QSzKEtL6VZM5XpVVPgjhijdfKRj2XjKS+TZ0wtbtsCbBUsma5XZzq/w kqsd6k6BEho4OcuW77wLqz/NihIjmAs6RPDndJkQSlRWWeVAgMK4JcpuwNvPJf1M+x+T GCu3Ybtq+wjzDapyTJIc2m/6TQjHNdErThl2buoRYnZol/PhkIk3YwkdFzccPOFHbRdX yUOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755229644; x=1755834444; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IS2rJiUNIoOPC4jKy0MbPPTjXf6JnFqJMKHpOMuuXaI=; b=V4vLTLYODlO88XMzXKZm7ueaioksxtiyMkpGwyzFu9uj9qDIPpLPjWXzwWdHCzjYv/ KnmPvKMwBVylgz5aTVIQerPFusQ2xB9QupFVAQBD2G4+sJVYIu0fa1gkUVBBDyP1Wpx/ B6qROb47RbixLOhA+E7Rwd9Y2N66lfwIisYi1Mxx9uFLv6mmhPJ8YIyLpgTcGDlGWHKF Pas/1Uk5GVqaMH+F2cqdKBz363mEzsP+uwqdJPPNdK1WpJ+d6/PGeNzch5iIfb7mFKsL RFNUjSSiQo5BBRcjsIGvW1JDmNoDi6hg7YzCpbusNA5VPnFJgvrxCpY8XbXlPMprkfUn 844w== X-Gm-Message-State: AOJu0YxWdBI56ZtpO3+gGacKTz8RCeiUETdhacEIc0lP5ehCWF+5/wph 7uj42B5sFhdGFyYhmqXq5QorCrAXO2eLlO2ylhtWentQO1/5Du20QG0WZ+33PQ== X-Gm-Gg: ASbGncvXeLpDHmaXVc+anECVEuKYOd04F9+1LdfaplyUVHgiejPc8WdZRTMrM7mJ4jv RVpIZx4RW9579kUjMaO5UqxMcIsgNJ54PN4UbwyK4KX+Z30byIuD/ZdQ3JW7sRNEDTie0SAPtYO SWtQnFCbinHfsBFbL2YwEfLQWSHXHN1kOSSyamNM0Cde3NXb/P51ddpJ7EDvpWWfuKLcYeKgQx2 mY+359yltMIs8DU+2hBObpcP7CkZ/vZUDyzBNxIEF6DTViqySgUJwh+LD0Y9XYYDctvD2gm0HeH ChYq0iIGfut7xYmXxvLkPKeBKzYbeSsMxdZYibVjcp+naeVWxsGeAczwFD5blQklFRZp/9udN73 dv0/1wKkqHaBtokqyS5fzL8yCWbFEkr29yzRdZiZ+Z1tU8St4H9eIQm5PLo3TqcVkGb/xtSDb8N 183cO/P90SdvSg X-Google-Smtp-Source: AGHT+IFfjZdBG6T+VENN5wHnRwFhulVCu8PF2IxCXK6r8ylocYdvWmhTed+nWDemgOt84gQpEQW/Dw== X-Received: by 2002:a05:6214:2aae:b0:709:dfc5:a43 with SMTP id 6a1803df08f44-70ba63b4a7amr8874096d6.1.1755229644227; Thu, 14 Aug 2025 20:47:24 -0700 (PDT) From: Zhi Song X-Google-Original-From: Zhi Song To: qemu-block@nongnu.org Cc: qemu-devel@nongnu.org, armbru@redhat.com, bernd@bsbernd.com, fam@euphon.net, hibriansong@gmail.com, hreitz@redhat.com, kwolf@redhat.com, stefanha@redhat.com Subject: [PATCH 2/3] fuse: Handle FUSE-uring requests Date: Thu, 14 Aug 2025 23:46:18 -0400 Message-ID: <20250815034619.51980-3-hizhisong@gmail.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: <20250815034619.51980-1-hizhisong@gmail.com> References: <20250815034619.51980-1-hizhisong@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::f34; envelope-from=hibriansong@gmail.com; helo=mail-qv1-xf34.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @gmail.com) X-ZM-MESSAGEID: 1755229725712116600 Content-Type: text/plain; charset="utf-8" From: Brian Song https://docs.kernel.org/filesystems/fuse-io-uring.html As described in the kernel documentation, after FUSE-over-io_uring initialization and handshake, FUSE interacts with the kernel using SQE/CQE to send requests and receive responses. This corresponds to the "Sending requests with CQEs" section in the docs. This patch implements three key parts: registering the CQE handler (fuse_uring_cqe_handler), processing FUSE requests (fuse_uring_co_ process_request), and sending response results (fuse_uring_send_ response). It also merges the traditional /dev/fuse request handling with the FUSE-over-io_uring handling functions. Suggested-by: Kevin Wolf Suggested-by: Stefan Hajnoczi Signed-off-by: Brian Song --- block/export/fuse.c | 425 ++++++++++++++++++++++++++++++-------------- 1 file changed, 289 insertions(+), 136 deletions(-) diff --git a/block/export/fuse.c b/block/export/fuse.c index 59fa79f486..7540f8f5a3 100644 --- a/block/export/fuse.c +++ b/block/export/fuse.c @@ -288,6 +288,46 @@ static const BlockDevOps fuse_export_blk_dev_ops =3D { }; #ifdef CONFIG_LINUX_IO_URING +static void coroutine_fn fuse_uring_co_process_request(FuseRingEnt *ent); + +static void coroutine_fn co_fuse_uring_queue_handle_cqes(void *opaque) +{ + FuseRingEnt *ent =3D opaque; + FuseExport *exp =3D ent->q->exp; + + /* Going to process requests */ + fuse_inc_in_flight(exp); + + + fuse_uring_co_process_request(ent); + + /* Finished processing requests */ + fuse_dec_in_flight(exp); +} + +static void fuse_uring_cqe_handler(CqeHandler *cqe_handler) +{ + FuseRingEnt *ent =3D container_of(cqe_handler, FuseRingEnt, fuse_cqe_h= andler); + Coroutine *co; + FuseExport *exp =3D ent->q->exp; + + if (unlikely(exp->halted)) { + return; + } + + int err =3D cqe_handler->cqe.res; + + if (err !=3D 0) { + /* -ENOTCONN is ok on umount */ + if (err !=3D -EINTR && err !=3D -EAGAIN && + err !=3D -ENOTCONN) { + fuse_export_halt(exp); + } + } else { + co =3D qemu_coroutine_create(co_fuse_uring_queue_handle_cqes, ent); + qemu_coroutine_enter(co); + } +} static void fuse_uring_sqe_set_req_data(struct fuse_uring_cmd_req *req, const unsigned int qid, @@ -1075,6 +1115,9 @@ fuse_co_read(FuseExport *exp, void **bufptr, uint64_t= offset, uint32_t size) * Data in @in_place_buf is assumed to be overwritten after yielding, so w= ill * be copied to a bounce buffer beforehand. @spillover_buf in contrast is * assumed to be exclusively owned and will be used as-is. + * In FUSE-over-io_uring mode, the actual op_payload content is stored in + * @spillover_buf. To ensure this buffer is used for writing, @in_place_buf + * is explicitly set to NULL. * Return the number of bytes written to *out on success, and -errno on er= ror. */ static ssize_t coroutine_fn @@ -1082,8 +1125,8 @@ fuse_co_write(FuseExport *exp, struct fuse_write_out = *out, uint64_t offset, uint32_t size, const void *in_place_buf, const void *spillover_buf) { - size_t in_place_size; - void *copied; + size_t in_place_size =3D 0; + void *copied =3D NULL; int64_t blk_len; int ret; struct iovec iov[2]; @@ -1098,10 +1141,12 @@ fuse_co_write(FuseExport *exp, struct fuse_write_ou= t *out, return -EACCES; } - /* Must copy to bounce buffer before potentially yielding */ - in_place_size =3D MIN(size, FUSE_IN_PLACE_WRITE_BYTES); - copied =3D blk_blockalign(exp->common.blk, in_place_size); - memcpy(copied, in_place_buf, in_place_size); + if (in_place_buf) { + /* Must copy to bounce buffer before potentially yielding */ + in_place_size =3D MIN(size, FUSE_IN_PLACE_WRITE_BYTES); + copied =3D blk_blockalign(exp->common.blk, in_place_size); + memcpy(copied, in_place_buf, in_place_size); + } /** * Clients will expect short writes at EOF, so we have to limit @@ -1125,26 +1170,37 @@ fuse_co_write(FuseExport *exp, struct fuse_write_ou= t *out, } } - iov[0] =3D (struct iovec) { - .iov_base =3D copied, - .iov_len =3D in_place_size, - }; - if (size > FUSE_IN_PLACE_WRITE_BYTES) { - assert(size - FUSE_IN_PLACE_WRITE_BYTES <=3D FUSE_SPILLOVER_BUF_SI= ZE); - iov[1] =3D (struct iovec) { - .iov_base =3D (void *)spillover_buf, - .iov_len =3D size - FUSE_IN_PLACE_WRITE_BYTES, + if (in_place_buf) { + iov[0] =3D (struct iovec) { + .iov_base =3D copied, + .iov_len =3D in_place_size, }; - qemu_iovec_init_external(&qiov, iov, 2); + if (size > FUSE_IN_PLACE_WRITE_BYTES) { + assert(size - FUSE_IN_PLACE_WRITE_BYTES <=3D FUSE_SPILLOVER_BU= F_SIZE); + iov[1] =3D (struct iovec) { + .iov_base =3D (void *)spillover_buf, + .iov_len =3D size - FUSE_IN_PLACE_WRITE_BYTES, + }; + qemu_iovec_init_external(&qiov, iov, 2); + } else { + qemu_iovec_init_external(&qiov, iov, 1); + } } else { + /* fuse over io_uring */ + iov[0] =3D (struct iovec) { + .iov_base =3D (void *)spillover_buf, + .iov_len =3D size, + }; qemu_iovec_init_external(&qiov, iov, 1); } + ret =3D blk_co_pwritev(exp->common.blk, offset, size, &qiov, 0); if (ret < 0) { goto fail_free_buffer; } - qemu_vfree(copied); + if (in_place_buf) + qemu_vfree(copied); *out =3D (struct fuse_write_out) { .size =3D size, @@ -1152,7 +1208,9 @@ fuse_co_write(FuseExport *exp, struct fuse_write_out = *out, return sizeof(*out); fail_free_buffer: - qemu_vfree(copied); + if (in_place_buf) { + qemu_vfree(copied); + } return ret; } @@ -1440,168 +1498,144 @@ static int fuse_write_buf_response(int fd, uint32= _t req_id, } } -/* - * For use in fuse_co_process_request(): - * Returns a pointer to the parameter object for the given operation (insi= de of - * queue->request_buf, which is assumed to hold a fuse_in_header first). - * Verifies that the object is complete (queue->request_buf is large enoug= h to - * hold it in one piece, and the request length includes the whole object). - * - * Note that queue->request_buf may be overwritten after yielding, so the - * returned pointer must not be used across a function that may yield! - */ -#define FUSE_IN_OP_STRUCT(op_name, queue) \ +#define FUSE_IN_OP_STRUCT_LEGACY(in_buf) \ ({ \ - const struct fuse_in_header *__in_hdr =3D \ - (const struct fuse_in_header *)(queue)->request_buf; \ - const struct fuse_##op_name##_in *__in =3D \ - (const struct fuse_##op_name##_in *)(__in_hdr + 1); \ - const size_t __param_len =3D sizeof(*__in_hdr) + sizeof(*__in); \ - uint32_t __req_len; \ - \ - QEMU_BUILD_BUG_ON(sizeof((queue)->request_buf) < __param_len); \ - \ - __req_len =3D __in_hdr->len; \ - if (__req_len < __param_len) { \ - warn_report("FUSE request truncated (%" PRIu32 " < %zu)", \ - __req_len, __param_len); \ - ret =3D -EINVAL; \ - break; \ - } \ - __in; \ + (void *)(((struct fuse_in_header *)in_buf) + 1); \ }) -/* - * For use in fuse_co_process_request(): - * Returns a pointer to the return object for the given operation (inside = of - * out_buf, which is assumed to hold a fuse_out_header first). - * Verifies that out_buf is large enough to hold the whole object. - * - * (out_buf should be a char[] array.) - */ -#define FUSE_OUT_OP_STRUCT(op_name, out_buf) \ +#define FUSE_OUT_OP_STRUCT_LEGACY(out_buf) \ ({ \ - struct fuse_out_header *__out_hdr =3D \ - (struct fuse_out_header *)(out_buf); \ - struct fuse_##op_name##_out *__out =3D \ - (struct fuse_##op_name##_out *)(__out_hdr + 1); \ - \ - QEMU_BUILD_BUG_ON(sizeof(*__out_hdr) + sizeof(*__out) > \ - sizeof(out_buf)); \ - \ - __out; \ + (void *)(((struct fuse_out_header *)out_buf) + 1); \ }) -/** - * Process a FUSE request, incl. writing the response. - * - * Note that yielding in any request-processing function can overwrite the - * contents of q->request_buf. Anything that takes a buffer needs to take - * care that the content is copied before yielding. - * - * @spillover_buf can contain the tail of a write request too large to fit= into - * q->request_buf. This function takes ownership of it (i.e. will free it= ), - * which assumes that its contents will not be overwritten by concurrent - * requests (as opposed to q->request_buf). + +/* + * Shared helper for FUSE request processing. Handles both legacy and io_u= ring + * paths. */ -static void coroutine_fn -fuse_co_process_request(FuseQueue *q, void *spillover_buf) +static void coroutine_fn fuse_co_process_request_common( + FuseExport *exp, + uint32_t opcode, + uint64_t req_id, + void *in_buf, + void *spillover_buf, + void *out_buf, + int fd, /* -1 for uring */ + void (*send_response)(void *opaque, uint32_t req_id, ssize_t ret, + const void *buf, void *out_buf), + void *opaque /* FuseQueue* or FuseRingEnt* */) { - FuseExport *exp =3D q->exp; - uint32_t opcode; - uint64_t req_id; - /* - * Return buffer. Must be large enough to hold all return headers, bu= t does - * not include space for data returned by read requests. - * (FUSE_IN_OP_STRUCT() verifies at compile time that out_buf is indeed - * large enough.) - */ - char out_buf[sizeof(struct fuse_out_header) + - MAX_CONST(sizeof(struct fuse_init_out), - MAX_CONST(sizeof(struct fuse_open_out), - MAX_CONST(sizeof(struct fuse_attr_out), - MAX_CONST(sizeof(struct fuse_write_out), - sizeof(struct fuse_lseek_out)))))]; - struct fuse_out_header *out_hdr =3D (struct fuse_out_header *)out_buf; - /* For read requests: Data to be returned */ void *out_data_buffer =3D NULL; + ssize_t ret =3D 0; bool is_uring =3D exp->is_uring; + void *op_in_buf =3D (is_uring && opcode !=3D FUSE_INIT) ? + (void *)in_buf : (void *)FUSE_IN_OP_STRUCT_LEGACY(in_buf); + + void *op_out_buf =3D (is_uring && opcode !=3D FUSE_INIT) ? + (void *)out_buf : (void *)FUSE_OUT_OP_STRUCT_LEGACY(out_bu= f); + switch (opcode) { case FUSE_INIT: { + const struct fuse_init_in *in =3D + (const struct fuse_init_in *)FUSE_IN_OP_STRUCT_LEGACY(in_buf); + + struct fuse_init_out *out =3D + (struct fuse_init_out *)FUSE_OUT_OP_STRUCT_LEGACY(out_buf); + + ret =3D fuse_co_init(exp, out, in->max_readahead, in->flags); #ifdef CONFIG_LINUX_IO_URING /* FUSE-over-io_uring enabled && start from the tradition path */ - if (is_uring) { + if (is_uring && fd !=3D -1) { fuse_uring_start(exp, out); } #endif break; } - case FUSE_OPEN: - ret =3D fuse_co_open(exp, FUSE_OUT_OP_STRUCT(open, out_buf)); + case FUSE_OPEN: { + struct fuse_open_out *out =3D + (struct fuse_open_out *)op_out_buf; + + ret =3D fuse_co_open(exp, out); break; + } case FUSE_RELEASE: ret =3D 0; break; case FUSE_LOOKUP: - ret =3D -ENOENT; /* There is no node but the root node */ + ret =3D -ENOENT; break; - case FUSE_GETATTR: - ret =3D fuse_co_getattr(exp, FUSE_OUT_OP_STRUCT(attr, out_buf)); + case FUSE_GETATTR: { + struct fuse_attr_out *out =3D + (struct fuse_attr_out *)op_out_buf; + + ret =3D fuse_co_getattr(exp, out); break; + } case FUSE_SETATTR: { - const struct fuse_setattr_in *in =3D FUSE_IN_OP_STRUCT(setattr, q); - ret =3D fuse_co_setattr(exp, FUSE_OUT_OP_STRUCT(attr, out_buf), - in->valid, in->size, in->mode, in->uid, in->= gid); + const struct fuse_setattr_in *in =3D + (const struct fuse_setattr_in *)op_in_buf; + + struct fuse_attr_out *out =3D + (struct fuse_attr_out *)op_out_buf; + + ret =3D fuse_co_setattr(exp, out, in->valid, in->size, in->mode, + in->uid, in->gid); break; } case FUSE_READ: { - const struct fuse_read_in *in =3D FUSE_IN_OP_STRUCT(read, q); + const struct fuse_read_in *in =3D + (const struct fuse_read_in *)op_in_buf; + ret =3D fuse_co_read(exp, &out_data_buffer, in->offset, in->size); break; } case FUSE_WRITE: { - const struct fuse_write_in *in =3D FUSE_IN_OP_STRUCT(write, q); - uint32_t req_len; - - req_len =3D ((const struct fuse_in_header *)q->request_buf)->len; - if (unlikely(req_len < sizeof(struct fuse_in_header) + sizeof(*in)= + - in->size)) { - warn_report("FUSE WRITE truncated; received %zu bytes of %" PR= Iu32, - req_len - sizeof(struct fuse_in_header) - sizeof(*= in), - in->size); - ret =3D -EINVAL; - break; + const struct fuse_write_in *in =3D + (const struct fuse_write_in *)op_in_buf; + + struct fuse_write_out *out =3D + (struct fuse_write_out *)op_out_buf; + + if (!is_uring) { + uint32_t req_len =3D ((const struct fuse_in_header *)in_buf)->= len; + + if (unlikely(req_len < sizeof(struct fuse_in_header) + sizeof(= *in) + + in->size)) { + warn_report("FUSE WRITE truncated; received %zu bytes of %" + PRIu32, + req_len - sizeof(struct fuse_in_header) - sizeof(*in), + in->size); + ret =3D -EINVAL; + break; + } + } else { + assert(in->size <=3D + ((FuseRingEnt *)opaque)->req_header.ring_ent_in_out.payloa= d_sz); } - /* - * poll_fuse_fd() has checked that in_hdr->len matches the number = of - * bytes read, which cannot exceed the max_write value we set - * (FUSE_MAX_WRITE_BYTES). So we know that FUSE_MAX_WRITE_BYTES >= =3D - * in_hdr->len >=3D in->size + X, so this assertion must hold. - */ assert(in->size <=3D FUSE_MAX_WRITE_BYTES); - /* - * Passing a pointer to `in` (i.e. the request buffer) is fine bec= ause - * fuse_co_write() takes care to copy its contents before potentia= lly - * yielding. - */ - ret =3D fuse_co_write(exp, FUSE_OUT_OP_STRUCT(write, out_buf), - in->offset, in->size, in + 1, spillover_buf); + const void *in_place_buf =3D is_uring ? NULL : (in + 1); + const void *spill_buf =3D is_uring ? out_buf : spillover_buf; + + ret =3D fuse_co_write(exp, out, in->offset, in->size, + in_place_buf, spill_buf); break; } case FUSE_FALLOCATE: { - const struct fuse_fallocate_in *in =3D FUSE_IN_OP_STRUCT(fallocate= , q); + const struct fuse_fallocate_in *in =3D + (const struct fuse_fallocate_in *)op_in_buf; + ret =3D fuse_co_fallocate(exp, in->offset, in->length, in->mode); break; } @@ -1616,9 +1650,13 @@ fuse_co_process_request(FuseQueue *q, void *spillove= r_buf) #ifdef CONFIG_FUSE_LSEEK case FUSE_LSEEK: { - const struct fuse_lseek_in *in =3D FUSE_IN_OP_STRUCT(lseek, q); - ret =3D fuse_co_lseek(exp, FUSE_OUT_OP_STRUCT(lseek, out_buf), - in->offset, in->whence); + const struct fuse_lseek_in *in =3D + (const struct fuse_lseek_in *)op_in_buf; + + struct fuse_lseek_out *out =3D + (struct fuse_lseek_out *)op_out_buf; + + ret =3D fuse_co_lseek(exp, out, in->offset, in->whence); break; } #endif @@ -1627,20 +1665,135 @@ fuse_co_process_request(FuseQueue *q, void *spillo= ver_buf) ret =3D -ENOSYS; } - /* Ignore errors from fuse_write*(), nothing we can do anyway */ + send_response(opaque, req_id, ret, out_data_buffer, out_buf); + if (out_data_buffer) { - assert(ret >=3D 0); - fuse_write_buf_response(q->fuse_fd, req_id, out_hdr, - out_data_buffer, ret); qemu_vfree(out_data_buffer); + } + + if (fd !=3D -1) { + qemu_vfree(spillover_buf); + } +} + +/* Helper to send response for legacy */ +static void send_response_legacy(void *opaque, uint32_t req_id, ssize_t re= t, + const void *buf, void *out_buf) +{ + FuseQueue *q =3D (FuseQueue *)opaque; + struct fuse_out_header *out_hdr =3D (struct fuse_out_header *)out_buf; + if (buf) { + assert(ret >=3D 0); + fuse_write_buf_response(q->fuse_fd, req_id, out_hdr, buf, ret); } else { fuse_write_response(q->fuse_fd, req_id, out_hdr, ret < 0 ? ret : 0, ret < 0 ? 0 : ret); } +} + +static void coroutine_fn +fuse_co_process_request(FuseQueue *q, void *spillover_buf) +{ + FuseExport *exp =3D q->exp; + uint32_t opcode; + uint64_t req_id; + + /* + * Return buffer. Must be large enough to hold all return headers, bu= t does + * not include space for data returned by read requests. + */ + char out_buf[sizeof(struct fuse_out_header) + + MAX_CONST(sizeof(struct fuse_init_out), + MAX_CONST(sizeof(struct fuse_open_out), + MAX_CONST(sizeof(struct fuse_attr_out), + MAX_CONST(sizeof(struct fuse_write_out), + sizeof(struct fuse_lseek_out)))))] =3D {0}; + + /* Limit scope to ensure pointer is no longer used after yielding */ + { + const struct fuse_in_header *in_hdr =3D + (const struct fuse_in_header *)q->request_buf; + + opcode =3D in_hdr->opcode; + req_id =3D in_hdr->unique; + } + + fuse_co_process_request_common(exp, opcode, req_id, q->request_buf, + spillover_buf, out_buf, q->fuse_fd, send_response_legacy, q); +} + +#ifdef CONFIG_LINUX_IO_URING +static void fuse_uring_prep_sqe_commit(struct io_uring_sqe *sqe, void *opa= que) +{ + FuseRingEnt *ent =3D opaque; + struct fuse_uring_cmd_req *req =3D (void *)&sqe->cmd[0]; + + fuse_uring_sqe_prepare(sqe, ent->q, FUSE_IO_URING_CMD_COMMIT_AND_FETCH= ); + fuse_uring_sqe_set_req_data(req, ent->q->qid, + ent->req_commit_id); +} + +static void +fuse_uring_send_response(FuseRingEnt *ent, uint32_t req_id, ssize_t ret, + const void *out_data_buffer) +{ + FuseExport *exp =3D ent->q->exp; + + struct fuse_uring_req_header *rrh =3D &ent->req_header; + struct fuse_out_header *out_header =3D (struct fuse_out_header *)&rrh-= >in_out; + struct fuse_uring_ent_in_out *ent_in_out =3D + (struct fuse_uring_ent_in_out *)&rrh->ring_ent_in_out; + + /* FUSE_READ */ + if (out_data_buffer && ret > 0) { + memcpy(ent->op_payload, out_data_buffer, ret); + } + + out_header->error =3D ret < 0 ? ret : 0; + out_header->unique =3D req_id; + /* out_header->len =3D ret > 0 ? ret : 0; */ + ent_in_out->payload_sz =3D ret > 0 ? ret : 0; + qemu_vfree(spillover_buf); + aio_add_sqe(fuse_uring_prep_sqe_commit, ent, + &ent->fuse_cqe_handler); +} + +/* Helper to send response for uring */ +static void send_response_uring(void *opaque, uint32_t req_id, ssize_t ret, + const void *out_data_buffer, void *payload) +{ + FuseRingEnt *ent =3D (FuseRingEnt *)opaque; + + fuse_uring_send_response(ent, req_id, ret, out_data_buffer); +} + +static void coroutine_fn fuse_uring_co_process_request(FuseRingEnt *ent) +{ + FuseQueue *q =3D ent->q; + FuseExport *exp =3D q->exp; + struct fuse_uring_req_header *rrh =3D &ent->req_header; + struct fuse_uring_ent_in_out *ent_in_out =3D + (struct fuse_uring_ent_in_out *)&rrh->ring_ent_in_out; + struct fuse_in_header *in_hdr =3D + (struct fuse_in_header *)&rrh->in_out; + uint32_t opcode =3D in_hdr->opcode; + uint64_t req_id =3D in_hdr->unique; + ent->req_commit_id =3D ent_in_out->commit_id; + + if (unlikely(ent->req_commit_id =3D=3D 0)) { + error_report("If this happens kernel will not find the response - " + "it will be stuck forever - better to abort immediately."); + fuse_export_halt(exp); + return; + } + + fuse_co_process_request_common(exp, opcode, req_id, &rrh->op_in, + NULL, ent->op_payload, -1, send_response_uring, ent); } +#endif const BlockExportDriver blk_exp_fuse =3D { .type =3D BLOCK_EXPORT_TYPE_FUSE, -- 2.45.2 From nobody Sat Nov 15 05:35:28 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=gmail.com ARC-Seal: i=1; a=rsa-sha256; t=1755229722; cv=none; d=zohomail.com; s=zohoarc; b=NJqs/fuCf9R7NwT8W5EzplRDRi+WOXs022j8JLFGx89fz5aREiZli3kaOJHUHBpzj4ttHPXqRQXuVWHW5xdBJf3v7D96CzaJYzEPLiXsVxi1mZhhUmCSloH45GHH/SEs/7RdoYF6cBWw9D4TfaDBKU29RYJ22pRf6W8VLjgDV/k= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1755229722; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=Ia9C9yonrsv1CWsKxEfiaqEx+tiY2731HEt0MKh0uDo=; b=jd797J5RcSSCbMNUPR5J2e9j75pNGfmtkEcKcyIEwqEckDuSryBJ2R8iyejhSk46m6tmoVzutUmL5Udu6CEe4T0NrjMKxAiR2u05/xATp+ZubzRiDngT34E7eNBZVTk5sKz0ZwxYqiKi1v6daHYYcR+x4o4BwXXlAs3v+AQnMnk= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1755229722289546.4708347807716; Thu, 14 Aug 2025 20:48:42 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1umlPf-0004ce-70; Thu, 14 Aug 2025 23:47:31 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1umlPd-0004br-6L; Thu, 14 Aug 2025 23:47:29 -0400 Received: from mail-qv1-xf32.google.com ([2607:f8b0:4864:20::f32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1umlPb-0002z0-Ez; Thu, 14 Aug 2025 23:47:28 -0400 Received: by mail-qv1-xf32.google.com with SMTP id 6a1803df08f44-70ba7aa136fso965616d6.1; Thu, 14 Aug 2025 20:47:26 -0700 (PDT) Received: from zzzhi.uwaterloo.ca (wn-campus-nat-129-97-124-101.dynamic.uwaterloo.ca. [129.97.124.101]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-70ba906706esm1402346d6.17.2025.08.14.20.47.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Aug 2025 20:47:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1755229646; x=1755834446; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Ia9C9yonrsv1CWsKxEfiaqEx+tiY2731HEt0MKh0uDo=; b=QgmkUxx8ZRrs4Eee20WJtPPIyxU+E7+JSsvEmzt5scxtdscQ+U//QnrnM1UZ3mKTp5 aGCR40OQL/M2LZvPNaUeyAaLjvPj65a0brSEPoUhwILp1LA+JDujP3vI6tKidiKg2qPH i9ovDEJiEcV3yYho9s+wz3mODb7McDLNbKiidSd2wC3UrabsziAoVRfZ54iM3MHlrbiV OWiWHkCLZzFv/cTtGkeYcpxUiousscRv1zqYsV92VpMi9c1XeDWykXIHzNs0f9vCUzUQ Xes5IBKbW+37NHc9qZw4r75Wppy1gea+7OSvna4JnASsRpHLARkUGt/RPpslq9fddi8C HZxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755229646; x=1755834446; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ia9C9yonrsv1CWsKxEfiaqEx+tiY2731HEt0MKh0uDo=; b=MikCU/dHXqUKG0NN341jjFN9c+9W35H8Dfd+yReCvfx+JluIGQMLmRtJODq2czyiiH WVq9D/5fbt31pBXjPEnLsPv85qo/hCV4BVJQfbbAY+UqDWHyES9fjuL1Daikf0bD5srO 8mFBDY0Oh7760+DtF9ANdeEB+wFdJ5yY8WfiQbUSmEyYmezV19aYGvTLRVULQoYqdmmy 6fu2tmNPd6OhtlvkgF8PV4C1cQFN6Vliq1a+0eDVBKyPiDdJsjnCaHQSph0eAZH2jfx1 OrM4deTG5sABJC+HTL+8Tz2Zo8tsa/P7w140saijPN6mDwO33m7h1h02gXRrS4jzTWM9 BbAA== X-Gm-Message-State: AOJu0YwXY2tSv2o1WpgnMyIFuimMz05GvEigFatRtprn4xzs2/SvlXfq l1vdLN58PwQQj55vNrBiN8tfv7tqhU76LTIhiCd3bZ8kI7BW5bi+k49PaOx0Hw== X-Gm-Gg: ASbGncvhzTSlyyt4PKTLXkIM7szaXI6VZK0MJZdsjeIwvhJhR7kROyOTHZp0zb1nIm0 bU5utn3oi3ms0Gt3x97Qz6yc7hOi0OY2JDFb64iEoZRVufmIrwJhBb+gSaaxwIGG9DGB8yfY2uU gGvgaX8yG8Pf7DXkyyXdBCkU3Vyo4YSzMXIyknP4JqT1c8jalDvGJyN8bML1pGxozdTGsOZ2hjP LQU2urxd7GrsSjC0EQrsAwksOr/GUr769PWYv466/aSotc4n2nHDtufuzxWyBUYTADQH16VvsGB csFeI0FOKi+uqI6IATRjs5UM1br9qgIs/uYM1nfuN93J8c9qeC2RrvFKckPX9C3Lf4/GxUMw7VC ujKsIJi0vAI9NRZJKXsYaOIhADtHUUQ+zJ/AGRNaVmjGqvAvtpc8YNDcZPX3QewAaHa8Rcy/jUM UdXLHSDGLMoJ3t X-Google-Smtp-Source: AGHT+IF2dpYCbzCzdJ8Ro3pzur9GdWHwrlDrzKRtzbXOpNuowpPvd787NAUeQrSXRu7GVyikbXZUPg== X-Received: by 2002:a05:6214:ca3:b0:70b:a2b7:21cb with SMTP id 6a1803df08f44-70ba7a5e86amr4359916d6.9.1755229645544; Thu, 14 Aug 2025 20:47:25 -0700 (PDT) From: Zhi Song X-Google-Original-From: Zhi Song To: qemu-block@nongnu.org Cc: qemu-devel@nongnu.org, armbru@redhat.com, bernd@bsbernd.com, fam@euphon.net, hibriansong@gmail.com, hreitz@redhat.com, kwolf@redhat.com, stefanha@redhat.com Subject: [PATCH 3/3] fuse: Safe termination for FUSE-uring Date: Thu, 14 Aug 2025 23:46:19 -0400 Message-ID: <20250815034619.51980-4-hizhisong@gmail.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: <20250815034619.51980-1-hizhisong@gmail.com> References: <20250815034619.51980-1-hizhisong@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::f32; envelope-from=hibriansong@gmail.com; helo=mail-qv1-xf32.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @gmail.com) X-ZM-MESSAGEID: 1755229724396124100 Content-Type: text/plain; charset="utf-8" From: Brian Song When the user sends a termination signal, storage-export-daemon stops the export, exits the main loop (main_loop_wait), and begins cleaning up associated resources. At this point, some SQEs submitted via FUSE_IO _URING_CMD_COMMIT_AND_FETCH may still be pending in the kernel, waiting for incoming FUSE requests, which can trigger CQE handlers in user space. Currently, there is no way to manually cancel these pending CQEs in the kernel. As a result, after export termination, the related data structures might be deleted before the pending CQEs return, causing the CQE handler to be invoked after it has been freed, which may lead to a segfault. As a workaround, when submitting an SQE to the kernel, we increment the block reference (blk_exp_ref) to prevent the CQE handler from being deleted during export termination. Once the CQE is received, we decrement the reference (blk_exp_unref). Suggested-by: Kevin Wolf Suggested-by: Stefan Hajnoczi Signed-off-by: Brian Song --- block/export/fuse.c | 52 +++++++++++++++++++++++++++++++++++---------- 1 file changed, 41 insertions(+), 11 deletions(-) diff --git a/block/export/fuse.c b/block/export/fuse.c index 7540f8f5a3..ddd83c50e2 100644 --- a/block/export/fuse.c +++ b/block/export/fuse.c @@ -298,6 +298,8 @@ static void coroutine_fn co_fuse_uring_queue_handle_cqe= s(void *opaque) /* Going to process requests */ fuse_inc_in_flight(exp); + /* A ring entry returned */ + blk_exp_unref(&exp->common); fuse_uring_co_process_request(ent); @@ -323,6 +325,9 @@ static void fuse_uring_cqe_handler(CqeHandler *cqe_hand= ler) err !=3D -ENOTCONN) { fuse_export_halt(exp); } + + /* A ring entry returned */ + blk_exp_unref(&exp->common); } else { co =3D qemu_coroutine_create(co_fuse_uring_queue_handle_cqes, ent); qemu_coroutine_enter(co); @@ -370,6 +375,8 @@ static void fuse_uring_submit_register(void *opaque) FuseQueue *q =3D opaque; FuseExport *exp =3D q->exp; + /* Commit and fetch a ring entry */ + blk_exp_ref(&exp->common); aio_add_sqe(fuse_uring_prep_sqe_register, q, &(q->ent.fuse_cqe_handler= )); } @@ -762,6 +769,17 @@ static void read_from_fuse_fd(void *opaque) qemu_coroutine_enter(co); } +#ifdef CONFIG_LINUX_IO_URING +static void fuse_export_delete_uring(FuseExport *exp) +{ + exp->is_uring =3D false; + + for (size_t qid =3D 0; qid < exp->num_queues; qid++) { + g_free(exp->queues[qid].ent.op_payload); + } +} +#endif + static void fuse_export_shutdown(BlockExport *blk_exp) { FuseExport *exp =3D container_of(blk_exp, FuseExport, common); @@ -777,11 +795,6 @@ static void fuse_export_shutdown(BlockExport *blk_exp) */ g_hash_table_remove(exports, exp->mountpoint); } -} - -static void fuse_export_delete(BlockExport *blk_exp) -{ - FuseExport *exp =3D container_of(blk_exp, FuseExport, common); for (int i =3D 0; i < exp->num_queues; i++) { FuseQueue *q =3D &exp->queues[i]; @@ -790,11 +803,7 @@ static void fuse_export_delete(BlockExport *blk_exp) if (i > 0 && q->fuse_fd >=3D 0) { close(q->fuse_fd); } - if (q->spillover_buf) { - qemu_vfree(q->spillover_buf); - } } - g_free(exp->queues); if (exp->fuse_session) { if (exp->mounted) { @@ -803,8 +812,29 @@ static void fuse_export_delete(BlockExport *blk_exp) fuse_session_destroy(exp->fuse_session); } +} + +static void fuse_export_delete(BlockExport *blk_exp) +{ + FuseExport *exp =3D container_of(blk_exp, FuseExport, common); + + for (int i =3D 0; i < exp->num_queues; i++) { + FuseQueue *q =3D &exp->queues[i]; + + if (q->spillover_buf) { + qemu_vfree(q->spillover_buf); + } + } g_free(exp->mountpoint); + +#ifdef CONFIG_LINUX_IO_URING + if (exp->is_uring) { + fuse_export_delete_uring(exp); + } +#endif + + g_free(exp->queues); } /** @@ -1755,8 +1785,8 @@ fuse_uring_send_response(FuseRingEnt *ent, uint32_t r= eq_id, ssize_t ret, /* out_header->len =3D ret > 0 ? ret : 0; */ ent_in_out->payload_sz =3D ret > 0 ? ret : 0; - - qemu_vfree(spillover_buf); + /* Commit and fetch a ring entry */ + blk_exp_ref(&exp->common); aio_add_sqe(fuse_uring_prep_sqe_commit, ent, &ent->fuse_cqe_handler); } -- 2.45.2