From nobody Fri Nov 14 16:55:24 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762223712; cv=none; d=zohomail.com; s=zohoarc; b=gCAPBfxdv9fq3KYaUahv6h2LOJMhGhRuYIb9PsovnHnPp8Czc70MO7oQ59k046HcozuLdajD05E/r5avMGZN1kLZz/BxJdxZQdaPeURucG1J8yo6D1voOk9LcXIQy+6AQaQrqJRZsAKr3b7qfFRGGVA5jUS4AKqSvLxPe3G8db8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762223712; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=zQ1JudMm7bKHTZ9ZI5j4eCSVpiXjE8mvU+hhW2Wueqs=; b=JrBHqbpvNy5625ZfAr0Ea5oNtP2OtTNbYgbM7wy9nkNIiP0K4EaxPPSYheutmQSndQycUIMfm/s9EsGFsBjz2Ur0AkS7vcAMOQhwua/qVqoJW4QM+x9TA1KmHdzip5kUE5mQrmNCIzyfszjbPEB604jvVSa67hil+T+8lSJz4Ls= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762223712298288.64065940947887; Mon, 3 Nov 2025 18:35:12 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vG6nv-0006xf-7g; Mon, 03 Nov 2025 21:29:51 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6nt-0006wR-Aj for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:29:49 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6np-0003sw-PY for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:29:49 -0500 Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-257-981FXoFGNA-y8Y7Q2U_5zg-1; Mon, 03 Nov 2025 21:29:41 -0500 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 33E2919560B1; Tue, 4 Nov 2025 02:29:40 +0000 (UTC) Received: from localhost (unknown [10.2.16.6]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 49C4030001A1; Tue, 4 Nov 2025 02:29:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762223384; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zQ1JudMm7bKHTZ9ZI5j4eCSVpiXjE8mvU+hhW2Wueqs=; b=NZNgqzhv0M5STXlZRnj6/mHjQY0VbdhRAmHWHhicVF1j+07No6zPo+DzUf4IUb2HjVNiXh 5ZFARVCCnqro/W81sBeGXtg8OfOdiGH7UGm7CYb0nmkxHaP6OX/JcaNXRfEU+y0TAJU6sj g/YnKJBgiZDwEF64b9vDllThN3lxX2s= X-MC-Unique: 981FXoFGNA-y8Y7Q2U_5zg-1 X-Mimecast-MFC-AGG-ID: 981FXoFGNA-y8Y7Q2U_5zg_1762223380 From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Stefan Hajnoczi , Kevin Wolf , eblake@redhat.com, Hanna Czenczek , qemu-block@nongnu.org, Paolo Bonzini , Fam Zheng , hibriansong@gmail.com Subject: [PATCH v6 01/15] aio-posix: fix race between io_uring CQE and AioHandler deletion Date: Mon, 3 Nov 2025 21:29:19 -0500 Message-ID: <20251104022933.618123-2-stefanha@redhat.com> In-Reply-To: <20251104022933.618123-1-stefanha@redhat.com> References: <20251104022933.618123-1-stefanha@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=stefanha@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762223718391158500 Content-Type: text/plain; charset="utf-8" When an AioHandler is enqueued on ctx->submit_list for removal, the fill_sq_ring() function will submit an io_uring POLL_REMOVE operation to cancel the in-flight POLL_ADD operation. There is a race when another thread enqueues an AioHandler for deletion on ctx->submit_list when the POLL_ADD CQE has already appeared. In that case POLL_REMOVE is unnecessary. The code already handled this, but forgot that the AioHandler itself is still on ctx->submit_list when the POLL_ADD CQE is being processed. It's unsafe to delete the AioHandler at that point in time (use-after-free). Solve this problem by keeping the AioHandler alive but setting a flag so that it will be deleted by fill_sq_ring() when it runs. Signed-off-by: Stefan Hajnoczi Reviewed-by: Eric Blake Reviewed-by: Kevin Wolf --- util/fdmon-io_uring.c | 33 ++++++++++++++++++++++++++------- 1 file changed, 26 insertions(+), 7 deletions(-) diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c index b0d68bdc44..ad89160f31 100644 --- a/util/fdmon-io_uring.c +++ b/util/fdmon-io_uring.c @@ -52,9 +52,10 @@ enum { FDMON_IO_URING_ENTRIES =3D 128, /* sq/cq ring size */ =20 /* AioHandler::flags */ - FDMON_IO_URING_PENDING =3D (1 << 0), - FDMON_IO_URING_ADD =3D (1 << 1), - FDMON_IO_URING_REMOVE =3D (1 << 2), + FDMON_IO_URING_PENDING =3D (1 << 0), + FDMON_IO_URING_ADD =3D (1 << 1), + FDMON_IO_URING_REMOVE =3D (1 << 2), + FDMON_IO_URING_DELETE_AIO_HANDLER =3D (1 << 3), }; =20 static inline int poll_events_from_pfd(int pfd_events) @@ -218,6 +219,16 @@ static void fill_sq_ring(AioContext *ctx) if (flags & FDMON_IO_URING_REMOVE) { add_poll_remove_sqe(ctx, node); } + if (flags & FDMON_IO_URING_DELETE_AIO_HANDLER) { + /* + * process_cqe() sets this flag after ADD and REMOVE have been + * cleared. They cannot be set again, so they must be clear. + */ + assert(!(flags & FDMON_IO_URING_ADD)); + assert(!(flags & FDMON_IO_URING_REMOVE)); + + QLIST_INSERT_HEAD_RCU(&ctx->deleted_aio_handlers, node, node_d= eleted); + } } } =20 @@ -241,7 +252,12 @@ static bool process_cqe(AioContext *ctx, */ flags =3D qatomic_fetch_and(&node->flags, ~FDMON_IO_URING_REMOVE); if (flags & FDMON_IO_URING_REMOVE) { - QLIST_INSERT_HEAD_RCU(&ctx->deleted_aio_handlers, node, node_delet= ed); + if (flags & FDMON_IO_URING_PENDING) { + /* Still on ctx->submit_list, defer deletion until fill_sq_rin= g() */ + qatomic_or(&node->flags, FDMON_IO_URING_DELETE_AIO_HANDLER); + } else { + QLIST_INSERT_HEAD_RCU(&ctx->deleted_aio_handlers, node, node_d= eleted); + } return false; } =20 @@ -347,10 +363,13 @@ void fdmon_io_uring_destroy(AioContext *ctx) unsigned flags =3D qatomic_fetch_and(&node->flags, ~(FDMON_IO_URING_PENDING | FDMON_IO_URING_ADD | - FDMON_IO_URING_REMOVE)); + FDMON_IO_URING_REMOVE | + FDMON_IO_URING_DELETE_AIO_HANDLER)); =20 - if (flags & FDMON_IO_URING_REMOVE) { - QLIST_INSERT_HEAD_RCU(&ctx->deleted_aio_handlers, node, no= de_deleted); + if ((flags & FDMON_IO_URING_REMOVE) || + (flags & FDMON_IO_URING_DELETE_AIO_HANDLER)) { + QLIST_INSERT_HEAD_RCU(&ctx->deleted_aio_handlers, + node, node_deleted); } =20 QSLIST_REMOVE_HEAD_RCU(&ctx->submit_list, node_submitted); --=20 2.51.1 From nobody Fri Nov 14 16:55:24 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762223442; cv=none; d=zohomail.com; s=zohoarc; b=S07Fvs/O+DUIxF0JcjN+jAKXb5JU6a+MwvjiqhNKbBb9A8DgcP2f2+Fl/NaJoXzlNPVin8PjE0tQM1GgXE20CjnHOnUjpJFyGXLWgvnlCKjF/8wMU8kgpVCytgAkt0qJkysy+//ifrEIOSmxrY/TbqvwqA5nBTU1GwgZozEKscw= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762223442; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=A4eBsQ02JlkSujwNhtGdbX6FLAtNyGzdAxZ9dBj/Igk=; b=T36WO4+Knq9tzfbY7u6wbFBPHAO1b8sQTK8QgHi4AZFwRpX9grLoqRuFrpTkXVg106ryXcij//y0Ef+F92DEwygmEMV0gQd8pc9xAWCiFHSYqpdpUNPXugPvTDP5qUjUiassvmYgk9A4Z4JenQ+/qUq/CEq6Rlb1AEMKumrk/+s= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762223442396373.70755250478976; Mon, 3 Nov 2025 18:30:42 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vG6nw-0006y7-97; Mon, 03 Nov 2025 21:29:52 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6nu-0006x6-7j for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:29:50 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6ns-0003tH-Ml for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:29:49 -0500 Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-478-Fdv-DBgTP4KDefbXmchejA-1; Mon, 03 Nov 2025 21:29:44 -0500 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id DDE301800245; Tue, 4 Nov 2025 02:29:42 +0000 (UTC) Received: from localhost (unknown [10.2.16.6]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 104E21956056; Tue, 4 Nov 2025 02:29:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762223387; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=A4eBsQ02JlkSujwNhtGdbX6FLAtNyGzdAxZ9dBj/Igk=; b=YB6JoYfaywuEa2tN7S4xkNo83yRR6P9nD3CcKqRxMfQMUEzfG88KsrHePTbzZV0I3AX0+Q zHccZ9Sgnwyy+HCEMyrhzt6Ylv83edFil+R6OHIvXzNcccTw/POA4CwW3BOTQD6nRZC3cW Bz0dHNcbBCOWq0daulZniA32shCtyS0= X-MC-Unique: Fdv-DBgTP4KDefbXmchejA-1 X-Mimecast-MFC-AGG-ID: Fdv-DBgTP4KDefbXmchejA_1762223383 From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Stefan Hajnoczi , Kevin Wolf , eblake@redhat.com, Hanna Czenczek , qemu-block@nongnu.org, Paolo Bonzini , Fam Zheng , hibriansong@gmail.com Subject: [PATCH v6 02/15] aio-posix: fix fdmon-io_uring.c timeout stack variable lifetime Date: Mon, 3 Nov 2025 21:29:20 -0500 Message-ID: <20251104022933.618123-3-stefanha@redhat.com> In-Reply-To: <20251104022933.618123-1-stefanha@redhat.com> References: <20251104022933.618123-1-stefanha@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=stefanha@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762223450490158500 Content-Type: text/plain; charset="utf-8" io_uring_prep_timeout() stashes a pointer to the timespec struct rather than copying its fields. That means the struct must live until after the SQE has been submitted by io_uring_enter(2). add_timeout_sqe() violates this constraint because the SQE is not submitted within the function. Inline add_timeout_sqe() into fdmon_io_uring_wait() so that the struct lives at least as long as io_uring_enter(2). This fixes random hangs (bogus timeout values) when the kernel loads undefined timespec struct values from userspace after the original struct on the stack has been destroyed. Reported-by: Kevin Wolf Signed-off-by: Stefan Hajnoczi --- util/fdmon-io_uring.c | 27 ++++++++++++--------------- 1 file changed, 12 insertions(+), 15 deletions(-) diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c index ad89160f31..b64ce42513 100644 --- a/util/fdmon-io_uring.c +++ b/util/fdmon-io_uring.c @@ -188,20 +188,6 @@ static void add_poll_remove_sqe(AioContext *ctx, AioHa= ndler *node) io_uring_sqe_set_data(sqe, NULL); } =20 -/* Add a timeout that self-cancels when another cqe becomes ready */ -static void add_timeout_sqe(AioContext *ctx, int64_t ns) -{ - struct io_uring_sqe *sqe; - struct __kernel_timespec ts =3D { - .tv_sec =3D ns / NANOSECONDS_PER_SECOND, - .tv_nsec =3D ns % NANOSECONDS_PER_SECOND, - }; - - sqe =3D get_sqe(ctx); - io_uring_prep_timeout(sqe, &ts, 1, 0); - io_uring_sqe_set_data(sqe, NULL); -} - /* Add sqes from ctx->submit_list for submission */ static void fill_sq_ring(AioContext *ctx) { @@ -291,13 +277,24 @@ static int process_cq_ring(AioContext *ctx, AioHandle= rList *ready_list) static int fdmon_io_uring_wait(AioContext *ctx, AioHandlerList *ready_list, int64_t timeout) { + struct __kernel_timespec ts; unsigned wait_nr =3D 1; /* block until at least one cqe is ready */ int ret; =20 if (timeout =3D=3D 0) { wait_nr =3D 0; /* non-blocking */ } else if (timeout > 0) { - add_timeout_sqe(ctx, timeout); + /* Add a timeout that self-cancels when another cqe becomes ready = */ + struct io_uring_sqe *sqe; + + ts =3D (struct __kernel_timespec){ + .tv_sec =3D timeout / NANOSECONDS_PER_SECOND, + .tv_nsec =3D timeout % NANOSECONDS_PER_SECOND, + }; + + sqe =3D get_sqe(ctx); + io_uring_prep_timeout(sqe, &ts, 1, 0); + io_uring_sqe_set_data(sqe, NULL); } =20 fill_sq_ring(ctx); --=20 2.51.1 From nobody Fri Nov 14 16:55:24 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762223543; cv=none; d=zohomail.com; s=zohoarc; b=IHToQy7zjbugwTpzrdJEElGEshfQBJg2uV4Mcc5UUOgbvcIOucmSbEPTxYsFUiqS+IH4nKefABz9MBTpjIvA7gcwg6fJxPtDcnxTgc2aH5LKag6cK7qqIG+aY400z/ULpO1gsGz98ZEAp/oljxPits3cTe82ZFyaApDiDRPXxak= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762223543; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=0oFHTWnYvTpTuorWMlOYX+CfxVfYYLPx7SRGFBc8MRQ=; b=m5Ff/AZRasFmolyWF5vS5riYSrkVU+0EZoplp0Wyu312Wg65HoKbW4v9Z4CnvtvCcv/N9dXQ8lqotImDrVJKy0wLgLNbwcQbhpAJBIUyOeRZ8YHDqOeHUCzYI552TXUqdmqCgYsWXMwUAyUZbN+eYKqeipxGjVCXDroMb2s99x8= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762223543520812.1373522612879; Mon, 3 Nov 2025 18:32:23 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vG6nz-0006yr-4o; Mon, 03 Nov 2025 21:29:55 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6nx-0006yc-E3 for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:29:53 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6nv-0003uf-Ni for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:29:53 -0500 Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-244-1ayQDKlDP5m2roXM0NQamg-1; Mon, 03 Nov 2025 21:29:47 -0500 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id C05711956053; Tue, 4 Nov 2025 02:29:45 +0000 (UTC) Received: from localhost (unknown [10.2.16.6]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id CBE5719560A2; Tue, 4 Nov 2025 02:29:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762223391; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0oFHTWnYvTpTuorWMlOYX+CfxVfYYLPx7SRGFBc8MRQ=; b=FtWIu86acJ2kiviKWZ1/b6kBwta+oxDT6z9F6b75FiTndM2PrWW7w4XO2QbHt3vIwJVMUK ngnzwLwk6eJF2CXoO+rQAErqb6opiTk+FeefxDRWUbJ39rYkEabB1RLVPegZXN8a+APo8Z WDwmFMeTtENTa8n33hkPz54Zo9e9PI8= X-MC-Unique: 1ayQDKlDP5m2roXM0NQamg-1 X-Mimecast-MFC-AGG-ID: 1ayQDKlDP5m2roXM0NQamg_1762223386 From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Stefan Hajnoczi , Kevin Wolf , eblake@redhat.com, Hanna Czenczek , qemu-block@nongnu.org, Paolo Bonzini , Fam Zheng , hibriansong@gmail.com Subject: [PATCH v6 03/15] aio-posix: fix spurious return from ->wait() due to signals Date: Mon, 3 Nov 2025 21:29:21 -0500 Message-ID: <20251104022933.618123-4-stefanha@redhat.com> In-Reply-To: <20251104022933.618123-1-stefanha@redhat.com> References: <20251104022933.618123-1-stefanha@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=stefanha@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762223551352158500 Content-Type: text/plain; charset="utf-8" io_uring_enter(2) only returns -EINTR in some cases when interrupted by a signal. Therefore the while loop in fdmon_io_uring_wait() is incomplete and can lead to a spurious early return. Handle the case when a signal interrupts io_uring_enter(2) but the syscall returns the number of SQEs submitted (that takes priority over -EINTR). This patch probably makes little difference for QEMU, but the test suite relies on the exact pattern of aio_poll() return values, so it's best to hide this io_uring syscall interface quirk. Here is the strace of test-aio receiving 3 SIGCONT signals after this fix has been applied. Notice how the io_uring_enter(2) return value is 1 the first time because an SQE was submitted, but -EINTR the other times: eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK) =3D 9 io_uring_enter(7, 1, 0, 0, NULL, 8) =3D 1 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=3D1, tv_nsec=3D0}, 0x7ffe38a46= 240) =3D 0 io_uring_enter(7, 1, 1, IORING_ENTER_GETEVENTS, NULL, 8) =3D 1 --- SIGCONT {si_signo=3DSIGCONT, si_code=3DSI_USER, si_pid=3D596096, si_u= id=3D1000} --- io_uring_enter(7, 0, 1, IORING_ENTER_GETEVENTS, NULL, 8) =3D -1 EINTR (In= terrupted system call) --- SIGCONT {si_signo=3DSIGCONT, si_code=3DSI_USER, si_pid=3D596096, si_u= id=3D1000} --- io_uring_enter(7, 0, 1, IORING_ENTER_GETEVENTS, NULL, 8 <... io_uring_enter resumed>) =3D -1 EINTR (Interrupted system call) --- SIGCONT {si_signo=3DSIGCONT, si_code=3DSI_USER, si_pid=3D596096, si_u= id=3D1000} --- io_uring_enter(7, 0, 1, IORING_ENTER_GETEVENTS, NULL, 8 <... io_uring_enter resumed>) =3D 0 Reported-by: Kevin Wolf Signed-off-by: Stefan Hajnoczi --- util/fdmon-io_uring.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c index b64ce42513..3d8638b0e5 100644 --- a/util/fdmon-io_uring.c +++ b/util/fdmon-io_uring.c @@ -299,9 +299,16 @@ static int fdmon_io_uring_wait(AioContext *ctx, AioHan= dlerList *ready_list, =20 fill_sq_ring(ctx); =20 + /* + * Loop to handle signals in both cases: + * 1. If no SQEs were submitted, then -EINTR is returned. + * 2. If SQEs were submitted then the number of SQEs submitted is retu= rned + * rather than -EINTR. + */ do { ret =3D io_uring_submit_and_wait(&ctx->fdmon_io_uring, wait_nr); - } while (ret =3D=3D -EINTR); + } while (ret =3D=3D -EINTR || + (ret >=3D 0 && wait_nr > io_uring_cq_ready(&ctx->fdmon_io_uri= ng))); =20 assert(ret >=3D 0); =20 --=20 2.51.1 From nobody Fri Nov 14 16:55:24 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762223576; cv=none; d=zohomail.com; s=zohoarc; b=WtQaZpH8SGlX6bOtvY9cpKZGgGSLE/cmwO3JQ7PIcw+XGH25TTjIJOeI5lu7goKOjJm063LASN8nvzcuSyFUswusNbsDBys7CVZE9I5US72QBwKMVRmsPI1+GPL+x2tkQE8EaxsqWZIJamJkBpjrIO8Unj+9IwRe+a9ckcFMpQk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762223576; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=vRTdLyxKxJkVcnKSS/x409aZoU9Yvqy0aKNN6YyPVZE=; b=jItsYb65tal7DiDfWg2l/04T+OgAkqWYVpI3tsSFDjRukpehhRIXxhGFZvSJrU5KonaAi6PFvYoVwSKO9i07vvDK6k6mQqg6HnZriuRZCOoxbCecDP5CYvxWzYbEBBq3t8/NIXAgml9p0408q9je7+uZWlpuTfRa6HZGB6bAfSs= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762223576579832.8861964806601; Mon, 3 Nov 2025 18:32:56 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vG6o2-000700-Ic; Mon, 03 Nov 2025 21:29:58 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6o0-0006zG-Hj for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:29:56 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6ny-0003vM-5e for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:29:56 -0500 Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-549-8d7CR2_NP6qJWmY_GBpr6A-1; Mon, 03 Nov 2025 21:29:49 -0500 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 52C23180035A; Tue, 4 Nov 2025 02:29:48 +0000 (UTC) Received: from localhost (unknown [10.2.16.6]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 4740719560A2; Tue, 4 Nov 2025 02:29:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762223393; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vRTdLyxKxJkVcnKSS/x409aZoU9Yvqy0aKNN6YyPVZE=; b=XUahF7j/rjNQcofxVblShDQ7JRlCoC+cRpAQTYug6gpW9hpsGPgrfbC6Nd9rfcSw0PR5mK vpIuRQnI1rrxyJDgMWxnEQYoHK33tWpQ+8BnBAFbpqiWR6lrNgXgmskgBUARuyXHY3YSty 5Wl4UFv0fJHiNE46VhmyddUNolBsGQ8= X-MC-Unique: 8d7CR2_NP6qJWmY_GBpr6A-1 X-Mimecast-MFC-AGG-ID: 8d7CR2_NP6qJWmY_GBpr6A_1762223388 From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Stefan Hajnoczi , Kevin Wolf , eblake@redhat.com, Hanna Czenczek , qemu-block@nongnu.org, Paolo Bonzini , Fam Zheng , hibriansong@gmail.com, Chao Gao Subject: [PATCH v6 04/15] aio-posix: keep polling enabled with fdmon-io_uring.c Date: Mon, 3 Nov 2025 21:29:22 -0500 Message-ID: <20251104022933.618123-5-stefanha@redhat.com> In-Reply-To: <20251104022933.618123-1-stefanha@redhat.com> References: <20251104022933.618123-1-stefanha@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=stefanha@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762223577744154100 Content-Type: text/plain; charset="utf-8" Commit 816a430c517e ("util/aio: Defer disabling poll mode as long as possible") kept polling enabled when the event loop timeout is 0. Since there is no timeout the event loop will continue immediately and the overhead of disabling and re-enabling polling can be avoided. fdmon-io_uring.c is unable to take advantage of this optimization because its ->need_wait() function returns true whenever there are new io_uring SQEs to submit: if (timeout || ctx->fdmon_ops->need_wait(ctx)) { ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Polling will be disabled even when timeout =3D=3D 0. Extend the optimization to handle the case when need_wait() returns true and timeout =3D=3D 0. Cc: Chao Gao Signed-off-by: Stefan Hajnoczi Reviewed-by: Eric Blake Reviewed-by: Kevin Wolf --- util/aio-posix.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/util/aio-posix.c b/util/aio-posix.c index 2e0a5dadc4..824fdc34cc 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -559,7 +559,14 @@ static bool run_poll_handlers(AioContext *ctx, AioHand= lerList *ready_list, elapsed_time =3D qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start_ti= me; max_ns =3D qemu_soonest_timeout(*timeout, max_ns); assert(!(max_ns && progress)); - } while (elapsed_time < max_ns && !ctx->fdmon_ops->need_wait(ctx)); + + if (ctx->fdmon_ops->need_wait(ctx)) { + if (fdmon_supports_polling(ctx)) { + *timeout =3D 0; /* stay in polling mode */ + } + break; + } + } while (elapsed_time < max_ns); =20 if (remove_idle_poll_handlers(ctx, ready_list, start_time + elapsed_time)) { @@ -722,7 +729,7 @@ bool aio_poll(AioContext *ctx, bool blocking) * up IO threads when some work becomes pending. It is essential to * avoid hangs or unnecessary latency. */ - if (poll_set_started(ctx, &ready_list, false)) { + if (timeout && poll_set_started(ctx, &ready_list, false)) { timeout =3D 0; progress =3D true; } --=20 2.51.1 From nobody Fri Nov 14 16:55:24 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762223587; cv=none; d=zohomail.com; s=zohoarc; b=ekgZ4xRgOiw6UgDwCPBLXpYlVboiVrmHbXj+PGxsmNkA1LLRpWrUVhslcHpofw8G6zwVEBiVu+FZV0eBrGRrw81LgDK9XdfWH+ynxUPGRuK8p4mxMA47vCLugIqdDFAvBtnLUdsFlH8aKPTU+HjuNBBQ+Z3GOuHyw4oDEnX5krY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762223587; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=BFgAfbyRpaC5Mdwyn5aXksu3BTrexte5lv41Z9g7slo=; b=kk8X2/D6saakcraj1Qto61qVa9MjbUVjFrTtwUgtyLg/aAo3KX8SUQs6SOuix7Edlcnffd3q4bdNs9iL/nZBR2CRfylozNyPqVRTa6oU+K/RkQmVXKpOdScBAXfREEopFSIeogdnwadnyLp+glI5gAGjhn62hNlAvP+5SYH+HVs= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762223587704119.34140508728387; Mon, 3 Nov 2025 18:33:07 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vG6oC-00072C-4b; Mon, 03 Nov 2025 21:30:09 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6o5-00070e-FW for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:30:01 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6o3-0003wQ-Lv for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:30:01 -0500 Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-456-IV-GfXgaMlGef0pb1SQURw-1; Mon, 03 Nov 2025 21:29:57 -0500 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A86721800654; Tue, 4 Nov 2025 02:29:50 +0000 (UTC) Received: from localhost (unknown [10.2.16.6]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 0A9CC1800451; Tue, 4 Nov 2025 02:29:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762223399; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BFgAfbyRpaC5Mdwyn5aXksu3BTrexte5lv41Z9g7slo=; b=e2ILoVLzG157XmhWJajnjjKkzAjNk1fwDAg0fTGGPJJ2ukkTd0QZb/+R1yBGVzBmLb9uLZ +WNuvM2Iwniflbue0d6Gqn33ZzLrjENH44CepJ+oQBMFj2CsrvSTTbfOQ2NoNkgOD8C0v/ uRGuXfE34BUeujLXwQEC4hFeNUjXN1w= X-MC-Unique: IV-GfXgaMlGef0pb1SQURw-1 X-Mimecast-MFC-AGG-ID: IV-GfXgaMlGef0pb1SQURw_1762223390 From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Stefan Hajnoczi , Kevin Wolf , eblake@redhat.com, Hanna Czenczek , qemu-block@nongnu.org, Paolo Bonzini , Fam Zheng , hibriansong@gmail.com Subject: [PATCH v6 05/15] tests/unit: skip test-nested-aio-poll with io_uring Date: Mon, 3 Nov 2025 21:29:23 -0500 Message-ID: <20251104022933.618123-6-stefanha@redhat.com> In-Reply-To: <20251104022933.618123-1-stefanha@redhat.com> References: <20251104022933.618123-1-stefanha@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=stefanha@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762223589891154100 Content-Type: text/plain; charset="utf-8" test-nested-aio-poll relies on internal details of how fdmon-poll.c handles AioContext polling. Skip it when other fdmon implementations are in use. The reason why fdmon-io_uring.c behaves differently from fdmon-poll.c is that its fdmon_ops->need_wait() function returns true when io_uring_enter(2) must be called (e.g. to submit pending SQEs). AioContext polling is skipped when ->need_wait() returns true, so the test case will never enter AioContext polling mode with fdmon-io_uring.c. Restrict this test to fdmon-poll.c and drop the aio_context_use_g_source() call since it's no longer necessary. Note that this test is only built on POSIX systems so it is safe to include "util/aio-posix.h". Signed-off-by: Stefan Hajnoczi Reviewed-by: Eric Blake --- v5: - Explain how fdmon-io_uring.c differs from other fdmon implementations in commit message [Kevin] - Move test-nested-aio-poll aio_get_g_source() removal into commit that tou= ches test case [Kevin] --- tests/unit/test-nested-aio-poll.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/tests/unit/test-nested-aio-poll.c b/tests/unit/test-nested-aio= -poll.c index d8fd92c43b..d13ecccd8c 100644 --- a/tests/unit/test-nested-aio-poll.c +++ b/tests/unit/test-nested-aio-poll.c @@ -15,6 +15,7 @@ #include "qemu/osdep.h" #include "block/aio.h" #include "qapi/error.h" +#include "util/aio-posix.h" =20 typedef struct { AioContext *ctx; @@ -71,17 +72,17 @@ static void test(void) .ctx =3D aio_context_new(&error_abort), }; =20 + if (td.ctx->fdmon_ops !=3D &fdmon_poll_ops) { + /* This test is tied to fdmon-poll.c */ + g_test_skip("fdmon_poll_ops not in use"); + return; + } + qemu_set_current_aio_context(td.ctx); =20 /* Enable polling */ aio_context_set_poll_params(td.ctx, 1000000, 2, 2, &error_abort); =20 - /* - * The GSource is unused but this has the side-effect of changing the = fdmon - * that AioContext uses. - */ - aio_get_g_source(td.ctx); - /* Make the event notifier active (set) right away */ event_notifier_init(&td.poll_notifier, 1); aio_set_event_notifier(td.ctx, &td.poll_notifier, --=20 2.51.1 From nobody Fri Nov 14 16:55:24 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762223443; cv=none; d=zohomail.com; s=zohoarc; b=nZvMeYGUbWF7epwZKnDlIF2NPP+K70Kf+eQzsXHK8aJafcB2zhPldWdbwDqYJWr/HbhrFNd2FNgwKCg6d6Zmo7Wz5o0hrRJkwMMe6L0NmE+uineLaxASDSj3cJabmCg+MJ693Cm/EyM8MjR+KEMbVp9qj/RrSKjekka3BatyQ7k= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762223443; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=WMLM1AvrE4ogaTXGPs1h6wX+VSpgpOX3+7letvy9Qtw=; b=iEjRALu3baHc6CHzC6BrTaYHpneWWVWxSGQoM/RcngK9cfp2D4alAAhglckD6836JuSJVdlygTAo5m8cLTdVa4OJNI0T9+siLYbjOUYcCKCY2THxJ2qXIve3b5pVpSg4ExI6ZWlXk+HJM9RXV27+X2nHJuEp0oyGiUkJT2ess24= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762223443005793.7733989782816; Mon, 3 Nov 2025 18:30:43 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vG6oM-0007A0-JM; Mon, 03 Nov 2025 21:30:18 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6o6-00071X-Dg for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:30:03 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6o3-0003wH-10 for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:30:02 -0500 Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-589-oK2ba8DVOeeBeHDRUfkjdA-1; Mon, 03 Nov 2025 21:29:55 -0500 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 011CD180065F; Tue, 4 Nov 2025 02:29:54 +0000 (UTC) Received: from localhost (unknown [10.2.16.6]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 31B5B19560A2; Tue, 4 Nov 2025 02:29:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762223398; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WMLM1AvrE4ogaTXGPs1h6wX+VSpgpOX3+7letvy9Qtw=; b=aA4VYI0WXVgbtuDSsAzsaeYyzZcx+92O/FHIRRjRrvIDBB7lPhv8IBXfwoo1jCFF5X7mk0 Z+7MGqVNzelW3nJBioq2+M7WNGCJfc7iNadO19n/EP2VGMU2gg9c1NI53BWRYyJDd0khzx 857SKLE2CvX3VMFhjajKzGNJgH/PBoc= X-MC-Unique: oK2ba8DVOeeBeHDRUfkjdA-1 X-Mimecast-MFC-AGG-ID: oK2ba8DVOeeBeHDRUfkjdA_1762223394 From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Stefan Hajnoczi , Kevin Wolf , eblake@redhat.com, Hanna Czenczek , qemu-block@nongnu.org, Paolo Bonzini , Fam Zheng , hibriansong@gmail.com Subject: [PATCH v6 06/15] aio-posix: integrate fdmon into glib event loop Date: Mon, 3 Nov 2025 21:29:24 -0500 Message-ID: <20251104022933.618123-7-stefanha@redhat.com> In-Reply-To: <20251104022933.618123-1-stefanha@redhat.com> References: <20251104022933.618123-1-stefanha@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=stefanha@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762223446750158500 Content-Type: text/plain; charset="utf-8" AioContext's glib integration only supports ppoll(2) file descriptor monitoring. epoll(7) and io_uring(7) disable themselves and switch back to ppoll(2) when the glib event loop is used. The main loop thread cannot use epoll(7) or io_uring(7) because it always uses the glib event loop. Future QEMU features may require io_uring(7). One example is uring_cmd support in FUSE exports. Each feature could create its own io_uring(7) context and integrate it into the event loop, but this is inefficient due to extra syscalls. It would be more efficient to reuse the AioContext's existing fdmon-io_uring.c io_uring(7) context because fdmon-io_uring.c will already be active on systems where Linux io_uring is available. In order to keep fdmon-io_uring.c's AioContext operational even when the glib event loop is used, extend FDMonOps with an API similar to GSourceFuncs so that file descriptor monitoring can integrate into the glib event loop. A quick summary of the GSourceFuncs API: - prepare() is called each event loop iteration before waiting for file descriptors and timers. - check() is called to determine whether events are ready to be dispatched after waiting. - dispatch() is called to process events. More details here: https://docs.gtk.org/glib/struct.SourceFuncs.html Move the ppoll(2)-specific code from aio-posix.c into fdmon-poll.c and also implement epoll(7)- and io_uring(7)-specific file descriptor monitoring code for glib event loops. Note that it's still faster to use aio_poll() rather than the glib event loop since glib waits for file descriptor activity with ppoll(2) and does not support adaptive polling. But at least epoll(7) and io_uring(7) now work in glib event loops. Splitting this into multiple commits without temporarily breaking AioContext proved difficult so this commit makes all the changes. The next commit will remove the aio_context_use_g_source() API because it is no longer needed. Signed-off-by: Stefan Hajnoczi Reviewed-by: Eric Blake --- v5: - Avoid g_source_add_poll() use-after-free in fdmon_poll_update() [Kevin] - Avoid duplication in fdmon_epoll_gsource_dispatch(), use fdmon_epoll_wait= () [Kevin] - Drop unnecessary revents checks in fdmon_poll_gsource_dispatch() [Kevin] --- include/block/aio.h | 36 ++++++++++++++++++ util/aio-posix.h | 5 +++ tests/unit/test-aio.c | 7 +++- util/aio-posix.c | 69 ++++++++--------------------------- util/fdmon-epoll.c | 34 ++++++++++++++--- util/fdmon-io_uring.c | 44 +++++++++++++++++++++- util/fdmon-poll.c | 85 ++++++++++++++++++++++++++++++++++++++++++- 7 files changed, 218 insertions(+), 62 deletions(-) diff --git a/include/block/aio.h b/include/block/aio.h index 99ff48420b..39ed86d14d 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -106,6 +106,38 @@ typedef struct { * Returns: true if ->wait() should be called, false otherwise. */ bool (*need_wait)(AioContext *ctx); + + /* + * gsource_prepare: + * @ctx: the AioContext + * + * Prepare for the glib event loop to wait for events instead of the u= sual + * ->wait() call. See glib's GSourceFuncs->prepare(). + */ + void (*gsource_prepare)(AioContext *ctx); + + /* + * gsource_check: + * @ctx: the AioContext + * + * Called by the glib event loop from glib's GSourceFuncs->check() aft= er + * waiting for events. + * + * Returns: true when ready to be dispatched. + */ + bool (*gsource_check)(AioContext *ctx); + + /* + * gsource_dispatch: + * @ctx: the AioContext + * @ready_list: list for handlers that become ready + * + * Place ready AioHandlers on ready_list. Called as part of the glib e= vent + * loop from glib's GSourceFuncs->dispatch(). + * + * Called with list_lock incremented. + */ + void (*gsource_dispatch)(AioContext *ctx, AioHandlerList *ready_list); } FDMonOps; =20 /* @@ -222,6 +254,7 @@ struct AioContext { /* State for file descriptor monitoring using Linux io_uring */ struct io_uring fdmon_io_uring; AioHandlerSList submit_list; + gpointer io_uring_fd_tag; #endif =20 /* TimerLists for calling timers - one per clock type. Has its own @@ -254,6 +287,9 @@ struct AioContext { /* epoll(7) state used when built with CONFIG_EPOLL */ int epollfd; =20 + /* The GSource unix fd tag for epollfd */ + gpointer epollfd_tag; + const FDMonOps *fdmon_ops; }; =20 diff --git a/util/aio-posix.h b/util/aio-posix.h index 82a0201ea4..f9994ed79e 100644 --- a/util/aio-posix.h +++ b/util/aio-posix.h @@ -47,9 +47,14 @@ void aio_add_ready_handler(AioHandlerList *ready_list, A= ioHandler *node, =20 extern const FDMonOps fdmon_poll_ops; =20 +/* Switch back to poll(2). list_lock must be held. */ +void fdmon_poll_downgrade(AioContext *ctx); + #ifdef CONFIG_EPOLL_CREATE1 bool fdmon_epoll_try_upgrade(AioContext *ctx, unsigned npfd); void fdmon_epoll_setup(AioContext *ctx); + +/* list_lock must be held */ void fdmon_epoll_disable(AioContext *ctx); #else static inline bool fdmon_epoll_try_upgrade(AioContext *ctx, unsigned npfd) diff --git a/tests/unit/test-aio.c b/tests/unit/test-aio.c index e77d86be87..010d65b79a 100644 --- a/tests/unit/test-aio.c +++ b/tests/unit/test-aio.c @@ -527,7 +527,12 @@ static void test_source_bh_delete_from_cb(void) g_assert_cmpint(data1.n, =3D=3D, data1.max); g_assert(data1.bh =3D=3D NULL); =20 - assert(g_main_context_iteration(NULL, false)); + /* + * There may be up to one more iteration due to the aio_notify + * EventNotifier. + */ + g_main_context_iteration(NULL, false); + assert(!g_main_context_iteration(NULL, false)); } =20 diff --git a/util/aio-posix.c b/util/aio-posix.c index 824fdc34cc..9de05ee7e8 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -70,15 +70,6 @@ static AioHandler *find_aio_handler(AioContext *ctx, int= fd) =20 static bool aio_remove_fd_handler(AioContext *ctx, AioHandler *node) { - /* If the GSource is in the process of being destroyed then - * g_source_remove_poll() causes an assertion failure. Skip - * removal in that case, because glib cleans up its state during - * destruction anyway. - */ - if (!g_source_is_destroyed(&ctx->source)) { - g_source_remove_poll(&ctx->source, &node->pfd); - } - node->pfd.revents =3D 0; node->poll_ready =3D false; =20 @@ -153,7 +144,6 @@ void aio_set_fd_handler(AioContext *ctx, } else { new_node->pfd =3D node->pfd; } - g_source_add_poll(&ctx->source, &new_node->pfd); =20 new_node->pfd.events =3D (io_read ? G_IO_IN | G_IO_HUP | G_IO_ERR = : 0); new_node->pfd.events |=3D (io_write ? G_IO_OUT | G_IO_ERR : 0); @@ -267,37 +257,13 @@ bool aio_prepare(AioContext *ctx) poll_set_started(ctx, &ready_list, false); /* TODO what to do with this list? */ =20 + ctx->fdmon_ops->gsource_prepare(ctx); return false; } =20 bool aio_pending(AioContext *ctx) { - AioHandler *node; - bool result =3D false; - - /* - * We have to walk very carefully in case aio_set_fd_handler is - * called while we're walking. - */ - qemu_lockcnt_inc(&ctx->list_lock); - - QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { - int revents; - - /* TODO should this check poll ready? */ - revents =3D node->pfd.revents & node->pfd.events; - if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) { - result =3D true; - break; - } - if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write) { - result =3D true; - break; - } - } - qemu_lockcnt_dec(&ctx->list_lock); - - return result; + return ctx->fdmon_ops->gsource_check(ctx); } =20 static void aio_free_deleted_handlers(AioContext *ctx) @@ -390,10 +356,6 @@ static bool aio_dispatch_handler(AioContext *ctx, AioH= andler *node) return progress; } =20 -/* - * If we have a list of ready handlers then this is more efficient than - * scanning all handlers with aio_dispatch_handlers(). - */ static bool aio_dispatch_ready_handlers(AioContext *ctx, AioHandlerList *ready_list, int64_t block_ns) @@ -417,24 +379,18 @@ static bool aio_dispatch_ready_handlers(AioContext *c= tx, return progress; } =20 -/* Slower than aio_dispatch_ready_handlers() but only used via glib */ -static bool aio_dispatch_handlers(AioContext *ctx) -{ - AioHandler *node, *tmp; - bool progress =3D false; - - QLIST_FOREACH_SAFE_RCU(node, &ctx->aio_handlers, node, tmp) { - progress =3D aio_dispatch_handler(ctx, node) || progress; - } - - return progress; -} - void aio_dispatch(AioContext *ctx) { + AioHandlerList ready_list =3D QLIST_HEAD_INITIALIZER(ready_list); + qemu_lockcnt_inc(&ctx->list_lock); aio_bh_poll(ctx); - aio_dispatch_handlers(ctx); + + ctx->fdmon_ops->gsource_dispatch(ctx, &ready_list); + + /* block_ns is 0 because polling is disabled in the glib event loop */ + aio_dispatch_ready_handlers(ctx, &ready_list, 0); + aio_free_deleted_handlers(ctx); qemu_lockcnt_dec(&ctx->list_lock); =20 @@ -766,6 +722,7 @@ void aio_context_setup(AioContext *ctx) { ctx->fdmon_ops =3D &fdmon_poll_ops; ctx->epollfd =3D -1; + ctx->epollfd_tag =3D NULL; =20 /* Use the fastest fd monitoring implementation if available */ if (fdmon_io_uring_setup(ctx)) { @@ -778,7 +735,11 @@ void aio_context_setup(AioContext *ctx) void aio_context_destroy(AioContext *ctx) { fdmon_io_uring_destroy(ctx); + + qemu_lockcnt_lock(&ctx->list_lock); fdmon_epoll_disable(ctx); + qemu_lockcnt_unlock(&ctx->list_lock); + aio_free_deleted_handlers(ctx); } =20 diff --git a/util/fdmon-epoll.c b/util/fdmon-epoll.c index 9fb8800dde..61118e1ee6 100644 --- a/util/fdmon-epoll.c +++ b/util/fdmon-epoll.c @@ -19,8 +19,12 @@ void fdmon_epoll_disable(AioContext *ctx) ctx->epollfd =3D -1; } =20 - /* Switch back */ - ctx->fdmon_ops =3D &fdmon_poll_ops; + if (ctx->epollfd_tag) { + g_source_remove_unix_fd(&ctx->source, ctx->epollfd_tag); + ctx->epollfd_tag =3D NULL; + } + + fdmon_poll_downgrade(ctx); } =20 static inline int epoll_events_from_pfd(int pfd_events) @@ -93,10 +97,29 @@ out: return ret; } =20 +static void fdmon_epoll_gsource_prepare(AioContext *ctx) +{ + /* Do nothing */ +} + +static bool fdmon_epoll_gsource_check(AioContext *ctx) +{ + return g_source_query_unix_fd(&ctx->source, ctx->epollfd_tag) & G_IO_I= N; +} + +static void fdmon_epoll_gsource_dispatch(AioContext *ctx, + AioHandlerList *ready_list) +{ + fdmon_epoll_wait(ctx, ready_list, 0); +} + static const FDMonOps fdmon_epoll_ops =3D { .update =3D fdmon_epoll_update, .wait =3D fdmon_epoll_wait, .need_wait =3D aio_poll_disabled, + .gsource_prepare =3D fdmon_epoll_gsource_prepare, + .gsource_check =3D fdmon_epoll_gsource_check, + .gsource_dispatch =3D fdmon_epoll_gsource_dispatch, }; =20 static bool fdmon_epoll_try_enable(AioContext *ctx) @@ -118,6 +141,8 @@ static bool fdmon_epoll_try_enable(AioContext *ctx) } =20 ctx->fdmon_ops =3D &fdmon_epoll_ops; + ctx->epollfd_tag =3D g_source_add_unix_fd(&ctx->source, ctx->epollfd, + G_IO_IN); return true; } =20 @@ -139,12 +164,11 @@ bool fdmon_epoll_try_upgrade(AioContext *ctx, unsigne= d npfd) } =20 ok =3D fdmon_epoll_try_enable(ctx); - - qemu_lockcnt_inc_and_unlock(&ctx->list_lock); - if (!ok) { fdmon_epoll_disable(ctx); } + + qemu_lockcnt_inc_and_unlock(&ctx->list_lock); return ok; } =20 diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c index 3d8638b0e5..0a5ec5ead6 100644 --- a/util/fdmon-io_uring.c +++ b/util/fdmon-io_uring.c @@ -262,6 +262,11 @@ static int process_cq_ring(AioContext *ctx, AioHandler= List *ready_list) unsigned num_ready =3D 0; unsigned head; =20 + /* If the CQ overflowed then fetch CQEs with a syscall */ + if (io_uring_cq_has_overflow(ring)) { + io_uring_get_events(ring); + } + io_uring_for_each_cqe(ring, head, cqe) { if (process_cqe(ctx, ready_list, cqe)) { num_ready++; @@ -274,6 +279,30 @@ static int process_cq_ring(AioContext *ctx, AioHandler= List *ready_list) return num_ready; } =20 +/* This is where SQEs are submitted in the glib event loop */ +static void fdmon_io_uring_gsource_prepare(AioContext *ctx) +{ + fill_sq_ring(ctx); + if (io_uring_sq_ready(&ctx->fdmon_io_uring)) { + while (io_uring_submit(&ctx->fdmon_io_uring) =3D=3D -EINTR) { + /* Keep trying if syscall was interrupted */ + } + } +} + +static bool fdmon_io_uring_gsource_check(AioContext *ctx) +{ + gpointer tag =3D ctx->io_uring_fd_tag; + return g_source_query_unix_fd(&ctx->source, tag) & G_IO_IN; +} + +/* This is where CQEs are processed in the glib event loop */ +static void fdmon_io_uring_gsource_dispatch(AioContext *ctx, + AioHandlerList *ready_list) +{ + process_cq_ring(ctx, ready_list); +} + static int fdmon_io_uring_wait(AioContext *ctx, AioHandlerList *ready_list, int64_t timeout) { @@ -339,12 +368,17 @@ static const FDMonOps fdmon_io_uring_ops =3D { .update =3D fdmon_io_uring_update, .wait =3D fdmon_io_uring_wait, .need_wait =3D fdmon_io_uring_need_wait, + .gsource_prepare =3D fdmon_io_uring_gsource_prepare, + .gsource_check =3D fdmon_io_uring_gsource_check, + .gsource_dispatch =3D fdmon_io_uring_gsource_dispatch, }; =20 bool fdmon_io_uring_setup(AioContext *ctx) { int ret; =20 + ctx->io_uring_fd_tag =3D NULL; + ret =3D io_uring_queue_init(FDMON_IO_URING_ENTRIES, &ctx->fdmon_io_uri= ng, 0); if (ret !=3D 0) { return false; @@ -352,6 +386,9 @@ bool fdmon_io_uring_setup(AioContext *ctx) =20 QSLIST_INIT(&ctx->submit_list); ctx->fdmon_ops =3D &fdmon_io_uring_ops; + ctx->io_uring_fd_tag =3D g_source_add_unix_fd(&ctx->source, + ctx->fdmon_io_uring.ring_fd, G_IO_IN); + return true; } =20 @@ -379,6 +416,11 @@ void fdmon_io_uring_destroy(AioContext *ctx) QSLIST_REMOVE_HEAD_RCU(&ctx->submit_list, node_submitted); } =20 - ctx->fdmon_ops =3D &fdmon_poll_ops; + g_source_remove_unix_fd(&ctx->source, ctx->io_uring_fd_tag); + ctx->io_uring_fd_tag =3D NULL; + + qemu_lockcnt_lock(&ctx->list_lock); + fdmon_poll_downgrade(ctx); + qemu_lockcnt_unlock(&ctx->list_lock); } } diff --git a/util/fdmon-poll.c b/util/fdmon-poll.c index 17df917cf9..0ae755cc13 100644 --- a/util/fdmon-poll.c +++ b/util/fdmon-poll.c @@ -72,6 +72,11 @@ static int fdmon_poll_wait(AioContext *ctx, AioHandlerLi= st *ready_list, =20 /* epoll(7) is faster above a certain number of fds */ if (fdmon_epoll_try_upgrade(ctx, npfd)) { + QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { + if (!QLIST_IS_INSERTED(node, node_deleted) && node->pfd.events= ) { + g_source_remove_poll(&ctx->source, &node->pfd); + } + } npfd =3D 0; /* we won't need pollfds[], reset npfd */ return ctx->fdmon_ops->wait(ctx, ready_list, timeout); } @@ -97,11 +102,89 @@ static void fdmon_poll_update(AioContext *ctx, AioHandler *old_node, AioHandler *new_node) { - /* Do nothing, AioHandler already contains the state we'll need */ + if (old_node) { + /* + * If the GSource is in the process of being destroyed then + * g_source_remove_poll() causes an assertion failure. Skip remov= al in + * that case, because glib cleans up its state during destruction + * anyway. + */ + if (!g_source_is_destroyed(&ctx->source)) { + g_source_remove_poll(&ctx->source, &old_node->pfd); + } + } + + if (new_node) { + g_source_add_poll(&ctx->source, &new_node->pfd); + } +} + +static void fdmon_poll_gsource_prepare(AioContext *ctx) +{ + /* Do nothing */ +} + +static bool fdmon_poll_gsource_check(AioContext *ctx) +{ + AioHandler *node; + bool result =3D false; + + /* + * We have to walk very carefully in case aio_set_fd_handler is + * called while we're walking. + */ + qemu_lockcnt_inc(&ctx->list_lock); + + QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { + int revents =3D node->pfd.revents & node->pfd.events; + + if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) { + result =3D true; + break; + } + if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write) { + result =3D true; + break; + } + } + + qemu_lockcnt_dec(&ctx->list_lock); + + return result; +} + +static void fdmon_poll_gsource_dispatch(AioContext *ctx, + AioHandlerList *ready_list) +{ + AioHandler *node; + + QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { + int revents =3D node->pfd.revents; + + if (revents) { + aio_add_ready_handler(ready_list, node, revents); + } + } } =20 const FDMonOps fdmon_poll_ops =3D { .update =3D fdmon_poll_update, .wait =3D fdmon_poll_wait, .need_wait =3D aio_poll_disabled, + .gsource_prepare =3D fdmon_poll_gsource_prepare, + .gsource_check =3D fdmon_poll_gsource_check, + .gsource_dispatch =3D fdmon_poll_gsource_dispatch, }; + +void fdmon_poll_downgrade(AioContext *ctx) +{ + AioHandler *node; + + ctx->fdmon_ops =3D &fdmon_poll_ops; + + QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { + if (!QLIST_IS_INSERTED(node, node_deleted) && node->pfd.events) { + g_source_add_poll(&ctx->source, &node->pfd); + } + } +} --=20 2.51.1 From nobody Fri Nov 14 16:55:24 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762223442; cv=none; d=zohomail.com; s=zohoarc; b=KHp0OmuaQiuHgxgaorWjJZqKBNkSGIFPiUEN9bkPxFJxK4U5+LpFNjyyzZEHJZ5wsku43/WbCwgi3V9k7+7PBIzhUv+1ciZBnAL5dn8X0rKCJCYlcZcfuMkrjlzogAWUXM97f0JIwyvGzv/SP8UWdEJjh6saWwod1DefcNLSX6w= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762223442; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=Hls5Ayd/gkfhHgh/tKvWVpGroj5gZuV7yYl44cIqJps=; b=Jjq7T4iV5hstUzW2UMLH+gPQrMTdLvOjCV/qmMyfTEQxufEhQtIGNDn7g+Pjdq8mi8hXwpDhEK2yi/hVFMgHmW1rdLT/ao2nn0fvoZNuQ8nk2sN8i3NHI1IC3yg+0pLhBTrqSO9y5WRj/eXF4VaN7e0BDW99VQ5QrTznZDJ7M5s= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762223442849509.6836035689182; Mon, 3 Nov 2025 18:30:42 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vG6oB-00072B-VV; Mon, 03 Nov 2025 21:30:08 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6o5-00070f-Gr for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:30:01 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6o3-0003wM-CQ for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:30:01 -0500 Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-593-uS0DBJYyNX2FDM7u8_qYjA-1; Mon, 03 Nov 2025 21:29:57 -0500 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 727401956048; Tue, 4 Nov 2025 02:29:56 +0000 (UTC) Received: from localhost (unknown [10.2.16.6]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id DB9A71956056; Tue, 4 Nov 2025 02:29:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762223398; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Hls5Ayd/gkfhHgh/tKvWVpGroj5gZuV7yYl44cIqJps=; b=QOddJ49wDHPvdGkuDC93Bpfe//Xwmdr12JPys9C8YCaHbrarW2DD82rN95LsjatevUGKdb S0Yg87LT3gKJ4+Szq0i+DH0hKWRG58U4gCPHRYIYYu3To6GXY/wjYuEqJqFewkPo6Ily3u BtX3ulovxaa8qagWK201JBgxFIT3uus= X-MC-Unique: uS0DBJYyNX2FDM7u8_qYjA-1 X-Mimecast-MFC-AGG-ID: uS0DBJYyNX2FDM7u8_qYjA_1762223396 From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Stefan Hajnoczi , Kevin Wolf , eblake@redhat.com, Hanna Czenczek , qemu-block@nongnu.org, Paolo Bonzini , Fam Zheng , hibriansong@gmail.com Subject: [PATCH v6 07/15] aio: remove aio_context_use_g_source() Date: Mon, 3 Nov 2025 21:29:25 -0500 Message-ID: <20251104022933.618123-8-stefanha@redhat.com> In-Reply-To: <20251104022933.618123-1-stefanha@redhat.com> References: <20251104022933.618123-1-stefanha@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=stefanha@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762223446682158500 Content-Type: text/plain; charset="utf-8" There is no need for aio_context_use_g_source() now that epoll(7) and io_uring(7) file descriptor monitoring works with the glib event loop. AioContext doesn't need to be notified that GSource is being used. On hosts with io_uring support this now enables fdmon-io_uring.c by default, replacing fdmon-poll.c and fdmon-epoll.c. In other words, the event loop will use io_uring! Signed-off-by: Stefan Hajnoczi Reviewed-by: Eric Blake Reviewed-by: Kevin Wolf --- v5: - Mention in commit message that fdmon-io_uring.c is the new default [Kevin] --- include/block/aio.h | 3 --- util/aio-posix.c | 12 ------------ util/aio-win32.c | 4 ---- util/async.c | 1 - 4 files changed, 20 deletions(-) diff --git a/include/block/aio.h b/include/block/aio.h index 39ed86d14d..1657740a0e 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -728,9 +728,6 @@ void aio_context_setup(AioContext *ctx); */ void aio_context_destroy(AioContext *ctx); =20 -/* Used internally, do not call outside AioContext code */ -void aio_context_use_g_source(AioContext *ctx); - /** * aio_context_set_poll_params: * @ctx: the aio context diff --git a/util/aio-posix.c b/util/aio-posix.c index 9de05ee7e8..bebd9ce3a2 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -743,18 +743,6 @@ void aio_context_destroy(AioContext *ctx) aio_free_deleted_handlers(ctx); } =20 -void aio_context_use_g_source(AioContext *ctx) -{ - /* - * Disable io_uring when the glib main loop is used because it doesn't - * support mixed glib/aio_poll() usage. It relies on aio_poll() being - * called regularly so that changes to the monitored file descriptors = are - * submitted, otherwise a list of pending fd handlers builds up. - */ - fdmon_io_uring_destroy(ctx); - aio_free_deleted_handlers(ctx); -} - void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns, int64_t grow, int64_t shrink, Error **err= p) { diff --git a/util/aio-win32.c b/util/aio-win32.c index c6fbce64c2..18cc9fb7a9 100644 --- a/util/aio-win32.c +++ b/util/aio-win32.c @@ -427,10 +427,6 @@ void aio_context_destroy(AioContext *ctx) { } =20 -void aio_context_use_g_source(AioContext *ctx) -{ -} - void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns, int64_t grow, int64_t shrink, Error **err= p) { diff --git a/util/async.c b/util/async.c index a736d2cd0d..cb72ad3777 100644 --- a/util/async.c +++ b/util/async.c @@ -433,7 +433,6 @@ static GSourceFuncs aio_source_funcs =3D { =20 GSource *aio_get_g_source(AioContext *ctx) { - aio_context_use_g_source(ctx); g_source_ref(&ctx->source); return &ctx->source; } --=20 2.51.1 From nobody Fri Nov 14 16:55:24 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762223698; cv=none; d=zohomail.com; s=zohoarc; b=jpnfR2ggzJBVSEUVSZrfUA1UVJxDeg2XQE89E7lQHrOFSCJDEFH3CDZbCt3PehKm43KgHQR/RR2lsaCCPLVbNS6KuSYnHykQCXDjUIxla8xOfVH4wjD6AtHUm85tXM+ebCTtFR24TM3c382Qv8jt2XwhlfxNWTinRfbzC/Nt/a4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762223698; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=83mg90DAHJekDEGfk77SRVGe+JX1WWsWhalbVbY2mAg=; b=cJNu1DyClxIGn7J17cYyb7MQO4EvIUYS56e3TIVLDPwTqFujcLm2dhFnj/MoMkwblZ1r7VYMOpB1gjV9hoB803n2I9AZm4oAgoCtgorQZcuZakH1FxwdkcUb0vEJwae26RCbT7UScJz6xovxuXnOSNuD0iq6xYF2FgHpIaiVcY8= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762223698576497.9702518338004; Mon, 3 Nov 2025 18:34:58 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vG6oi-0007HQ-RR; Mon, 03 Nov 2025 21:30:40 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6oA-00072A-3Y for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:30:07 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6o5-0003x6-Pr for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:30:03 -0500 Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-523-cIqv3GeNMxKS0tpzOGu9fg-1; Mon, 03 Nov 2025 21:29:59 -0500 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8CDA81800654; Tue, 4 Nov 2025 02:29:58 +0000 (UTC) Received: from localhost (unknown [10.2.16.6]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 09C1B1956056; Tue, 4 Nov 2025 02:29:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762223401; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=83mg90DAHJekDEGfk77SRVGe+JX1WWsWhalbVbY2mAg=; b=VCTYDYjmpRmlm7HMrFOZSqYxLJkAFhvmdMbWQD8jCx16fw3bs8KIEQn/wJ7rzmmDHWeaVW nu+PJXgycztatTsQVoz3sceplu+KmAtdJRblIYWjcPQxJ6zejJDw4h6XFQ8vPl0MZxNcQb gNnHlYj93erPnkzv0MsHAe1z2pWx7Ls= X-MC-Unique: cIqv3GeNMxKS0tpzOGu9fg-1 X-Mimecast-MFC-AGG-ID: cIqv3GeNMxKS0tpzOGu9fg_1762223398 From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Stefan Hajnoczi , Kevin Wolf , eblake@redhat.com, Hanna Czenczek , qemu-block@nongnu.org, Paolo Bonzini , Fam Zheng , hibriansong@gmail.com Subject: [PATCH v6 08/15] aio: free AioContext when aio_context_new() fails Date: Mon, 3 Nov 2025 21:29:26 -0500 Message-ID: <20251104022933.618123-9-stefanha@redhat.com> In-Reply-To: <20251104022933.618123-1-stefanha@redhat.com> References: <20251104022933.618123-1-stefanha@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=stefanha@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762223700675154100 Content-Type: text/plain; charset="utf-8" g_source_destroy() only removes the GSource from the GMainContext it's attached to, if any. It does not free it. Use g_source_unref() instead so that the AioContext (which embeds a GSource) is freed. There is no need to call g_source_destroy() in aio_context_new() because the GSource isn't attached to a GMainContext yet. aio_ctx_finalize() expects everything to be set up already, so introduce the new ctx->initialized boolean and do nothing when called with !initialized. This also requires moving aio_context_setup() down after event_notifier_init() since aio_ctx_finalize() won't release any resources that aio_context_setup() acquired. Signed-off-by: Stefan Hajnoczi Reviewed-by: Eric Blake --- v5: - Added comments explaining how to clean up resources in error paths [Kevin] v2: - Fix spacing in aio_ctx_finalize() argument list [Eric] --- include/block/aio.h | 3 +++ util/async.c | 31 ++++++++++++++++++++++++++++--- 2 files changed, 31 insertions(+), 3 deletions(-) diff --git a/include/block/aio.h b/include/block/aio.h index 1657740a0e..2760f308f5 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -291,6 +291,9 @@ struct AioContext { gpointer epollfd_tag; =20 const FDMonOps *fdmon_ops; + + /* Was aio_context_new() successful? */ + bool initialized; }; =20 /** diff --git a/util/async.c b/util/async.c index cb72ad3777..7d06ff98f3 100644 --- a/util/async.c +++ b/util/async.c @@ -366,12 +366,16 @@ aio_ctx_dispatch(GSource *source, } =20 static void -aio_ctx_finalize(GSource *source) +aio_ctx_finalize(GSource *source) { AioContext *ctx =3D (AioContext *) source; QEMUBH *bh; unsigned flags; =20 + if (!ctx->initialized) { + return; + } + thread_pool_free_aio(ctx->thread_pool); =20 #ifdef CONFIG_LINUX_AIO @@ -579,16 +583,35 @@ AioContext *aio_context_new(Error **errp) int ret; AioContext *ctx; =20 + /* + * ctx is freed by g_source_unref() (e.g. aio_context_unref()). ctx's + * resources are freed as follows: + * + * 1. By aio_ctx_finalize() after aio_context_new() has returned and s= et + * ->initialized =3D true. + * + * 2. By manual cleanup code in this function's error paths before goto + * fail. + * + * Be careful to free resources in both cases! + */ ctx =3D (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioConte= xt)); QSLIST_INIT(&ctx->bh_list); QSIMPLEQ_INIT(&ctx->bh_slice_list); - aio_context_setup(ctx); =20 ret =3D event_notifier_init(&ctx->notifier, false); if (ret < 0) { error_setg_errno(errp, -ret, "Failed to initialize event notifier"= ); goto fail; } + + /* + * Resources cannot easily be freed manually after aio_context_setup()= . If + * you add any new resources to AioContext, it's probably best to acqu= ire + * them before aio_context_setup(). + */ + aio_context_setup(ctx); + g_source_set_can_recurse(&ctx->source, true); qemu_lockcnt_init(&ctx->list_lock); =20 @@ -622,9 +645,11 @@ AioContext *aio_context_new(Error **errp) =20 register_aiocontext(ctx); =20 + ctx->initialized =3D true; + return ctx; fail: - g_source_destroy(&ctx->source); + g_source_unref(&ctx->source); return NULL; } =20 --=20 2.51.1 From nobody Fri Nov 14 16:55:24 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762223552; cv=none; d=zohomail.com; s=zohoarc; b=Tt9ztxfiZbP3ENeXEiCkNeRbAsJ7oPV7LlZdlwYDVDMAxANmuasmIHk+L+T1Ba4qBUJNgMepaHxdO/2xNXdJvhmNZSnRQP/Pej/h+ccuTbh2yqulu/Qg8C8kn1fs3CDoFO4acsBeXZZ5oBbrdvuJsuKS9l1ngEqYkMjELSVxYQs= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762223552; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=NAhVKQ1PL/UKLr1rGfO+unRZBQ2Qq+WRBimQQp6L9do=; b=IBx3f/TvKzYsVQbD1hXLQbQwUvuGxbZy92t2eS8nvcYxOR/6ToOJc9G3GadrzcLVyZy4nCWJRUpD8R/pr5wMijQ8r93u/2iVYx3dQWKRgZXJ/g+d4+IqU4FoWMrAITXjQi+5Ab8IXLBbveZjgPYPtLbD0Ke86/Ocud5v5Pb7w8A= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762223552639444.0893800166881; Mon, 3 Nov 2025 18:32:32 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vG6oZ-0007BE-II; Mon, 03 Nov 2025 21:30:33 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6oC-00072x-Tw for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:30:09 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6o9-00049g-Vn for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:30:08 -0500 Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-642-G6Qo5USeOVuizIcq_xaaKg-1; Mon, 03 Nov 2025 21:30:02 -0500 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 00FF21800378; Tue, 4 Nov 2025 02:30:01 +0000 (UTC) Received: from localhost (unknown [10.2.16.6]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 92F161956056; Tue, 4 Nov 2025 02:30:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762223405; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NAhVKQ1PL/UKLr1rGfO+unRZBQ2Qq+WRBimQQp6L9do=; b=dCIAD7u0jUVKeWJ3r/WKghziYJaTAXlN2uCp1dVf6MPvq27FMBi2UISmaq/H9NvBK9RkQR +GJHI9ZhEneMjvUYdRdvD94qDIq/DWIqNv+YF/8rbsdUNZVToPFxEuWPZs6JAL8n/TFJ6J ckkyFEjAysXy0cF8986rEohRbH6CRB4= X-MC-Unique: G6Qo5USeOVuizIcq_xaaKg-1 X-Mimecast-MFC-AGG-ID: G6Qo5USeOVuizIcq_xaaKg_1762223401 From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Stefan Hajnoczi , Kevin Wolf , eblake@redhat.com, Hanna Czenczek , qemu-block@nongnu.org, Paolo Bonzini , Fam Zheng , hibriansong@gmail.com Subject: [PATCH v6 09/15] aio: add errp argument to aio_context_setup() Date: Mon, 3 Nov 2025 21:29:27 -0500 Message-ID: <20251104022933.618123-10-stefanha@redhat.com> In-Reply-To: <20251104022933.618123-1-stefanha@redhat.com> References: <20251104022933.618123-1-stefanha@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=stefanha@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762223553145158500 Content-Type: text/plain; charset="utf-8" When aio_context_new() -> aio_context_setup() fails at startup it doesn't really matter whether errors are returned to the caller or the process terminates immediately. However, it is not acceptable to terminate when hotplugging --object iothread at runtime. Refactor aio_context_setup() so that errors can be propagated. The next commit will set errp when fdmon_io_uring_setup() fails. Suggested-by: Kevin Wolf Signed-off-by: Stefan Hajnoczi Reviewed-by: Eric Blake Reviewed-by: Kevin Wolf --- v5: - Indicate error in return value from function with Error *errp arg [Kevin] --- include/block/aio.h | 5 ++++- util/aio-posix.c | 5 +++-- util/aio-win32.c | 3 ++- util/async.c | 6 +++++- 4 files changed, 14 insertions(+), 5 deletions(-) diff --git a/include/block/aio.h b/include/block/aio.h index 2760f308f5..9562733fa7 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -718,10 +718,13 @@ void qemu_set_current_aio_context(AioContext *ctx); /** * aio_context_setup: * @ctx: the aio context + * @errp: error pointer * * Initialize the aio context. + * + * Returns: true on success, false otherwise */ -void aio_context_setup(AioContext *ctx); +bool aio_context_setup(AioContext *ctx, Error **errp); =20 /** * aio_context_destroy: diff --git a/util/aio-posix.c b/util/aio-posix.c index bebd9ce3a2..9806a75c12 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -718,7 +718,7 @@ bool aio_poll(AioContext *ctx, bool blocking) return progress; } =20 -void aio_context_setup(AioContext *ctx) +bool aio_context_setup(AioContext *ctx, Error **errp) { ctx->fdmon_ops =3D &fdmon_poll_ops; ctx->epollfd =3D -1; @@ -726,10 +726,11 @@ void aio_context_setup(AioContext *ctx) =20 /* Use the fastest fd monitoring implementation if available */ if (fdmon_io_uring_setup(ctx)) { - return; + return true; } =20 fdmon_epoll_setup(ctx); + return true; } =20 void aio_context_destroy(AioContext *ctx) diff --git a/util/aio-win32.c b/util/aio-win32.c index 18cc9fb7a9..6e6f699e4b 100644 --- a/util/aio-win32.c +++ b/util/aio-win32.c @@ -419,8 +419,9 @@ bool aio_poll(AioContext *ctx, bool blocking) return progress; } =20 -void aio_context_setup(AioContext *ctx) +bool aio_context_setup(AioContext *ctx, Error **errp) { + return true; } =20 void aio_context_destroy(AioContext *ctx) diff --git a/util/async.c b/util/async.c index 7d06ff98f3..00e46b99f9 100644 --- a/util/async.c +++ b/util/async.c @@ -580,6 +580,7 @@ static void co_schedule_bh_cb(void *opaque) =20 AioContext *aio_context_new(Error **errp) { + ERRP_GUARD(); int ret; AioContext *ctx; =20 @@ -610,7 +611,10 @@ AioContext *aio_context_new(Error **errp) * you add any new resources to AioContext, it's probably best to acqu= ire * them before aio_context_setup(). */ - aio_context_setup(ctx); + if (!aio_context_setup(ctx, errp)) { + event_notifier_cleanup(&ctx->notifier); + goto fail; + } =20 g_source_set_can_recurse(&ctx->source, true); qemu_lockcnt_init(&ctx->list_lock); --=20 2.51.1 From nobody Fri Nov 14 16:55:24 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762223517; cv=none; d=zohomail.com; s=zohoarc; b=geCgpl76AGbJ5nNGkXlv+AYyd7tHRu27fcpvUttmTyX5CcvdXKmgF6HIsYY//+xilpIf/brLeO1S023nHRX9zxFjZsdNZK+eiZw88fM5Bl6foyckbKfXuduPpwAZRfagIU0fs/vo5YSrBJkcADYlkugtqRDjMkBGEdKtSIMyXTI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762223517; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=AwmgTSPPa1aT3tU4uAKxn1GGK2rmmhDBlB2T7FCydDE=; b=DdB3EFNpuTT85UGA46NQUakj3v7H0dEldWAafAuqmIWWuXpvW+be/R72STRE64TqAvsJxthcVeW+tlfevhjR7M6Go+xyuUtL5dR2dH4yquwQ4Ry4vEQi2leE/aWc+gGE9Pcieh8BLi9ad1gzPM/5YceKC37k7A4KphFdQQfYmEA= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762223517684174.7303836617681; Mon, 3 Nov 2025 18:31:57 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vG6oM-000791-CN; Mon, 03 Nov 2025 21:30:18 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6oG-00074Q-GI for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:30:13 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6oE-0004BJ-4t for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:30:11 -0500 Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-671-cJod-fAkM4y_Mm5kajhRCQ-1; Mon, 03 Nov 2025 21:30:04 -0500 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 5FC8B1956080; Tue, 4 Nov 2025 02:30:03 +0000 (UTC) Received: from localhost (unknown [10.2.16.6]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 9E7FD19560A2; Tue, 4 Nov 2025 02:30:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762223409; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AwmgTSPPa1aT3tU4uAKxn1GGK2rmmhDBlB2T7FCydDE=; b=ENvcaBTlzF7iuPPs5EtHctJuV2bg+B+lmf/0kWk9rqn7sETQGVhWKahGwSJZ9wyDZOfcLs 5yA4Z5zibrVLojT2ERKCmpIOfJI6fz2XwhXWhcYJbpyadBLoxnRS6nJmv855JM0CpqZP8c cKO+ZQERdVvVa2E1vMAXGRcZjX6/9Rg= X-MC-Unique: cJod-fAkM4y_Mm5kajhRCQ-1 X-Mimecast-MFC-AGG-ID: cJod-fAkM4y_Mm5kajhRCQ_1762223403 From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Stefan Hajnoczi , Kevin Wolf , eblake@redhat.com, Hanna Czenczek , qemu-block@nongnu.org, Paolo Bonzini , Fam Zheng , hibriansong@gmail.com Subject: [PATCH v6 10/15] aio-posix: gracefully handle io_uring_queue_init() failure Date: Mon, 3 Nov 2025 21:29:28 -0500 Message-ID: <20251104022933.618123-11-stefanha@redhat.com> In-Reply-To: <20251104022933.618123-1-stefanha@redhat.com> References: <20251104022933.618123-1-stefanha@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=stefanha@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762223527063158500 Content-Type: text/plain; charset="utf-8" io_uring may not be available at runtime due to system policies (e.g. the io_uring_disabled sysctl) or creation could fail due to file descriptor resource limits. Handle failure scenarios as follows: If another AioContext already has io_uring, then fail AioContext creation so that the aio_add_sqe() API is available uniformly from all QEMU threads. Otherwise fall back to epoll(7) if io_uring is unavailable. Notes: - Update the comment about selecting the fastest fdmon implementation. At this point it's not about speed anymore, it's about aio_add_sqe() API availability. - Uppercase the error message when converting from error_report() to error_setg_errno() for consistency (but there are instances of lowercase in the codebase). - It's easier to move the #ifdefs from aio-posix.h to aio-posix.c. Signed-off-by: Stefan Hajnoczi Reviewed-by: Eric Blake Reviewed-by: Kevin Wolf --- v5: - Indicate error in return value from function with Error *errp arg [Kevin] --- util/aio-posix.h | 12 ++---------- util/aio-posix.c | 28 +++++++++++++++++++++++++--- util/fdmon-io_uring.c | 5 +++-- 3 files changed, 30 insertions(+), 15 deletions(-) diff --git a/util/aio-posix.h b/util/aio-posix.h index f9994ed79e..dfa1a51c0b 100644 --- a/util/aio-posix.h +++ b/util/aio-posix.h @@ -18,6 +18,7 @@ #define AIO_POSIX_H =20 #include "block/aio.h" +#include "qapi/error.h" =20 struct AioHandler { GPollFD pfd; @@ -72,17 +73,8 @@ static inline void fdmon_epoll_disable(AioContext *ctx) #endif /* !CONFIG_EPOLL_CREATE1 */ =20 #ifdef CONFIG_LINUX_IO_URING -bool fdmon_io_uring_setup(AioContext *ctx); +bool fdmon_io_uring_setup(AioContext *ctx, Error **errp); void fdmon_io_uring_destroy(AioContext *ctx); -#else -static inline bool fdmon_io_uring_setup(AioContext *ctx) -{ - return false; -} - -static inline void fdmon_io_uring_destroy(AioContext *ctx) -{ -} #endif /* !CONFIG_LINUX_IO_URING */ =20 #endif /* AIO_POSIX_H */ diff --git a/util/aio-posix.c b/util/aio-posix.c index 9806a75c12..c0285a26a3 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -16,6 +16,7 @@ #include "qemu/osdep.h" #include "block/block.h" #include "block/thread-pool.h" +#include "qapi/error.h" #include "qemu/main-loop.h" #include "qemu/lockcnt.h" #include "qemu/rcu.h" @@ -724,10 +725,29 @@ bool aio_context_setup(AioContext *ctx, Error **errp) ctx->epollfd =3D -1; ctx->epollfd_tag =3D NULL; =20 - /* Use the fastest fd monitoring implementation if available */ - if (fdmon_io_uring_setup(ctx)) { - return true; +#ifdef CONFIG_LINUX_IO_URING + { + static bool need_io_uring; + Error *local_err =3D NULL; /* ERRP_GUARD() doesn't handle error_ab= ort */ + + /* io_uring takes precedence because it provides aio_add_sqe() sup= port */ + if (fdmon_io_uring_setup(ctx, &local_err)) { + /* + * If one AioContext gets io_uring, then all AioContexts need = io_uring + * so that aio_add_sqe() support is available across all threa= ds. + */ + need_io_uring =3D true; + return true; + } + if (need_io_uring) { + error_propagate(errp, local_err); + return false; + } + + /* Silently fall back on systems where io_uring is unavailable */ + error_free(local_err); } +#endif /* CONFIG_LINUX_IO_URING */ =20 fdmon_epoll_setup(ctx); return true; @@ -735,7 +755,9 @@ bool aio_context_setup(AioContext *ctx, Error **errp) =20 void aio_context_destroy(AioContext *ctx) { +#ifdef CONFIG_LINUX_IO_URING fdmon_io_uring_destroy(ctx); +#endif =20 qemu_lockcnt_lock(&ctx->list_lock); fdmon_epoll_disable(ctx); diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c index 0a5ec5ead6..9f25d6d6db 100644 --- a/util/fdmon-io_uring.c +++ b/util/fdmon-io_uring.c @@ -45,6 +45,7 @@ =20 #include "qemu/osdep.h" #include +#include "qapi/error.h" #include "qemu/rcu_queue.h" #include "aio-posix.h" =20 @@ -373,7 +374,7 @@ static const FDMonOps fdmon_io_uring_ops =3D { .gsource_dispatch =3D fdmon_io_uring_gsource_dispatch, }; =20 -bool fdmon_io_uring_setup(AioContext *ctx) +bool fdmon_io_uring_setup(AioContext *ctx, Error **errp) { int ret; =20 @@ -381,6 +382,7 @@ bool fdmon_io_uring_setup(AioContext *ctx) =20 ret =3D io_uring_queue_init(FDMON_IO_URING_ENTRIES, &ctx->fdmon_io_uri= ng, 0); if (ret !=3D 0) { + error_setg_errno(errp, -ret, "Failed to initialize io_uring"); return false; } =20 @@ -388,7 +390,6 @@ bool fdmon_io_uring_setup(AioContext *ctx) ctx->fdmon_ops =3D &fdmon_io_uring_ops; ctx->io_uring_fd_tag =3D g_source_add_unix_fd(&ctx->source, ctx->fdmon_io_uring.ring_fd, G_IO_IN); - return true; } =20 --=20 2.51.1 From nobody Fri Nov 14 16:55:24 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762223499; cv=none; d=zohomail.com; s=zohoarc; b=Q9O1hCD0Pokn3EPW3JJv+1SAZQJdsa+F8mVcB7k+U/aFWmEV+BTBplAgwNBn3sc/HXCXDAm7CXW/nQcSvDqXXgq2F8CHAoGqYaSxRIGK4QTQkLi3SyDCkh2yqVCuvdg9g5PA1j2TzfPUzztNYfFpQ6igGy/iNIxFzgowJGnIScY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762223499; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=J21vM0J2NKXZ9mcytYzZdrXeR0GXtkJ/tCwjYIPg+gc=; b=EB5Uu/q1auk9irNhrxwsnvMlETswZgOqvvvzoGUfFLmdLolny+PyfzcpnVUg5YiaIuOLEYwMwV1YL9GpJCq10YqKskbUvom8rzsg7CweFh1nIjh/CTsRnmh0GdHmMNq5KV7miltpcE5cr4QbSJnLjSv4j83bOM6vn6Xa9IR/mKU= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762223499297725.7050478991092; Mon, 3 Nov 2025 18:31:39 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vG6pB-0007dp-96; Mon, 03 Nov 2025 21:31:11 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6oG-00074R-HL for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:30:13 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6oE-0004BY-TT for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:30:12 -0500 Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-17-QKMbSYCoMR6y7p-kfvWUZQ-1; Mon, 03 Nov 2025 21:30:06 -0500 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id D1CAF1956059; Tue, 4 Nov 2025 02:30:05 +0000 (UTC) Received: from localhost (unknown [10.2.16.6]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 19E4B180045B; Tue, 4 Nov 2025 02:30:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762223410; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=J21vM0J2NKXZ9mcytYzZdrXeR0GXtkJ/tCwjYIPg+gc=; b=F14XVIebo0DgME19iiouKI5p83VZOzk4l1jLh/zuEKZk0XYziJ3JnSogE4ZVVA/AaSHcVj KlXTcxCeKIo70kx8DHXQ3veR1DZsScqZf4AM3uydiJpqIho6qYSs8KX0Qzt6o/j9zIq77W m+y/pVSyNaIA5PpThQIlKFvkaPkpvtM= X-MC-Unique: QKMbSYCoMR6y7p-kfvWUZQ-1 X-Mimecast-MFC-AGG-ID: QKMbSYCoMR6y7p-kfvWUZQ_1762223405 From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Stefan Hajnoczi , Kevin Wolf , eblake@redhat.com, Hanna Czenczek , qemu-block@nongnu.org, Paolo Bonzini , Fam Zheng , hibriansong@gmail.com Subject: [PATCH v6 11/15] aio-posix: unindent fdmon_io_uring_destroy() Date: Mon, 3 Nov 2025 21:29:29 -0500 Message-ID: <20251104022933.618123-12-stefanha@redhat.com> In-Reply-To: <20251104022933.618123-1-stefanha@redhat.com> References: <20251104022933.618123-1-stefanha@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=stefanha@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762223501653154100 Content-Type: text/plain; charset="utf-8" Reduce the level of indentation to make further code changes easier to read. Signed-off-by: Stefan Hajnoczi --- v5: - Add patch to unindent fdmon_io_uring_destroy() [Kevin] --- util/fdmon-io_uring.c | 46 ++++++++++++++++++++++--------------------- 1 file changed, 24 insertions(+), 22 deletions(-) diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c index 9f25d6d6db..a06bbe2715 100644 --- a/util/fdmon-io_uring.c +++ b/util/fdmon-io_uring.c @@ -395,33 +395,35 @@ bool fdmon_io_uring_setup(AioContext *ctx, Error **er= rp) =20 void fdmon_io_uring_destroy(AioContext *ctx) { - if (ctx->fdmon_ops =3D=3D &fdmon_io_uring_ops) { - AioHandler *node; + AioHandler *node; =20 - io_uring_queue_exit(&ctx->fdmon_io_uring); + if (ctx->fdmon_ops !=3D &fdmon_io_uring_ops) { + return; + } =20 - /* Move handlers due to be removed onto the deleted list */ - while ((node =3D QSLIST_FIRST_RCU(&ctx->submit_list))) { - unsigned flags =3D qatomic_fetch_and(&node->flags, - ~(FDMON_IO_URING_PENDING | - FDMON_IO_URING_ADD | - FDMON_IO_URING_REMOVE | - FDMON_IO_URING_DELETE_AIO_HANDLER)); + io_uring_queue_exit(&ctx->fdmon_io_uring); =20 - if ((flags & FDMON_IO_URING_REMOVE) || - (flags & FDMON_IO_URING_DELETE_AIO_HANDLER)) { - QLIST_INSERT_HEAD_RCU(&ctx->deleted_aio_handlers, - node, node_deleted); - } + /* Move handlers due to be removed onto the deleted list */ + while ((node =3D QSLIST_FIRST_RCU(&ctx->submit_list))) { + unsigned flags =3D qatomic_fetch_and(&node->flags, + ~(FDMON_IO_URING_PENDING | + FDMON_IO_URING_ADD | + FDMON_IO_URING_REMOVE | + FDMON_IO_URING_DELETE_AIO_HANDLER)); =20 - QSLIST_REMOVE_HEAD_RCU(&ctx->submit_list, node_submitted); + if ((flags & FDMON_IO_URING_REMOVE) || + (flags & FDMON_IO_URING_DELETE_AIO_HANDLER)) { + QLIST_INSERT_HEAD_RCU(&ctx->deleted_aio_handlers, + node, node_deleted); } =20 - g_source_remove_unix_fd(&ctx->source, ctx->io_uring_fd_tag); - ctx->io_uring_fd_tag =3D NULL; - - qemu_lockcnt_lock(&ctx->list_lock); - fdmon_poll_downgrade(ctx); - qemu_lockcnt_unlock(&ctx->list_lock); + QSLIST_REMOVE_HEAD_RCU(&ctx->submit_list, node_submitted); } + + g_source_remove_unix_fd(&ctx->source, ctx->io_uring_fd_tag); + ctx->io_uring_fd_tag =3D NULL; + + qemu_lockcnt_lock(&ctx->list_lock); + fdmon_poll_downgrade(ctx); + qemu_lockcnt_unlock(&ctx->list_lock); } --=20 2.51.1 From nobody Fri Nov 14 16:55:24 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762223512; cv=none; d=zohomail.com; s=zohoarc; b=MvlFeVu/c0U4xZlC/h41Xrgh4SCqcM574p0GFXdc8jAbLo+o20rji4kcCLd4rNc4e9eAPevi0xc3sAkGs3ruwdMh7AuumSljDiNTyPMfMl4tVhoNC179t7kruhIu3p27mBsiKQng3zNYh+siSa5a7WhlQ7Jh/eGxo8S2mi4D0R0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762223512; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=Y2JCHupFaVO7tOSin3L/1pVR3971dfXU4ZutFj6ncWg=; b=l2xmtiuN9E0Xot+6xrofpRpxaTqdlCBV0jX/G2TVPbuCbxPPdNY3w7WUKzL2JGaptMpMahID0BDf4aNH42k3qBK87dy9pAcAzCoe/XSaow/C+/XHxiqjDy0QTkjRs7t7VfChDoBER31Apmvt0SbsnTIFUQPGSQVwV9jUwQ7/QOU= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762223512347808.7691429987121; Mon, 3 Nov 2025 18:31:52 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vG6p1-0007NW-CH; Mon, 03 Nov 2025 21:31:05 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6oJ-00079H-S1 for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:30:18 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6oH-0004CS-Te for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:30:15 -0500 Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-78-WRKcxxu5N9SFQo3XGwksEg-1; Mon, 03 Nov 2025 21:30:09 -0500 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 1B4571808981; Tue, 4 Nov 2025 02:30:08 +0000 (UTC) Received: from localhost (unknown [10.2.16.6]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 586A430001A1; Tue, 4 Nov 2025 02:30:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762223413; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Y2JCHupFaVO7tOSin3L/1pVR3971dfXU4ZutFj6ncWg=; b=Jim/qnJOmcUCeBeEk5jIb+Y+xDRocGUg4UiDrZgp3FeKpUEG6sz/q0liAGl1M2LT3UTBeE etr5dMhsO707HggDOOzmev6i/1s8uVjTeWSZaaI1nk6X0WMr65tD3oWt5fOVpG4ZH+CU1R GMnBKRPbTa6oSgZA7TTkZvh0EqqyMUc= X-MC-Unique: WRKcxxu5N9SFQo3XGwksEg-1 X-Mimecast-MFC-AGG-ID: WRKcxxu5N9SFQo3XGwksEg_1762223408 From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Stefan Hajnoczi , Kevin Wolf , eblake@redhat.com, Hanna Czenczek , qemu-block@nongnu.org, Paolo Bonzini , Fam Zheng , hibriansong@gmail.com Subject: [PATCH v6 12/15] aio-posix: add fdmon_ops->dispatch() Date: Mon, 3 Nov 2025 21:29:30 -0500 Message-ID: <20251104022933.618123-13-stefanha@redhat.com> In-Reply-To: <20251104022933.618123-1-stefanha@redhat.com> References: <20251104022933.618123-1-stefanha@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=stefanha@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762223527033158500 Content-Type: text/plain; charset="utf-8" The ppoll and epoll file descriptor monitoring implementations rely on the event loop's generic file descriptor, timer, and BH dispatch code to invoke user callbacks. The io_uring file descriptor monitoring implementation will need io_uring-specific dispatch logic for CQE handlers for custom SQEs. Introduce a new FDMonOps ->dispatch() callback that allows file descriptor monitoring implementations to invoke user callbacks. The next patch will use this new callback. Signed-off-by: Stefan Hajnoczi --- v5: - Add patch to introduce FDMonOps->dispatch() callback [Kevin] --- include/block/aio.h | 19 +++++++++++++++++++ util/aio-posix.c | 9 +++++++++ 2 files changed, 28 insertions(+) diff --git a/include/block/aio.h b/include/block/aio.h index 9562733fa7..b266daa58f 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -107,6 +107,25 @@ typedef struct { */ bool (*need_wait)(AioContext *ctx); =20 + /* + * dispatch: + * @ctx: the AioContext + * + * Dispatch any work that is specific to this file descriptor monitori= ng + * implementation. Usually the event loop's generic file descriptor + * monitoring, BH, and timer dispatching code is sufficient, but file + * descriptor monitoring implementations offering additional functiona= lity + * may need to implement this function for custom behavior. Called at a + * point in the event loop when it is safe to invoke user-defined + * callbacks. + * + * This function is optional and may be NULL. + * + * Returns: true if progress was made (see aio_poll()'s return value), + * false otherwise. + */ + bool (*dispatch)(AioContext *ctx); + /* * gsource_prepare: * @ctx: the AioContext diff --git a/util/aio-posix.c b/util/aio-posix.c index c0285a26a3..6ff36b6e51 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -385,10 +385,15 @@ void aio_dispatch(AioContext *ctx) AioHandlerList ready_list =3D QLIST_HEAD_INITIALIZER(ready_list); =20 qemu_lockcnt_inc(&ctx->list_lock); + aio_bh_poll(ctx); =20 ctx->fdmon_ops->gsource_dispatch(ctx, &ready_list); =20 + if (ctx->fdmon_ops->dispatch) { + ctx->fdmon_ops->dispatch(ctx); + } + /* block_ns is 0 because polling is disabled in the glib event loop */ aio_dispatch_ready_handlers(ctx, &ready_list, 0); =20 @@ -707,6 +712,10 @@ bool aio_poll(AioContext *ctx, bool blocking) block_ns =3D qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start; } =20 + if (ctx->fdmon_ops->dispatch) { + progress |=3D ctx->fdmon_ops->dispatch(ctx); + } + progress |=3D aio_bh_poll(ctx); progress |=3D aio_dispatch_ready_handlers(ctx, &ready_list, block_ns); =20 --=20 2.51.1 From nobody Fri Nov 14 16:55:24 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762223674; cv=none; d=zohomail.com; s=zohoarc; b=aT3c9wWRi4kuwIRUj4cuPMfT0mhMZObcKNlz67Aj1/XdCNZoFghZamxJcVfPjfAGpoiRGvTVhpAFuN2RtTJCAFI9OB8UQ1j1SoqaG0NWy/nWJ9AoELPQzBwPNAVHHb6EJ1Z/WtdJMCN0S/NXIvXVOcCa2DI+8ffMUHuALs642ZQ= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762223674; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=LUvAPVpBkMWS7uz/pr00E0CD5XkNIotOQvCIiZlq74U=; b=PNW+bpnKmIOuRfhIj4Ary1W1QQbjjjhuF0kMo6D/qiaAcR718cPOxXp5Xs5aIUksiLnC4Bp389LUa5yBwtNESv+ZX3r8ljPWtFQFfcd2OAs47myAsYEWrquvgpfe3waH1MKtSzci7pb0I/hy4P34a5FTgE1lbrsYZmOf+bvOBfI= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762223674405666.9076754476661; Mon, 3 Nov 2025 18:34:34 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vG6ov-0007In-KM; Mon, 03 Nov 2025 21:30:59 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6oM-00079m-Fv for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:30:18 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6oJ-0004Cj-1L for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:30:17 -0500 Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-16-uzYWN9dJM_ijvGQu6G1nfA-1; Mon, 03 Nov 2025 21:30:11 -0500 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0C30618001E8; Tue, 4 Nov 2025 02:30:10 +0000 (UTC) Received: from localhost (unknown [10.2.16.6]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 8A0C019560B2; Tue, 4 Nov 2025 02:30:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762223414; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LUvAPVpBkMWS7uz/pr00E0CD5XkNIotOQvCIiZlq74U=; b=Wr2YUD1SJ5n0C7Epr5ytsHN6Li0fzLYlcGcRp+NYOAPcH+668pdtnRopSNkHp6+Ck95TfL /OXUbf2huMUVXSsJAaqAQUQOfDnxqQ4LSl4FkFdX9w6i/9nZgra8xhEC3ZDWYjHjARGD+x jBPed6yQTiMj3/I9VlbDZL64Uuu6k0I= X-MC-Unique: uzYWN9dJM_ijvGQu6G1nfA-1 X-Mimecast-MFC-AGG-ID: uzYWN9dJM_ijvGQu6G1nfA_1762223410 From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Stefan Hajnoczi , Kevin Wolf , eblake@redhat.com, Hanna Czenczek , qemu-block@nongnu.org, Paolo Bonzini , Fam Zheng , hibriansong@gmail.com Subject: [PATCH v6 13/15] aio-posix: add aio_add_sqe() API for user-defined io_uring requests Date: Mon, 3 Nov 2025 21:29:31 -0500 Message-ID: <20251104022933.618123-14-stefanha@redhat.com> In-Reply-To: <20251104022933.618123-1-stefanha@redhat.com> References: <20251104022933.618123-1-stefanha@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=stefanha@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762223676532154100 Content-Type: text/plain; charset="utf-8" Introduce the aio_add_sqe() API for submitting io_uring requests in the current AioContext. This allows other components in QEMU, like the block layer, to take advantage of io_uring features without creating their own io_uring context. This API supports nested event loops just like file descriptor monitoring and BHs do. This comes at a complexity cost: CQE callbacks must be placed on a list so that nested event loops can invoke pending CQE callbacks from parent event loops. If you're wondering why CqeHandler exists instead of just a callback function pointer, this is why. Signed-off-by: Stefan Hajnoczi Reviewed-by: Eric Blake --- v5: - Replace cqe_handler_bh with FDMonOps->dispatch() [Kevin] - Rename AioHandler->cqe_handler field to ->internal_cqe_handler [Kevin] - Consolidate fdmon-io_uring.c trace-events changes into this commit v2: - Fix pre_sqe -> prep_sqe typo [Eric] - Add #endif terminator comment [Eric] --- include/block/aio.h | 83 +++++++++++++++++++++++++++++++- util/aio-posix.h | 1 + util/aio-posix.c | 9 ++++ util/fdmon-io_uring.c | 109 ++++++++++++++++++++++++++++++++++++------ util/trace-events | 4 ++ 5 files changed, 190 insertions(+), 16 deletions(-) diff --git a/include/block/aio.h b/include/block/aio.h index b266daa58f..05d1bf4036 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -61,6 +61,27 @@ typedef struct LuringState LuringState; /* Is polling disabled? */ bool aio_poll_disabled(AioContext *ctx); =20 +#ifdef CONFIG_LINUX_IO_URING +/* + * Each io_uring request must have a unique CqeHandler that processes the = cqe. + * The lifetime of a CqeHandler must be at least from aio_add_sqe() until + * ->cb() invocation. + */ +typedef struct CqeHandler CqeHandler; +struct CqeHandler { + /* Called by the AioContext when the request has completed */ + void (*cb)(CqeHandler *handler); + + /* Used internally, do not access this */ + QSIMPLEQ_ENTRY(CqeHandler) next; + + /* This field is filled in before ->cb() is called */ + struct io_uring_cqe cqe; +}; + +typedef QSIMPLEQ_HEAD(, CqeHandler) CqeHandlerSimpleQ; +#endif /* CONFIG_LINUX_IO_URING */ + /* Callbacks for file descriptor monitoring implementations */ typedef struct { /* @@ -157,6 +178,27 @@ typedef struct { * Called with list_lock incremented. */ void (*gsource_dispatch)(AioContext *ctx, AioHandlerList *ready_list); + +#ifdef CONFIG_LINUX_IO_URING + /** + * add_sqe: Add an io_uring sqe for submission. + * @prep_sqe: invoked with an sqe that should be prepared for submissi= on + * @opaque: user-defined argument to @prep_sqe() + * @cqe_handler: the unique cqe handler associated with this request + * + * The caller's @prep_sqe() function is invoked to fill in the details= of + * the sqe. Do not call io_uring_sqe_set_data() on this sqe. + * + * The kernel may see the sqe as soon as @prep_sqe() returns or it may= take + * until the next event loop iteration. + * + * This function is called from the current AioContext and is not + * thread-safe. + */ + void (*add_sqe)(AioContext *ctx, + void (*prep_sqe)(struct io_uring_sqe *sqe, void *opaqu= e), + void *opaque, CqeHandler *cqe_handler); +#endif /* CONFIG_LINUX_IO_URING */ } FDMonOps; =20 /* @@ -274,7 +316,10 @@ struct AioContext { struct io_uring fdmon_io_uring; AioHandlerSList submit_list; gpointer io_uring_fd_tag; -#endif + + /* Pending callback state for cqe handlers */ + CqeHandlerSimpleQ cqe_handler_ready_list; +#endif /* CONFIG_LINUX_IO_URING */ =20 /* TimerLists for calling timers - one per clock type. Has its own * locking. @@ -782,4 +827,40 @@ void aio_context_set_aio_params(AioContext *ctx, int64= _t max_batch); */ void aio_context_set_thread_pool_params(AioContext *ctx, int64_t min, int64_t max, Error **errp); + +#ifdef CONFIG_LINUX_IO_URING +/** + * aio_has_io_uring: Return whether io_uring is available. + * + * io_uring is either available in all AioContexts or in none, so this only + * needs to be called once from within any thread's AioContext. + */ +static inline bool aio_has_io_uring(void) +{ + AioContext *ctx =3D qemu_get_current_aio_context(); + return ctx->fdmon_ops->add_sqe; +} + +/** + * aio_add_sqe: Add an io_uring sqe for submission. + * @prep_sqe: invoked with an sqe that should be prepared for submission + * @opaque: user-defined argument to @prep_sqe() + * @cqe_handler: the unique cqe handler associated with this request + * + * The caller's @prep_sqe() function is invoked to fill in the details of = the + * sqe. Do not call io_uring_sqe_set_data() on this sqe. + * + * The sqe is submitted by the current AioContext. The kernel may see the = sqe + * as soon as @prep_sqe() returns or it may take until the next event loop + * iteration. + * + * When the AioContext is destroyed, pending sqes are ignored and their + * CqeHandlers are not invoked. + * + * This function must be called only when aio_has_io_uring() returns true. + */ +void aio_add_sqe(void (*prep_sqe)(struct io_uring_sqe *sqe, void *opaque), + void *opaque, CqeHandler *cqe_handler); +#endif /* CONFIG_LINUX_IO_URING */ + #endif diff --git a/util/aio-posix.h b/util/aio-posix.h index dfa1a51c0b..babbfa8314 100644 --- a/util/aio-posix.h +++ b/util/aio-posix.h @@ -36,6 +36,7 @@ struct AioHandler { #ifdef CONFIG_LINUX_IO_URING QSLIST_ENTRY(AioHandler) node_submitted; unsigned flags; /* see fdmon-io_uring.c */ + CqeHandler internal_cqe_handler; /* used for POLL_ADD/POLL_REMOVE */ #endif int64_t poll_idle_timeout; /* when to stop userspace polling */ bool poll_ready; /* has polling detected an event? */ diff --git a/util/aio-posix.c b/util/aio-posix.c index 6ff36b6e51..e24b955fd9 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -806,3 +806,12 @@ void aio_context_set_aio_params(AioContext *ctx, int64= _t max_batch) =20 aio_notify(ctx); } + +#ifdef CONFIG_LINUX_IO_URING +void aio_add_sqe(void (*prep_sqe)(struct io_uring_sqe *sqe, void *opaque), + void *opaque, CqeHandler *cqe_handler) +{ + AioContext *ctx =3D qemu_get_current_aio_context(); + ctx->fdmon_ops->add_sqe(ctx, prep_sqe, opaque, cqe_handler); +} +#endif /* CONFIG_LINUX_IO_URING */ diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c index a06bbe2715..4230bf33e3 100644 --- a/util/fdmon-io_uring.c +++ b/util/fdmon-io_uring.c @@ -46,8 +46,10 @@ #include "qemu/osdep.h" #include #include "qapi/error.h" +#include "qemu/defer-call.h" #include "qemu/rcu_queue.h" #include "aio-posix.h" +#include "trace.h" =20 enum { FDMON_IO_URING_ENTRIES =3D 128, /* sq/cq ring size */ @@ -76,8 +78,8 @@ static inline int pfd_events_from_poll(int poll_events) } =20 /* - * Returns an sqe for submitting a request. Only be called within - * fdmon_io_uring_wait(). + * Returns an sqe for submitting a request. Only called from the AioContext + * thread. */ static struct io_uring_sqe *get_sqe(AioContext *ctx) { @@ -168,23 +170,46 @@ static void fdmon_io_uring_update(AioContext *ctx, } } =20 +static void fdmon_io_uring_add_sqe(AioContext *ctx, + void (*prep_sqe)(struct io_uring_sqe *sqe, void *opaque), + void *opaque, CqeHandler *cqe_handler) +{ + struct io_uring_sqe *sqe =3D get_sqe(ctx); + + prep_sqe(sqe, opaque); + io_uring_sqe_set_data(sqe, cqe_handler); + + trace_fdmon_io_uring_add_sqe(ctx, opaque, sqe->opcode, sqe->fd, sqe->o= ff, + cqe_handler); +} + +static void fdmon_special_cqe_handler(CqeHandler *cqe_handler) +{ + /* + * This is an empty function that is never called. It is used as a fun= ction + * pointer to distinguish it from ordinary cqe handlers. + */ +} + static void add_poll_add_sqe(AioContext *ctx, AioHandler *node) { struct io_uring_sqe *sqe =3D get_sqe(ctx); int events =3D poll_events_from_pfd(node->pfd.events); =20 io_uring_prep_poll_add(sqe, node->pfd.fd, events); - io_uring_sqe_set_data(sqe, node); + node->internal_cqe_handler.cb =3D fdmon_special_cqe_handler; + io_uring_sqe_set_data(sqe, &node->internal_cqe_handler); } =20 static void add_poll_remove_sqe(AioContext *ctx, AioHandler *node) { struct io_uring_sqe *sqe =3D get_sqe(ctx); + CqeHandler *cqe_handler =3D &node->internal_cqe_handler; =20 #ifdef LIBURING_HAVE_DATA64 - io_uring_prep_poll_remove(sqe, (uintptr_t)node); + io_uring_prep_poll_remove(sqe, (uintptr_t)cqe_handler); #else - io_uring_prep_poll_remove(sqe, node); + io_uring_prep_poll_remove(sqe, cqe_handler); #endif io_uring_sqe_set_data(sqe, NULL); } @@ -219,19 +244,13 @@ static void fill_sq_ring(AioContext *ctx) } } =20 -/* Returns true if a handler became ready */ -static bool process_cqe(AioContext *ctx, - AioHandlerList *ready_list, - struct io_uring_cqe *cqe) +static bool process_cqe_aio_handler(AioContext *ctx, + AioHandlerList *ready_list, + AioHandler *node, + struct io_uring_cqe *cqe) { - AioHandler *node =3D io_uring_cqe_get_data(cqe); unsigned flags; =20 - /* poll_timeout and poll_remove have a zero user_data field */ - if (!node) { - return false; - } - /* * Deletion can only happen when IORING_OP_POLL_ADD completes. If we = race * with enqueue() here then we can safely clear the FDMON_IO_URING_REM= OVE @@ -255,6 +274,35 @@ static bool process_cqe(AioContext *ctx, return true; } =20 +/* Returns true if a handler became ready */ +static bool process_cqe(AioContext *ctx, + AioHandlerList *ready_list, + struct io_uring_cqe *cqe) +{ + CqeHandler *cqe_handler =3D io_uring_cqe_get_data(cqe); + + /* poll_timeout and poll_remove have a zero user_data field */ + if (!cqe_handler) { + return false; + } + + /* + * Special handling for AioHandler cqes. They need ready_list and have= a + * return value. + */ + if (cqe_handler->cb =3D=3D fdmon_special_cqe_handler) { + AioHandler *node =3D container_of(cqe_handler, AioHandler, + internal_cqe_handler); + return process_cqe_aio_handler(ctx, ready_list, node, cqe); + } + + cqe_handler->cqe =3D *cqe; + + /* Handlers are invoked later by fdmon_io_uring_dispatch() */ + QSIMPLEQ_INSERT_TAIL(&ctx->cqe_handler_ready_list, cqe_handler, next); + return false; +} + static int process_cq_ring(AioContext *ctx, AioHandlerList *ready_list) { struct io_uring *ring =3D &ctx->fdmon_io_uring; @@ -297,6 +345,32 @@ static bool fdmon_io_uring_gsource_check(AioContext *c= tx) return g_source_query_unix_fd(&ctx->source, tag) & G_IO_IN; } =20 +/* Dispatch CQE handlers that are ready */ +static bool fdmon_io_uring_dispatch(AioContext *ctx) +{ + CqeHandlerSimpleQ *ready_list =3D &ctx->cqe_handler_ready_list; + bool progress =3D false; + + /* Handlers may use defer_call() to coalesce frequent operations */ + defer_call_begin(); + + while (!QSIMPLEQ_EMPTY(ready_list)) { + CqeHandler *cqe_handler =3D QSIMPLEQ_FIRST(ready_list); + + QSIMPLEQ_REMOVE_HEAD(ready_list, next); + + trace_fdmon_io_uring_cqe_handler(ctx, cqe_handler, + cqe_handler->cqe.res); + cqe_handler->cb(cqe_handler); + progress =3D true; + } + + defer_call_end(); + + return progress; +} + + /* This is where CQEs are processed in the glib event loop */ static void fdmon_io_uring_gsource_dispatch(AioContext *ctx, AioHandlerList *ready_list) @@ -369,9 +443,11 @@ static const FDMonOps fdmon_io_uring_ops =3D { .update =3D fdmon_io_uring_update, .wait =3D fdmon_io_uring_wait, .need_wait =3D fdmon_io_uring_need_wait, + .dispatch =3D fdmon_io_uring_dispatch, .gsource_prepare =3D fdmon_io_uring_gsource_prepare, .gsource_check =3D fdmon_io_uring_gsource_check, .gsource_dispatch =3D fdmon_io_uring_gsource_dispatch, + .add_sqe =3D fdmon_io_uring_add_sqe, }; =20 bool fdmon_io_uring_setup(AioContext *ctx, Error **errp) @@ -387,6 +463,7 @@ bool fdmon_io_uring_setup(AioContext *ctx, Error **errp) } =20 QSLIST_INIT(&ctx->submit_list); + QSIMPLEQ_INIT(&ctx->cqe_handler_ready_list); ctx->fdmon_ops =3D &fdmon_io_uring_ops; ctx->io_uring_fd_tag =3D g_source_add_unix_fd(&ctx->source, ctx->fdmon_io_uring.ring_fd, G_IO_IN); @@ -423,6 +500,8 @@ void fdmon_io_uring_destroy(AioContext *ctx) g_source_remove_unix_fd(&ctx->source, ctx->io_uring_fd_tag); ctx->io_uring_fd_tag =3D NULL; =20 + assert(QSIMPLEQ_EMPTY(&ctx->cqe_handler_ready_list)); + qemu_lockcnt_lock(&ctx->list_lock); fdmon_poll_downgrade(ctx); qemu_lockcnt_unlock(&ctx->list_lock); diff --git a/util/trace-events b/util/trace-events index bd8f25fb59..540d662507 100644 --- a/util/trace-events +++ b/util/trace-events @@ -24,6 +24,10 @@ buffer_move_empty(const char *buf, size_t len, const cha= r *from) "%s: %zd bytes buffer_move(const char *buf, size_t len, const char *from) "%s: %zd bytes = from %s" buffer_free(const char *buf, size_t len) "%s: capacity %zd" =20 +# fdmon-io_uring.c +fdmon_io_uring_add_sqe(void *ctx, void *opaque, int opcode, int fd, uint64= _t off, void *cqe_handler) "ctx %p opaque %p opcode %d fd %d off %"PRId64" = cqe_handler %p" +fdmon_io_uring_cqe_handler(void *ctx, void *cqe_handler, int cqe_res) "ctx= %p cqe_handler %p cqe_res %d" + # filemonitor-inotify.c qemu_file_monitor_add_watch(void *mon, const char *dirpath, const char *fi= lename, void *cb, void *opaque, int64_t id) "File monitor %p add watch dir= =3D'%s' file=3D'%s' cb=3D%p opaque=3D%p id=3D%" PRId64 qemu_file_monitor_remove_watch(void *mon, const char *dirpath, int64_t id)= "File monitor %p remove watch dir=3D'%s' id=3D%" PRId64 --=20 2.51.1 From nobody Fri Nov 14 16:55:24 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762223653; cv=none; d=zohomail.com; s=zohoarc; b=Zz8TSkyUnDJ4irS1SLs+qkNoa6EYnPyNNt4YZ4wCIkYCMilKQvCQnyNF/i4h60MGULfvnePmPCjdd9cW3ccUD0Loxn6A7YxJsoPwhL0yZ7WeD9SKLKN3zcaxsVI+7Q5bJ05RBWCPT7MhQ4HkmH38YVVHsnnOc7cSAjh6Dj+7680= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762223653; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=s50QRA/rLv/AfBMFQovHiUMR4s2eim1uG7Zw00xAHis=; b=ci5WphUrinMHAjrIGujc2SNApXF3KIBs+Rg1tPvwqQWTPfUNitEKyw/HGlmZ4vXQgnCDyddnZl+cQhnf5GYVrKz9Z3MFkEb9Ei/umQBJKg2xwNps2kH43zJXG/qm8mk9yyYoSwQZCdC+ri8vhHkujKzxkFam1kLMYx7B6uTlrVI= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762223653718859.0826101117732; Mon, 3 Nov 2025 18:34:13 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vG6ph-0008Jz-7n; Mon, 03 Nov 2025 21:31:41 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6oX-0007Bx-Pu for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:30:31 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6oO-0004DU-4w for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:30:25 -0500 Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-134-M4tlqM_dP8eE1RmDplgQFw-1; Mon, 03 Nov 2025 21:30:13 -0500 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7DF331956053; Tue, 4 Nov 2025 02:30:12 +0000 (UTC) Received: from localhost (unknown [10.2.16.6]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id D364D1956056; Tue, 4 Nov 2025 02:30:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762223418; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s50QRA/rLv/AfBMFQovHiUMR4s2eim1uG7Zw00xAHis=; b=C5wTtwObVha1g1JEA0e/CtcAV2Z0FHmmRQmH76FKqdYxTizJt5YwgV+EvaU6I9H/sCFwip lNTdwsqquYPIkenvO8kH/0cZzSrf31HziQbCIDPFmwV3kX6o38ujdeozv0bqa0n8l0FA3C mJS5im/6xLEnHshSxvCIkOjAw/UZE1A= X-MC-Unique: M4tlqM_dP8eE1RmDplgQFw-1 X-Mimecast-MFC-AGG-ID: M4tlqM_dP8eE1RmDplgQFw_1762223412 From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Stefan Hajnoczi , Kevin Wolf , eblake@redhat.com, Hanna Czenczek , qemu-block@nongnu.org, Paolo Bonzini , Fam Zheng , hibriansong@gmail.com Subject: [PATCH v6 14/15] block/io_uring: use aio_add_sqe() Date: Mon, 3 Nov 2025 21:29:32 -0500 Message-ID: <20251104022933.618123-15-stefanha@redhat.com> In-Reply-To: <20251104022933.618123-1-stefanha@redhat.com> References: <20251104022933.618123-1-stefanha@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=stefanha@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762223656452154100 Content-Type: text/plain; charset="utf-8" AioContext has its own io_uring instance for file descriptor monitoring. The disk I/O io_uring code was developed separately. Originally I thought the characteristics of file descriptor monitoring and disk I/O were too different, requiring separate io_uring instances. Now it has become clear to me that it's feasible to share a single io_uring instance for file descriptor monitoring and disk I/O. We're not using io_uring's IOPOLL feature or anything else that would require a separate instance. Unify block/io_uring.c and util/fdmon-io_uring.c using the new aio_add_sqe() API that allows user-defined io_uring sqe submission. Now block/io_uring.c just needs to submit readv/writev/fsync and most of the io_uring-specific logic is handled by fdmon-io_uring.c. There are two immediate advantages: 1. Fewer system calls. There is no need to monitor the disk I/O io_uring ring fd from the file descriptor monitoring io_uring instance. Disk I/O completions are now picked up directly. Also, sqes are accumulated in the sq ring until the end of the event loop iteration and there are fewer io_uring_enter(2) syscalls. 2. Less code duplication. Note that error_setg() messages are not supposed to end with punctuation, so I removed a '.' for the non-io_uring build error message. Signed-off-by: Stefan Hajnoczi Reviewed-by: Eric Blake --- include/block/aio.h | 7 - include/block/raw-aio.h | 5 - block/file-posix.c | 40 ++-- block/io_uring.c | 489 ++++++++++------------------------------ stubs/io_uring.c | 32 --- util/async.c | 35 --- block/trace-events | 12 +- stubs/meson.build | 3 - 8 files changed, 130 insertions(+), 493 deletions(-) delete mode 100644 stubs/io_uring.c diff --git a/include/block/aio.h b/include/block/aio.h index 05d1bf4036..540bbc5d60 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -310,8 +310,6 @@ struct AioContext { struct LinuxAioState *linux_aio; #endif #ifdef CONFIG_LINUX_IO_URING - LuringState *linux_io_uring; - /* State for file descriptor monitoring using Linux io_uring */ struct io_uring fdmon_io_uring; AioHandlerSList submit_list; @@ -615,11 +613,6 @@ struct LinuxAioState *aio_setup_linux_aio(AioContext *= ctx, Error **errp); /* Return the LinuxAioState bound to this AioContext */ struct LinuxAioState *aio_get_linux_aio(AioContext *ctx); =20 -/* Setup the LuringState bound to this AioContext */ -LuringState *aio_setup_linux_io_uring(AioContext *ctx, Error **errp); - -/* Return the LuringState bound to this AioContext */ -LuringState *aio_get_linux_io_uring(AioContext *ctx); /** * aio_timer_new_with_attrs: * @ctx: the aio context diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h index 6570244496..30e5fc9a9f 100644 --- a/include/block/raw-aio.h +++ b/include/block/raw-aio.h @@ -74,15 +74,10 @@ static inline bool laio_has_fua(void) #endif /* io_uring.c - Linux io_uring implementation */ #ifdef CONFIG_LINUX_IO_URING -LuringState *luring_init(Error **errp); -void luring_cleanup(LuringState *s); - /* luring_co_submit: submit I/O requests in the thread's current AioContex= t. */ int coroutine_fn luring_co_submit(BlockDriverState *bs, int fd, uint64_t o= ffset, QEMUIOVector *qiov, int type, BdrvRequestFlags flags); -void luring_detach_aio_context(LuringState *s, AioContext *old_context); -void luring_attach_aio_context(LuringState *s, AioContext *new_context); bool luring_has_fua(void); #else static inline bool luring_has_fua(void) diff --git a/block/file-posix.c b/block/file-posix.c index 8c738674ce..8b7c02d19a 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -755,14 +755,23 @@ static int raw_open_common(BlockDriverState *bs, QDic= t *options, } #endif /* !defined(CONFIG_LINUX_AIO) */ =20 -#ifndef CONFIG_LINUX_IO_URING if (s->use_linux_io_uring) { +#ifdef CONFIG_LINUX_IO_URING + if (!aio_has_io_uring()) { + error_setg(errp, "aio=3Dio_uring was specified, but is not " + "available (disabled via io_uring_disabled " + "sysctl or blocked by container runtime " + "seccomp policy?)"); + ret =3D -EINVAL; + goto fail; + } +#else error_setg(errp, "aio=3Dio_uring was specified, but is not support= ed " - "in this build."); + "in this build"); ret =3D -EINVAL; goto fail; - } #endif /* !defined(CONFIG_LINUX_IO_URING) */ + } =20 s->has_discard =3D true; s->has_write_zeroes =3D true; @@ -2522,27 +2531,6 @@ static bool bdrv_qiov_is_aligned(BlockDriverState *b= s, QEMUIOVector *qiov) return true; } =20 -#ifdef CONFIG_LINUX_IO_URING -static inline bool raw_check_linux_io_uring(BDRVRawState *s) -{ - Error *local_err =3D NULL; - AioContext *ctx; - - if (!s->use_linux_io_uring) { - return false; - } - - ctx =3D qemu_get_current_aio_context(); - if (unlikely(!aio_setup_linux_io_uring(ctx, &local_err))) { - error_reportf_err(local_err, "Unable to use linux io_uring, " - "falling back to thread pool: "); - s->use_linux_io_uring =3D false; - return false; - } - return true; -} -#endif - #ifdef CONFIG_LINUX_AIO static inline bool raw_check_linux_aio(BDRVRawState *s) { @@ -2595,7 +2583,7 @@ raw_co_prw(BlockDriverState *bs, int64_t *offset_ptr,= uint64_t bytes, if (s->needs_alignment && !bdrv_qiov_is_aligned(bs, qiov)) { type |=3D QEMU_AIO_MISALIGNED; #ifdef CONFIG_LINUX_IO_URING - } else if (raw_check_linux_io_uring(s)) { + } else if (s->use_linux_io_uring) { assert(qiov->size =3D=3D bytes); ret =3D luring_co_submit(bs, s->fd, offset, qiov, type, flags); goto out; @@ -2692,7 +2680,7 @@ static int coroutine_fn raw_co_flush_to_disk(BlockDri= verState *bs) }; =20 #ifdef CONFIG_LINUX_IO_URING - if (raw_check_linux_io_uring(s)) { + if (s->use_linux_io_uring) { return luring_co_submit(bs, s->fd, 0, NULL, QEMU_AIO_FLUSH, 0); } #endif diff --git a/block/io_uring.c b/block/io_uring.c index dd4f304910..dd930ee57e 100644 --- a/block/io_uring.c +++ b/block/io_uring.c @@ -11,28 +11,20 @@ #include "qemu/osdep.h" #include #include "block/aio.h" -#include "qemu/queue.h" #include "block/block.h" #include "block/raw-aio.h" #include "qemu/coroutine.h" -#include "qemu/defer-call.h" -#include "qapi/error.h" #include "system/block-backend.h" #include "trace.h" =20 -/* Only used for assertions. */ -#include "qemu/coroutine_int.h" - -/* io_uring ring size */ -#define MAX_ENTRIES 128 - -typedef struct LuringAIOCB { +typedef struct { Coroutine *co; - struct io_uring_sqe sqeq; - ssize_t ret; QEMUIOVector *qiov; - bool is_read; - QSIMPLEQ_ENTRY(LuringAIOCB) next; + uint64_t offset; + ssize_t ret; + int type; + int fd; + BdrvRequestFlags flags; =20 /* * Buffered reads may require resubmission, see @@ -40,36 +32,51 @@ typedef struct LuringAIOCB { */ int total_read; QEMUIOVector resubmit_qiov; -} LuringAIOCB; =20 -typedef struct LuringQueue { - unsigned int in_queue; - unsigned int in_flight; - bool blocked; - QSIMPLEQ_HEAD(, LuringAIOCB) submit_queue; -} LuringQueue; + CqeHandler cqe_handler; +} LuringRequest; =20 -struct LuringState { - AioContext *aio_context; - - struct io_uring ring; - - /* No locking required, only accessed from AioContext home thread */ - LuringQueue io_q; - - QEMUBH *completion_bh; -}; - -/** - * luring_resubmit: - * - * Resubmit a request by appending it to submit_queue. The caller must en= sure - * that ioq_submit() is called later so that submit_queue requests are sta= rted. - */ -static void luring_resubmit(LuringState *s, LuringAIOCB *luringcb) +static void luring_prep_sqe(struct io_uring_sqe *sqe, void *opaque) { - QSIMPLEQ_INSERT_TAIL(&s->io_q.submit_queue, luringcb, next); - s->io_q.in_queue++; + LuringRequest *req =3D opaque; + QEMUIOVector *qiov =3D req->qiov; + uint64_t offset =3D req->offset; + int fd =3D req->fd; + BdrvRequestFlags flags =3D req->flags; + + switch (req->type) { + case QEMU_AIO_WRITE: +#ifdef HAVE_IO_URING_PREP_WRITEV2 + { + int luring_flags =3D (flags & BDRV_REQ_FUA) ? RWF_DSYNC : 0; + io_uring_prep_writev2(sqe, fd, qiov->iov, + qiov->niov, offset, luring_flags); + } +#else + assert(flags =3D=3D 0); + io_uring_prep_writev(sqe, fd, qiov->iov, qiov->niov, offset); +#endif + break; + case QEMU_AIO_ZONE_APPEND: + io_uring_prep_writev(sqe, fd, qiov->iov, qiov->niov, offset); + break; + case QEMU_AIO_READ: + { + if (req->resubmit_qiov.iov !=3D NULL) { + qiov =3D &req->resubmit_qiov; + } + io_uring_prep_readv(sqe, fd, qiov->iov, qiov->niov, + offset + req->total_read); + break; + } + case QEMU_AIO_FLUSH: + io_uring_prep_fsync(sqe, fd, IORING_FSYNC_DATASYNC); + break; + default: + fprintf(stderr, "%s: invalid AIO request type, aborting 0x%x.\n", + __func__, req->type); + abort(); + } } =20 /** @@ -78,385 +85,115 @@ static void luring_resubmit(LuringState *s, LuringAIO= CB *luringcb) * Short reads are rare but may occur. The remaining read request needs to= be * resubmitted. */ -static void luring_resubmit_short_read(LuringState *s, LuringAIOCB *luring= cb, - int nread) +static void luring_resubmit_short_read(LuringRequest *req, int nread) { QEMUIOVector *resubmit_qiov; size_t remaining; =20 - trace_luring_resubmit_short_read(s, luringcb, nread); + trace_luring_resubmit_short_read(req, nread); =20 /* Update read position */ - luringcb->total_read +=3D nread; - remaining =3D luringcb->qiov->size - luringcb->total_read; + req->total_read +=3D nread; + remaining =3D req->qiov->size - req->total_read; =20 /* Shorten qiov */ - resubmit_qiov =3D &luringcb->resubmit_qiov; + resubmit_qiov =3D &req->resubmit_qiov; if (resubmit_qiov->iov =3D=3D NULL) { - qemu_iovec_init(resubmit_qiov, luringcb->qiov->niov); + qemu_iovec_init(resubmit_qiov, req->qiov->niov); } else { qemu_iovec_reset(resubmit_qiov); } - qemu_iovec_concat(resubmit_qiov, luringcb->qiov, luringcb->total_read, - remaining); + qemu_iovec_concat(resubmit_qiov, req->qiov, req->total_read, remaining= ); =20 - /* Update sqe */ - luringcb->sqeq.off +=3D nread; - luringcb->sqeq.addr =3D (uintptr_t)luringcb->resubmit_qiov.iov; - luringcb->sqeq.len =3D luringcb->resubmit_qiov.niov; - - luring_resubmit(s, luringcb); + aio_add_sqe(luring_prep_sqe, req, &req->cqe_handler); } =20 -/** - * luring_process_completions: - * @s: AIO state - * - * Fetches completed I/O requests, consumes cqes and invokes their callbac= ks - * The function is somewhat tricky because it supports nested event loops,= for - * example when a request callback invokes aio_poll(). - * - * Function schedules BH completion so it can be called again in a nested - * event loop. When there are no events left to complete the BH is being - * canceled. - * - */ -static void luring_process_completions(LuringState *s) +static void luring_cqe_handler(CqeHandler *cqe_handler) { - struct io_uring_cqe *cqes; - int total_bytes; + LuringRequest *req =3D container_of(cqe_handler, LuringRequest, cqe_ha= ndler); + int ret =3D cqe_handler->cqe.res; =20 - defer_call_begin(); + trace_luring_cqe_handler(req, ret); =20 - /* - * Request completion callbacks can run the nested event loop. - * Schedule ourselves so the nested event loop will "see" remaining - * completed requests and process them. Without this, completion - * callbacks that wait for other requests using a nested event loop - * would hang forever. - * - * This workaround is needed because io_uring uses poll_wait, which - * is woken up when new events are added to the uring, thus polling on - * the same uring fd will block unless more events are received. - * - * Other leaf block drivers (drivers that access the data themselves) - * are networking based, so they poll sockets for data and run the - * correct coroutine. - */ - qemu_bh_schedule(s->completion_bh); - - while (io_uring_peek_cqe(&s->ring, &cqes) =3D=3D 0) { - LuringAIOCB *luringcb; - int ret; - - if (!cqes) { - break; + if (ret < 0) { + /* + * Only writev/readv/fsync requests on regular files or host block + * devices are submitted. Therefore -EAGAIN is not expected but it= 's + * known to happen sometimes with Linux SCSI. Submit again and hope + * the request completes successfully. + * + * For more information, see: + * https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@= kernel.dk/T/#u + * + * If the code is changed to submit other types of requests in the + * future, then this workaround may need to be extended to deal wi= th + * genuine -EAGAIN results that should not be resubmitted + * immediately. + */ + if (ret =3D=3D -EINTR || ret =3D=3D -EAGAIN) { + aio_add_sqe(luring_prep_sqe, req, &req->cqe_handler); + return; } - - luringcb =3D io_uring_cqe_get_data(cqes); - ret =3D cqes->res; - io_uring_cqe_seen(&s->ring, cqes); - cqes =3D NULL; - - /* Change counters one-by-one because we can be nested. */ - s->io_q.in_flight--; - trace_luring_process_completion(s, luringcb, ret); - + } else if (req->qiov) { /* total_read is non-zero only for resubmitted read requests */ - total_bytes =3D ret + luringcb->total_read; + int total_bytes =3D ret + req->total_read; =20 - if (ret < 0) { - /* - * Only writev/readv/fsync requests on regular files or host b= lock - * devices are submitted. Therefore -EAGAIN is not expected bu= t it's - * known to happen sometimes with Linux SCSI. Submit again and= hope - * the request completes successfully. - * - * For more information, see: - * https://lore.kernel.org/io-uring/20210727165811.284510-3-ax= boe@kernel.dk/T/#u - * - * If the code is changed to submit other types of requests in= the - * future, then this workaround may need to be extended to dea= l with - * genuine -EAGAIN results that should not be resubmitted - * immediately. - */ - if (ret =3D=3D -EINTR || ret =3D=3D -EAGAIN) { - luring_resubmit(s, luringcb); - continue; - } - } else if (!luringcb->qiov) { - goto end; - } else if (total_bytes =3D=3D luringcb->qiov->size) { + if (total_bytes =3D=3D req->qiov->size) { ret =3D 0; - /* Only read/write */ } else { /* Short Read/Write */ - if (luringcb->is_read) { + if (req->type =3D=3D QEMU_AIO_READ) { if (ret > 0) { - luring_resubmit_short_read(s, luringcb, ret); - continue; - } else { - /* Pad with zeroes */ - qemu_iovec_memset(luringcb->qiov, total_bytes, 0, - luringcb->qiov->size - total_bytes); - ret =3D 0; + luring_resubmit_short_read(req, ret); + return; } + + /* Pad with zeroes */ + qemu_iovec_memset(req->qiov, total_bytes, 0, + req->qiov->size - total_bytes); + ret =3D 0; } else { ret =3D -ENOSPC; } } -end: - luringcb->ret =3D ret; - qemu_iovec_destroy(&luringcb->resubmit_qiov); - - /* - * If the coroutine is already entered it must be in ioq_submit() - * and will notice luringcb->ret has been filled in when it - * eventually runs later. Coroutines cannot be entered recursively - * so avoid doing that! - */ - assert(luringcb->co->ctx =3D=3D s->aio_context); - if (!qemu_coroutine_entered(luringcb->co)) { - aio_co_wake(luringcb->co); - } } =20 - qemu_bh_cancel(s->completion_bh); + req->ret =3D ret; + qemu_iovec_destroy(&req->resubmit_qiov); =20 - defer_call_end(); -} - -static int ioq_submit(LuringState *s) -{ - int ret =3D 0; - LuringAIOCB *luringcb, *luringcb_next; - - while (s->io_q.in_queue > 0) { - /* - * Try to fetch sqes from the ring for requests waiting in - * the overflow queue - */ - QSIMPLEQ_FOREACH_SAFE(luringcb, &s->io_q.submit_queue, next, - luringcb_next) { - struct io_uring_sqe *sqes =3D io_uring_get_sqe(&s->ring); - if (!sqes) { - break; - } - /* Prep sqe for submission */ - *sqes =3D luringcb->sqeq; - QSIMPLEQ_REMOVE_HEAD(&s->io_q.submit_queue, next); - } - ret =3D io_uring_submit(&s->ring); - trace_luring_io_uring_submit(s, ret); - /* Prevent infinite loop if submission is refused */ - if (ret <=3D 0) { - if (ret =3D=3D -EAGAIN || ret =3D=3D -EINTR) { - continue; - } - break; - } - s->io_q.in_flight +=3D ret; - s->io_q.in_queue -=3D ret; - } - s->io_q.blocked =3D (s->io_q.in_queue > 0); - - if (s->io_q.in_flight) { - /* - * We can try to complete something just right away if there are - * still requests in-flight. - */ - luring_process_completions(s); - } - return ret; -} - -static void luring_process_completions_and_submit(LuringState *s) -{ - luring_process_completions(s); - - if (s->io_q.in_queue > 0) { - ioq_submit(s); + /* + * If the coroutine is already entered it must be in luring_co_submit(= ) and + * will notice req->ret has been filled in when it eventually runs lat= er. + * Coroutines cannot be entered recursively so avoid doing that! + */ + if (!qemu_coroutine_entered(req->co)) { + aio_co_wake(req->co); } } =20 -static void qemu_luring_completion_bh(void *opaque) +int coroutine_fn luring_co_submit(BlockDriverState *bs, int fd, + uint64_t offset, QEMUIOVector *qiov, + int type, BdrvRequestFlags flags) { - LuringState *s =3D opaque; - luring_process_completions_and_submit(s); -} - -static void qemu_luring_completion_cb(void *opaque) -{ - LuringState *s =3D opaque; - luring_process_completions_and_submit(s); -} - -static bool qemu_luring_poll_cb(void *opaque) -{ - LuringState *s =3D opaque; - - return io_uring_cq_ready(&s->ring); -} - -static void qemu_luring_poll_ready(void *opaque) -{ - LuringState *s =3D opaque; - - luring_process_completions_and_submit(s); -} - -static void ioq_init(LuringQueue *io_q) -{ - QSIMPLEQ_INIT(&io_q->submit_queue); - io_q->in_queue =3D 0; - io_q->in_flight =3D 0; - io_q->blocked =3D false; -} - -static void luring_deferred_fn(void *opaque) -{ - LuringState *s =3D opaque; - trace_luring_unplug_fn(s, s->io_q.blocked, s->io_q.in_queue, - s->io_q.in_flight); - if (!s->io_q.blocked && s->io_q.in_queue > 0) { - ioq_submit(s); - } -} - -/** - * luring_do_submit: - * @fd: file descriptor for I/O - * @luringcb: AIO control block - * @s: AIO state - * @offset: offset for request - * @type: type of request - * - * Fetches sqes from ring, adds to pending queue and preps them - * - */ -static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s, - uint64_t offset, int type, BdrvRequestFlags fl= ags) -{ - int ret; - struct io_uring_sqe *sqes =3D &luringcb->sqeq; - - switch (type) { - case QEMU_AIO_WRITE: -#ifdef HAVE_IO_URING_PREP_WRITEV2 - { - int luring_flags =3D (flags & BDRV_REQ_FUA) ? RWF_DSYNC : 0; - io_uring_prep_writev2(sqes, fd, luringcb->qiov->iov, - luringcb->qiov->niov, offset, luring_flags); - } -#else - assert(flags =3D=3D 0); - io_uring_prep_writev(sqes, fd, luringcb->qiov->iov, - luringcb->qiov->niov, offset); -#endif - break; - case QEMU_AIO_ZONE_APPEND: - io_uring_prep_writev(sqes, fd, luringcb->qiov->iov, - luringcb->qiov->niov, offset); - break; - case QEMU_AIO_READ: - io_uring_prep_readv(sqes, fd, luringcb->qiov->iov, - luringcb->qiov->niov, offset); - break; - case QEMU_AIO_FLUSH: - io_uring_prep_fsync(sqes, fd, IORING_FSYNC_DATASYNC); - break; - default: - fprintf(stderr, "%s: invalid AIO request type, aborting 0x%x.\n", - __func__, type); - abort(); - } - io_uring_sqe_set_data(sqes, luringcb); - - QSIMPLEQ_INSERT_TAIL(&s->io_q.submit_queue, luringcb, next); - s->io_q.in_queue++; - trace_luring_do_submit(s, s->io_q.blocked, s->io_q.in_queue, - s->io_q.in_flight); - if (!s->io_q.blocked) { - if (s->io_q.in_flight + s->io_q.in_queue >=3D MAX_ENTRIES) { - ret =3D ioq_submit(s); - trace_luring_do_submit_done(s, ret); - return ret; - } - - defer_call(luring_deferred_fn, s); - } - return 0; -} - -int coroutine_fn luring_co_submit(BlockDriverState *bs, int fd, uint64_t o= ffset, - QEMUIOVector *qiov, int type, - BdrvRequestFlags flags) -{ - int ret; - AioContext *ctx =3D qemu_get_current_aio_context(); - LuringState *s =3D aio_get_linux_io_uring(ctx); - LuringAIOCB luringcb =3D { + LuringRequest req =3D { .co =3D qemu_coroutine_self(), - .ret =3D -EINPROGRESS, .qiov =3D qiov, - .is_read =3D (type =3D=3D QEMU_AIO_READ), + .ret =3D -EINPROGRESS, + .type =3D type, + .fd =3D fd, + .offset =3D offset, + .flags =3D flags, }; - trace_luring_co_submit(bs, s, &luringcb, fd, offset, qiov ? qiov->size= : 0, - type); - ret =3D luring_do_submit(fd, &luringcb, s, offset, type, flags); =20 - if (ret < 0) { - return ret; - } + req.cqe_handler.cb =3D luring_cqe_handler; =20 - if (luringcb.ret =3D=3D -EINPROGRESS) { + trace_luring_co_submit(bs, &req, fd, offset, qiov ? qiov->size : 0, ty= pe); + aio_add_sqe(luring_prep_sqe, &req, &req.cqe_handler); + + if (req.ret =3D=3D -EINPROGRESS) { qemu_coroutine_yield(); } - return luringcb.ret; -} - -void luring_detach_aio_context(LuringState *s, AioContext *old_context) -{ - aio_set_fd_handler(old_context, s->ring.ring_fd, - NULL, NULL, NULL, NULL, s); - qemu_bh_delete(s->completion_bh); - s->aio_context =3D NULL; -} - -void luring_attach_aio_context(LuringState *s, AioContext *new_context) -{ - s->aio_context =3D new_context; - s->completion_bh =3D aio_bh_new(new_context, qemu_luring_completion_bh= , s); - aio_set_fd_handler(s->aio_context, s->ring.ring_fd, - qemu_luring_completion_cb, NULL, - qemu_luring_poll_cb, qemu_luring_poll_ready, s); -} - -LuringState *luring_init(Error **errp) -{ - int rc; - LuringState *s =3D g_new0(LuringState, 1); - struct io_uring *ring =3D &s->ring; - - trace_luring_init_state(s, sizeof(*s)); - - rc =3D io_uring_queue_init(MAX_ENTRIES, ring, 0); - if (rc < 0) { - error_setg_errno(errp, -rc, "failed to init linux io_uring ring"); - g_free(s); - return NULL; - } - - ioq_init(&s->io_q); - return s; - -} - -void luring_cleanup(LuringState *s) -{ - io_uring_queue_exit(&s->ring); - trace_luring_cleanup_state(s); - g_free(s); + return req.ret; } =20 bool luring_has_fua(void) diff --git a/stubs/io_uring.c b/stubs/io_uring.c deleted file mode 100644 index 622d1e4648..0000000000 --- a/stubs/io_uring.c +++ /dev/null @@ -1,32 +0,0 @@ -/* - * Linux io_uring support. - * - * Copyright (C) 2009 IBM, Corp. - * Copyright (C) 2009 Red Hat, Inc. - * - * This work is licensed under the terms of the GNU GPL, version 2 or late= r. - * See the COPYING file in the top-level directory. - */ -#include "qemu/osdep.h" -#include "block/aio.h" -#include "block/raw-aio.h" - -void luring_detach_aio_context(LuringState *s, AioContext *old_context) -{ - abort(); -} - -void luring_attach_aio_context(LuringState *s, AioContext *new_context) -{ - abort(); -} - -LuringState *luring_init(Error **errp) -{ - abort(); -} - -void luring_cleanup(LuringState *s) -{ - abort(); -} diff --git a/util/async.c b/util/async.c index 00e46b99f9..a216cf8695 100644 --- a/util/async.c +++ b/util/async.c @@ -386,14 +386,6 @@ aio_ctx_finalize(GSource *source) } #endif =20 -#ifdef CONFIG_LINUX_IO_URING - if (ctx->linux_io_uring) { - luring_detach_aio_context(ctx->linux_io_uring, ctx); - luring_cleanup(ctx->linux_io_uring); - ctx->linux_io_uring =3D NULL; - } -#endif - assert(QSLIST_EMPTY(&ctx->scheduled_coroutines)); qemu_bh_delete(ctx->co_schedule_bh); =20 @@ -468,29 +460,6 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx) } #endif =20 -#ifdef CONFIG_LINUX_IO_URING -LuringState *aio_setup_linux_io_uring(AioContext *ctx, Error **errp) -{ - if (ctx->linux_io_uring) { - return ctx->linux_io_uring; - } - - ctx->linux_io_uring =3D luring_init(errp); - if (!ctx->linux_io_uring) { - return NULL; - } - - luring_attach_aio_context(ctx->linux_io_uring, ctx); - return ctx->linux_io_uring; -} - -LuringState *aio_get_linux_io_uring(AioContext *ctx) -{ - assert(ctx->linux_io_uring); - return ctx->linux_io_uring; -} -#endif - void aio_notify(AioContext *ctx) { /* @@ -630,10 +599,6 @@ AioContext *aio_context_new(Error **errp) ctx->linux_aio =3D NULL; #endif =20 -#ifdef CONFIG_LINUX_IO_URING - ctx->linux_io_uring =3D NULL; -#endif - ctx->thread_pool =3D NULL; qemu_rec_mutex_init(&ctx->lock); timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx); diff --git a/block/trace-events b/block/trace-events index 8e789e1f12..c9b4736ff8 100644 --- a/block/trace-events +++ b/block/trace-events @@ -62,15 +62,9 @@ qmp_block_stream(void *bs) "bs %p" file_paio_submit(void *acb, void *opaque, int64_t offset, int count, int t= ype) "acb %p opaque %p offset %"PRId64" count %d type %d" =20 # io_uring.c -luring_init_state(void *s, size_t size) "s %p size %zu" -luring_cleanup_state(void *s) "%p freed" -luring_unplug_fn(void *s, int blocked, int queued, int inflight) "LuringSt= ate %p blocked %d queued %d inflight %d" -luring_do_submit(void *s, int blocked, int queued, int inflight) "LuringSt= ate %p blocked %d queued %d inflight %d" -luring_do_submit_done(void *s, int ret) "LuringState %p submitted to kerne= l %d" -luring_co_submit(void *bs, void *s, void *luringcb, int fd, uint64_t offse= t, size_t nbytes, int type) "bs %p s %p luringcb %p fd %d offset %" PRId64 = " nbytes %zd type %d" -luring_process_completion(void *s, void *aiocb, int ret) "LuringState %p l= uringcb %p ret %d" -luring_io_uring_submit(void *s, int ret) "LuringState %p ret %d" -luring_resubmit_short_read(void *s, void *luringcb, int nread) "LuringStat= e %p luringcb %p nread %d" +luring_cqe_handler(void *req, int ret) "req %p ret %d" +luring_co_submit(void *bs, void *req, int fd, uint64_t offset, size_t nbyt= es, int type) "bs %p req %p fd %d offset %" PRId64 " nbytes %zd type %d" +luring_resubmit_short_read(void *req, int nread) "req %p nread %d" =20 # qcow2.c qcow2_add_task(void *co, void *bs, void *pool, const char *action, int clu= ster_type, uint64_t host_offset, uint64_t offset, uint64_t bytes, void *qio= v, size_t qiov_offset) "co %p bs %p pool %p: %s: cluster_type %d file_clust= er_offset %" PRIu64 " offset %" PRIu64 " bytes %" PRIu64 " qiov %p qiov_off= set %zu" diff --git a/stubs/meson.build b/stubs/meson.build index 27be2dec9f..0b2778c568 100644 --- a/stubs/meson.build +++ b/stubs/meson.build @@ -32,9 +32,6 @@ if have_block or have_ga stub_ss.add(files('cpus-virtual-clock.c')) stub_ss.add(files('icount.c')) stub_ss.add(files('graph-lock.c')) - if linux_io_uring.found() - stub_ss.add(files('io_uring.c')) - endif if libaio.found() stub_ss.add(files('linux-aio.c')) endif --=20 2.51.1 From nobody Fri Nov 14 16:55:24 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762223609; cv=none; d=zohomail.com; s=zohoarc; b=Asyh64iTK0Rn8lgDsouMOKGBlXzQz6wptGPLrn2Y16QvsGRc0B5eWj3dRT/4rYJXMvUS5WlPPVP63W5DTn9fewPN2feZe+XG1TcUumQ/nUUlR/tH1APGV17wZcHzNJrgak/nliqt1D48NLUEvnuusuLtBtVXmzWykcHzpt9EMgU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762223609; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=nt3miwY9wKH/L4WGKkfJvXc/TwSfSHQI01mWZWbxNnQ=; b=kkYcMMMPornbICpDbNxSziQNOhRIMEow9+bRoTE7Z2OrlnUoQifnWOGwV75gm9g7cNZb9a50UYIrZ1CAp/3Qd7oTAd1YFYvfpehPDA60QfD1rb4HdiiOFCdKmW91DaYNi14cgaWp8pRlWS7gb2klMrJ3bd8IeTwckaT13O6hK0E= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762223609711876.7040718624303; Mon, 3 Nov 2025 18:33:29 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vG6pj-0008ON-7m; Mon, 03 Nov 2025 21:31:44 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6oV-0007BJ-V1 for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:30:31 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG6oO-0004DY-5P for qemu-devel@nongnu.org; Mon, 03 Nov 2025 21:30:24 -0500 Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-251-u2J3ZNdEOD2LG4Wf2i4SXw-1; Mon, 03 Nov 2025 21:30:15 -0500 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id BA578195608A; Tue, 4 Nov 2025 02:30:14 +0000 (UTC) Received: from localhost (unknown [10.2.16.6]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 1D12619560B2; Tue, 4 Nov 2025 02:30:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762223419; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nt3miwY9wKH/L4WGKkfJvXc/TwSfSHQI01mWZWbxNnQ=; b=EjHhblwbUnXfRYQX9KVnLHqzIIN1/ssIs8ZT4cf4qnyBriPvi3pQNZnBgYb01uA8hvc/ZS Djm6IVg7B4sqayBQSsln/GxC6rorVSfm75V9zbLM59A6cpPXAHj5AzEtkL+67f4tCHoQTl mzn0kFcfO4lWbec4ZwAXh1JJI4Cgs5c= X-MC-Unique: u2J3ZNdEOD2LG4Wf2i4SXw-1 X-Mimecast-MFC-AGG-ID: u2J3ZNdEOD2LG4Wf2i4SXw_1762223414 From: Stefan Hajnoczi To: qemu-devel@nongnu.org Cc: Stefan Hajnoczi , Kevin Wolf , eblake@redhat.com, Hanna Czenczek , qemu-block@nongnu.org, Paolo Bonzini , Fam Zheng , hibriansong@gmail.com Subject: [PATCH v6 15/15] block/io_uring: use non-vectored read/write when possible Date: Mon, 3 Nov 2025 21:29:33 -0500 Message-ID: <20251104022933.618123-16-stefanha@redhat.com> In-Reply-To: <20251104022933.618123-1-stefanha@redhat.com> References: <20251104022933.618123-1-stefanha@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=stefanha@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762223611966154100 Content-Type: text/plain; charset="utf-8" The io_uring_prep_readv2/writev2() man pages recommend using the non-vectored read/write operations when possible for performance reasons. I didn't measure a significant difference but it doesn't hurt to have this optimization in place. Suggested-by: Eric Blake Signed-off-by: Stefan Hajnoczi --- v5: - Reduce #ifdef HAVE_IO_URING_PREP_WRITEV2 code duplication [Kevin] --- block/io_uring.c | 34 ++++++++++++++++++++++++++-------- 1 file changed, 26 insertions(+), 8 deletions(-) diff --git a/block/io_uring.c b/block/io_uring.c index dd930ee57e..f1514cf024 100644 --- a/block/io_uring.c +++ b/block/io_uring.c @@ -46,17 +46,28 @@ static void luring_prep_sqe(struct io_uring_sqe *sqe, v= oid *opaque) =20 switch (req->type) { case QEMU_AIO_WRITE: -#ifdef HAVE_IO_URING_PREP_WRITEV2 { int luring_flags =3D (flags & BDRV_REQ_FUA) ? RWF_DSYNC : 0; - io_uring_prep_writev2(sqe, fd, qiov->iov, - qiov->niov, offset, luring_flags); - } + if (luring_flags !=3D 0 || qiov->niov > 1) { +#ifdef HAVE_IO_URING_PREP_WRITEV2 + io_uring_prep_writev2(sqe, fd, qiov->iov, + qiov->niov, offset, luring_flags); #else - assert(flags =3D=3D 0); - io_uring_prep_writev(sqe, fd, qiov->iov, qiov->niov, offset); + /* + * FUA should only be enabled with HAVE_IO_URING_PREP_WRITEV2,= see + * luring_has_fua(). + */ + assert(luring_flags =3D=3D 0); + + io_uring_prep_writev(sqe, fd, qiov->iov, qiov->niov, offset); #endif + } else { + /* The man page says non-vectored is faster than vectored */ + struct iovec *iov =3D qiov->iov; + io_uring_prep_write(sqe, fd, iov->iov_base, iov->iov_len, offs= et); + } break; + } case QEMU_AIO_ZONE_APPEND: io_uring_prep_writev(sqe, fd, qiov->iov, qiov->niov, offset); break; @@ -65,8 +76,15 @@ static void luring_prep_sqe(struct io_uring_sqe *sqe, vo= id *opaque) if (req->resubmit_qiov.iov !=3D NULL) { qiov =3D &req->resubmit_qiov; } - io_uring_prep_readv(sqe, fd, qiov->iov, qiov->niov, - offset + req->total_read); + if (qiov->niov > 1) { + io_uring_prep_readv(sqe, fd, qiov->iov, qiov->niov, + offset + req->total_read); + } else { + /* The man page says non-vectored is faster than vectored */ + struct iovec *iov =3D qiov->iov; + io_uring_prep_read(sqe, fd, iov->iov_base, iov->iov_len, + offset + req->total_read); + } break; } case QEMU_AIO_FLUSH: --=20 2.51.1