From nobody Sun Dec 14 12:14:48 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1764754677056400.9828190593614; Wed, 3 Dec 2025 01:37:57 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vQjIM-0001C0-Jb; Wed, 03 Dec 2025 04:37:10 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vQjID-0000th-Oe; Wed, 03 Dec 2025 04:37:01 -0500 Received: from isrv.corpit.ru ([212.248.84.144]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vQjIB-00078Z-Qz; Wed, 03 Dec 2025 04:37:01 -0500 Received: from tsrv.corpit.ru (tsrv.tls.msk.ru [192.168.177.2]) by isrv.corpit.ru (Postfix) with ESMTP id 321B61708B9; Wed, 03 Dec 2025 12:35:55 +0300 (MSK) Received: from think4mjt.tls.msk.ru (mjtthink.wg.tls.msk.ru [192.168.177.146]) by tsrv.corpit.ru (Postfix) with ESMTP id 1DCA932B5B0; Wed, 03 Dec 2025 12:36:13 +0300 (MSK) From: Michael Tokarev To: qemu-devel@nongnu.org Cc: qemu-stable@nongnu.org, Fiona Ebner , Stefan Hajnoczi , Michael Tokarev Subject: [Stable-10.1.3 90/96] block/io_uring: avoid potentially getting stuck after resubmit at the end of ioq_submit() Date: Wed, 3 Dec 2025 12:35:23 +0300 Message-ID: <20251203093612.2370716-14-mjt@tls.msk.ru> X-Mailer: git-send-email 2.47.3 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=212.248.84.144; envelope-from=mjt@tls.msk.ru; helo=isrv.corpit.ru X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1764754678132019200 Content-Type: text/plain; charset="utf-8" From: Fiona Ebner Note that this issue seems already fixed as a consequence of the large io_uring rework with 047dabef97 ("block/io_uring: use aio_add_sqe()") in current master, so this is purely for QEMU stable branches. At the end of ioq_submit(), there is an opportunistic call to luring_process_completions(). This is the single caller of luring_process_completions() that doesn't use the luring_process_completions_and_submit() wrapper. Other callers use the wrapper, because luring_process_completions() might require a subsequent call to ioq_submit() after resubmitting a request. As noted for luring_resubmit(): > Resubmit a request by appending it to submit_queue. The caller must ensu= re > that ioq_submit() is called later so that submit_queue requests are start= ed. So the caller at the end of ioq_submit() violates the contract and can in fact be problematic if no other requests come in later. In such a case, the request intended to be resubmitted will never be actually be submitted via io_uring_submit(). A reproducer exposing this issue is [0], which is based on user reports from [1]. Another reproducer is iotest 109 with '-i io_uring'. I had the most success to trigger the issue with [0] when using a BTRFS RAID 1 storage. With tmpfs, it can take quite a few iterations, but also triggers eventually on my machine. With iotest 109 with '-i io_uring' the issue triggers reliably on my ext4 file system. Have ioq_submit() submit any resubmitted requests after calling luring_process_completions(). The return value from io_uring_submit() is checked to be non-negative before the opportunistic processing of completions and going for the new resubmit logic, to ensure that a failure of io_uring_submit() is not missed. Also note that the return value already was not necessarily the total number of submissions, since the loop might've been iterated more than once even before the current change. Only trigger the resubmission logic if it is actually necessary to avoid changing behavior more than necessary. For example iotest 109 would produce more 'mirror ready' events if always resubmitting after luring_process_completions() at the end of ioq_submit(). Note iotest 109 still does not pass as is when run with '-i io_uring', because of two offset values for BLOCK_JOB_COMPLETED events being zero instead of non-zero as in the expected output. Note that the two affected test cases are expected failures and still fail, so they just fail "faster". The test cases are actually not triggering the resubmit logic, so the reason seems to be different ordering of requests and completions of the current aio=3Dio_uring implementation versus aio=3Dthreads. [0]: > #!/bin/bash -e > #file=3D/mnt/btrfs/disk.raw > file=3D/tmp/disk.raw > filesize=3D256 > readsize=3D512 > rm -f $file > truncate -s $filesize $file > ./qemu-system-x86_64 --trace '*uring*' --qmp stdio \ > --blockdev raw,node-name=3Dnode0,file.driver=3Dfile,file.cache.direct=3Do= ff,file.filename=3D$file,file.aio=3Dio_uring \ > < {"execute": "qmp_capabilities"} > {"execute": "human-monitor-command", "arguments": { "command-line": "qemu= -io node0 \"read 0 $readsize \"" }} > {"execute": "quit"} > EOF [1]: https://forum.proxmox.com/threads/170045/ Cc: qemu-stable@nongnu.org Signed-off-by: Fiona Ebner Reviewed-by: Stefan Hajnoczi Signed-off-by: Michael Tokarev diff --git a/block/io_uring.c b/block/io_uring.c index dd4f304910..5dbafc8f7b 100644 --- a/block/io_uring.c +++ b/block/io_uring.c @@ -120,11 +120,14 @@ static void luring_resubmit_short_read(LuringState *s= , LuringAIOCB *luringcb, * event loop. When there are no events left to complete the BH is being * canceled. * + * Returns whether ioq_submit() must be called again afterwards since requ= ests + * were resubmitted via luring_resubmit(). */ -static void luring_process_completions(LuringState *s) +static bool luring_process_completions(LuringState *s) { struct io_uring_cqe *cqes; int total_bytes; + bool resubmit =3D false; =20 defer_call_begin(); =20 @@ -182,6 +185,7 @@ static void luring_process_completions(LuringState *s) */ if (ret =3D=3D -EINTR || ret =3D=3D -EAGAIN) { luring_resubmit(s, luringcb); + resubmit =3D true; continue; } } else if (!luringcb->qiov) { @@ -194,6 +198,7 @@ static void luring_process_completions(LuringState *s) if (luringcb->is_read) { if (ret > 0) { luring_resubmit_short_read(s, luringcb, ret); + resubmit =3D true; continue; } else { /* Pad with zeroes */ @@ -224,6 +229,8 @@ end: qemu_bh_cancel(s->completion_bh); =20 defer_call_end(); + + return resubmit; } =20 static int ioq_submit(LuringState *s) @@ -231,6 +238,7 @@ static int ioq_submit(LuringState *s) int ret =3D 0; LuringAIOCB *luringcb, *luringcb_next; =20 +resubmit: while (s->io_q.in_queue > 0) { /* * Try to fetch sqes from the ring for requests waiting in @@ -260,12 +268,14 @@ static int ioq_submit(LuringState *s) } s->io_q.blocked =3D (s->io_q.in_queue > 0); =20 - if (s->io_q.in_flight) { + if (ret >=3D 0 && s->io_q.in_flight) { /* * We can try to complete something just right away if there are * still requests in-flight. */ - luring_process_completions(s); + if (luring_process_completions(s)) { + goto resubmit; + } } return ret; } --=20 2.47.3