From nobody Tue Feb 10 20:07:31 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1763780467370960.9622152697928; Fri, 21 Nov 2025 19:01:07 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vMd8l-0006LU-0W; Fri, 21 Nov 2025 21:14:20 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vMcwz-0002RW-EG; Fri, 21 Nov 2025 21:02:09 -0500 Received: from isrv.corpit.ru ([212.248.84.144]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vMcvA-0000D3-K0; Fri, 21 Nov 2025 21:02:06 -0500 Received: from tsrv.corpit.ru (tsrv.tls.msk.ru [192.168.177.2]) by isrv.corpit.ru (Postfix) with ESMTP id 0BCA016CA7F; Fri, 21 Nov 2025 21:44:29 +0300 (MSK) Received: from think4mjt.tls.msk.ru (mjtthink.wg.tls.msk.ru [192.168.177.146]) by tsrv.corpit.ru (Postfix) with ESMTP id BA473321CBB; Fri, 21 Nov 2025 21:44:37 +0300 (MSK) From: Michael Tokarev To: qemu-devel@nongnu.org Cc: qemu-stable@nongnu.org, Hanna Czenczek , Junyao Zhao , Kevin Wolf , Michael Tokarev Subject: [Stable-10.0.7 68/81] =?UTF-8?q?rbd:=20Run=20co=20BH=20CB=20in=20?= =?UTF-8?q?the=20coroutine=E2=80=99s=20AioContext?= Date: Fri, 21 Nov 2025 21:44:07 +0300 Message-ID: <20251121184424.1137669-68-mjt@tls.msk.ru> X-Mailer: git-send-email 2.47.3 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=212.248.84.144; envelope-from=mjt@tls.msk.ru; helo=isrv.corpit.ru X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, T_SPF_HELO_TEMPERROR=0.01, T_SPF_TEMPERROR=0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1763780467620018900 From: Hanna Czenczek qemu_rbd_completion_cb() schedules the request completion code (qemu_rbd_finish_bh()) to run in the BDS=E2=80=99s AioContext, assuming that this is the same thread in which qemu_rbd_start_co() runs. To explain, this is how both latter functions interact: In qemu_rbd_start_co(): while (!task.complete) qemu_coroutine_yield(); In qemu_rbd_finish_bh(): task->complete =3D true; aio_co_wake(task->co); // task->co is qemu_rbd_start_co() For this interaction to work reliably, both must run in the same thread so that qemu_rbd_finish_bh() can only run once the coroutine yields. Otherwise, finish_bh() may run before start_co() checks task.complete, which will result in the latter seeing .complete as true immediately and skipping the yield altogether, even though finish_bh() still wakes it. With multiqueue, the BDS=E2=80=99s AioContext is not necessarily the thread start_co() runs in, and so finish_bh() may be scheduled to run in a different thread than start_co(). With the right timing, this will cause the problems described above; waking a non-yielding coroutine is not good, as can be reproduced by putting e.g. a usleep(100000) above the while loop in start_co() (and using multiqueue), giving finish_bh() a much better chance at exiting before start_co() can yield. So instead of scheduling finish_bh() in the BDS=E2=80=99s AioContext, sched= ule finish_bh() in task->co=E2=80=99s AioContext. In addition, we can get rid of task.complete altogether because we will get woken exactly once, when the task is indeed complete, no need to check. (We could go further and drop the BH, running aio_co_wake() directly in qemu_rbd_completion_cb() because we are allowed to do that even if the coroutine isn=E2=80=99t yet yielding and we=E2=80=99re in a different threa= d =E2=80=93 but the doc comment on qemu_rbd_completion_cb() says to be careful, so I decided not to go so far here.) Buglink: https://issues.redhat.com/browse/RHEL-67115 Reported-by: Junyao Zhao Cc: qemu-stable@nongnu.org Signed-off-by: Hanna Czenczek Message-ID: <20251110154854.151484-3-hreitz@redhat.com> Reviewed-by: Kevin Wolf Signed-off-by: Kevin Wolf (cherry picked from commit 89d22536d1a1715083ef8118fe7e6e9239f900c1) Signed-off-by: Michael Tokarev diff --git a/block/rbd.c b/block/rbd.c index 446e90d34c..e2a11a6bc1 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -110,9 +110,7 @@ typedef struct BDRVRBDState { } BDRVRBDState; =20 typedef struct RBDTask { - BlockDriverState *bs; Coroutine *co; - bool complete; int64_t ret; } RBDTask; =20 @@ -1310,7 +1308,6 @@ static int qemu_rbd_resize(BlockDriverState *bs, uint= 64_t size) static void qemu_rbd_finish_bh(void *opaque) { RBDTask *task =3D opaque; - task->complete =3D true; aio_co_wake(task->co); } =20 @@ -1327,7 +1324,7 @@ static void qemu_rbd_completion_cb(rbd_completion_t c= , RBDTask *task) { task->ret =3D rbd_aio_get_return_value(c); rbd_aio_release(c); - aio_bh_schedule_oneshot(bdrv_get_aio_context(task->bs), + aio_bh_schedule_oneshot(qemu_coroutine_get_aio_context(task->co), qemu_rbd_finish_bh, task); } =20 @@ -1339,7 +1336,7 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriver= State *bs, RBDAIOCmd cmd) { BDRVRBDState *s =3D bs->opaque; - RBDTask task =3D { .bs =3D bs, .co =3D qemu_coroutine_self() }; + RBDTask task =3D { .co =3D qemu_coroutine_self() }; rbd_completion_t c; int r; =20 @@ -1402,9 +1399,8 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriver= State *bs, return r; } =20 - while (!task.complete) { - qemu_coroutine_yield(); - } + /* Expect exactly a single wake from qemu_rbd_finish_bh() */ + qemu_coroutine_yield(); =20 if (task.ret < 0) { error_report("rbd request failed: cmd %d offset %" PRIu64 " bytes = %" --=20 2.47.3