From nobody Fri Dec 19 20:12:20 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1554155321074306.18698494656576; Mon, 1 Apr 2019 14:48:41 -0700 (PDT) Received: from localhost ([127.0.0.1]:49899 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hB4nG-00029C-Ue for importer@patchew.org; Mon, 01 Apr 2019 17:48:38 -0400 Received: from eggs.gnu.org ([209.51.188.92]:42644) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hB46y-0003UN-Tb for qemu-devel@nongnu.org; Mon, 01 Apr 2019 17:04:58 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hB46x-0004rE-3F for qemu-devel@nongnu.org; Mon, 01 Apr 2019 17:04:56 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:48796) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hB46w-0004o2-BL for qemu-devel@nongnu.org; Mon, 01 Apr 2019 17:04:54 -0400 Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x31L4iV3114707 for ; Mon, 1 Apr 2019 17:04:49 -0400 Received: from e33.co.us.ibm.com (e33.co.us.ibm.com [32.97.110.151]) by mx0a-001b2d01.pphosted.com with ESMTP id 2rksvqgywa-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 01 Apr 2019 17:04:46 -0400 Received: from localhost by e33.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 1 Apr 2019 22:01:50 +0100 Received: from b03cxnp08026.gho.boulder.ibm.com (9.17.130.18) by e33.co.us.ibm.com (192.168.1.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Mon, 1 Apr 2019 22:01:49 +0100 Received: from b03ledav006.gho.boulder.ibm.com (b03ledav006.gho.boulder.ibm.com [9.17.130.237]) by b03cxnp08026.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x31L1mnu62587034 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 1 Apr 2019 21:01:48 GMT Received: from b03ledav006.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 09BF2C605F; Mon, 1 Apr 2019 21:01:48 +0000 (GMT) Received: from b03ledav006.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BE743C605B; Mon, 1 Apr 2019 21:01:47 +0000 (GMT) Received: from localhost (unknown [9.80.94.43]) by b03ledav006.gho.boulder.ibm.com (Postfix) with ESMTP; Mon, 1 Apr 2019 21:01:47 +0000 (GMT) From: Michael Roth To: qemu-devel@nongnu.org Date: Mon, 1 Apr 2019 15:58:56 -0500 X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190401210011.16009-1-mdroth@linux.vnet.ibm.com> References: <20190401210011.16009-1-mdroth@linux.vnet.ibm.com> X-TM-AS-GCONF: 00 x-cbid: 19040121-0036-0000-0000-00000AA31DB1 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00010857; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000283; SDB=6.01182941; UDB=6.00619269; IPR=6.00963683; MB=3.00026249; MTD=3.00000008; XFM=3.00000015; UTC=2019-04-01 21:01:50 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19040121-0037-0000-0000-00004B3CB145 Message-Id: <20190401210011.16009-23-mdroth@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-04-01_06:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904010136 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [generic] X-Received-From: 148.163.156.1 Subject: [Qemu-devel] [PATCH 22/97] aio: Do aio_notify_accept only during blocking aio_poll X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Fam Zheng , qemu-stable@nongnu.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Fam Zheng An aio_notify() pairs with an aio_notify_accept(). The former should happen in the main thread or a vCPU thread, and the latter should be done in the IOThread. There is one rare case that the main thread or vCPU thread may "steal" the aio_notify() event just raised by itself, in bdrv_set_aio_context() [1]. The sequence is like this: main thread IO Thread =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D bdrv_drained_begin() aio_disable_external(ctx) aio_poll(ctx, true) ctx->notify_me +=3D 2 ... bdrv_drained_end() ... aio_notify() ... bdrv_set_aio_context() aio_poll(ctx, false) [1] aio_notify_accept(ctx) ppoll() /* Hang! */ [1] is problematic. It will clear the ctx->notifier event so that the blocked ppoll() will not return. (For the curious, this bug was noticed when booting a number of VMs simultaneously in RHV. One or two of the VMs will hit this race condition, making the VIRTIO device unresponsive to I/O commands. When it hangs, Seabios is busy waiting for a read request to complete (read MBR), right after initializing the virtio-blk-pci device, using 100% guest CPU. See also https://bugzilla.redhat.com/show_bug.cgi?id=3D1562750 for the original bug analysis.) aio_notify() only injects an event when ctx->notify_me is set, correspondingly aio_notify_accept() is only useful when ctx->notify_me _was_ set. Move the call to it into the "blocking" branch. This will effectively skip [1] and fix the hang. Furthermore, blocking aio_poll is only allowed on home thread (in_aio_context_home_thread), because otherwise two blocking aio_poll()'s can steal each other's ctx->notifier event and cause hanging just like described above. Cc: qemu-stable@nongnu.org Suggested-by: Paolo Bonzini Signed-off-by: Fam Zheng Message-Id: <20180809132259.18402-3-famz@redhat.com> Signed-off-by: Fam Zheng (cherry picked from commit b37548fcd1b8ac2e88e185a395bef851f3fc4e65) Signed-off-by: Michael Roth --- util/aio-posix.c | 4 ++-- util/aio-win32.c | 3 ++- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/util/aio-posix.c b/util/aio-posix.c index b5c7f463aa..b5c609b68b 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -591,6 +591,7 @@ bool aio_poll(AioContext *ctx, bool blocking) * so disable the optimization now. */ if (blocking) { + assert(in_aio_context_home_thread(ctx)); atomic_add(&ctx->notify_me, 2); } =20 @@ -633,6 +634,7 @@ bool aio_poll(AioContext *ctx, bool blocking) =20 if (blocking) { atomic_sub(&ctx->notify_me, 2); + aio_notify_accept(ctx); } =20 /* Adjust polling time */ @@ -676,8 +678,6 @@ bool aio_poll(AioContext *ctx, bool blocking) } } =20 - aio_notify_accept(ctx); - /* if we have any readable fds, dispatch event */ if (ret > 0) { for (i =3D 0; i < npfd; i++) { diff --git a/util/aio-win32.c b/util/aio-win32.c index e676a8d9b2..c58957cc4b 100644 --- a/util/aio-win32.c +++ b/util/aio-win32.c @@ -373,11 +373,12 @@ bool aio_poll(AioContext *ctx, bool blocking) ret =3D WaitForMultipleObjects(count, events, FALSE, timeout); if (blocking) { assert(first); + assert(in_aio_context_home_thread(ctx)); atomic_sub(&ctx->notify_me, 2); + aio_notify_accept(ctx); } =20 if (first) { - aio_notify_accept(ctx); progress |=3D aio_bh_poll(ctx); first =3D false; } --=20 2.17.1