From nobody Wed May 1 00:40:43 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1523366674036821.7104998587185; Tue, 10 Apr 2018 06:24:34 -0700 (PDT) Received: from localhost ([::1]:44769 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f5tGD-0004dK-4n for importer@patchew.org; Tue, 10 Apr 2018 09:24:33 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34401) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f5tE7-0003K9-3k for qemu-devel@nongnu.org; Tue, 10 Apr 2018 09:22:27 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f5tE3-0002bk-4V for qemu-devel@nongnu.org; Tue, 10 Apr 2018 09:22:23 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:43262 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1f5tDx-0002V6-FF; Tue, 10 Apr 2018 09:22:13 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1211D7705F; Tue, 10 Apr 2018 13:22:10 +0000 (UTC) Received: from red.redhat.com (ovpn-121-166.rdu2.redhat.com [10.10.121.166]) by smtp.corp.redhat.com (Postfix) with ESMTP id C472C10B2B45; Tue, 10 Apr 2018 13:22:06 +0000 (UTC) From: Eric Blake To: qemu-devel@nongnu.org Date: Tue, 10 Apr 2018 08:21:58 -0500 Message-Id: <20180410132200.187832-2-eblake@redhat.com> In-Reply-To: <20180410132200.187832-1-eblake@redhat.com> References: <20180410132200.187832-1-eblake@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Tue, 10 Apr 2018 13:22:10 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Tue, 10 Apr 2018 13:22:10 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'eblake@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PULL 1/3] iotests: fix wait_until_completed() X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , "open list:Block layer core" , Peter Xu , Max Reitz Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Peter Xu If there are more than one events, wait_until_completed() might return the 2nd event even if the 1st event is JOB_COMPLETED, since the for loop will continue to run even if completed is set to True. It never happened before, but it can be triggered when OOB is enabled due to the RESUME startup message. Fix that up. Signed-off-by: Peter Xu Message-Id: <20180408030542.17855-1-peterx@redhat.com> Reviewed-by: Fam Zheng Reviewed-by: Stefan Hajnoczi Signed-off-by: Eric Blake --- tests/qemu-iotests/iotests.py | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index b5d7945af88..119c8e270a5 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -470,18 +470,15 @@ class QMPTestCase(unittest.TestCase): def wait_until_completed(self, drive=3D'drive0', check_offset=3DTrue): '''Wait for a block job to finish, returning the event''' - completed =3D False - while not completed: + while True: for event in self.vm.get_qmp_events(wait=3DTrue): if event['event'] =3D=3D 'BLOCK_JOB_COMPLETED': self.assert_qmp(event, 'data/device', drive) self.assert_qmp_absent(event, 'data/error') if check_offset: self.assert_qmp(event, 'data/offset', event['data'= ]['len']) - completed =3D True - - self.assert_no_active_block_jobs() - return event + self.assert_no_active_block_jobs() + return event def wait_ready(self, drive=3D'drive0'): '''Wait until a block job BLOCK_JOB_READY event''' --=20 2.14.3 From nobody Wed May 1 00:40:43 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1523366677821211.77656023913698; Tue, 10 Apr 2018 06:24:37 -0700 (PDT) Received: from localhost ([::1]:44770 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f5tGH-0004gh-1y for importer@patchew.org; Tue, 10 Apr 2018 09:24:37 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34315) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f5tE1-0003GZ-NV for qemu-devel@nongnu.org; Tue, 10 Apr 2018 09:22:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f5tDx-0002Va-IN for qemu-devel@nongnu.org; Tue, 10 Apr 2018 09:22:17 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:57408 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1f5tDx-0002V7-E1 for qemu-devel@nongnu.org; Tue, 10 Apr 2018 09:22:13 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7FBB07C3B1 for ; Tue, 10 Apr 2018 13:22:10 +0000 (UTC) Received: from red.redhat.com (ovpn-121-166.rdu2.redhat.com [10.10.121.166]) by smtp.corp.redhat.com (Postfix) with ESMTP id 377EF10B2B45; Tue, 10 Apr 2018 13:22:10 +0000 (UTC) From: Eric Blake To: qemu-devel@nongnu.org Date: Tue, 10 Apr 2018 08:21:59 -0500 Message-Id: <20180410132200.187832-3-eblake@redhat.com> In-Reply-To: <20180410132200.187832-1-eblake@redhat.com> References: <20180410132200.187832-1-eblake@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Tue, 10 Apr 2018 13:22:10 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Tue, 10 Apr 2018 13:22:10 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'eblake@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PULL 2/3] iothread: workaround glib bug which hangs qmp-test X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Xu Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Peter Xu Free the AIO context earlier than the GMainContext (if we have) to workaround a glib2 bug that GSource context pointer is not cleared even if the context has already been destroyed (while it should). The patch itself only changed the order to destroy the objects, no functional change at all. Without this workaround, we can encounter qmp-test hang with oob (and possibly any other use case when iothread is used with GMainContexts): #0 0x00007f35ffe45334 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f35ffe405d8 in _L_lock_854 () from /lib64/libpthread.so.0 #2 0x00007f35ffe404a7 in pthread_mutex_lock () from /lib64/libpthread.so= .0 #3 0x00007f35fc5b9c9d in g_source_unref_internal (source=3D0x24f0600, co= ntext=3D0x7f35f0000960, have_lock=3D0) at gmain.c:1685 #4 0x0000000000aa6672 in aio_context_unref (ctx=3D0x24f0600) at /root/qe= mu/util/async.c:497 #5 0x000000000065851c in iothread_instance_finalize (obj=3D0x24f0380) at= /root/qemu/iothread.c:129 #6 0x0000000000962d79 in object_deinit (obj=3D0x24f0380, type=3D0x242e96= 0) at /root/qemu/qom/object.c:462 #7 0x0000000000962e0d in object_finalize (data=3D0x24f0380) at /root/qem= u/qom/object.c:476 #8 0x0000000000964146 in object_unref (obj=3D0x24f0380) at /root/qemu/qo= m/object.c:924 #9 0x0000000000965880 in object_finalize_child_property (obj=3D0x24ec640= , name=3D0x24efca0 "mon_iothread", opaque=3D0x24f0380) at /root/qemu/qom/ob= ject.c:1436 #10 0x0000000000962c33 in object_property_del_child (obj=3D0x24ec640, chi= ld=3D0x24f0380, errp=3D0x0) at /root/qemu/qom/object.c:436 #11 0x0000000000962d26 in object_unparent (obj=3D0x24f0380) at /root/qemu= /qom/object.c:455 #12 0x0000000000658f00 in iothread_destroy (iothread=3D0x24f0380) at /roo= t/qemu/iothread.c:365 #13 0x00000000004c67a8 in monitor_cleanup () at /root/qemu/monitor.c:4663 #14 0x0000000000669e27 in main (argc=3D16, argv=3D0x7ffc8b1ae2f8, envp=3D= 0x7ffc8b1ae380) at /root/qemu/vl.c:4749 The glib2 bug is fixed in commit 26056558b ("gmain: allow g_source_get_context() on destroyed sources", 2012-07-30), so the first good version is glib2 2.33.10. But we still support building with glib as old as 2.28, so we need the workaround. Let's make sure we destroy the GSources first before its owner context until we drop support for glib older than 2.33.10. Signed-off-by: Peter Xu Message-Id: <20180409083956.1780-1-peterx@redhat.com> Reviewed-by: Eric Blake Reviewed-by: Stefan Hajnoczi Signed-off-by: Eric Blake --- iothread.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/iothread.c b/iothread.c index e675c384422..aff12812576 100644 --- a/iothread.c +++ b/iothread.c @@ -117,16 +117,26 @@ static void iothread_instance_finalize(Object *obj) IOThread *iothread =3D IOTHREAD(obj); iothread_stop(iothread); + /* + * Before glib2 2.33.10, there is a glib2 bug that GSource context + * pointer may not be cleared even if the context has already been + * destroyed (while it should). Here let's free the AIO context + * earlier to bypass that glib bug. + * + * We can remove this comment after the minimum supported glib2 + * version boosts to 2.33.10. Before that, let's free the + * GSources first before destroying any GMainContext. + */ + if (iothread->ctx) { + aio_context_unref(iothread->ctx); + iothread->ctx =3D NULL; + } if (iothread->worker_context) { g_main_context_unref(iothread->worker_context); iothread->worker_context =3D NULL; } qemu_cond_destroy(&iothread->init_done_cond); qemu_mutex_destroy(&iothread->init_done_lock); - if (!iothread->ctx) { - return; - } - aio_context_unref(iothread->ctx); } static void iothread_complete(UserCreatable *obj, Error **errp) --=20 2.14.3 From nobody Wed May 1 00:40:43 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1523366672483854.2919270272588; Tue, 10 Apr 2018 06:24:32 -0700 (PDT) Received: from localhost ([::1]:44766 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f5tG4-0004Tu-He for importer@patchew.org; Tue, 10 Apr 2018 09:24:24 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34316) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f5tE1-0003Ga-Np for qemu-devel@nongnu.org; Tue, 10 Apr 2018 09:22:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f5tDz-0002Ww-9P for qemu-devel@nongnu.org; Tue, 10 Apr 2018 09:22:17 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:43264 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1f5tDz-0002Wn-3o for qemu-devel@nongnu.org; Tue, 10 Apr 2018 09:22:15 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BF3447705F for ; Tue, 10 Apr 2018 13:22:14 +0000 (UTC) Received: from red.redhat.com (ovpn-121-166.rdu2.redhat.com [10.10.121.166]) by smtp.corp.redhat.com (Postfix) with ESMTP id A601410B2B45; Tue, 10 Apr 2018 13:22:10 +0000 (UTC) From: Eric Blake To: qemu-devel@nongnu.org Date: Tue, 10 Apr 2018 08:22:00 -0500 Message-Id: <20180410132200.187832-4-eblake@redhat.com> In-Reply-To: <20180410132200.187832-1-eblake@redhat.com> References: <20180410132200.187832-1-eblake@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Tue, 10 Apr 2018 13:22:14 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Tue, 10 Apr 2018 13:22:14 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'eblake@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PULL 3/3] monitor: bind dispatch bh to iohandler context X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Fam Zheng , Stefan Hajnoczi , Markus Armbruster , Peter Xu , "Dr. David Alan Gilbert" Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Peter Xu Eric Auger reported the problem days ago that OOB broke ARM when running with libvirt: http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg06231.html The problem was that the monitor dispatcher bottom half was bound to qemu_aio_context now, which could be polled unexpectedly in block code. We should keep the dispatchers run in iohandler_ctx just like what we did before the Out-Of-Band series (chardev uses qio, and qio binds everything with iohandler_ctx). If without this change, QMP dispatcher might be run even before reaching main loop in block IO path, for example, in a stack like (the ARM case, "cont" command handler run even during machine init phase): #0 qmp_cont () #1 0x00000000006bd210 in qmp_marshal_cont () #2 0x0000000000ac05c4 in do_qmp_dispatch () #3 0x0000000000ac07a0 in qmp_dispatch () #4 0x0000000000472d60 in monitor_qmp_dispatch_one () #5 0x000000000047302c in monitor_qmp_bh_dispatcher () #6 0x0000000000acf374 in aio_bh_call () #7 0x0000000000acf428 in aio_bh_poll () #8 0x0000000000ad5110 in aio_poll () #9 0x0000000000a08ab8 in blk_prw () #10 0x0000000000a091c4 in blk_pread () #11 0x0000000000734f94 in pflash_cfi01_realize () #12 0x000000000075a3a4 in device_set_realized () #13 0x00000000009a26cc in property_set_bool () #14 0x00000000009a0a40 in object_property_set () #15 0x00000000009a3a08 in object_property_set_qobject () #16 0x00000000009a0c8c in object_property_set_bool () #17 0x0000000000758f94 in qdev_init_nofail () #18 0x000000000058e190 in create_one_flash () #19 0x000000000058e2f4 in create_flash () #20 0x00000000005902f0 in machvirt_init () #21 0x00000000007635cc in machine_run_board_init () #22 0x00000000006b135c in main () Actually the problem is more severe than that. After we switched to the qemu AIO handler it means the monitor dispatcher code can even be called with nested aio_poll(), then it can be an explicit aio_poll() inside another main loop aio_poll() which could be racy too; breaking code like TPM and 9p that use nested event loops. Switch to use the iohandler_ctx for monitor dispatchers. My sincere thanks to Eric Auger who offered great help during both debugging and verifying the problem. The ARM test was carried out by applying this patch upon QEMU 2.12.0-rc0 and problem is gone after the patch. A quick test of mine shows that after this patch applied we can pass all raw iotests even with OOB on by default. CC: Eric Blake CC: Markus Armbruster CC: Stefan Hajnoczi CC: Fam Zheng Reported-by: Eric Auger Tested-by: Eric Auger Signed-off-by: Peter Xu Message-Id: <20180410044942.17059-1-peterx@redhat.com> Reviewed-by: Eric Blake Reviewed-by: Stefan Hajnoczi Signed-off-by: Eric Blake --- monitor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/monitor.c b/monitor.c index 51f4cf480f8..39f8ee17ba7 100644 --- a/monitor.c +++ b/monitor.c @@ -4467,7 +4467,7 @@ static void monitor_iothread_init(void) * have assumption to be run on main loop thread. It would be * nice that one day we can remove this assumption in the future. */ - mon_global.qmp_dispatcher_bh =3D aio_bh_new(qemu_get_aio_context(), + mon_global.qmp_dispatcher_bh =3D aio_bh_new(iohandler_get_aio_context(= ), monitor_qmp_bh_dispatcher, NULL); --=20 2.14.3