From nobody Thu May 16 03:02:17 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail header.i=@intel.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=intel.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1646274313204478.88339099615814; Wed, 2 Mar 2022 18:25:13 -0800 (PST) Received: from localhost ([::1]:57716 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nPb9T-0008B2-26 for importer@patchew.org; Wed, 02 Mar 2022 21:25:11 -0500 Received: from eggs.gnu.org ([209.51.188.92]:60036) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nPb8A-0007Kq-M2; Wed, 02 Mar 2022 21:23:50 -0500 Received: from mga05.intel.com ([192.55.52.43]:19822) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nPb82-0006Qe-5N; Wed, 02 Mar 2022 21:23:49 -0500 Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Mar 2022 18:23:36 -0800 Received: from leirao-pc.bj.intel.com ([10.238.156.103]) by orsmga007.jf.intel.com with ESMTP; 02 Mar 2022 18:23:33 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1646274222; x=1677810222; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=yZHTR0+ETdX7iJz6d46y5qOi0kqILuW+TaM/JOCTioA=; b=cpMYVJ9BmSJYXhKLQH02qxqlYMp95yf7zUddrk3+/wjFgdYS7vm2P6a6 Rp6m50UbMuoOyE68WORP0pvLd51ZAHAC6AsYoXz0ucdAqgjU3vTWejstF J4Y8hIUJ8MUMxF7wNObW5LyhlxKytcqLhbo8mz9DQaP9dffkkAbu1tONz E3w1S+eYBDUaZ3OW2YDMf3dcjTHnlXm5xxhKITwKLCvoknZqmG7/wBkLn OIwzsHKuu0rWdcr1BtRX3t3qCEtcs4pVPSm6T8YZR8E3e8HMqZ7xw4ETf Dc/tbVY0+B+d2r+lRuVxTKM8tlytDoSJgXtID/qk2crA1qlu1/srNZH7F g==; X-IronPort-AV: E=McAfee;i="6200,9189,10274"; a="339986872" X-IronPort-AV: E=Sophos;i="5.90,150,1643702400"; d="scan'208";a="339986872" X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,150,1643702400"; d="scan'208";a="535626766" From: Rao Lei To: eblake@redhat.com, vsementsov@virtuozzo.com, kwolf@redhat.com, hreitz@redhat.com, chen.zhang@intel.com Subject: [PATCH] block/nbd.c: Fixed IO request coroutine not being wakeup when kill NBD server. Date: Thu, 3 Mar 2022 10:21:45 +0800 Message-Id: <20220303022145.328112-1-lei.rao@intel.com> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=192.55.52.43; envelope-from=lei.rao@intel.com; helo=mga05.intel.com X-Spam_score_int: -44 X-Spam_score: -4.5 X-Spam_bar: ---- X-Spam_report: (-4.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Rao Lei , qemu-devel@nongnu.org, qemu-block@nongnu.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1646274314921100001 Content-Type: text/plain; charset="utf-8" During the stress test, the IO request coroutine has a probability that it can't be awakened when the NBD server is killed. The GDB statck is as follows: (gdb) bt 0 0x00007f2ff990cbf6 in __ppoll (fds=3D0x55575de85000, nfds=3D1, timeout= =3D, sigmask=3D0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44 1 0x000055575c302e7c in qemu_poll_ns (fds=3D0x55575de85000, nfds=3D1, time= out=3D599999603140) at ../util/qemu-timer.c:348 2 0x000055575c2d3c34 in fdmon_poll_wait (ctx=3D0x55575dc480f0, ready_list= =3D0x7ffd9dd1dae0, timeout=3D599999603140) at ../util/fdmon-poll.c:80 3 0x000055575c2d350d in aio_poll (ctx=3D0x55575dc480f0, blocking=3Dtrue) a= t ../util/aio-posix.c:655 4 0x000055575c16eabd in bdrv_do_drained_begin (bs=3D0x55575dee7fe0, recurs= ive=3Dfalse, parent=3D0x0, ignore_bds_parents=3Dfalse, poll=3Dtrue) at ../b= lock/io.c:474 5 0x000055575c16eba6 in bdrv_drained_begin (bs=3D0x55575dee7fe0) at ../blo= ck/io.c:480 6 0x000055575c1aff33 in quorum_del_child (bs=3D0x55575dee7fe0, child=3D0x5= 5575dcea690, errp=3D0x7ffd9dd1dd08) at ../block/quorum.c:1130 7 0x000055575c14239b in bdrv_del_child (parent_bs=3D0x55575dee7fe0, child= =3D0x55575dcea690, errp=3D0x7ffd9dd1dd08) at ../block.c:7705 8 0x000055575c12da28 in qmp_x_blockdev_change (parent=3D0x55575df404c0 "colo-disk0", has_child=3Dtrue, child=3D0x5557= 5de867f0 "children.1", has_node=3Dfalse, node=3D0x0, errp=3D0x7ffd9dd1dd08) at ../blockdev.c:3676 9 0x000055575c258435 in qmp_marshal_x_blockdev_change (args=3D0x7f2fec0081= 90, ret=3D0x7f2ff7b0bd98, errp=3D0x7f2ff7b0bd90) at qapi/qapi-commands-bloc= k-core.c:1675 10 0x000055575c2c6201 in do_qmp_dispatch_bh (opaque=3D0x7f2ff7b0be30) at ..= /qapi/qmp-dispatch.c:129 11 0x000055575c2ebb1c in aio_bh_call (bh=3D0x55575dc429c0) at ../util/async= .c:141 12 0x000055575c2ebc2a in aio_bh_poll (ctx=3D0x55575dc480f0) at ../util/asyn= c.c:169 13 0x000055575c2d2d96 in aio_dispatch (ctx=3D0x55575dc480f0) at ../util/aio= -posix.c:415 14 0x000055575c2ec07f in aio_ctx_dispatch (source=3D0x55575dc480f0, callbac= k=3D0x0, user_data=3D0x0) at ../util/async.c:311 15 0x00007f2ff9e7cfbd in g_main_context_dispatch () at /lib/x86_64-linux-gn= u/libglib-2.0.so.0 16 0x000055575c2fd581 in glib_pollfds_poll () at ../util/main-loop.c:232 17 0x000055575c2fd5ff in os_host_main_loop_wait (timeout=3D0) at ../util/ma= in-loop.c:255 18 0x000055575c2fd710 in main_loop_wait (nonblocking=3D0) at ../util/main-l= oop.c:531 19 0x000055575bfa7588 in qemu_main_loop () at ../softmmu/runstate.c:726 20 0x000055575bbee57a in main (argc=3D60, argv=3D0x7ffd9dd1e0e8, envp=3D0x7= ffd9dd1e2d0) at ../softmmu/main.c:50 (gdb) qemu coroutine 0x55575e16aac0 0 0x000055575c2ee7dc in qemu_coroutine_switch (from_=3D0x55575e16aac0, to_= =3D0x7f2ff830fba0, action=3DCOROUTINE_YIELD) at ../util/coroutine-ucontext.= c:302 1 0x000055575c2fe2a9 in qemu_coroutine_yield () at ../util/qemu-coroutine.= c:195 2 0x000055575c2fe93c in qemu_co_queue_wait_impl (queue=3D0x55575dc46170, l= ock=3D0x7f2b32ad9850) at ../util/qemu-coroutine-lock.c:56 3 0x000055575c17ddfb in nbd_co_send_request (bs=3D0x55575ebfaf20, request= =3D0x7f2b32ad9920, qiov=3D0x55575dfc15d8) at ../block/nbd.c:478 4 0x000055575c17f931 in nbd_co_request (bs=3D0x55575ebfaf20, request=3D0x7= f2b32ad9920, write_qiov=3D0x55575dfc15d8) at ../block/nbd.c:1182 5 0x000055575c17fe14 in nbd_client_co_pwritev (bs=3D0x55575ebfaf20, offset= =3D403487858688, bytes=3D4538368, qiov=3D0x55575dfc15d8, flags=3D0) at ../b= lock/nbd.c:1284 6 0x000055575c170d25 in bdrv_driver_pwritev (bs=3D0x55575ebfaf20, offset= =3D403487858688, bytes=3D4538368, qiov=3D0x55575dfc15d8, qiov_offset=3D0, f= lags=3D0) at ../block/io.c:1264 7 0x000055575c1733b4 in bdrv_aligned_pwritev (child=3D0x55575dff6890, req=3D0x7f2b32ad9ad0, offset=3D403487858688, b= ytes=3D4538368, align=3D1, qiov=3D0x55575dfc15d8, qiov_offset=3D0, flags=3D= 0) at ../block/io.c:2126 8 0x000055575c173c67 in bdrv_co_pwritev_part (child=3D0x55575dff6890, offs= et=3D403487858688, bytes=3D4538368, qiov=3D0x55575dfc15d8, qiov_offset=3D0,= flags=3D0) at ../block/io.c:2314 9 0x000055575c17391b in bdrv_co_pwritev (child=3D0x55575dff6890, offset=3D= 403487858688, bytes=3D4538368, qiov=3D0x55575dfc15d8, flags=3D0) at ../bloc= k/io.c:2233 10 0x000055575c1ee506 in replication_co_writev (bs=3D0x55575e9824f0, sector= _num=3D788062224, remaining_sectors=3D8864, qiov=3D0x55575dfc15d8, flags=3D= 0) at ../block/replication.c:270 11 0x000055575c170eed in bdrv_driver_pwritev (bs=3D0x55575e9824f0, offset= =3D403487858688, bytes=3D4538368, qiov=3D0x55575dfc15d8, qiov_offset=3D0, f= lags=3D0) at ../block/io.c:1297 12 0x000055575c1733b4 in bdrv_aligned_pwritev (child=3D0x55575dcea690, req=3D0x7f2b32ad9e00, offset=3D403487858688, b= ytes=3D4538368, align=3D512, qiov=3D0x55575dfc15d8, qiov_offset=3D0, flags= =3D0) at ../block/io.c:2126 13 0x000055575c173c67 in bdrv_co_pwritev_part (child=3D0x55575dcea690, offs= et=3D403487858688, bytes=3D4538368, qiov=3D0x55575dfc15d8, qiov_offset=3D0,= flags=3D0) at ../block/io.c:2314 14 0x000055575c17391b in bdrv_co_pwritev (child=3D0x55575dcea690, offset=3D= 403487858688, bytes=3D4538368, qiov=3D0x55575dfc15d8, flags=3D0) at ../bloc= k/io.c:2233 15 0x000055575c1aeffa in write_quorum_entry (opaque=3D0x7f2fddaf8c50) at ..= /block/quorum.c:699 16 0x000055575c2ee4db in coroutine_trampoline (i0=3D1578543808, i1=3D21847)= at ../util/coroutine-ucontext.c:173 17 0x00007f2ff9855660 in __start_context () at ../sysdeps/unix/sysv/linux/x= 86_64/__start_context.S:91 When we do failover in COLO mode, QEMU will hang while it is waiting for th= e in flight IO. From the call trace, we can see the IO request coroutine which is waiting f= or send_mutex has yield in nbd_co_send_request(). When we kill nbd server, it will never = be wake up. So, it is necessary to wake up the coroutine in nbd_channel_error(). Signed-off-by: Rao Lei --- block/nbd.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/nbd.c b/block/nbd.c index 5853d85d60..cf9dda537c 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -167,6 +167,7 @@ static void nbd_channel_error(BDRVNBDState *s, int ret) s->state =3D NBD_CLIENT_QUIT; } =20 + qemu_co_queue_restart_all(&s->free_sema); nbd_recv_coroutines_wake(s, true); } =20 --=20 2.32.0