From nobody Thu May 2 10:43:48 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1576548727; cv=none; d=zohomail.com; s=zohoarc; b=f81DZR1AbMvuY5kKuoFGuVqa72GDyJwIMwnc0UgoTQTW9dmqrJJm6jgGToIHIgD1D6go5EStW6n0e7XBChSreWbDzy8pQpUk8dvmZfFATDAX+0aMOFXCvPmPNDZZlEnjd1rktI0Jjpld10yT97173ziA2IlVBme6o5EO4AdpEkM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1576548727; h=Content-Type:Content-Transfer-Encoding:Date:From:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Reply-To:References:Sender:Subject:To; bh=p/JO+YhmjLx6iLvV8yoCVRpNQqXhrsCxUvgWbvULy3I=; b=dl8bFiJK/1lJTj+l4J9X+jDQA3PLbyS3PpbyXh7UmpmRfh55vBsB/MEOAwQpZ6HmHoUD9TAvCUHRTEY4k2g7PrxobXLgvpckiplVwK6kKSDQpBdTSiPejrsgphzcEGLbUuUXYRC+JS4zHndAn+69RHrmC1F1EbbjKUAowqjBWb0= ARC-Authentication-Results: i=1; mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1576548727326585.6845286111543; Mon, 16 Dec 2019 18:12:07 -0800 (PST) Received: from localhost ([::1]:34318 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ih2LC-0000Fa-OR for importer@patchew.org; Mon, 16 Dec 2019 21:12:02 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:35414) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ih1rE-0008Jo-2A for qemu-devel@nongnu.org; Mon, 16 Dec 2019 20:41:05 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ih1rC-0003ND-Hz for qemu-devel@nongnu.org; Mon, 16 Dec 2019 20:41:03 -0500 Received: from indium.canonical.com ([91.189.90.7]:38984) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1ih1rC-0003LO-CS for qemu-devel@nongnu.org; Mon, 16 Dec 2019 20:41:02 -0500 Received: from loganberry.canonical.com ([91.189.90.37]) by indium.canonical.com with esmtp (Exim 4.86_2 #2 (Debian)) id 1ih1r7-0003ym-24 for ; Tue, 17 Dec 2019 01:40:57 +0000 Received: from loganberry.canonical.com (localhost [127.0.0.1]) by loganberry.canonical.com (Postfix) with ESMTP id 0BEA22E80C0 for ; Tue, 17 Dec 2019 01:40:57 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Date: Tue, 17 Dec 2019 01:34:00 -0000 From: Fred Kimmy To: qemu-devel@nongnu.org X-Launchpad-Notification-Type: bug X-Launchpad-Bug: product=kunpeng920; status=Confirmed; importance=Undecided; assignee=None; X-Launchpad-Bug: product=qemu; status=In Progress; importance=Undecided; assignee=rafaeldtinoco@kernelpath.com; X-Launchpad-Bug: distribution=ubuntu; sourcepackage=qemu; component=main; status=Confirmed; importance=Medium; assignee=rafaeldtinoco@kernelpath.com; X-Launchpad-Bug: distribution=ubuntu; distroseries=bionic; sourcepackage=qemu; component=main; status=Confirmed; importance=Medium; assignee=None; X-Launchpad-Bug: distribution=ubuntu; distroseries=disco; sourcepackage=qemu; component=main; status=Confirmed; importance=Medium; assignee=None; X-Launchpad-Bug: distribution=ubuntu; distroseries=eoan; sourcepackage=qemu; component=main; status=In Progress; importance=Medium; assignee=rafaeldtinoco@kernelpath.com; X-Launchpad-Bug: distribution=ubuntu; distroseries=focal; sourcepackage=qemu; component=main; status=Confirmed; importance=Medium; assignee=None; X-Launchpad-Bug-Tags: qemu-img X-Launchpad-Bug-Information-Type: Public X-Launchpad-Bug-Private: no X-Launchpad-Bug-Security-Vulnerability: no X-Launchpad-Bug-Commenters: dannf jan-glauber-i jnsnow kongzizaixian lizhengui rafaeldtinoco X-Launchpad-Bug-Reporter: dann frazier (dannf) X-Launchpad-Bug-Modifier: Fred Kimmy (kongzizaixian) References: <154327283728.15443.11625169757714443608.malonedeb@soybean.canonical.com> Message-Id: <157654644083.14692.7869547192402289701.malone@wampee.canonical.com> Subject: [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images X-Launchpad-Message-Rationale: Subscriber (QEMU) @qemu-devel-ml X-Launchpad-Message-For: qemu-devel-ml Precedence: bulk X-Generated-By: Launchpad (canonical.com); Revision="bceb5ef013b87ef7aafe0755545ceb689ca7ac60"; Instance="production-secrets-lazr.conf" X-Launchpad-Hash: b27d338fb0f39bbd86612cfde6c6948c92b0a843 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 91.189.90.7 X-Mailman-Approved-At: Mon, 16 Dec 2019 21:11:18 -0500 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Bug 1805256 <1805256@bugs.launchpad.net> Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" include/block/aio.h | 3 +++ qemu-img.c | 4 ++++ util/async.c | 5 +---- 3 files changed, 8 insertions(+), 4 deletions(-) diff --git a/include/block/aio.h b/include/block/aio.h index e9bc04c..9153d87 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -89,6 +89,9 @@ struct AioContext { */ uint32_t notify_me; =20 + /* force to notify for qemu-img convert */ + bool notify_for_convert; + /* lock to protect between bh's adders and deleter */ QemuMutex bh_lock; =20 diff --git a/qemu-img.c b/qemu-img.c index 60a2be3..cf037aa 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -2411,6 +2411,10 @@ static int img_convert(int argc, char **argv) .wr_in_order =3D wr_in_order, .num_coroutines =3D num_coroutines, }; + + AioContext *ctx =3D qemu_get_aio_context(); + ctx->notify_for_convert =3D 1; + ret =3D convert_do_copy(&state); =20 out: diff --git a/util/async.c b/util/async.c index 042bf8a..af235fc 100644 --- a/util/async.c +++ b/util/async.c @@ -336,12 +336,9 @@ void aio_notify(AioContext *ctx) * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. */ smp_mb(); - if (ctx->notify_me) { + if (ctx->notify_me || ctx->notify_for_convert) { event_notifier_set(&ctx->notifier); atomic_mb_set(&ctx->notified, true); -#if defined(__aarch64__) - kill(getpid(), SIGIO); -#endif } } Can you try this aboving patchset to slove it? --=20 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Confirmed Status in QEMU: In Progress Status in qemu package in Ubuntu: Confirmed Status in qemu source package in Bionic: Confirmed Status in qemu source package in Disco: Confirmed Status in qemu source package in Eoan: In Progress Status in qemu source package in Focal: Confirmed Bug description: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. ---- Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. ---- (gdb) thread 1 ... (gdb) bt #0 0x0000ffffbf1ad81c in __GI_ppoll #1 0x0000aaaaaabcf73c in ppoll #2 qemu_poll_ns #3 0x0000aaaaaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0x0000aaaaaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=3Dev@entry=3D0xaaaaaac86ce8 ) #3 0x0000aaaaaabed05c in call_rcu_thread #4 0x0000aaaaaabd34c8 in qemu_thread_start #5 0x0000ffffbf25c880 in start_thread #6 0x0000ffffbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0x0000ffffbf11aa20 in __GI___sigtimedwait #1 0x0000ffffbf2671b4 in __sigwait #2 0x0000aaaaaabd1ddc in sigwait_compat #3 0x0000aaaaaabd34c8 in qemu_thread_start #4 0x0000ffffbf25c880 in start_thread #5 0x0000ffffbf1b6b9c in thread_start ---- (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xffffbec5ad90 (LWP 72839)] [New Thread 0xffffbe459d90 (LWP 72840)] [New Thread 0xffffbdb57d90 (LWP 72841)] [New Thread 0xffffacac9d90 (LWP 72859)] [New Thread 0xffffa7ffed90 (LWP 72860)] [New Thread 0xffffa77fdd90 (LWP 72861)] [New Thread 0xffffa6ffcd90 (LWP 72862)] [New Thread 0xffffa67fbd90 (LWP 72863)] [New Thread 0xffffa5ffad90 (LWP 72864)] [Thread 0xffffa5ffad90 (LWP 72864) exited] [Thread 0xffffa6ffcd90 (LWP 72862) exited] [Thread 0xffffa77fdd90 (LWP 72861) exited] [Thread 0xffffbdb57d90 (LWP 72841) exited] [Thread 0xffffa67fbd90 (LWP 72863) exited] [Thread 0xffffacac9d90 (LWP 72859) exited] [Thread 0xffffa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). ---- [ Original Description ] On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0x0000ffffae4f8154 in __GI_ppoll (fds=3D0xaaaae8a67dc0, nfds=3D187650= 274213760, =C2=A0=C2=A0=C2=A0=C2=A0timeout=3D, timeout@entry=3D0x0, s= igmask=3D0xffffc123b950) =C2=A0=C2=A0=C2=A0=C2=A0at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0x0000aaaabbefaf00 in ppoll (__ss=3D0x0, __timeout=3D0x0, __nfds=3D, =C2=A0=C2=A0=C2=A0=C2=A0__fds=3D) at /usr/include/aarch64-= linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=3D, nfds=3D, =C2=A0=C2=A0=C2=A0=C2=A0timeout=3Dtimeout@entry=3D-1) at util/qemu-timer.= c:322 #3 0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=3D-1) =C2=A0=C2=A0=C2=A0=C2=A0at util/main-loop.c:233 #4 main_loop_wait (nonblocking=3D) at util/main-loop.c:497 #5 0x0000aaaabbe2aa30 in convert_do_copy (s=3D0xffffc123bb58) at qemu-im= g.c:1980 #6 img_convert (argc=3D, argv=3D) at qemu-= img.c:2456 #7 0x0000aaaabbe2333c in main (argc=3D7, argv=3D) at qemu= -img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions