From nobody Wed Nov 5 16:53:37 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1536146720809131.6965056067927; Wed, 5 Sep 2018 04:25:20 -0700 (PDT) Received: from localhost ([::1]:55155 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fxVvz-0006x4-31 for importer@patchew.org; Wed, 05 Sep 2018 07:25:19 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49278) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fxVv8-0006dL-LL for qemu-devel@nongnu.org; Wed, 05 Sep 2018 07:24:28 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fxVv6-00069I-KE for qemu-devel@nongnu.org; Wed, 05 Sep 2018 07:24:26 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:37414 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fxVux-0005rS-Ie; Wed, 05 Sep 2018 07:24:15 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E525C40216E6; Wed, 5 Sep 2018 11:24:13 +0000 (UTC) Received: from dritchie.redhat.com (unknown [10.33.36.21]) by smtp.corp.redhat.com (Postfix) with ESMTP id 26BBC10DCF46; Wed, 5 Sep 2018 11:24:08 +0000 (UTC) From: Sergio Lopez To: kwolf@redhat.com, mreitz@redhat.com, qemu-block@nongnu.org Date: Wed, 5 Sep 2018 13:23:34 +0200 Message-Id: <20180905112334.26911-1-slp@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Wed, 05 Sep 2018 11:24:13 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Wed, 05 Sep 2018 11:24:13 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'slp@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH] block/linux-aio: acquire AioContext before qemu_laio_process_completions X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-devel@nongnu.org, Sergio Lopez Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RDMRC_1 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" In qemu_laio_process_completions_and_submit, the AioContext is acquired before the ioq_submit iteration and after qemu_laio_process_completions, but the latter is not thread safe either. This change avoids a number of random crashes when the Main Thread and an IO Thread collide processing completions for the same AioContext. This is an example of such crash: - The IO Thread is trying to acquire the AioContext at aio_co_enter, which evidences that it didn't lock it before: Thread 3 (Thread 0x7fdfd8bd8700 (LWP 36743)): #0 0x00007fdfe0dd542d in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/= linux/x86_64/lowlevellock.S:135 #1 0x00007fdfe0dd0de6 in _L_lock_870 () at /lib64/libpthread.so.0 #2 0x00007fdfe0dd0cdf in __GI___pthread_mutex_lock (mutex=3Dmutex@entry= =3D0x5631fde0e6c0) at ../nptl/pthread_mutex_lock.c:114 #3 0x00005631fc0603a7 in qemu_mutex_lock_impl (mutex=3D0x5631fde0e6c0, fi= le=3D0x5631fc23520f "util/async.c", line=3D511) at util/qemu-thread-posix.c= :66 #4 0x00005631fc05b558 in aio_co_enter (ctx=3D0x5631fde0e660, co=3D0x7fdfc= c0c2b40) at util/async.c:493 #5 0x00005631fc05b5ac in aio_co_wake (co=3D) at util/async= .c:478 #6 0x00005631fbfc51ad in qemu_laio_process_completion (laiocb=3D) at block/linux-aio.c:104 #7 0x00005631fbfc523c in qemu_laio_process_completions (s=3Ds@entry=3D0x7= fdfc0297670) at block/linux-aio.c:222 #8 0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=3D0x= 7fdfc0297670) at block/linux-aio.c:237 #9 0x00005631fc05d978 in aio_dispatch_handlers (ctx=3Dctx@entry=3D0x5631f= de0e660) at util/aio-posix.c:406 #10 0x00005631fc05e3ea in aio_poll (ctx=3D0x5631fde0e660, blocking=3Dblock= ing@entry=3Dtrue) at util/aio-posix.c:693 #11 0x00005631fbd7ad96 in iothread_run (opaque=3D0x5631fde0e1c0) at iothre= ad.c:64 #12 0x00007fdfe0dcee25 in start_thread (arg=3D0x7fdfd8bd8700) at pthread_c= reate.c:308 #13 0x00007fdfe0afc34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/cl= one.S:113 - The Main Thread is also processing completions from the same AioContext, and crashes due to failed assertion at util/iov.c:78: Thread 1 (Thread 0x7fdfeb5eac80 (LWP 36740)): #0 0x00007fdfe0a391f7 in __GI_raise (sig=3Dsig@entry=3D6) at ../nptl/sysd= eps/unix/sysv/linux/raise.c:56 #1 0x00007fdfe0a3a8e8 in __GI_abort () at abort.c:90 #2 0x00007fdfe0a32266 in __assert_fail_base (fmt=3D0x7fdfe0b84e68 "%s%s%s= :%u: %s%sAssertion `%s' failed.\n%n", assertion=3Dassertion@entry=3D0x5631f= c238ccb "offset =3D=3D 0", file=3Dfile@entry=3D0x5631fc23698e "util/iov.c",= line=3Dline@entry=3D78, function=3Dfunction@entry=3D0x5631fc236adc <__PRET= TY_FUNCTION__.15220> "iov_memset") at assert.c:92 #3 0x00007fdfe0a32312 in __GI___assert_fail (assertion=3Dassertion@entry= =3D0x5631fc238ccb "offset =3D=3D 0", file=3Dfile@entry=3D0x5631fc23698e "ut= il/iov.c", line=3Dline@entry=3D78, function=3Dfunction@entry=3D0x5631fc236a= dc <__PRETTY_FUNCTION__.15220> "iov_memset") at assert.c:101 #4 0x00005631fc065287 in iov_memset (iov=3D, iov_cnt=3D, offset=3D, offset@entry=3D65536, fillc=3Dfillc= @entry=3D0, bytes=3D15515191315812405248) at util/iov.c:78 #5 0x00005631fc065a63 in qemu_iovec_memset (qiov=3D, offse= t=3Doffset@entry=3D65536, fillc=3Dfillc@entry=3D0, bytes=3D)= at util/iov.c:410 #6 0x00005631fbfc5178 in qemu_laio_process_completion (laiocb=3D0x7fdd920= df630) at block/linux-aio.c:88 #7 0x00005631fbfc523c in qemu_laio_process_completions (s=3Ds@entry=3D0x7= fdfc0297670) at block/linux-aio.c:222 #8 0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=3D0x= 7fdfc0297670) at block/linux-aio.c:237 #9 0x00005631fbfc54ed in qemu_laio_poll_cb (opaque=3D) at = block/linux-aio.c:272 #10 0x00005631fc05d85e in run_poll_handlers_once (ctx=3Dctx@entry=3D0x5631= fde0e660) at util/aio-posix.c:497 #11 0x00005631fc05e2ca in aio_poll (blocking=3Dfalse, ctx=3D0x5631fde0e660= ) at util/aio-posix.c:574 #12 0x00005631fc05e2ca in aio_poll (ctx=3D0x5631fde0e660, blocking=3Dblock= ing@entry=3Dfalse) at util/aio-posix.c:604 #13 0x00005631fbfcb8a3 in bdrv_do_drained_begin (ignore_parent=3D, recursive=3D, bs=3D) at block/io.c:273 #14 0x00005631fbfcb8a3 in bdrv_do_drained_begin (bs=3D0x5631fe8b6200, recu= rsive=3D, parent=3D0x0, ignore_bds_parents=3D= , poll=3D) at block/io.c:390 #15 0x00005631fbfbcd2e in blk_drain (blk=3D0x5631fe83ac80) at block/block-= backend.c:1590 #16 0x00005631fbfbe138 in blk_remove_bs (blk=3Dblk@entry=3D0x5631fe83ac80)= at block/block-backend.c:774 #17 0x00005631fbfbe3d6 in blk_unref (blk=3D0x5631fe83ac80) at block/block-= backend.c:401 #18 0x00005631fbfbe3d6 in blk_unref (blk=3D0x5631fe83ac80) at block/block-= backend.c:449 #19 0x00005631fbfc9a69 in commit_complete (job=3D0x5631fe8b94b0, opaque=3D= 0x7fdfcc1bb080) at block/commit.c:92 #20 0x00005631fbf7d662 in job_defer_to_main_loop_bh (opaque=3D0x7fdfcc1b45= 60) at job.c:973 #21 0x00005631fc05ad41 in aio_bh_poll (bh=3D0x7fdfcc01ad90) at util/async.= c:90 #22 0x00005631fc05ad41 in aio_bh_poll (ctx=3Dctx@entry=3D0x5631fddffdb0) a= t util/async.c:118 #23 0x00005631fc05e210 in aio_dispatch (ctx=3D0x5631fddffdb0) at util/aio-= posix.c:436 #24 0x00005631fc05ac1e in aio_ctx_dispatch (source=3D, call= back=3D, user_data=3D) at util/async.c:261 #25 0x00007fdfeaae44c9 in g_main_context_dispatch (context=3D0x5631fde0014= 0) at gmain.c:3201 #26 0x00007fdfeaae44c9 in g_main_context_dispatch (context=3Dcontext@entry= =3D0x5631fde00140) at gmain.c:3854 #27 0x00005631fc05d503 in main_loop_wait () at util/main-loop.c:215 #28 0x00005631fc05d503 in main_loop_wait (timeout=3D) at ut= il/main-loop.c:238 #29 0x00005631fc05d503 in main_loop_wait (nonblocking=3Dnonblocking@entry= =3D0) at util/main-loop.c:497 #30 0x00005631fbd81412 in main_loop () at vl.c:1866 #31 0x00005631fbc18ff3 in main (argc=3D, argv=3D, envp=3D) at vl.c:4647 - A closer examination shows that s->io_q.in_flight appears to have gone backwards: (gdb) frame 7 #7 0x00005631fbfc523c in qemu_laio_process_completions (s=3Ds@entry=3D0x7= fdfc0297670) at block/linux-aio.c:222 222 qemu_laio_process_completion(laiocb); (gdb) p s $2 =3D (LinuxAioState *) 0x7fdfc0297670 (gdb) p *s $3 =3D {aio_context =3D 0x5631fde0e660, ctx =3D 0x7fdfeb43b000, e =3D {rfd = =3D 33, wfd =3D 33}, io_q =3D {plugged =3D 0, in_queue =3D 0, in_flight =3D 4294967280, blocked =3D false, pending = =3D {sqh_first =3D 0x0, sqh_last =3D 0x7fdfc0297698}}, completion_bh =3D 0x7fdfc0280ef0, even= t_idx =3D 21, event_max =3D 241} (gdb) p/x s->io_q.in_flight $4 =3D 0xfffffff0 Signed-off-by: Sergio Lopez Reviewed-by: Paolo Bonzini --- block/linux-aio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/linux-aio.c b/block/linux-aio.c index 19eb922fdd..217ce60138 100644 --- a/block/linux-aio.c +++ b/block/linux-aio.c @@ -234,9 +234,9 @@ static void qemu_laio_process_completions(LinuxAioState= *s) =20 static void qemu_laio_process_completions_and_submit(LinuxAioState *s) { + aio_context_acquire(s->aio_context); qemu_laio_process_completions(s); =20 - aio_context_acquire(s->aio_context); if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) { ioq_submit(s); } --=20 2.17.0