From nobody Fri May 3 23:22:01 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (208.118.235.17 [208.118.235.17]) by mx.zohomail.com with SMTPS id 1510169042695553.6925643705061; Wed, 8 Nov 2017 11:24:02 -0800 (PST) Received: from localhost ([::1]:33602 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eCVww-00014a-Mt for importer@patchew.org; Wed, 08 Nov 2017 14:23:46 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40956) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eCVu9-0007jr-Su for qemu-devel@nongnu.org; Wed, 08 Nov 2017 14:20:54 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eCVu8-0001Ac-SH for qemu-devel@nongnu.org; Wed, 08 Nov 2017 14:20:53 -0500 Received: from mx1.redhat.com ([209.132.183.28]:38490) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eCVu8-00019v-LU for qemu-devel@nongnu.org; Wed, 08 Nov 2017 14:20:52 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D20AFC04AC56; Wed, 8 Nov 2017 19:20:51 +0000 (UTC) Received: from localhost (ovpn-116-145.ams2.redhat.com [10.36.116.145]) by smtp.corp.redhat.com (Postfix) with ESMTP id E5E1E60BE3; Wed, 8 Nov 2017 19:20:48 +0000 (UTC) From: Stefan Hajnoczi To: Date: Wed, 8 Nov 2017 19:20:42 +0000 Message-Id: <20171108192043.10682-2-stefanha@redhat.com> In-Reply-To: <20171108192043.10682-1-stefanha@redhat.com> References: <20171108192043.10682-1-stefanha@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Wed, 08 Nov 2017 19:20:51 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PULL for-2.11-rc1 v2 1/2] tests-aio-multithread: fix /aio/multi/schedule race condition X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell , Stefan Hajnoczi Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" test_multi_co_schedule_entry() set to_schedule[id] in the final loop iteration before terminating the coroutine. There is a race condition where the main thread attempts to enter the terminating or terminated coroutine when signalling coroutines to stop: atomic_mb_set(&now_stopping, true); for (i =3D 0; i < NUM_CONTEXTS; i++) { ctx_run(i, finish_cb, NULL); <--- enters dead coroutine! to_schedule[i] =3D NULL; } Make sure only to set to_schedule[id] if this coroutine really needs to be scheduled! Reported-by: "R.Nageswara Sastry" Signed-off-by: Stefan Hajnoczi Reviewed-by: Paolo Bonzini Message-id: 20171106190233.1175-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi --- tests/test-aio-multithread.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/tests/test-aio-multithread.c b/tests/test-aio-multithread.c index 549d784915..d396185972 100644 --- a/tests/test-aio-multithread.c +++ b/tests/test-aio-multithread.c @@ -144,17 +144,16 @@ static void finish_cb(void *opaque) static coroutine_fn void test_multi_co_schedule_entry(void *opaque) { g_assert(to_schedule[id] =3D=3D NULL); - atomic_mb_set(&to_schedule[id], qemu_coroutine_self()); =20 while (!atomic_mb_read(&now_stopping)) { int n; =20 n =3D g_test_rand_int_range(0, NUM_CONTEXTS); schedule_next(n); + + atomic_mb_set(&to_schedule[id], qemu_coroutine_self()); qemu_coroutine_yield(); - g_assert(to_schedule[id] =3D=3D NULL); - atomic_mb_set(&to_schedule[id], qemu_coroutine_self()); } } =20 --=20 2.13.6 From nobody Fri May 3 23:22:01 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (208.118.235.17 [208.118.235.17]) by mx.zohomail.com with SMTPS id 1510169112791383.3200545277185; Wed, 8 Nov 2017 11:25:12 -0800 (PST) Received: from localhost ([::1]:33604 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eCVy4-0001dM-Qi for importer@patchew.org; Wed, 08 Nov 2017 14:24:56 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41002) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eCVuF-0007nJ-G5 for qemu-devel@nongnu.org; Wed, 08 Nov 2017 14:21:04 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eCVuE-0001DM-Ht for qemu-devel@nongnu.org; Wed, 08 Nov 2017 14:20:59 -0500 Received: from mx1.redhat.com ([209.132.183.28]:40064) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eCVuE-0001DD-BX for qemu-devel@nongnu.org; Wed, 08 Nov 2017 14:20:58 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 907942C9DC7; Wed, 8 Nov 2017 19:20:57 +0000 (UTC) Received: from localhost (ovpn-116-145.ams2.redhat.com [10.36.116.145]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3B5756BF86; Wed, 8 Nov 2017 19:20:53 +0000 (UTC) From: Stefan Hajnoczi To: Date: Wed, 8 Nov 2017 19:20:43 +0000 Message-Id: <20171108192043.10682-3-stefanha@redhat.com> In-Reply-To: <20171108192043.10682-1-stefanha@redhat.com> References: <20171108192043.10682-1-stefanha@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Wed, 08 Nov 2017 19:20:57 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PULL for-2.11-rc1 v2 2/2] util/async: use atomic_mb_set in qemu_bh_cancel X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell , Stefan Hajnoczi , Sergio Lopez Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Sergio Lopez Commit b7a745d added a qemu_bh_cancel call to the completion function as an optimization to prevent it from unnecessarily rescheduling itself. This completion function is scheduled from worker_thread, after setting the state of a ThreadPoolElement to THREAD_DONE. This was considered to be safe, as the completion function restarts the loop just after the call to qemu_bh_cancel. But, as this loop lacks a HW memory barrier, the read of req->state may actually happen _before_ the call, seeing it still as THREAD_QUEUED, and ending the completion function without having processed a pending TPE linked at pool->head: worker thread | I/O thread ------------------------------------------------------------------------ | speculatively read req->state req->state =3D THREAD_DONE; | qemu_bh_schedule(p->completion_bh) | bh->scheduled =3D 1; | | qemu_bh_cancel(p->completion_bh) | bh->scheduled =3D 0; | if (req->state =3D=3D THREAD_DONE) | // sees THREAD_QUEUED The source of the misunderstanding was that qemu_bh_cancel is now being used by the _consumer_ rather than the producer, and therefore now needs to have acquire semantics just like e.g. aio_bh_poll. In some situations, if there are no other independent requests in the same aio context that could eventually trigger the scheduling of the completion function, the omitted TPE and all operations pending on it will get stuck forever. [Added Sergio's updated wording about the HW memory barrier. --Stefan] Signed-off-by: Sergio Lopez Message-id: 20171108063447.2842-1-slp@redhat.com Signed-off-by: Stefan Hajnoczi --- util/async.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/util/async.c b/util/async.c index 355af73ee7..0e1bd8780a 100644 --- a/util/async.c +++ b/util/async.c @@ -174,7 +174,7 @@ void qemu_bh_schedule(QEMUBH *bh) */ void qemu_bh_cancel(QEMUBH *bh) { - bh->scheduled =3D 0; + atomic_mb_set(&bh->scheduled, 0); } =20 /* This func is async.The bottom half will do the delete action at the fin= ial --=20 2.13.6