From nobody Sat Apr 27 16:39:32 2024 Delivered-To: importer@patchew.org Received-SPF: temperror (zoho.com: Error in retrieving data from DNS) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=temperror (zoho.com: Error in retrieving data from DNS) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (208.118.235.17 [208.118.235.17]) by mx.zohomail.com with SMTPS id 151012297730841.279147298241355; Tue, 7 Nov 2017 22:36:17 -0800 (PST) Received: from localhost ([::1]:57776 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eCJxx-0005BP-4x for importer@patchew.org; Wed, 08 Nov 2017 01:36:01 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50561) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eCJx1-0004m4-6G for qemu-devel@nongnu.org; Wed, 08 Nov 2017 01:35:04 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eCJx0-0003wN-BH for qemu-devel@nongnu.org; Wed, 08 Nov 2017 01:35:03 -0500 Received: from mx1.redhat.com ([209.132.183.28]:56926) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eCJwv-0003uv-Sv; Wed, 08 Nov 2017 01:34:58 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id CAB7D7EA8E; Wed, 8 Nov 2017 06:34:56 +0000 (UTC) Received: from kthompson.redhat.com (ovpn-116-141.ams2.redhat.com [10.36.116.141]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8737C5E1DE; Wed, 8 Nov 2017 06:34:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com CAB7D7EA8E Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=slp@redhat.com From: Sergio Lopez To: qemu-devel@nongnu.org Date: Wed, 8 Nov 2017 07:34:47 +0100 Message-Id: <20171108063447.2842-1-slp@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Wed, 08 Nov 2017 06:34:56 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v3] util/async: use atomic_mb_set in qemu_bh_cancel X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sergio Lopez , pbonzini@redhat.com, famz@redhat.com, stefanha@redhat.com, qemu-block@nongnu.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_6 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Commit b7a745d added a qemu_bh_cancel call to the completion function as an optimization to prevent it from unnecessarily rescheduling itself. This completion function is scheduled from worker_thread, after setting the state of a ThreadPoolElement to THREAD_DONE. This was considered to be safe, as the completion function restarts the loop just after the call to qemu_bh_cancel. But, under certain access patterns and scheduling conditions, the loop may wrongly use a pre-fetched elem->state value, reading it as THREAD_QUEUED, and ending the completion function without having processed a pending TPE linked at pool->head: worker thread | I/O thread ------------------------------------------------------------------------ | speculatively read req->state req->state =3D THREAD_DONE; | qemu_bh_schedule(p->completion_bh) | bh->scheduled =3D 1; | | qemu_bh_cancel(p->completion_bh) | bh->scheduled =3D 0; | if (req->state =3D=3D THREAD_DONE) | // sees THREAD_QUEUED The source of the misunderstanding was that qemu_bh_cancel is now being used by the _consumer_ rather than the producer, and therefore now needs to have acquire semantics just like e.g. aio_bh_poll. In some situations, if there are no other independent requests in the same aio context that could eventually trigger the scheduling of the completion function, the omitted TPE and all operations pending on it will get stuck forever. Signed-off-by: Sergio Lopez --- util/async.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/util/async.c b/util/async.c index 355af73ee7..0e1bd8780a 100644 --- a/util/async.c +++ b/util/async.c @@ -174,7 +174,7 @@ void qemu_bh_schedule(QEMUBH *bh) */ void qemu_bh_cancel(QEMUBH *bh) { - bh->scheduled =3D 0; + atomic_mb_set(&bh->scheduled, 0); } =20 /* This func is async.The bottom half will do the delete action at the fin= ial --=20 2.13.6