From nobody Sun Apr 13 09:43:36 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1487584444350792.3874007090889; Mon, 20 Feb 2017 01:54:04 -0800 (PST) Received: from localhost ([::1]:37012 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cfkfR-0007al-Vu for importer@patchew.org; Mon, 20 Feb 2017 04:54:02 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60215) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cfkMN-0007L8-N5 for qemu-devel@nongnu.org; Mon, 20 Feb 2017 04:34:20 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cfkMI-0007TP-Ii for qemu-devel@nongnu.org; Mon, 20 Feb 2017 04:34:19 -0500 Received: from mx1.redhat.com ([209.132.183.28]:60688) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cfkM4-0007Lh-Dm for qemu-devel@nongnu.org; Mon, 20 Feb 2017 04:34:14 -0500 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8BD2A85540; Mon, 20 Feb 2017 09:34:00 +0000 (UTC) Received: from localhost (ovpn-116-137.ams2.redhat.com [10.36.116.137]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v1K9XxLN015633; Mon, 20 Feb 2017 04:33:59 -0500 From: Stefan Hajnoczi To: Date: Mon, 20 Feb 2017 09:33:00 +0000 Message-Id: <20170220093304.20515-21-stefanha@redhat.com> In-Reply-To: <20170220093304.20515-1-stefanha@redhat.com> References: <20170220093304.20515-1-stefanha@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 20 Feb 2017 09:34:00 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PULL 20/24] coroutine-lock: add limited spinning to CoMutex X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell , Stefan Hajnoczi , Paolo Bonzini Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Paolo Bonzini Running a very small critical section on pthread_mutex_t and CoMutex shows that pthread_mutex_t is much faster because it doesn't actually go to sleep. What happens is that the critical section is shorter than the latency of entering the kernel and thus FUTEX_WAIT always fails. With CoMutex there is no such latency but you still want to avoid wait and wakeup. So introduce it artificially. This only works with one waiters; because CoMutex is fair, it will always have more waits and wakeups than a pthread_mutex_t. Signed-off-by: Paolo Bonzini Reviewed-by: Fam Zheng Message-id: 20170213181244.16297-3-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi --- include/qemu/coroutine.h | 5 +++++ util/qemu-coroutine-lock.c | 51 ++++++++++++++++++++++++++++++++++++++++--= ---- util/qemu-coroutine.c | 2 +- 3 files changed, 51 insertions(+), 7 deletions(-) diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h index fce228f..12ce8e1 100644 --- a/include/qemu/coroutine.h +++ b/include/qemu/coroutine.h @@ -167,6 +167,11 @@ typedef struct CoMutex { */ unsigned locked; =20 + /* Context that is holding the lock. Useful to avoid spinning + * when two coroutines on the same AioContext try to get the lock. :) + */ + AioContext *ctx; + /* A queue of waiters. Elements are added atomically in front of * from_push. to_pop is only populated, and popped from, by whoever * is in charge of the next wakeup. This can be an unlocker or, diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c index 25da9fa..73fe77c 100644 --- a/util/qemu-coroutine-lock.c +++ b/util/qemu-coroutine-lock.c @@ -30,6 +30,7 @@ #include "qemu-common.h" #include "qemu/coroutine.h" #include "qemu/coroutine_int.h" +#include "qemu/processor.h" #include "qemu/queue.h" #include "block/aio.h" #include "trace.h" @@ -181,7 +182,18 @@ void qemu_co_mutex_init(CoMutex *mutex) memset(mutex, 0, sizeof(*mutex)); } =20 -static void coroutine_fn qemu_co_mutex_lock_slowpath(CoMutex *mutex) +static void coroutine_fn qemu_co_mutex_wake(CoMutex *mutex, Coroutine *co) +{ + /* Read co before co->ctx; pairs with smp_wmb() in + * qemu_coroutine_enter(). + */ + smp_read_barrier_depends(); + mutex->ctx =3D co->ctx; + aio_co_wake(co); +} + +static void coroutine_fn qemu_co_mutex_lock_slowpath(AioContext *ctx, + CoMutex *mutex) { Coroutine *self =3D qemu_coroutine_self(); CoWaitRecord w; @@ -206,10 +218,11 @@ static void coroutine_fn qemu_co_mutex_lock_slowpath(= CoMutex *mutex) if (co =3D=3D self) { /* We got the lock ourselves! */ assert(to_wake =3D=3D &w); + mutex->ctx =3D ctx; return; } =20 - aio_co_wake(co); + qemu_co_mutex_wake(mutex, co); } =20 qemu_coroutine_yield(); @@ -218,13 +231,39 @@ static void coroutine_fn qemu_co_mutex_lock_slowpath(= CoMutex *mutex) =20 void coroutine_fn qemu_co_mutex_lock(CoMutex *mutex) { + AioContext *ctx =3D qemu_get_current_aio_context(); Coroutine *self =3D qemu_coroutine_self(); + int waiters, i; =20 - if (atomic_fetch_inc(&mutex->locked) =3D=3D 0) { + /* Running a very small critical section on pthread_mutex_t and CoMutex + * shows that pthread_mutex_t is much faster because it doesn't actual= ly + * go to sleep. What happens is that the critical section is shorter + * than the latency of entering the kernel and thus FUTEX_WAIT always + * fails. With CoMutex there is no such latency but you still want to + * avoid wait and wakeup. So introduce it artificially. + */ + i =3D 0; +retry_fast_path: + waiters =3D atomic_cmpxchg(&mutex->locked, 0, 1); + if (waiters !=3D 0) { + while (waiters =3D=3D 1 && ++i < 1000) { + if (atomic_read(&mutex->ctx) =3D=3D ctx) { + break; + } + if (atomic_read(&mutex->locked) =3D=3D 0) { + goto retry_fast_path; + } + cpu_relax(); + } + waiters =3D atomic_fetch_inc(&mutex->locked); + } + + if (waiters =3D=3D 0) { /* Uncontended. */ trace_qemu_co_mutex_lock_uncontended(mutex, self); + mutex->ctx =3D ctx; } else { - qemu_co_mutex_lock_slowpath(mutex); + qemu_co_mutex_lock_slowpath(ctx, mutex); } mutex->holder =3D self; self->locks_held++; @@ -240,6 +279,7 @@ void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex) assert(mutex->holder =3D=3D self); assert(qemu_in_coroutine()); =20 + mutex->ctx =3D NULL; mutex->holder =3D NULL; self->locks_held--; if (atomic_fetch_dec(&mutex->locked) =3D=3D 1) { @@ -252,8 +292,7 @@ void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex) unsigned our_handoff; =20 if (to_wake) { - Coroutine *co =3D to_wake->co; - aio_co_wake(co); + qemu_co_mutex_wake(mutex, to_wake->co); break; } =20 diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c index 415600d..72412e5 100644 --- a/util/qemu-coroutine.c +++ b/util/qemu-coroutine.c @@ -118,7 +118,7 @@ void qemu_coroutine_enter(Coroutine *co) co->ctx =3D qemu_get_current_aio_context(); =20 /* Store co->ctx before anything that stores co. Matches - * barrier in aio_co_wake. + * barrier in aio_co_wake and qemu_co_mutex_wake. */ smp_wmb(); =20 --=20 2.9.3