From nobody Mon Feb  9 11:46:41 2026
Delivered-To: importer@patchew.org
Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as
 permitted sender) client-ip=208.118.235.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists.gnu.org;
Authentication-Results: mx.zohomail.com;
	spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted
 sender)  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by
 mx.zohomail.com
	with SMTPS id 1538516601339467.50997982721844;
 Tue, 2 Oct 2018 14:43:21 -0700 (PDT)
Received: from localhost ([::1]:45824 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <qemu-devel-bounces+importer=patchew.org@nongnu.org>)
	id 1g7SRs-0002hO-6B
	for importer@patchew.org; Tue, 02 Oct 2018 17:43:20 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:44073)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <cota@braap.org>) id 1g7SEo-0007fx-Ux
	for qemu-devel@nongnu.org; Tue, 02 Oct 2018 17:29:52 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <cota@braap.org>) id 1g7SEj-0001Mx-RQ
	for qemu-devel@nongnu.org; Tue, 02 Oct 2018 17:29:50 -0400
Received: from out3-smtp.messagingengine.com ([66.111.4.27]:58639)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <cota@braap.org>) id 1g7SEj-0001LR-Kw
	for qemu-devel@nongnu.org; Tue, 02 Oct 2018 17:29:45 -0400
Received: from compute4.internal (compute4.nyi.internal [10.202.2.44])
	by mailout.nyi.internal (Postfix) with ESMTP id 6CC9321E33;
	Tue,  2 Oct 2018 17:29:43 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
	by compute4.internal (MEProxy); Tue, 02 Oct 2018 17:29:43 -0400
Received: from localhost (flamenco.cs.columbia.edu [128.59.20.216])
	by mail.messagingengine.com (Postfix) with ESMTPA id B5F78E49CB;
	Tue,  2 Oct 2018 17:29:42 -0400 (EDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=braap.org; h=cc
	:date:from:in-reply-to:message-id:references:subject:to
	:x-me-sender:x-me-sender:x-sasl-enc; s=mesmtp; bh=jWd/ueBukeYKf/
	HYsd0n3gINJfKVyK+oskBNl369UoE=; b=x1d20ZPHLICXu7kZA6Wwz7LrM0aGVd
	OZKYk2beGNVJJLujDskMRxqX1knY2TjKrkZmMVjB1jdbGgDnBdOHKzRPrwekpaQo
	0CVFEcPkG9E/NC+iJdGauE/T6fo16y0eGLrbADzMjNCPLqb9XYOTV+dODJEc/ixA
	rxbCm+sXRnAHk=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
	messagingengine.com; h=cc:date:from:in-reply-to:message-id
	:references:subject:to:x-me-sender:x-me-sender:x-sasl-enc; s=
	fm3; bh=jWd/ueBukeYKf/HYsd0n3gINJfKVyK+oskBNl369UoE=; b=wjjkitKm
	eKmJ7ZpQvulX2k04i7b8ehCZlShkMc+uEobtDcWpWw2PzgT9rz0gFnuIo3xKmvP0
	zxXahqVShd2DnS9YaIvi3/kOQifNfRywPHFNmHc+DTiQgc+Mo2NsEUW3anGoTLAy
	mUjP64aL2geiarnvuLxbMcNGQtwiUpmjbnqlfEqopgSCQ4C9ffyL7O0YOvz9CmQJ
	4XZORI+CU5nzKsPrLcBucnnUioPHjqNauGqV3lycrk6pbaNoEC5uDMCeLSgvbl7n
	1/qMobhVNNs0w6LSmeyYcOuHtjdW/iNJn4gtDUurLnYjXTS9+LQtcC66RzQvo5S7
	26aH1ePOd9GOWQ==
X-ME-Sender: <xms:R-OzWy8T_rZVeB82sQa0Xn1thCTvOjTcfaKZVgdDmXE9hCT5fHRf9g>
X-ME-Proxy: <xmx:R-OzWwRtjl9_u17XxLEGRR_FarETN1x4OnOwzlu6tWOr_Zci3eyENw>
	<xmx:R-OzW9_4jtFjN0BevkVS9W-oYTMe6yfWMAlfoAwrNq1rixhAQFwFpQ>
	<xmx:R-OzW_qZ5qRMmpgZ3GQnI4chu85MCibylIiMo8kF7QGcLxGgyhtATw>
	<xmx:R-OzWympxAyY1edgBVxDj02EOJJjKaSDOwgmMjMBwVM3RztSE3SPbg>
	<xmx:R-OzW-LHGj0H9hWyKRGjBSN_J94b0lKOvKP-jAdDsjHg8-NmYpI-QA>
	<xmx:R-OzW2GM6SUyJmG74hVYuEBUE_PLq3zN2pDIp8LEezBiayxZWWDY5A>
From: "Emilio G. Cota" <cota@braap.org>
To: qemu-devel@nongnu.org
Date: Tue,  2 Oct 2018 17:29:20 -0400
Message-Id: <20181002212921.30982-3-cota@braap.org>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20181002212921.30982-1-cota@braap.org>
References: <20181002212921.30982-1-cota@braap.org>
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 66.111.4.27
Subject: [Qemu-devel] [PATCH 2/3] cputlb: serialize tlb updates with
 env->tlb_lock
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	=?UTF-8?q?Alex=20Benn=C3=A9e?= <alex.bennee@linaro.org>,
	Richard Henderson <rth@twiddle.net>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+importer=patchew.org@nongnu.org>
X-ZohoMail: RSF_0  Z_629925259 SPT_0
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"

Currently we rely on atomic operations for cross-CPU invalidations.
There are two cases that these atomics miss: cross-CPU invalidations
can race with either (1) vCPU threads flushing their TLB, which
happens via memset, or (2) vCPUs calling tlb_reset_dirty on their TLB,
which updates .addr_write with a regular store. This results in
undefined behaviour, since we're mixing regular and atomic ops
on concurrent accesses.

Fix it by using tlb_lock, a per-vCPU lock. All updaters of tlb_table
and the corresponding victim cache now hold the lock.
The readers that do not hold tlb_lock must use atomic reads when
reading .addr_write, since this field can be updated by other threads;
the conversion to atomic reads is done in the next patch.

Note that an alternative fix would be to expand the use of atomic ops.
However, in the case of TLB flushes this would have a huge performance
impact, since (1) TLB flushes can happen very frequently and (2) we
currently use a full memory barrier to flush each TLB entry, and a TLB
has many entries. Instead, acquiring the lock is barely slower than a
full memory barrier since it is uncontended, and with a single lock
acquisition we can flush the entire TLB.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/cpu-defs.h |   2 +
 accel/tcg/cputlb.c      | 108 ++++++++++++++++++----------------------
 2 files changed, 50 insertions(+), 60 deletions(-)

diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index a171ffc1a4..bcc40c8ef5 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -142,6 +142,8 @@ typedef struct CPUIOTLBEntry {
=20
 #define CPU_COMMON_TLB \
     /* The meaning of the MMU modes is defined in the target code. */   \
+    /* tlb_lock serializes updates to tlb_table and tlb_v_table */      \
+    QemuMutex tlb_lock;                                                 \
     CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE];                  \
     CPUTLBEntry tlb_v_table[NB_MMU_MODES][CPU_VTLB_SIZE];               \
     CPUIOTLBEntry iotlb[NB_MMU_MODES][CPU_TLB_SIZE];                    \
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 502eea2850..01b33f9d28 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -75,6 +75,9 @@ QEMU_BUILD_BUG_ON(NB_MMU_MODES > 16);
=20
 void tlb_init(CPUState *cpu)
 {
+    CPUArchState *env =3D cpu->env_ptr;
+
+    qemu_mutex_init(&env->tlb_lock);
 }
=20
 /* flush_all_helper: run fn across all cpus
@@ -129,8 +132,11 @@ static void tlb_flush_nocheck(CPUState *cpu)
     atomic_set(&env->tlb_flush_count, env->tlb_flush_count + 1);
     tlb_debug("(count: %zu)\n", tlb_flush_count());
=20
+    qemu_mutex_lock(&env->tlb_lock);
     memset(env->tlb_table, -1, sizeof(env->tlb_table));
     memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
+    qemu_mutex_unlock(&env->tlb_lock);
+
     cpu_tb_jmp_cache_clear(cpu);
=20
     env->vtlb_index =3D 0;
@@ -454,72 +460,48 @@ void tlb_unprotect_code(ram_addr_t ram_addr)
  * most usual is detecting writes to code regions which may invalidate
  * generated code.
  *
- * Because we want other vCPUs to respond to changes straight away we
- * update the te->addr_write field atomically. If the TLB entry has
- * been changed by the vCPU in the mean time we skip the update.
+ * Other vCPUs might be reading their TLBs during guest execution, so we u=
pdate
+ * te->addr_write with atomic_set. We don't need to worry about this for
+ * oversized guests as MTTCG is disabled for them.
  *
- * As this function uses atomic accesses we also need to ensure
- * updates to tlb_entries follow the same access rules. We don't need
- * to worry about this for oversized guests as MTTCG is disabled for
- * them.
+ * Called with tlb_lock held.
  */
-
-static void tlb_reset_dirty_range(CPUTLBEntry *tlb_entry, uintptr_t start,
-                           uintptr_t length)
+static void tlb_reset_dirty_range_locked(CPUTLBEntry *tlb_entry,
+                                         uintptr_t start, uintptr_t length)
 {
-#if TCG_OVERSIZED_GUEST
     uintptr_t addr =3D tlb_entry->addr_write;
=20
     if ((addr & (TLB_INVALID_MASK | TLB_MMIO | TLB_NOTDIRTY)) =3D=3D 0) {
         addr &=3D TARGET_PAGE_MASK;
         addr +=3D tlb_entry->addend;
         if ((addr - start) < length) {
+#if TCG_OVERSIZED_GUEST
             tlb_entry->addr_write |=3D TLB_NOTDIRTY;
-        }
-    }
 #else
-    /* paired with atomic_mb_set in tlb_set_page_with_attrs */
-    uintptr_t orig_addr =3D atomic_mb_read(&tlb_entry->addr_write);
-    uintptr_t addr =3D orig_addr;
-
-    if ((addr & (TLB_INVALID_MASK | TLB_MMIO | TLB_NOTDIRTY)) =3D=3D 0) {
-        addr &=3D TARGET_PAGE_MASK;
-        addr +=3D atomic_read(&tlb_entry->addend);
-        if ((addr - start) < length) {
-            uintptr_t notdirty_addr =3D orig_addr | TLB_NOTDIRTY;
-            atomic_cmpxchg(&tlb_entry->addr_write, orig_addr, notdirty_add=
r);
+            atomic_set(&tlb_entry->addr_write,
+                       tlb_entry->addr_write | TLB_NOTDIRTY);
+#endif
         }
     }
-#endif
 }
=20
-/* For atomic correctness when running MTTCG we need to use the right
- * primitives when copying entries */
-static inline void copy_tlb_helper(CPUTLBEntry *d, CPUTLBEntry *s,
-                                   bool atomic_set)
+/* Called with tlb_lock held */
+static void copy_tlb_helper_locked(CPUTLBEntry *d, const CPUTLBEntry *s)
 {
-#if TCG_OVERSIZED_GUEST
     *d =3D *s;
-#else
-    if (atomic_set) {
-        d->addr_read =3D s->addr_read;
-        d->addr_code =3D s->addr_code;
-        atomic_set(&d->addend, atomic_read(&s->addend));
-        /* Pairs with flag setting in tlb_reset_dirty_range */
-        atomic_mb_set(&d->addr_write, atomic_read(&s->addr_write));
-    } else {
-        d->addr_read =3D s->addr_read;
-        d->addr_write =3D atomic_read(&s->addr_write);
-        d->addr_code =3D s->addr_code;
-        d->addend =3D atomic_read(&s->addend);
-    }
-#endif
+}
+
+static void copy_tlb_helper(CPUArchState *env, CPUTLBEntry *d, CPUTLBEntry=
 *s)
+{
+    qemu_mutex_lock(&env->tlb_lock);
+    copy_tlb_helper_locked(d, s);
+    qemu_mutex_unlock(&env->tlb_lock);
 }
=20
 /* This is a cross vCPU call (i.e. another vCPU resetting the flags of
- * the target vCPU). As such care needs to be taken that we don't
- * dangerously race with another vCPU update. The only thing actually
- * updated is the target TLB entry ->addr_write flags.
+ * the target vCPU).
+ * We must take tlb_lock to avoid racing with another vCPU update. The only
+ * thing actually updated is the target TLB entry ->addr_write flags.
  */
 void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length)
 {
@@ -528,22 +510,26 @@ void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1=
, ram_addr_t length)
     int mmu_idx;
=20
     env =3D cpu->env_ptr;
+    qemu_mutex_lock(&env->tlb_lock);
     for (mmu_idx =3D 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
         unsigned int i;
=20
         for (i =3D 0; i < CPU_TLB_SIZE; i++) {
-            tlb_reset_dirty_range(&env->tlb_table[mmu_idx][i],
-                                  start1, length);
+            tlb_reset_dirty_range_locked(&env->tlb_table[mmu_idx][i], star=
t1,
+                                         length);
         }
=20
         for (i =3D 0; i < CPU_VTLB_SIZE; i++) {
-            tlb_reset_dirty_range(&env->tlb_v_table[mmu_idx][i],
-                                  start1, length);
+            tlb_reset_dirty_range_locked(&env->tlb_v_table[mmu_idx][i], st=
art1,
+                                         length);
         }
     }
+    qemu_mutex_unlock(&env->tlb_lock);
 }
=20
-static inline void tlb_set_dirty1(CPUTLBEntry *tlb_entry, target_ulong vad=
dr)
+/* Called with tlb_lock held */
+static inline void tlb_set_dirty1_locked(CPUTLBEntry *tlb_entry,
+                                         target_ulong vaddr)
 {
     if (tlb_entry->addr_write =3D=3D (vaddr | TLB_NOTDIRTY)) {
         tlb_entry->addr_write =3D vaddr;
@@ -562,16 +548,18 @@ void tlb_set_dirty(CPUState *cpu, target_ulong vaddr)
=20
     vaddr &=3D TARGET_PAGE_MASK;
     i =3D (vaddr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+    qemu_mutex_lock(&env->tlb_lock);
     for (mmu_idx =3D 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
-        tlb_set_dirty1(&env->tlb_table[mmu_idx][i], vaddr);
+        tlb_set_dirty1_locked(&env->tlb_table[mmu_idx][i], vaddr);
     }
=20
     for (mmu_idx =3D 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
         int k;
         for (k =3D 0; k < CPU_VTLB_SIZE; k++) {
-            tlb_set_dirty1(&env->tlb_v_table[mmu_idx][k], vaddr);
+            tlb_set_dirty1_locked(&env->tlb_v_table[mmu_idx][k], vaddr);
         }
     }
+    qemu_mutex_unlock(&env->tlb_lock);
 }
=20
 /* Our TLB does not support large pages, so remember the area covered by
@@ -677,7 +665,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulon=
g vaddr,
         CPUTLBEntry *tv =3D &env->tlb_v_table[mmu_idx][vidx];
=20
         /* Evict the old entry into the victim tlb.  */
-        copy_tlb_helper(tv, te, true);
+        copy_tlb_helper(env, tv, te);
         env->iotlb_v[mmu_idx][vidx] =3D env->iotlb[mmu_idx][index];
     }
=20
@@ -729,9 +717,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulon=
g vaddr,
         }
     }
=20
-    /* Pairs with flag setting in tlb_reset_dirty_range */
-    copy_tlb_helper(te, &tn, true);
-    /* atomic_mb_set(&te->addr_write, write_address); */
+    copy_tlb_helper(env, te, &tn);
 }
=20
 /* Add a new TLB entry, but without specifying the memory
@@ -903,9 +889,11 @@ static bool victim_tlb_hit(CPUArchState *env, size_t m=
mu_idx, size_t index,
             /* Found entry in victim tlb, swap tlb and iotlb.  */
             CPUTLBEntry tmptlb, *tlb =3D &env->tlb_table[mmu_idx][index];
=20
-            copy_tlb_helper(&tmptlb, tlb, false);
-            copy_tlb_helper(tlb, vtlb, true);
-            copy_tlb_helper(vtlb, &tmptlb, true);
+            qemu_mutex_lock(&env->tlb_lock);
+            copy_tlb_helper_locked(&tmptlb, tlb);
+            copy_tlb_helper_locked(tlb, vtlb);
+            copy_tlb_helper_locked(vtlb, &tmptlb);
+            qemu_mutex_unlock(&env->tlb_lock);
=20
             CPUIOTLBEntry tmpio, *io =3D &env->iotlb[mmu_idx][index];
             CPUIOTLBEntry *vio =3D &env->iotlb_v[mmu_idx][vidx];
--=20
2.17.1