From nobody Mon Feb  9 11:51:25 2026
Delivered-To: importer@patchew.org
Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as
 permitted sender) client-ip=208.118.235.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists.gnu.org;
Authentication-Results: mx.zohomail.com;
	spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted
 sender)  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by
 mx.zohomail.com
	with SMTPS id 1500237668819232.3501752567522;
 Sun, 16 Jul 2017 13:41:08 -0700 (PDT)
Received: from localhost ([::1]:46873 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <qemu-devel-bounces+importer=patchew.org@nongnu.org>)
	id 1dWqLj-0005Hx-AW
	for importer@patchew.org; Sun, 16 Jul 2017 16:41:07 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:46482)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <cota@braap.org>) id 1dWpmj-0001GT-0q
	for qemu-devel@nongnu.org; Sun, 16 Jul 2017 16:05:01 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <cota@braap.org>) id 1dWpma-0008SM-TO
	for qemu-devel@nongnu.org; Sun, 16 Jul 2017 16:04:56 -0400
Received: from out1-smtp.messagingengine.com ([66.111.4.25]:46541)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <cota@braap.org>) id 1dWpma-0008Rg-KJ
	for qemu-devel@nongnu.org; Sun, 16 Jul 2017 16:04:48 -0400
Received: from compute4.internal (compute4.nyi.internal [10.202.2.44])
	by mailout.nyi.internal (Postfix) with ESMTP id 3EE6520A7C;
	Sun, 16 Jul 2017 16:04:48 -0400 (EDT)
Received: from frontend1 ([10.202.2.160])
	by compute4.internal (MEProxy); Sun, 16 Jul 2017 16:04:48 -0400
Received: from localhost (flamenco.cs.columbia.edu [128.59.20.216])
	by mail.messagingengine.com (Postfix) with ESMTPA id EB8D07E17E;
	Sun, 16 Jul 2017 16:04:47 -0400 (EDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=braap.org; h=cc
	:date:from:in-reply-to:message-id:references:subject:to
	:x-me-sender:x-me-sender:x-sasl-enc:x-sasl-enc; s=mesmtp; bh=t7T
	odIt8JA/Z4PoBtlmFw38N78j9U9x6ZXzW14hEMV4=; b=psO5MBav146uokICSTZ
	7vJnrJVa4ueqw1lue57cDgDGUwcJIzL8yaXdsjX1tdJnGFS4j2mkqGY/KlU3vXhO
	qEYyXT8V6nOm0XgNb9IBpcMj/kEglALjnqArBt13tn2w1aah+ZYsAMS8KZBwQ7WR
	VByvgHyuZ2/s1jZrzotaypg8=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
	messagingengine.com; h=cc:date:from:in-reply-to:message-id
	:references:subject:to:x-me-sender:x-me-sender:x-sasl-enc
	:x-sasl-enc; s=fm1; bh=t7TodIt8JA/Z4PoBtlmFw38N78j9U9x6ZXzW14hEM
	V4=; b=mCznS/BIoNonnsdZRf5y+7GSaEq41ApvO+1BsBLqoj1COk2ftJMUFaxSD
	/vXKE80nxEqndk51hgEpCXzhA7IeTBsWTNLkhgNui416Bncej1NUIt0GQ5DXpZgp
	saVAhD6Lnfo2Doz2hvrfbkpfxxqVetn8QcXsDqY9pVB9P/qR4d4uWlJt1boHqr4y
	4Pp7vSK0efLNU7V9YAdJuf4tPSbT1GaefMsBpuEM1YPj3QZ+cZybONZ+tzgWYq/m
	NazlWmYU9xyCsWEb1/VgrqU0J+auaytUPWkglGRO8F2LBDgdHA9CcS+EtU9Kvn0X
	73sxiVZtFYhA7elkwnwlCBCGwwtAA==
X-ME-Sender: <xms:4MZrWbyv7jO5CSH0tWlR8yuKgNBdANGmiIjv7GqVtPSEJA9-isQobg>
X-Sasl-enc: n0VRts1vGt0jPPpWXI3a6ZqMpqfm8HIPwerrj4wXIz0T 1500235487
From: "Emilio G. Cota" <cota@braap.org>
To: qemu-devel@nongnu.org
Date: Sun, 16 Jul 2017 16:04:26 -0400
Message-Id: <1500235468-15341-44-git-send-email-cota@braap.org>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1500235468-15341-1-git-send-email-cota@braap.org>
References: <1500235468-15341-1-git-send-email-cota@braap.org>
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
	[fuzzy]
X-Received-From: 66.111.4.25
Subject: [Qemu-devel] [PATCH v2 43/45] tcg: introduce regions to split
 code_gen_buffer
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: Richard Henderson <rth@twiddle.net>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+importer=patchew.org@nongnu.org>
X-ZohoMail: RSF_0  Z_629925259 SPT_0
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"

This is groundwork for supporting multiple TCG contexts.

The naive solution here is to split code_gen_buffer statically
among the TCG threads; this however results in poor utilization
if translation needs are different across TCG threads.

What we do here is to add an extra layer of indirection, assigning
regions that act just like pages do in virtual memory allocation.
(BTW if you are wondering about the chosen naming, I did not want
to use blocks or pages because those are already heavily used in QEMU).

We use a global lock to serialize allocations as well as statistics
reporting (we now export the size of the used code_gen_buffer with
tcg_code_size()). Note that for the allocator we could just use
a counter and atomic_inc; however, that would complicate the gathering
of tcg_code_size()-like stats. So given that the region operations are
not a fast path, a lock seems the most reasonable choice.

The effectiveness of this approach is clear after seeing some numbers.
I used the bootup+shutdown of debian-arm with '-tb-size 80' as a benchmark.
Note that I'm evaluating this after enabling per-thread TCG (which
is done by a subsequent commit).

* -smp 1, 1 region (entire buffer):
    qemu: flush code_size=3D83885014 nb_tbs=3D154739 avg_tb_size=3D357
    qemu: flush code_size=3D83884902 nb_tbs=3D153136 avg_tb_size=3D363
    qemu: flush code_size=3D83885014 nb_tbs=3D152777 avg_tb_size=3D364
    qemu: flush code_size=3D83884950 nb_tbs=3D150057 avg_tb_size=3D373
    qemu: flush code_size=3D83884998 nb_tbs=3D150234 avg_tb_size=3D373
    qemu: flush code_size=3D83885014 nb_tbs=3D154009 avg_tb_size=3D360
    qemu: flush code_size=3D83885014 nb_tbs=3D151007 avg_tb_size=3D370
    qemu: flush code_size=3D83885014 nb_tbs=3D151816 avg_tb_size=3D367

That is, 8 flushes.

* -smp 8, 32 regions (80/32 MB per region) [i.e. this patch]:

    qemu: flush code_size=3D76328008 nb_tbs=3D141040 avg_tb_size=3D356
    qemu: flush code_size=3D75366534 nb_tbs=3D138000 avg_tb_size=3D361
    qemu: flush code_size=3D76864546 nb_tbs=3D140653 avg_tb_size=3D361
    qemu: flush code_size=3D76309084 nb_tbs=3D135945 avg_tb_size=3D375
    qemu: flush code_size=3D74581856 nb_tbs=3D132909 avg_tb_size=3D375
    qemu: flush code_size=3D73927256 nb_tbs=3D135616 avg_tb_size=3D360
    qemu: flush code_size=3D78629426 nb_tbs=3D142896 avg_tb_size=3D365
    qemu: flush code_size=3D76667052 nb_tbs=3D138508 avg_tb_size=3D368

Again, 8 flushes. Note how buffer utilization is not 100%, but it
is close. Smaller region sizes would yield higher utilization,
but we want region allocation to be rare (it acquires a lock), so
we do not want to go too small.

* -smp 8, static partitioning of 8 regions (10 MB per region):
    qemu: flush code_size=3D21936504 nb_tbs=3D40570 avg_tb_size=3D354
    qemu: flush code_size=3D11472174 nb_tbs=3D20633 avg_tb_size=3D370
    qemu: flush code_size=3D11603976 nb_tbs=3D21059 avg_tb_size=3D365
    qemu: flush code_size=3D23254872 nb_tbs=3D41243 avg_tb_size=3D377
    qemu: flush code_size=3D28289496 nb_tbs=3D52057 avg_tb_size=3D358
    qemu: flush code_size=3D43605160 nb_tbs=3D78896 avg_tb_size=3D367
    qemu: flush code_size=3D45166552 nb_tbs=3D82158 avg_tb_size=3D364
    qemu: flush code_size=3D63289640 nb_tbs=3D116494 avg_tb_size=3D358
    qemu: flush code_size=3D51389960 nb_tbs=3D93937 avg_tb_size=3D362
    qemu: flush code_size=3D59665928 nb_tbs=3D107063 avg_tb_size=3D372
    qemu: flush code_size=3D38380824 nb_tbs=3D68597 avg_tb_size=3D374
    qemu: flush code_size=3D44884568 nb_tbs=3D79901 avg_tb_size=3D376
    qemu: flush code_size=3D50782632 nb_tbs=3D90681 avg_tb_size=3D374
    qemu: flush code_size=3D39848888 nb_tbs=3D71433 avg_tb_size=3D372
    qemu: flush code_size=3D64708840 nb_tbs=3D119052 avg_tb_size=3D359
    qemu: flush code_size=3D49830008 nb_tbs=3D90992 avg_tb_size=3D362
    qemu: flush code_size=3D68372408 nb_tbs=3D123442 avg_tb_size=3D368
    qemu: flush code_size=3D33555560 nb_tbs=3D59514 avg_tb_size=3D378
    qemu: flush code_size=3D44748344 nb_tbs=3D80974 avg_tb_size=3D367
    qemu: flush code_size=3D37104248 nb_tbs=3D67609 avg_tb_size=3D364

That is, 20 flushes. Note how a static partitioning approach uses
the code buffer poorly, leading to many unnecessary flushes.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/exec-all.h   |   3 +
 tcg/tcg.h                 |   6 ++
 accel/tcg/translate-all.c |  56 +++++++++----
 bsd-user/main.c           |   1 +
 cpus.c                    |  12 +++
 linux-user/main.c         |   1 +
 tcg/tcg.c                 | 197 ++++++++++++++++++++++++++++++++++++++++++=
+++-
 7 files changed, 260 insertions(+), 16 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 37487d7..69a2a21 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -49,6 +49,9 @@ void gen_intermediate_code(CPUArchState *env, struct Tran=
slationBlock *tb);
 void restore_state_to_opc(CPUArchState *env, struct TranslationBlock *tb,
                           target_ulong *data);
=20
+#ifdef CONFIG_SOFTMMU
+void softmmu_tcg_region_init(void);
+#endif
 void cpu_gen_init(void);
 bool cpu_restore_state(CPUState *cpu, uintptr_t searched_pc);
=20
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 9d17584..6f6720b 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -772,6 +772,12 @@ void *tcg_malloc_internal(TCGContext *s, int size);
 void tcg_pool_reset(TCGContext *s);
 TranslationBlock *tcg_tb_alloc(TCGContext *s);
=20
+void tcg_region_init(size_t n_regions);
+void tcg_region_reset_all(void);
+
+size_t tcg_code_size(void);
+size_t tcg_code_capacity(void);
+
 /* Called with tb_lock held.  */
 static inline void *tcg_malloc(int size)
 {
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 913b1c5..c30d400 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -59,6 +59,7 @@
 #include "qemu/main-loop.h"
 #include "exec/log.h"
 #include "sysemu/cpus.h"
+#include "sysemu/sysemu.h"
=20
 /* #define DEBUG_TB_INVALIDATE */
 /* #define DEBUG_TB_FLUSH */
@@ -797,6 +798,39 @@ static inline void code_gen_alloc(size_t tb_size)
     qemu_mutex_init(&tb_ctx.tb_lock);
 }
=20
+#ifdef CONFIG_SOFTMMU
+/*
+ * It is likely that some vCPUs will translate more code than others, so we
+ * first try to set more regions than smp_cpus, with those regions being
+ * larger than the minimum code_gen_buffer size. If that's not possible we
+ * make do by evenly dividing the code_gen_buffer among the vCPUs.
+ */
+void softmmu_tcg_region_init(void)
+{
+    size_t i;
+
+    /* Use a single region if all we have is one vCPU thread */
+    if (smp_cpus =3D=3D 1 || !qemu_tcg_mttcg_enabled()) {
+        tcg_region_init(0);
+        return;
+    }
+
+    for (i =3D 8; i > 0; i--) {
+        size_t regions_per_thread =3D i;
+        size_t region_size;
+
+        region_size =3D tcg_init_ctx.code_gen_buffer_size;
+        region_size /=3D smp_cpus * regions_per_thread;
+
+        if (region_size >=3D 2 * MIN_CODE_GEN_BUFFER_SIZE) {
+            tcg_region_init(smp_cpus * regions_per_thread);
+            return;
+        }
+    }
+    tcg_region_init(smp_cpus);
+}
+#endif
+
 static void tb_htable_init(void)
 {
     unsigned int mode =3D QHT_MODE_AUTO_RESIZE;
@@ -916,13 +950,8 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data=
 tb_flush_count)
         size_t host_size =3D 0;
=20
         g_tree_foreach(tb_ctx.tb_tree, tb_host_size_iter, &host_size);
-        printf("qemu: flush code_size=3D%td nb_tbs=3D%zu avg_tb_size=3D%zu=
\n",
-               tcg_ctx->code_gen_ptr - tcg_ctx->code_gen_buffer, nb_tbs,
-               nb_tbs > 0 ? host_size / nb_tbs : 0);
-    }
-    if ((unsigned long)(tcg_ctx->code_gen_ptr - tcg_ctx->code_gen_buffer)
-        > tcg_ctx->code_gen_buffer_size) {
-        cpu_abort(cpu, "Internal error: code buffer overflow\n");
+        printf("qemu: flush code_size=3D%zu nb_tbs=3D%zu avg_tb_size=3D%zu=
\n",
+               tcg_code_size(), nb_tbs, nb_tbs > 0 ? host_size / nb_tbs : =
0);
     }
=20
     CPU_FOREACH(cpu) {
@@ -936,7 +965,7 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data =
tb_flush_count)
     qht_reset_size(&tb_ctx.htable, CODE_GEN_HTABLE_SIZE);
     page_flush_tb();
=20
-    tcg_ctx->code_gen_ptr =3D tcg_ctx->code_gen_buffer;
+    tcg_region_reset_all();
     /* XXX: flush processor icache at this point if cache flush is
        expensive */
     atomic_mb_set(&tb_ctx.tb_flush_count, tb_ctx.tb_flush_count + 1);
@@ -1281,9 +1310,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         cflags |=3D CF_USE_ICOUNT;
     }
=20
+ buffer_overflow:
     tb =3D tb_alloc(pc);
     if (unlikely(!tb)) {
- buffer_overflow:
         /* flush must be done */
         tb_flush(cpu);
         mmap_unlock();
@@ -1367,9 +1396,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
     }
 #endif
=20
-    tcg_ctx->code_gen_ptr =3D (void *)
+    atomic_set(&tcg_ctx->code_gen_ptr, (void *)
         ROUND_UP((uintptr_t)gen_code_buf + gen_code_size + search_size,
-                 CODE_GEN_ALIGN);
+                 CODE_GEN_ALIGN));
=20
     /* init jump list */
     assert(((uintptr_t)tb & 3) =3D=3D 0);
@@ -1921,9 +1950,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fpr=
intf)
      * otherwise users might think "-tb-size" is not honoured.
      * For avg host size we use the precise numbers from tb_tree_stats tho=
ugh.
      */
-    cpu_fprintf(f, "gen code size       %td/%zd\n",
-                tcg_ctx->code_gen_ptr - tcg_ctx->code_gen_buffer,
-                tcg_ctx->code_gen_highwater - tcg_ctx->code_gen_buffer);
+    cpu_fprintf(f, "gen code size       %zu/%zu\n",
+                tcg_code_size(), tcg_code_capacity());
     cpu_fprintf(f, "TB count            %zu\n", nb_tbs);
     cpu_fprintf(f, "TB avg target size  %zu max=3D%zu bytes\n",
                 nb_tbs ? tst.target_size / nb_tbs : 0,
diff --git a/bsd-user/main.c b/bsd-user/main.c
index 7a8b29e..bc06c1c 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -979,6 +979,7 @@ int main(int argc, char **argv)
        generating the prologue until now so that the prologue can take
        the real value of GUEST_BASE into account.  */
     tcg_prologue_init(tcg_ctx);
+    tcg_region_init(0);
=20
     /* build Task State */
     memset(ts, 0, sizeof(TaskState));
diff --git a/cpus.c b/cpus.c
index 14bb8d5..5455819 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1664,6 +1664,18 @@ static void qemu_tcg_init_vcpu(CPUState *cpu)
     char thread_name[VCPU_THREAD_NAME_SIZE];
     static QemuCond *single_tcg_halt_cond;
     static QemuThread *single_tcg_cpu_thread;
+    static int tcg_region_inited;
+
+    /*
+     * Initialize TCG regions--once, of course. Now is a good time, becaus=
e:
+     * (1) TCG's init context, prologue and target globals have been set u=
p.
+     * (2) qemu_tcg_mttcg_enabled() works now (TCG init code runs before t=
he
+     *     -accel flag is processed, so the check doesn't work then).
+     */
+    if (!tcg_region_inited) {
+        softmmu_tcg_region_init();
+        tcg_region_inited =3D 1;
+    }
=20
     if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) {
         cpu->thread =3D g_malloc0(sizeof(QemuThread));
diff --git a/linux-user/main.c b/linux-user/main.c
index ad4c6f5..0500628 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -4457,6 +4457,7 @@ int main(int argc, char **argv, char **envp)
        generating the prologue until now so that the prologue can take
        the real value of GUEST_BASE into account.  */
     tcg_prologue_init(tcg_ctx);
+    tcg_region_init(0);
=20
 #if defined(TARGET_I386)
     env->cr[0] =3D CR0_PG_MASK | CR0_WP_MASK | CR0_PE_MASK;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index e8aae1f..daec7d1 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -33,6 +33,7 @@
 #include "qemu/cutils.h"
 #include "qemu/host-utils.h"
 #include "qemu/timer.h"
+#include "qemu/osdep.h"
=20
 /* Note: the long term plan is to reduce the dependencies on the QEMU
    CPU definitions. Currently they are used for qemu_ld/st
@@ -120,6 +121,23 @@ static bool tcg_out_tb_finalize(TCGContext *s);
 static TCGContext **tcg_ctxs;
 static unsigned int n_tcg_ctxs;
=20
+/*
+ * We divide code_gen_buffer into equally-sized "regions" that TCG threads
+ * dynamically allocate from as demand dictates. Given appropriate region
+ * sizing, this minimizes flushes even when some TCG threads generate a lot
+ * more code than others.
+ */
+struct tcg_region_state {
+    QemuMutex lock;
+    void *buf;      /* set at init time */
+    size_t n;       /* set at init time */
+    size_t size;    /* size of one region; set at init time */
+    size_t current; /* protected by the lock */
+    size_t n_full;  /* protected by the lock */
+};
+
+static struct tcg_region_state region;
+
 static TCGRegSet tcg_target_available_regs[2];
 static TCGRegSet tcg_target_call_clobber_regs;
=20
@@ -257,6 +275,177 @@ TCGLabel *gen_new_label(void)
=20
 #include "tcg-target.inc.c"
=20
+static void tcg_region_assign(TCGContext *s, size_t curr_region)
+{
+    size_t guard_size =3D qemu_real_host_page_size;
+    void *buf =3D region.buf + curr_region * (region.size + guard_size);
+
+    s->code_gen_buffer =3D buf;
+    s->code_gen_ptr =3D buf;
+    s->code_gen_buffer_size =3D region.size;
+    s->code_gen_highwater =3D buf + region.size - TCG_HIGHWATER;
+}
+
+static bool tcg_region_alloc__locked(TCGContext *s)
+{
+    if (region.current =3D=3D region.n) {
+        return true;
+    }
+    tcg_region_assign(s, region.current);
+    region.current++;
+    return false;
+}
+
+/*
+ * Request a new region once the one in use has filled up.
+ * Returns true on error.
+ */
+static bool tcg_region_alloc(TCGContext *s)
+{
+    bool err;
+
+    qemu_mutex_lock(&region.lock);
+    err =3D tcg_region_alloc__locked(s);
+    if (!err) {
+        region.n_full++;
+    }
+    qemu_mutex_unlock(&region.lock);
+    return err;
+}
+
+/*
+ * Perform a context's first region allocation.
+ * This function does _not_ increment region.n_full.
+ */
+static inline bool tcg_region_initial_alloc__locked(TCGContext *s)
+{
+    return tcg_region_alloc__locked(s);
+}
+
+/* Call from a safe-work context */
+void tcg_region_reset_all(void)
+{
+    unsigned int i;
+
+    qemu_mutex_lock(&region.lock);
+    region.current =3D 0;
+    region.n_full =3D 0;
+
+    for (i =3D 0; i < n_tcg_ctxs; i++) {
+        if (unlikely(tcg_region_initial_alloc__locked(tcg_ctxs[i]))) {
+            tcg_abort();
+        }
+    }
+    qemu_mutex_unlock(&region.lock);
+}
+
+static void tcg_region_set_guard_pages(void)
+{
+    size_t guard_size =3D qemu_real_host_page_size;
+    size_t i;
+
+    for (i =3D 0; i < region.n; i++) {
+        void *guard =3D region.buf + region.size + i * (region.size + guar=
d_size);
+
+        if (qemu_mprotect_none(guard, qemu_real_host_page_size)) {
+            tcg_abort();
+        }
+    }
+}
+
+/*
+ * Initializes region partitioning, setting the number of regions via
+ * @n_regions.
+ * Set @n_regions to 0 or 1 to use a single region that uses all of
+ * code_gen_buffer.
+ *
+ * Called at init time from the parent thread (i.e. the one calling
+ * tcg_context_init), after the target's TCG globals have been set.
+ *
+ * Region partitioning works by splitting code_gen_buffer into separate re=
gions,
+ * and then assigning regions to TCG threads so that the threads can trans=
late
+ * code in parallel without synchronization.
+ */
+void tcg_region_init(size_t n_regions)
+{
+    void *buf =3D tcg_init_ctx.code_gen_buffer;
+    size_t size =3D tcg_init_ctx.code_gen_buffer_size;
+
+    if (!n_regions) {
+        n_regions =3D 1;
+    }
+
+    /* start on a page-aligned address */
+    buf =3D QEMU_ALIGN_PTR_UP(buf, qemu_real_host_page_size);
+    if (unlikely(buf > tcg_init_ctx.code_gen_buffer + size)) {
+        tcg_abort();
+    }
+    /* discard that initial portion */
+    size -=3D buf - tcg_init_ctx.code_gen_buffer;
+
+    /* make region.size a multiple of page_size */
+    region.size =3D size / n_regions;
+    region.size &=3D qemu_real_host_page_mask;
+
+    /* A region must have at least 2 pages; one code, one guard */
+    if (unlikely(region.size < 2 * qemu_real_host_page_size)) {
+        tcg_abort();
+    }
+
+    /* do not count the guard page in region.size */
+    region.size -=3D qemu_real_host_page_size;
+    region.n =3D n_regions;
+    region.buf =3D buf;
+    tcg_region_set_guard_pages();
+    qemu_mutex_init(&region.lock);
+    /*
+     * We do not yet support multiple TCG contexts, so do the initial
+     * allocation now.
+     */
+    if (unlikely(tcg_region_initial_alloc__locked(tcg_ctx))) {
+        tcg_abort();
+    }
+}
+
+/*
+ * Returns the size (in bytes) of all translated code (i.e. from all regio=
ns)
+ * currently in the cache.
+ * See also: tcg_code_capacity()
+ * Do not confuse with tcg_current_code_size(); that one applies to a sing=
le
+ * TCG context.
+ */
+size_t tcg_code_size(void)
+{
+    unsigned int i;
+    size_t total;
+
+    qemu_mutex_lock(&region.lock);
+    total =3D region.n_full * (region.size - TCG_HIGHWATER);
+    for (i =3D 0; i < n_tcg_ctxs; i++) {
+        const TCGContext *s =3D tcg_ctxs[i];
+        size_t size;
+
+        size =3D atomic_read(&s->code_gen_ptr) - s->code_gen_buffer;
+        if (unlikely(size > s->code_gen_buffer_size)) {
+            tcg_abort();
+        }
+        total +=3D size;
+    }
+    qemu_mutex_unlock(&region.lock);
+    return total;
+}
+
+/*
+ * Returns the code capacity (in bytes) of the entire cache, i.e. includin=
g all
+ * regions.
+ * See also: tcg_code_size()
+ */
+size_t tcg_code_capacity(void)
+{
+    /* no need for synchronization; these variables are set at init time */
+    return region.n * (region.size - TCG_HIGHWATER);
+}
+
 /* pool based memory allocation */
 void *tcg_malloc_internal(TCGContext *s, int size)
 {
@@ -406,13 +595,17 @@ TranslationBlock *tcg_tb_alloc(TCGContext *s)
     TranslationBlock *tb;
     void *next;
=20
+ retry:
     tb =3D (void *)ROUND_UP((uintptr_t)s->code_gen_ptr, align);
     next =3D (void *)ROUND_UP((uintptr_t)(tb + 1), align);
=20
     if (unlikely(next > s->code_gen_highwater)) {
-        return NULL;
+        if (tcg_region_alloc(s)) {
+            return NULL;
+        }
+        goto retry;
     }
-    s->code_gen_ptr =3D next;
+    atomic_set(&s->code_gen_ptr, next);
     return tb;
 }
=20
--=20
2.7.4