Series comparison

-[PULL 0/8] tcg patch queue
+[PULL v2.5 00/27] tcg patch queue
-The following changes since commit e0175b71638cf4398903c0d25f93fe62e0606389:
+v2: Fix target/loongarch printf formats for vaddr
     Include two more reviewed patches.
-  Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20200228' into staging (2020-02-28 16:39:27 +0000)
+This time with actual pull urls.  :-/
 r~
 The following changes since commit db7aa99ef894e88fc5eedf02ca2579b8c344b2ec:
   Merge tag 'hw-misc-20250216' of https://github.com/philmd/qemu into staging (2025-02-16 20:48:06 -0500)
 are available in the Git repository at:
-  https://github.com/rth7680/qemu.git tags/pull-tcg-20200228
+  https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20250215-2
-for you to fetch changes up to 600e17b261555c56a048781b8dd5ba3985650013:
+for you to fetch changes up to a39bdd0f4ba96fcbb6b5bcb6e89591d2b24f52eb:
-  accel/tcg: increase default code gen buffer size for 64 bit (2020-02-28 17:43:31 -0800)
+  tcg: Remove TCG_TARGET_HAS_{br,set}cond2 from riscv and loongarch64 (2025-02-17 09:52:07 -0800)
 ----------------------------------------------------------------
-Fix race in cpu_exec_step_atomic.
+tcg: Remove last traces of TCG_TARGET_NEED_POOL_LABELS
-Work around compile failure with -fno-inine.
+tcg: Cleanups after disallowing 64-on-32
-Expand tcg/arm epilogue inline.
+tcg: Introduce constraint for zero register
-Adjustments to the default code gen buffer size.
+tcg: Remove TCG_TARGET_HAS_{br,set}cond2 from riscv and loongarch64
 tcg/i386: Use tcg_{high,unsigned}_cond in tcg_out_brcond2
 linux-user: Move TARGET_SA_RESTORER out of generic/signal.h
 linux-user: Fix alignment when unmapping excess reservation
 target/sparc: Fix register selection for all F*TOx and FxTO* instructions
 target/sparc: Fix gdbstub incorrectly handling registers f32-f62
 target/sparc: fake UltraSPARC T1 PCR and PIC registers
 ----------------------------------------------------------------
-Alex Bennée (5):
+Andreas Schwab (1):
-      accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025)
+      linux-user: Move TARGET_SA_RESTORER out of generic/signal.h
       accel/tcg: use units.h for defining code gen buffer sizes
       accel/tcg: remove link between guest ram and TCG cache size
       accel/tcg: only USE_STATIC_CODE_GEN_BUFFER on 32 bit hosts
       accel/tcg: increase default code gen buffer size for 64 bit
-Richard Henderson (2):
+Artyom Tarasenko (1):
-      tcg/arm: Split out tcg_out_epilogue
+      target/sparc: fake UltraSPARC T1 PCR and PIC registers
       tcg/arm: Expand epilogue inline
-Zenghui Yu (1):
+Fabiano Rosas (1):
-      compiler.h: Don't use compile-time assert when __NO_INLINE__ is defined
+      elfload: Fix alignment when unmapping excess reservation
- include/qemu/compiler.h   |  2 +-
+Mikael Szreder (2):
- accel/tcg/cpu-exec.c      | 21 ++++++++--------
+      target/sparc: Fix register selection for all F*TOx and FxTO* instructions
- accel/tcg/translate-all.c | 61 ++++++++++++++++++++++++++++-------------------
+      target/sparc: Fix gdbstub incorrectly handling registers f32-f62
  tcg/arm/tcg-target.inc.c  | 29 ++++++++++------------
 files changed, 60 insertions(+), 53 deletions(-)
+Richard Henderson (22):
+      tcg: Remove last traces of TCG_TARGET_NEED_POOL_LABELS
+      tcg: Remove TCG_OVERSIZED_GUEST
+      tcg: Drop support for two address registers in gen_ldst
+      tcg: Merge INDEX_op_qemu_*_{a32,a64}_*
+      tcg/arm: Drop addrhi from prepare_host_addr
+      tcg/i386: Drop addrhi from prepare_host_addr
+      tcg/mips: Drop addrhi from prepare_host_addr
+      tcg/ppc: Drop addrhi from prepare_host_addr
+      tcg: Replace addr{lo,hi}_reg with addr_reg in TCGLabelQemuLdst
+      plugins: Fix qemu_plugin_read_memory_vaddr parameters
+      accel/tcg: Fix tlb_set_page_with_attrs, tlb_set_page
+      target/loongarch: Use VADDR_PRIx for logging pc_next
+      include/exec: Change vaddr to uintptr_t
+      include/exec: Use uintptr_t in CPUTLBEntry
+      tcg: Introduce the 'z' constraint for a hardware zero register
+      tcg/aarch64: Use 'z' constraint
+      tcg/loongarch64: Use 'z' constraint
+      tcg/mips: Use 'z' constraint
+      tcg/riscv: Use 'z' constraint
+      tcg/sparc64: Use 'z' constraint
+      tcg/i386: Use tcg_{high,unsigned}_cond in tcg_out_brcond2
+      tcg: Remove TCG_TARGET_HAS_{br,set}cond2 from riscv and loongarch64
+ include/exec/tlb-common.h                          |  10 +-
+ include/exec/vaddr.h                               |  16 +-
+ include/qemu/atomic.h                              |  18 +-
+ include/tcg/oversized-guest.h                      |  23 ---
+ include/tcg/tcg-opc.h                              |  28 +--
+ include/tcg/tcg.h                                  |   3 +-
+ linux-user/aarch64/target_signal.h                 |   2 +
+ linux-user/arm/target_signal.h                     |   2 +
+ linux-user/generic/signal.h                        |   1 -
+ linux-user/i386/target_signal.h                    |   2 +
+ linux-user/m68k/target_signal.h                    |   1 +
+ linux-user/microblaze/target_signal.h              |   2 +
+ linux-user/ppc/target_signal.h                     |   2 +
+ linux-user/s390x/target_signal.h                   |   2 +
+ linux-user/sh4/target_signal.h                     |   2 +
+ linux-user/x86_64/target_signal.h                  |   2 +
+ linux-user/xtensa/target_signal.h                  |   2 +
+ tcg/aarch64/tcg-target-con-set.h                   |  12 +-
+ tcg/aarch64/tcg-target.h                           |   2 +
+ tcg/loongarch64/tcg-target-con-set.h               |  15 +-
+ tcg/loongarch64/tcg-target-con-str.h               |   1 -
+ tcg/loongarch64/tcg-target-has.h                   |   2 -
+ tcg/loongarch64/tcg-target.h                       |   2 +
+ tcg/mips/tcg-target-con-set.h                      |  26 +--
+ tcg/mips/tcg-target-con-str.h                      |   1 -
+ tcg/mips/tcg-target.h                              |   2 +
+ tcg/riscv/tcg-target-con-set.h                     |  10 +-
+ tcg/riscv/tcg-target-con-str.h                     |   1 -
+ tcg/riscv/tcg-target-has.h                         |   2 -
+ tcg/riscv/tcg-target.h                             |   2 +
+ tcg/sparc64/tcg-target-con-set.h                   |  12 +-
+ tcg/sparc64/tcg-target-con-str.h                   |   1 -
+ tcg/sparc64/tcg-target.h                           |   3 +-
+ tcg/tci/tcg-target.h                               |   1 -
+ accel/tcg/cputlb.c                                 |  32 +---
+ accel/tcg/tcg-all.c                                |   9 +-
+ linux-user/elfload.c                               |   4 +-
+ plugins/api.c                                      |   2 +-
+ target/arm/ptw.c                                   |  34 ----
+ target/loongarch/tcg/translate.c                   |   2 +-
+ target/riscv/cpu_helper.c                          |  13 +-
+ target/sparc/gdbstub.c                             |  18 +-
+ target/sparc/translate.c                           |  19 +++
+ tcg/optimize.c                                     |  21 +--
+ tcg/tcg-op-ldst.c                                  | 103 +++--------
+ tcg/tcg.c                                          |  97 +++++------
+ tcg/tci.c                                          | 119 +++----------
+ docs/devel/multi-thread-tcg.rst                    |   1 -
+ docs/devel/tcg-ops.rst                             |   4 +-
+ target/loongarch/tcg/insn_trans/trans_atomic.c.inc |   2 +-
+ target/sparc/insns.decode                          |  19 ++-
+ tcg/aarch64/tcg-target.c.inc                       |  86 ++++------
+ tcg/arm/tcg-target.c.inc                           | 114 ++++---------
+ tcg/i386/tcg-target.c.inc                          | 190 +++++----------------
+ tcg/loongarch64/tcg-target.c.inc                   |  72 +++-----
+ tcg/mips/tcg-target.c.inc                          | 169 ++++++------------
+ tcg/ppc/tcg-target.c.inc                           | 164 +++++-------------
+ tcg/riscv/tcg-target.c.inc                         |  56 +++---
+ tcg/s390x/tcg-target.c.inc                         |  40 ++---
+ tcg/sparc64/tcg-target.c.inc                       |  45 ++---
+ tcg/tci/tcg-target.c.inc                           |  60 ++-----
+files changed, 548 insertions(+), 1160 deletions(-)
+ delete mode 100644 include/tcg/oversized-guest.h

-[PULL 1/8] accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025)
+Deleted patch
-From: Alex Bennée <alex.bennee@linaro.org>
-The bug describes a race whereby cpu_exec_step_atomic can acquire a TB
-which is invalidated by a tb_flush before we execute it. This doesn't
-affect the other cpu_exec modes as a tb_flush by it's nature can only
-occur on a quiescent system. The race was described as:
-  B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code
-  B3. tcg_tb_alloc obtains a new TB
-      C3. TB obtained with tb_lookup__cpu_state or tb_gen_code
-          (same TB as B2)
-          A3. start_exclusive critical section entered
-          A4. do_tb_flush is called, TB memory freed/re-allocated
-          A5. end_exclusive exits critical section
-  B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code
-  B3. tcg_tb_alloc reallocates TB from B2
-      C4. start_exclusive critical section entered
-      C5. cpu_tb_exec executes the TB code that was free in A4
-The simplest fix is to widen the exclusive period to include the TB
-lookup. As a result we can drop the complication of checking we are in
-the exclusive region before we end it.
-Cc: Yifan <me@yifanlu.com>
-Buglink: https://bugs.launchpad.net/qemu/+bug/1863025
-Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
-Message-Id: <20200214144952.15502-1-alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- accel/tcg/cpu-exec.c | 21 +++++++++++----------
-file changed, 11 insertions(+), 10 deletions(-)
-diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
-index XXXXXXX..XXXXXXX 100644
---- a/accel/tcg/cpu-exec.c
-+++ b/accel/tcg/cpu-exec.c
-@@ -XXX,XX +XXX,XX @@ void cpu_exec_step_atomic(CPUState *cpu)
-     uint32_t cf_mask = cflags & CF_HASH_MASK;
-     if (sigsetjmp(cpu->jmp_env, 0) == 0) {
-+        start_exclusive();
-+
-         tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, cf_mask);
-         if (tb == NULL) {
-             mmap_lock();
-@@ -XXX,XX +XXX,XX @@ void cpu_exec_step_atomic(CPUState *cpu)
-             mmap_unlock();
-         }
--        start_exclusive();
--
-         /* Since we got here, we know that parallel_cpus must be true.  */
-         parallel_cpus = false;
-         cc->cpu_exec_enter(cpu);
-@@ -XXX,XX +XXX,XX @@ void cpu_exec_step_atomic(CPUState *cpu)
-         qemu_plugin_disable_mem_helpers(cpu);
-     }
--    if (cpu_in_exclusive_context(cpu)) {
--        /* We might longjump out of either the codegen or the
--         * execution, so must make sure we only end the exclusive
--         * region if we started it.
--         */
--        parallel_cpus = true;
--        end_exclusive();
--    }
-+
-+    /*
-+     * As we start the exclusive region before codegen we must still
-+     * be in the region if we longjump out of either the codegen or
-+     * the execution.
-+     */
-+    g_assert(cpu_in_exclusive_context(cpu));
-+    parallel_cpus = true;
-+    end_exclusive();
- }
- struct tb_desc {
---
-.20.1

-[PULL 2/8] compiler.h: Don't use compile-time assert when __NO_INLINE__ is defined
+Deleted patch
-From: Zenghui Yu <yuzenghui@huawei.com>
-Our robot reported the following compile-time warning while compiling
-Qemu with -fno-inline cflags:
-In function 'load_memop',
-    inlined from 'load_helper' at /qemu/accel/tcg/cputlb.c:1578:20,
-    inlined from 'full_ldub_mmu' at /qemu/accel/tcg/cputlb.c:1624:12:
-/qemu/accel/tcg/cputlb.c:1502:9: error: call to 'qemu_build_not_reached' declared with attribute error: code path is reachable
-         qemu_build_not_reached();
-         ^~~~~~~~~~~~~~~~~~~~~~~~
-    [...]
-It looks like a false-positive because only (MO_UB ^ MO_BSWAP) will
-hit the default case in load_memop() while need_swap (size > 1) has
-already ensured that MO_UB is not involved.
-So the thing is that compilers get confused by the -fno-inline and
-just can't accurately evaluate memop_size(op) at compile time, and
-then the qemu_build_not_reached() is wrongly triggered by (MO_UB ^
-MO_BSWAP).  Let's carefully don't use the compile-time assert when
-no functions will be inlined into their callers.
-Reported-by: Euler Robot <euler.robot@huawei.com>
-Suggested-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
-Message-Id: <20200205141545.180-1-yuzenghui@huawei.com>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- include/qemu/compiler.h | 2 +-
-file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/qemu/compiler.h
-+++ b/include/qemu/compiler.h
-@@ -XXX,XX +XXX,XX @@
-  * supports QEMU_ERROR, this will be reported at compile time; otherwise
-  * this will be reported at link time due to the missing symbol.
-  */
--#ifdef __OPTIMIZE__
-+#if defined(__OPTIMIZE__) && !defined(__NO_INLINE__)
- extern void QEMU_NORETURN QEMU_ERROR("code path is reachable")
-     qemu_build_not_reached(void);
- #else
---
-.20.1

-[PULL 3/8] tcg/arm: Split out tcg_out_epilogue
+Deleted patch
-From: Richard Henderson <rth@twiddle.net>
-We will shortly use this function from tcg_out_op as well.
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Signed-off-by: Richard Henderson <rth@twiddle.net>
----
- tcg/arm/tcg-target.inc.c | 19 +++++++++++--------
-file changed, 11 insertions(+), 8 deletions(-)
-diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/arm/tcg-target.inc.c
-+++ b/tcg/arm/tcg-target.inc.c
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
- }
- static tcg_insn_unit *tb_ret_addr;
-+static void tcg_out_epilogue(TCGContext *s);
- static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
-                 const TCGArg *args, const int *const_args)
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
-       + TCG_TARGET_STACK_ALIGN - 1) \
-      & -TCG_TARGET_STACK_ALIGN)
-+#define STACK_ADDEND  (FRAME_SIZE - PUSH_SIZE)
-+
- static void tcg_target_qemu_prologue(TCGContext *s)
- {
--    int stack_addend;
--
-     /* Calling convention requires us to save r4-r11 and lr.  */
-     /* stmdb sp!, { r4 - r11, lr } */
-     tcg_out32(s, (COND_AL << 28) | 0x092d4ff0);
-     /* Reserve callee argument and tcg temp space.  */
--    stack_addend = FRAME_SIZE - PUSH_SIZE;
--
-     tcg_out_dat_rI(s, COND_AL, ARITH_SUB, TCG_REG_CALL_STACK,
--                   TCG_REG_CALL_STACK, stack_addend, 1);
-+                   TCG_REG_CALL_STACK, STACK_ADDEND, 1);
-     tcg_set_frame(s, TCG_REG_CALL_STACK, TCG_STATIC_CALL_ARGS_SIZE,
-                   CPU_TEMP_BUF_NLONGS * sizeof(long));
-@@ -XXX,XX +XXX,XX @@ static void tcg_target_qemu_prologue(TCGContext *s)
-      */
-     s->code_gen_epilogue = s->code_ptr;
-     tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R0, 0);
--
--    /* TB epilogue */
-     tb_ret_addr = s->code_ptr;
-+    tcg_out_epilogue(s);
-+}
-+
-+static void tcg_out_epilogue(TCGContext *s)
-+{
-+    /* Release local stack frame.  */
-     tcg_out_dat_rI(s, COND_AL, ARITH_ADD, TCG_REG_CALL_STACK,
--                   TCG_REG_CALL_STACK, stack_addend, 1);
-+                   TCG_REG_CALL_STACK, STACK_ADDEND, 1);
-     /* ldmia sp!, { r4 - r11, pc } */
-     tcg_out32(s, (COND_AL << 28) | 0x08bd8ff0);
---
-.20.1

-[PULL 4/8] tcg/arm: Expand epilogue inline
+Deleted patch
-From: Richard Henderson <rth@twiddle.net>
-It is, after all, just two instructions.
-Profiling on a cortex-a15, using -d nochain to increase the number
-of exit_tb that are executed, shows a minor improvement of 0.5%.
-Signed-off-by: Richard Henderson <rth@twiddle.net>
----
- tcg/arm/tcg-target.inc.c | 12 ++----------
-file changed, 2 insertions(+), 10 deletions(-)
-diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/arm/tcg-target.inc.c
-+++ b/tcg/arm/tcg-target.inc.c
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
- #endif
- }
--static tcg_insn_unit *tb_ret_addr;
- static void tcg_out_epilogue(TCGContext *s);
- static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
-@@ -XXX,XX +XXX,XX @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
-     switch (opc) {
-     case INDEX_op_exit_tb:
--        /* Reuse the zeroing that exists for goto_ptr.  */
--        a0 = args[0];
--        if (a0 == 0) {
--            tcg_out_goto(s, COND_AL, s->code_gen_epilogue);
--        } else {
--            tcg_out_movi32(s, COND_AL, TCG_REG_R0, args[0]);
--            tcg_out_goto(s, COND_AL, tb_ret_addr);
--        }
-+        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R0, args[0]);
-+        tcg_out_epilogue(s);
-         break;
-     case INDEX_op_goto_tb:
-         {
-@@ -XXX,XX +XXX,XX @@ static void tcg_target_qemu_prologue(TCGContext *s)
-      */
-     s->code_gen_epilogue = s->code_ptr;
-     tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R0, 0);
--    tb_ret_addr = s->code_ptr;
-     tcg_out_epilogue(s);
- }
---
-.20.1

-[PULL 5/8] accel/tcg: use units.h for defining code gen buffer sizes
+Deleted patch
-From: Alex Bennée <alex.bennee@linaro.org>
-It's easier to read.
-Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Message-Id: <20200228192415.19867-2-alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- accel/tcg/translate-all.c | 19 ++++++++++---------
-file changed, 10 insertions(+), 9 deletions(-)
-diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
-index XXXXXXX..XXXXXXX 100644
---- a/accel/tcg/translate-all.c
-+++ b/accel/tcg/translate-all.c
-@@ -XXX,XX +XXX,XX @@
-  */
- #include "qemu/osdep.h"
-+#include "qemu/units.h"
- #include "qemu-common.h"
- #define NO_CPU_IO_DEFS
-@@ -XXX,XX +XXX,XX @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
- /* Minimum size of the code gen buffer.  This number is randomly chosen,
-    but not so small that we can't have a fair number of TB's live.  */
--#define MIN_CODE_GEN_BUFFER_SIZE     (1024u * 1024)
-+#define MIN_CODE_GEN_BUFFER_SIZE     (1 * MiB)
- /* Maximum size of the code gen buffer we'd like to use.  Unless otherwise
-    indicated, this is constrained by the range of direct branches on the
-    host cpu, as used by the TCG implementation of goto_tb.  */
- #if defined(__x86_64__)
--# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
-+# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
- #elif defined(__sparc__)
--# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
-+# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
- #elif defined(__powerpc64__)
--# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
-+# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
- #elif defined(__powerpc__)
--# define MAX_CODE_GEN_BUFFER_SIZE  (32u * 1024 * 1024)
-+# define MAX_CODE_GEN_BUFFER_SIZE  (32 * MiB)
- #elif defined(__aarch64__)
--# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
-+# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
- #elif defined(__s390x__)
-   /* We have a +- 4GB range on the branches; leave some slop.  */
--# define MAX_CODE_GEN_BUFFER_SIZE  (3ul * 1024 * 1024 * 1024)
-+# define MAX_CODE_GEN_BUFFER_SIZE  (3 * GiB)
- #elif defined(__mips__)
-   /* We have a 256MB branch region, but leave room to make sure the
-      main executable is also within that region.  */
--# define MAX_CODE_GEN_BUFFER_SIZE  (128ul * 1024 * 1024)
-+# define MAX_CODE_GEN_BUFFER_SIZE  (128 * MiB)
- #else
- # define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
- #endif
--#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32u * 1024 * 1024)
-+#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
- #define DEFAULT_CODE_GEN_BUFFER_SIZE \
-   (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
---
-.20.1

-[PULL 6/8] accel/tcg: remove link between guest ram and TCG cache size
+Deleted patch
-From: Alex Bennée <alex.bennee@linaro.org>
-Basing the TB cache size on the ram_size was always a little heuristic
-and was broken by a1b18df9a4 which caused ram_size not to be fully
-realised at the time we initialise the TCG translation cache.
-The current DEFAULT_CODE_GEN_BUFFER_SIZE may still be a little small
-but follow-up patches will address that.
-Fixes: a1b18df9a4
-Cc: Igor Mammedov <imammedo@redhat.com>
-Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
-Message-Id: <20200228192415.19867-3-alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- accel/tcg/translate-all.c | 8 --------
-file changed, 8 deletions(-)
-diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
-index XXXXXXX..XXXXXXX 100644
---- a/accel/tcg/translate-all.c
-+++ b/accel/tcg/translate-all.c
-@@ -XXX,XX +XXX,XX @@ static inline size_t size_code_gen_buffer(size_t tb_size)
- {
-     /* Size the buffer.  */
-     if (tb_size == 0) {
--#ifdef USE_STATIC_CODE_GEN_BUFFER
-         tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;
--#else
--        /* ??? Needs adjustments.  */
--        /* ??? If we relax the requirement that CONFIG_USER_ONLY use the
--           static buffer, we could size this on RESERVED_VA, on the text
--           segment size of the executable, or continue to use the default.  */
--        tb_size = (unsigned long)(ram_size / 4);
--#endif
-     }
-     if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {
-         tb_size = MIN_CODE_GEN_BUFFER_SIZE;
---
-.20.1

-[PULL 7/8] accel/tcg: only USE_STATIC_CODE_GEN_BUFFER on 32 bit hosts
+Deleted patch
-From: Alex Bennée <alex.bennee@linaro.org>
-There is no particular reason to use a static codegen buffer on 64 bit
-hosts as we have address space to burn. Allow the common CONFIG_USER
-case to use the mmap'ed buffers like SoftMMU.
-Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
-Message-Id: <20200228192415.19867-4-alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- accel/tcg/translate-all.c | 11 ++++++-----
-file changed, 6 insertions(+), 5 deletions(-)
-diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
-index XXXXXXX..XXXXXXX 100644
---- a/accel/tcg/translate-all.c
-+++ b/accel/tcg/translate-all.c
-@@ -XXX,XX +XXX,XX @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
-     }
- }
--#if defined(CONFIG_USER_ONLY)
--/* Currently it is not recommended to allocate big chunks of data in
--   user mode. It will change when a dedicated libc will be used.  */
--/* ??? 64-bit hosts ought to have no problem mmaping data outside the
--   region in which the guest needs to run.  Revisit this.  */
-+#if defined(CONFIG_USER_ONLY) && TCG_TARGET_REG_BITS == 32
-+/*
-+ * For user mode on smaller 32 bit systems we may run into trouble
-+ * allocating big chunks of data in the right place. On these systems
-+ * we utilise a static code generation buffer directly in the binary.
-+ */
- #define USE_STATIC_CODE_GEN_BUFFER
- #endif
---
-.20.1

-[PULL 8/8] accel/tcg: increase default code gen buffer size for 64 bit
+Deleted patch
-From: Alex Bennée <alex.bennee@linaro.org>
-While 32mb is certainly usable a full system boot ends up flushing the
-codegen buffer nearly 100 times. Increase the default on 64 bit hosts
-to take advantage of all that spare memory. After this change I can
-boot my tests system without any TB flushes.
-As we usually run more CONFIG_USER binaries at a time in typical usage
-we aren't quite as profligate for user-mode code generation usage. We
-also bring the static code gen defies to the same place to keep all
-the reasoning in the comments together.
-Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
-Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
-Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
-Message-Id: <20200228192415.19867-5-alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- accel/tcg/translate-all.c | 35 ++++++++++++++++++++++++++---------
-file changed, 26 insertions(+), 9 deletions(-)
-diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
-index XXXXXXX..XXXXXXX 100644
---- a/accel/tcg/translate-all.c
-+++ b/accel/tcg/translate-all.c
-@@ -XXX,XX +XXX,XX @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
-     }
- }
--#if defined(CONFIG_USER_ONLY) && TCG_TARGET_REG_BITS == 32
--/*
-- * For user mode on smaller 32 bit systems we may run into trouble
-- * allocating big chunks of data in the right place. On these systems
-- * we utilise a static code generation buffer directly in the binary.
-- */
--#define USE_STATIC_CODE_GEN_BUFFER
--#endif
--
- /* Minimum size of the code gen buffer.  This number is randomly chosen,
-    but not so small that we can't have a fair number of TB's live.  */
- #define MIN_CODE_GEN_BUFFER_SIZE     (1 * MiB)
-@@ -XXX,XX +XXX,XX @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
- # define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
- #endif
-+#if TCG_TARGET_REG_BITS == 32
- #define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
-+#ifdef CONFIG_USER_ONLY
-+/*
-+ * For user mode on smaller 32 bit systems we may run into trouble
-+ * allocating big chunks of data in the right place. On these systems
-+ * we utilise a static code generation buffer directly in the binary.
-+ */
-+#define USE_STATIC_CODE_GEN_BUFFER
-+#endif
-+#else /* TCG_TARGET_REG_BITS == 64 */
-+#ifdef CONFIG_USER_ONLY
-+/*
-+ * As user-mode emulation typically means running multiple instances
-+ * of the translator don't go too nuts with our default code gen
-+ * buffer lest we make things too hard for the OS.
-+ */
-+#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (128 * MiB)
-+#else
-+/*
-+ * We expect most system emulation to run one or two guests per host.
-+ * Users running large scale system emulation may want to tweak their
-+ * runtime setup via the tb-size control on the command line.
-+ */
-+#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (1 * GiB)
-+#endif
-+#endif
- #define DEFAULT_CODE_GEN_BUFFER_SIZE \
-   (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
---
-.20.1

The following changes since commit e0175b71638cf4398903c0d25f93fe62e0606389:

Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20200228' into staging (2020-02-28 16:39:27 +0000)

are available in the Git repository at:

https://github.com/rth7680/qemu.git tags/pull-tcg-20200228

for you to fetch changes up to 600e17b261555c56a048781b8dd5ba3985650013:

accel/tcg: increase default code gen buffer size for 64 bit (2020-02-28 17:43:31 -0800)

----------------------------------------------------------------
Fix race in cpu_exec_step_atomic.
Work around compile failure with -fno-inine.
Expand tcg/arm epilogue inline.
Adjustments to the default code gen buffer size.

----------------------------------------------------------------
Alex Bennée (5):
      accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025)
      accel/tcg: use units.h for defining code gen buffer sizes
      accel/tcg: remove link between guest ram and TCG cache size
      accel/tcg: only USE_STATIC_CODE_GEN_BUFFER on 32 bit hosts
      accel/tcg: increase default code gen buffer size for 64 bit

Richard Henderson (2):
      tcg/arm: Split out tcg_out_epilogue
      tcg/arm: Expand epilogue inline

Zenghui Yu (1):
      compiler.h: Don't use compile-time assert when __NO_INLINE__ is defined

include/qemu/compiler.h   |  2 +-
 accel/tcg/cpu-exec.c      | 21 ++++++++--------
 accel/tcg/translate-all.c | 61 ++++++++++++++++++++++++++++-------------------
 tcg/arm/tcg-target.inc.c  | 29 ++++++++++------------
 4 files changed, 60 insertions(+), 53 deletions(-)

From: Alex Bennée <alex.bennee@linaro.org>

The bug describes a race whereby cpu_exec_step_atomic can acquire a TB
which is invalidated by a tb_flush before we execute it. This doesn't
affect the other cpu_exec modes as a tb_flush by it's nature can only
occur on a quiescent system. The race was described as:

B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code
  B3. tcg_tb_alloc obtains a new TB

C3. TB obtained with tb_lookup__cpu_state or tb_gen_code
          (same TB as B2)

A3. start_exclusive critical section entered
          A4. do_tb_flush is called, TB memory freed/re-allocated
          A5. end_exclusive exits critical section

B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code
  B3. tcg_tb_alloc reallocates TB from B2

C4. start_exclusive critical section entered
      C5. cpu_tb_exec executes the TB code that was free in A4

The simplest fix is to widen the exclusive period to include the TB
lookup. As a result we can drop the complication of checking we are in
the exclusive region before we end it.

Cc: Yifan <me@yifanlu.com>
Buglink: https://bugs.launchpad.net/qemu/+bug/1863025
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200214144952.15502-1-alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/cpu-exec.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -XXX,XX +XXX,XX @@ void cpu_exec_step_atomic(CPUState *cpu)
     uint32_t cf_mask = cflags & CF_HASH_MASK;
 
     if (sigsetjmp(cpu->jmp_env, 0) == 0) {
+        start_exclusive();
+
         tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, cf_mask);
         if (tb == NULL) {
             mmap_lock();
@@ -XXX,XX +XXX,XX @@ void cpu_exec_step_atomic(CPUState *cpu)
             mmap_unlock();
         }
 
-        start_exclusive();
-
         /* Since we got here, we know that parallel_cpus must be true.  */
         parallel_cpus = false;
         cc->cpu_exec_enter(cpu);
@@ -XXX,XX +XXX,XX @@ void cpu_exec_step_atomic(CPUState *cpu)
         qemu_plugin_disable_mem_helpers(cpu);
     }
 
-    if (cpu_in_exclusive_context(cpu)) {
-        /* We might longjump out of either the codegen or the
-         * execution, so must make sure we only end the exclusive
-         * region if we started it.
-         */
-        parallel_cpus = true;
-        end_exclusive();
-    }
+
+    /*
+     * As we start the exclusive region before codegen we must still
+     * be in the region if we longjump out of either the codegen or
+     * the execution.
+     */
+    g_assert(cpu_in_exclusive_context(cpu));
+    parallel_cpus = true;
+    end_exclusive();
 }
 
 struct tb_desc {
-- 
2.20.1

From: Zenghui Yu <yuzenghui@huawei.com>

Our robot reported the following compile-time warning while compiling
Qemu with -fno-inline cflags:

In function 'load_memop',
    inlined from 'load_helper' at /qemu/accel/tcg/cputlb.c:1578:20,
    inlined from 'full_ldub_mmu' at /qemu/accel/tcg/cputlb.c:1624:12:
/qemu/accel/tcg/cputlb.c:1502:9: error: call to 'qemu_build_not_reached' declared with attribute error: code path is reachable
         qemu_build_not_reached();
         ^~~~~~~~~~~~~~~~~~~~~~~~
    [...]

It looks like a false-positive because only (MO_UB ^ MO_BSWAP) will
hit the default case in load_memop() while need_swap (size > 1) has
already ensured that MO_UB is not involved.

So the thing is that compilers get confused by the -fno-inline and
just can't accurately evaluate memop_size(op) at compile time, and
then the qemu_build_not_reached() is wrongly triggered by (MO_UB ^
MO_BSWAP).  Let's carefully don't use the compile-time assert when
no functions will be inlined into their callers.

Reported-by: Euler Robot <euler.robot@huawei.com>
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Message-Id: <20200205141545.180-1-yuzenghui@huawei.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/compiler.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/compiler.h
+++ b/include/qemu/compiler.h
@@ -XXX,XX +XXX,XX @@
  * supports QEMU_ERROR, this will be reported at compile time; otherwise
  * this will be reported at link time due to the missing symbol.
  */
-#ifdef __OPTIMIZE__
+#if defined(__OPTIMIZE__) && !defined(__NO_INLINE__)
 extern void QEMU_NORETURN QEMU_ERROR("code path is reachable")
     qemu_build_not_reached(void);
 #else
-- 
2.20.1

From: Richard Henderson <rth@twiddle.net>

We will shortly use this function from tcg_out_op as well.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.inc.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -XXX,XX +XXX,XX @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 }
 
 static tcg_insn_unit *tb_ret_addr;
+static void tcg_out_epilogue(TCGContext *s);
 
 static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                 const TCGArg *args, const int *const_args)
@@ -XXX,XX +XXX,XX @@ static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
       + TCG_TARGET_STACK_ALIGN - 1) \
      & -TCG_TARGET_STACK_ALIGN)
 
+#define STACK_ADDEND  (FRAME_SIZE - PUSH_SIZE)
+
 static void tcg_target_qemu_prologue(TCGContext *s)
 {
-    int stack_addend;
-
     /* Calling convention requires us to save r4-r11 and lr.  */
     /* stmdb sp!, { r4 - r11, lr } */
     tcg_out32(s, (COND_AL << 28) | 0x092d4ff0);
 
     /* Reserve callee argument and tcg temp space.  */
-    stack_addend = FRAME_SIZE - PUSH_SIZE;
-
     tcg_out_dat_rI(s, COND_AL, ARITH_SUB, TCG_REG_CALL_STACK,
-                   TCG_REG_CALL_STACK, stack_addend, 1);
+                   TCG_REG_CALL_STACK, STACK_ADDEND, 1);
     tcg_set_frame(s, TCG_REG_CALL_STACK, TCG_STATIC_CALL_ARGS_SIZE,
                   CPU_TEMP_BUF_NLONGS * sizeof(long));
 
@@ -XXX,XX +XXX,XX @@ static void tcg_target_qemu_prologue(TCGContext *s)
      */
     s->code_gen_epilogue = s->code_ptr;
     tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R0, 0);
-
-    /* TB epilogue */
     tb_ret_addr = s->code_ptr;
+    tcg_out_epilogue(s);
+}
+
+static void tcg_out_epilogue(TCGContext *s)
+{
+    /* Release local stack frame.  */
     tcg_out_dat_rI(s, COND_AL, ARITH_ADD, TCG_REG_CALL_STACK,
-                   TCG_REG_CALL_STACK, stack_addend, 1);
+                   TCG_REG_CALL_STACK, STACK_ADDEND, 1);
 
     /* ldmia sp!, { r4 - r11, pc } */
     tcg_out32(s, (COND_AL << 28) | 0x08bd8ff0);
-- 
2.20.1

From: Richard Henderson <rth@twiddle.net>

It is, after all, just two instructions.

Profiling on a cortex-a15, using -d nochain to increase the number
of exit_tb that are executed, shows a minor improvement of 0.5%.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.inc.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

From: Alex Bennée <alex.bennee@linaro.org>

It's easier to read.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200228192415.19867-2-alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translate-all.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -XXX,XX +XXX,XX @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/units.h"
 #include "qemu-common.h"
 
 #define NO_CPU_IO_DEFS
@@ -XXX,XX +XXX,XX @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
 
 /* Minimum size of the code gen buffer.  This number is randomly chosen,
    but not so small that we can't have a fair number of TB's live.  */
-#define MIN_CODE_GEN_BUFFER_SIZE     (1024u * 1024)
+#define MIN_CODE_GEN_BUFFER_SIZE     (1 * MiB)
 
 /* Maximum size of the code gen buffer we'd like to use.  Unless otherwise
    indicated, this is constrained by the range of direct branches on the
    host cpu, as used by the TCG implementation of goto_tb.  */
 #if defined(__x86_64__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
 #elif defined(__sparc__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
 #elif defined(__powerpc64__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
 #elif defined(__powerpc__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (32u * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (32 * MiB)
 #elif defined(__aarch64__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
 #elif defined(__s390x__)
   /* We have a +- 4GB range on the branches; leave some slop.  */
-# define MAX_CODE_GEN_BUFFER_SIZE  (3ul * 1024 * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (3 * GiB)
 #elif defined(__mips__)
   /* We have a 256MB branch region, but leave room to make sure the
      main executable is also within that region.  */
-# define MAX_CODE_GEN_BUFFER_SIZE  (128ul * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (128 * MiB)
 #else
 # define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
 #endif
 
-#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32u * 1024 * 1024)
+#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
 
 #define DEFAULT_CODE_GEN_BUFFER_SIZE \
   (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
-- 
2.20.1

From: Alex Bennée <alex.bennee@linaro.org>

Basing the TB cache size on the ram_size was always a little heuristic
and was broken by a1b18df9a4 which caused ram_size not to be fully
realised at the time we initialise the TCG translation cache.

The current DEFAULT_CODE_GEN_BUFFER_SIZE may still be a little small
but follow-up patches will address that.

Fixes: a1b18df9a4
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Message-Id: <20200228192415.19867-3-alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translate-all.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -XXX,XX +XXX,XX @@ static inline size_t size_code_gen_buffer(size_t tb_size)
 {
     /* Size the buffer.  */
     if (tb_size == 0) {
-#ifdef USE_STATIC_CODE_GEN_BUFFER
         tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;
-#else
-        /* ??? Needs adjustments.  */
-        /* ??? If we relax the requirement that CONFIG_USER_ONLY use the
-           static buffer, we could size this on RESERVED_VA, on the text
-           segment size of the executable, or continue to use the default.  */
-        tb_size = (unsigned long)(ram_size / 4);
-#endif
     }
     if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {
         tb_size = MIN_CODE_GEN_BUFFER_SIZE;
-- 
2.20.1

From: Alex Bennée <alex.bennee@linaro.org>

There is no particular reason to use a static codegen buffer on 64 bit
hosts as we have address space to burn. Allow the common CONFIG_USER
case to use the mmap'ed buffers like SoftMMU.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Message-Id: <20200228192415.19867-4-alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translate-all.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -XXX,XX +XXX,XX @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
     }
 }
 
-#if defined(CONFIG_USER_ONLY)
-/* Currently it is not recommended to allocate big chunks of data in
-   user mode. It will change when a dedicated libc will be used.  */
-/* ??? 64-bit hosts ought to have no problem mmaping data outside the
-   region in which the guest needs to run.  Revisit this.  */
+#if defined(CONFIG_USER_ONLY) && TCG_TARGET_REG_BITS == 32
+/*
+ * For user mode on smaller 32 bit systems we may run into trouble
+ * allocating big chunks of data in the right place. On these systems
+ * we utilise a static code generation buffer directly in the binary.
+ */
 #define USE_STATIC_CODE_GEN_BUFFER
 #endif
 
-- 
2.20.1

From: Alex Bennée <alex.bennee@linaro.org>

While 32mb is certainly usable a full system boot ends up flushing the
codegen buffer nearly 100 times. Increase the default on 64 bit hosts
to take advantage of all that spare memory. After this change I can
boot my tests system without any TB flushes.

As we usually run more CONFIG_USER binaries at a time in typical usage
we aren't quite as profligate for user-mode code generation usage. We
also bring the static code gen defies to the same place to keep all
the reasoning in the comments together.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Message-Id: <20200228192415.19867-5-alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translate-all.c | 35 ++++++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -XXX,XX +XXX,XX @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
     }
 }
 
-#if defined(CONFIG_USER_ONLY) && TCG_TARGET_REG_BITS == 32
-/*
- * For user mode on smaller 32 bit systems we may run into trouble
- * allocating big chunks of data in the right place. On these systems
- * we utilise a static code generation buffer directly in the binary.
- */
-#define USE_STATIC_CODE_GEN_BUFFER
-#endif
-
 /* Minimum size of the code gen buffer.  This number is randomly chosen,
    but not so small that we can't have a fair number of TB's live.  */
 #define MIN_CODE_GEN_BUFFER_SIZE     (1 * MiB)
@@ -XXX,XX +XXX,XX @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
 # define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
 #endif
 
+#if TCG_TARGET_REG_BITS == 32
 #define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
+#ifdef CONFIG_USER_ONLY
+/*
+ * For user mode on smaller 32 bit systems we may run into trouble
+ * allocating big chunks of data in the right place. On these systems
+ * we utilise a static code generation buffer directly in the binary.
+ */
+#define USE_STATIC_CODE_GEN_BUFFER
+#endif
+#else /* TCG_TARGET_REG_BITS == 64 */
+#ifdef CONFIG_USER_ONLY
+/*
+ * As user-mode emulation typically means running multiple instances
+ * of the translator don't go too nuts with our default code gen
+ * buffer lest we make things too hard for the OS.
+ */
+#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (128 * MiB)
+#else
+/*
+ * We expect most system emulation to run one or two guests per host.
+ * Users running large scale system emulation may want to tweak their
+ * runtime setup via the tb-size control on the command line.
+ */
+#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (1 * GiB)
+#endif
+#endif
 
 #define DEFAULT_CODE_GEN_BUFFER_SIZE \
   (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
-- 
2.20.1

v2: Fix target/loongarch printf formats for vaddr
    Include two more reviewed patches.

This time with actual pull urls.  :-/

The following changes since commit db7aa99ef894e88fc5eedf02ca2579b8c344b2ec:

Merge tag 'hw-misc-20250216' of https://github.com/philmd/qemu into staging (2025-02-16 20:48:06 -0500)

are available in the Git repository at:

https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20250215-2

for you to fetch changes up to a39bdd0f4ba96fcbb6b5bcb6e89591d2b24f52eb:

tcg: Remove TCG_TARGET_HAS_{br,set}cond2 from riscv and loongarch64 (2025-02-17 09:52:07 -0800)

----------------------------------------------------------------
tcg: Remove last traces of TCG_TARGET_NEED_POOL_LABELS
tcg: Cleanups after disallowing 64-on-32
tcg: Introduce constraint for zero register
tcg: Remove TCG_TARGET_HAS_{br,set}cond2 from riscv and loongarch64
tcg/i386: Use tcg_{high,unsigned}_cond in tcg_out_brcond2
linux-user: Move TARGET_SA_RESTORER out of generic/signal.h
linux-user: Fix alignment when unmapping excess reservation
target/sparc: Fix register selection for all F*TOx and FxTO* instructions
target/sparc: Fix gdbstub incorrectly handling registers f32-f62
target/sparc: fake UltraSPARC T1 PCR and PIC registers

----------------------------------------------------------------
Andreas Schwab (1):
      linux-user: Move TARGET_SA_RESTORER out of generic/signal.h

Artyom Tarasenko (1):
      target/sparc: fake UltraSPARC T1 PCR and PIC registers

Fabiano Rosas (1):
      elfload: Fix alignment when unmapping excess reservation

Mikael Szreder (2):
      target/sparc: Fix register selection for all F*TOx and FxTO* instructions
      target/sparc: Fix gdbstub incorrectly handling registers f32-f62

Richard Henderson (22):
      tcg: Remove last traces of TCG_TARGET_NEED_POOL_LABELS
      tcg: Remove TCG_OVERSIZED_GUEST
      tcg: Drop support for two address registers in gen_ldst
      tcg: Merge INDEX_op_qemu_*_{a32,a64}_*
      tcg/arm: Drop addrhi from prepare_host_addr
      tcg/i386: Drop addrhi from prepare_host_addr
      tcg/mips: Drop addrhi from prepare_host_addr
      tcg/ppc: Drop addrhi from prepare_host_addr
      tcg: Replace addr{lo,hi}_reg with addr_reg in TCGLabelQemuLdst
      plugins: Fix qemu_plugin_read_memory_vaddr parameters
      accel/tcg: Fix tlb_set_page_with_attrs, tlb_set_page
      target/loongarch: Use VADDR_PRIx for logging pc_next
      include/exec: Change vaddr to uintptr_t
      include/exec: Use uintptr_t in CPUTLBEntry
      tcg: Introduce the 'z' constraint for a hardware zero register
      tcg/aarch64: Use 'z' constraint
      tcg/loongarch64: Use 'z' constraint
      tcg/mips: Use 'z' constraint
      tcg/riscv: Use 'z' constraint
      tcg/sparc64: Use 'z' constraint
      tcg/i386: Use tcg_{high,unsigned}_cond in tcg_out_brcond2
      tcg: Remove TCG_TARGET_HAS_{br,set}cond2 from riscv and loongarch64