Series comparison

-[PULL 00/56] tcg patch queue
+[PULL 00/20] tcg patch queue
-The following changes since commit c52d69e7dbaaed0ffdef8125e79218672c30161d:
+The following changes since commit 9c6c079bc6723da8061ccfb44361d67b1dd785dd:
-  Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20211027' into staging (2021-10-27 11:45:18 -0700)
+  Merge tag 'pull-target-arm-20240430' of https://git.linaro.org/people/pmaydell/qemu-arm into staging (2024-04-30 09:58:54 -0700)
 are available in the Git repository at:
-  https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20211027
+  https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20240501
-for you to fetch changes up to 820c025f0dcacf2f3c12735b1f162893fbfa7bc6:
+for you to fetch changes up to 917d7f8d948d706e275c9f33169b9dd0149ded1e:
-  tcg/optimize: Propagate sign info for shifting (2021-10-27 17:11:23 -0700)
+  plugins: Update the documentation block for plugin-gen.c (2024-04-30 16:12:05 -0700)
 ----------------------------------------------------------------
-Improvements to qemu/int128
+plugins: Rewrite plugin tcg expansion
 Fixes for 128/64 division.
 Cleanup tcg/optimize.c
 Optimize redundant sign extensions
 ----------------------------------------------------------------
-Frédéric Pétrot (1):
+Richard Henderson (20):
-      qemu/int128: Add int128_{not,xor}
+      tcg: Make tcg/helper-info.h self-contained
       tcg: Pass function pointer to tcg_gen_call*
       plugins: Zero new qemu_plugin_dyn_cb entries
       plugins: Move function pointer in qemu_plugin_dyn_cb
       plugins: Create TCGHelperInfo for all out-of-line callbacks
       plugins: Use emit_before_op for PLUGIN_GEN_AFTER_INSN
       plugins: Use emit_before_op for PLUGIN_GEN_FROM_TB
       plugins: Add PLUGIN_GEN_AFTER_TB
       plugins: Use emit_before_op for PLUGIN_GEN_FROM_INSN
       plugins: Use emit_before_op for PLUGIN_GEN_FROM_MEM
       plugins: Remove plugin helpers
       tcg: Remove TCG_CALL_PLUGIN
       tcg: Remove INDEX_op_plugin_cb_{start,end}
       plugins: Simplify callback queues
       plugins: Introduce PLUGIN_CB_MEM_REGULAR
       plugins: Replace pr_ops with a proper debug dump flag
       plugins: Split out common cb expanders
       plugins: Merge qemu_plugin_tb_insn_get to plugin-gen.c
       plugins: Inline plugin_gen_empty_callback
       plugins: Update the documentation block for plugin-gen.c
-Luis Pires (4):
+ accel/tcg/plugin-helpers.h         |    5 -
-      host-utils: move checks out of divu128/divs128
+ include/exec/helper-gen-common.h   |    4 -
-      host-utils: move udiv_qrnnd() to host-utils
+ include/exec/helper-proto-common.h |    4 -
-      host-utils: add 128-bit quotient support to divu128/divs128
+ include/exec/plugin-gen.h          |    4 -
-      host-utils: add unit tests for divu128/divs128
+ include/qemu/log.h                 |    1 +
+ include/qemu/plugin.h              |   67 +--
-Richard Henderson (51):
+ include/tcg/helper-info.h          |    3 +
-      tcg/optimize: Rename "mask" to "z_mask"
+ include/tcg/tcg-op-common.h        |    4 +-
-      tcg/optimize: Split out OptContext
+ include/tcg/tcg-opc.h              |    4 +-
-      tcg/optimize: Remove do_default label
+ include/tcg/tcg.h                  |   26 +-
-      tcg/optimize: Change tcg_opt_gen_{mov,movi} interface
+ include/exec/helper-gen.h.inc      |   24 +-
-      tcg/optimize: Move prev_mb into OptContext
+ accel/tcg/plugin-gen.c             | 1007 +++++++++---------------------------
-      tcg/optimize: Split out init_arguments
+ plugins/api.c                      |   26 +-
-      tcg/optimize: Split out copy_propagate
+ plugins/core.c                     |   61 ++-
-      tcg/optimize: Split out fold_call
+ tcg/tcg-op-ldst.c                  |    6 +-
-      tcg/optimize: Drop nb_oargs, nb_iargs locals
+ tcg/tcg-op.c                       |    8 +-
-      tcg/optimize: Change fail return for do_constant_folding_cond*
+ tcg/tcg.c                          |   78 ++-
-      tcg/optimize: Return true from tcg_opt_gen_{mov,movi}
+ tcg/tci.c                          |    1 +
-      tcg/optimize: Split out finish_folding
+ util/log.c                         |    4 +
-      tcg/optimize: Use a boolean to avoid a mass of continues
+files changed, 399 insertions(+), 938 deletions(-)
-      tcg/optimize: Split out fold_mb, fold_qemu_{ld,st}
+ delete mode 100644 accel/tcg/plugin-helpers.h
       tcg/optimize: Split out fold_const{1,2}
       tcg/optimize: Split out fold_setcond2
       tcg/optimize: Split out fold_brcond2
       tcg/optimize: Split out fold_brcond
       tcg/optimize: Split out fold_setcond
       tcg/optimize: Split out fold_mulu2_i32
       tcg/optimize: Split out fold_addsub2_i32
       tcg/optimize: Split out fold_movcond
       tcg/optimize: Split out fold_extract2
       tcg/optimize: Split out fold_extract, fold_sextract
       tcg/optimize: Split out fold_deposit
       tcg/optimize: Split out fold_count_zeros
       tcg/optimize: Split out fold_bswap
       tcg/optimize: Split out fold_dup, fold_dup2
       tcg/optimize: Split out fold_mov
       tcg/optimize: Split out fold_xx_to_i
       tcg/optimize: Split out fold_xx_to_x
       tcg/optimize: Split out fold_xi_to_i
       tcg/optimize: Add type to OptContext
       tcg/optimize: Split out fold_to_not
       tcg/optimize: Split out fold_sub_to_neg
       tcg/optimize: Split out fold_xi_to_x
       tcg/optimize: Split out fold_ix_to_i
       tcg/optimize: Split out fold_masks
       tcg/optimize: Expand fold_mulu2_i32 to all 4-arg multiplies
       tcg/optimize: Expand fold_addsub2_i32 to 64-bit ops
       tcg/optimize: Sink commutative operand swapping into fold functions
       tcg/optimize: Stop forcing z_mask to "garbage" for 32-bit values
       tcg/optimize: Use fold_xx_to_i for orc
       tcg/optimize: Use fold_xi_to_x for mul
       tcg/optimize: Use fold_xi_to_x for div
       tcg/optimize: Use fold_xx_to_i for rem
       tcg/optimize: Optimize sign extensions
       tcg/optimize: Propagate sign info for logical operations
       tcg/optimize: Propagate sign info for setcond
       tcg/optimize: Propagate sign info for bit counting
       tcg/optimize: Propagate sign info for shifting
  include/fpu/softfloat-macros.h |   82 --
  include/hw/clock.h             |    5 +-
  include/qemu/host-utils.h      |  121 +-
  include/qemu/int128.h          |   20 +
  target/ppc/int_helper.c        |   23 +-
  tcg/optimize.c                 | 2644 ++++++++++++++++++++++++----------------
  tests/unit/test-div128.c       |  197 +++
  util/host-utils.c              |  147 ++-
  tests/unit/meson.build         |    1 +
 files changed, 2053 insertions(+), 1187 deletions(-)
  create mode 100644 tests/unit/test-div128.c

-[PULL 01/56] qemu/int128: Add int128_{not,xor}
+Deleted patch
-From: Frédéric Pétrot <frederic.petrot@univ-grenoble-alpes.fr>
-Addition of not and xor on 128-bit integers.
-Signed-off-by: Frédéric Pétrot <frederic.petrot@univ-grenoble-alpes.fr>
-Co-authored-by: Fabien Portas <fabien.portas@grenoble-inp.org>
-Message-Id: <20211025122818.168890-3-frederic.petrot@univ-grenoble-alpes.fr>
-[rth: Split out logical operations.]
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- include/qemu/int128.h | 20 ++++++++++++++++++++
-file changed, 20 insertions(+)
-diff --git a/include/qemu/int128.h b/include/qemu/int128.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/qemu/int128.h
-+++ b/include/qemu/int128.h
-@@ -XXX,XX +XXX,XX @@ static inline Int128 int128_exts64(int64_t a)
-     return a;
- }
-+static inline Int128 int128_not(Int128 a)
-+{
-+    return ~a;
-+}
-+
- static inline Int128 int128_and(Int128 a, Int128 b)
- {
-     return a & b;
-@@ -XXX,XX +XXX,XX @@ static inline Int128 int128_or(Int128 a, Int128 b)
-     return a | b;
- }
-+static inline Int128 int128_xor(Int128 a, Int128 b)
-+{
-+    return a ^ b;
-+}
-+
- static inline Int128 int128_rshift(Int128 a, int n)
- {
-     return a >> n;
-@@ -XXX,XX +XXX,XX @@ static inline Int128 int128_exts64(int64_t a)
-     return int128_make128(a, (a < 0) ? -1 : 0);
- }
-+static inline Int128 int128_not(Int128 a)
-+{
-+    return int128_make128(~a.lo, ~a.hi);
-+}
-+
- static inline Int128 int128_and(Int128 a, Int128 b)
- {
-     return int128_make128(a.lo & b.lo, a.hi & b.hi);
-@@ -XXX,XX +XXX,XX @@ static inline Int128 int128_or(Int128 a, Int128 b)
-     return int128_make128(a.lo | b.lo, a.hi | b.hi);
- }
-+static inline Int128 int128_xor(Int128 a, Int128 b)
-+{
-+    return int128_make128(a.lo ^ b.lo, a.hi ^ b.hi);
-+}
-+
- static inline Int128 int128_rshift(Int128 a, int n)
- {
-     int64_t h;
---
-.25.1

-[PULL 56/56] tcg/optimize: Propagate sign info for shifting
+[PULL 01/20] tcg: Make tcg/helper-info.h self-contained
-For constant shifts, we can simply shift the s_mask.
+Move MAX_CALL_IARGS from tcg.h and include for
+the define of TCG_TARGET_REG_BITS.
 For variable shifts, we know that sar does not reduce
 the s_mask, which helps for sequences like
     ext32s_i64  t, in
     sar_i64     t, t, v
     ext32s_i64  out, t
 allowing the final extend to be eliminated.
 Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/optimize.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++---
+ include/tcg/helper-info.h | 3 +++
-file changed, 47 insertions(+), 3 deletions(-)
+ include/tcg/tcg.h         | 2 --
  tcg/tci.c                 | 1 +
 files changed, 4 insertions(+), 2 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
+diff --git a/include/tcg/helper-info.h b/include/tcg/helper-info.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
+--- a/include/tcg/helper-info.h
-+++ b/tcg/optimize.c
++++ b/include/tcg/helper-info.h
-@@ -XXX,XX +XXX,XX @@ static uint64_t smask_from_zmask(uint64_t zmask)
+@@ -XXX,XX +XXX,XX @@
-     return ~(~0ull >> rep);
+ #ifdef CONFIG_TCG_INTERPRETER
- }
+ #include <ffi.h>
+ #endif
-+/*
++#include "tcg-target-reg-bits.h"
 + * Recreate a properly left-aligned smask after manipulation.
 + * Some bit-shuffling, particularly shifts and rotates, may
 + * retain sign bits on the left, but may scatter disconnected
 + * sign bits on the right.  Retain only what remains to the left.
 + */
 +static uint64_t smask_from_smask(int64_t smask)
 +{
 +    /* Only the 1 bits are significant for smask */
 +    return smask_from_zmask(~smask);
 +}
 +
- static inline TempOptInfo *ts_info(TCGTemp *ts)
++#define MAX_CALL_IARGS  7
- {
-     return ts->state_ptr;
+ /*
-@@ -XXX,XX +XXX,XX @@ static bool fold_sextract(OptContext *ctx, TCGOp *op)
+  * Describe the calling convention of a given argument type.
+diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
- static bool fold_shift(OptContext *ctx, TCGOp *op)
+index XXXXXXX..XXXXXXX 100644
- {
+--- a/include/tcg/tcg.h
-+    uint64_t s_mask, z_mask, sign;
++++ b/include/tcg/tcg.h
-+
+@@ -XXX,XX +XXX,XX @@
-     if (fold_const2(ctx, op) ||
+ /* XXX: make safe guess about sizes */
-         fold_ix_to_i(ctx, op, 0) ||
+ #define MAX_OP_PER_INSTR 266
-         fold_xi_to_x(ctx, op, 0)) {
-         return true;
+-#define MAX_CALL_IARGS  7
-     }
+-
+ #define CPU_TEMP_BUF_NLONGS 128
-+    s_mask = arg_info(op->args[1])->s_mask;
+ #define TCG_STATIC_FRAME_SIZE  (CPU_TEMP_BUF_NLONGS * sizeof(long))
-+    z_mask = arg_info(op->args[1])->z_mask;
-+
+diff --git a/tcg/tci.c b/tcg/tci.c
-     if (arg_is_const(op->args[2])) {
+index XXXXXXX..XXXXXXX 100644
--        ctx->z_mask = do_constant_folding(op->opc, ctx->type,
+--- a/tcg/tci.c
--                                          arg_info(op->args[1])->z_mask,
++++ b/tcg/tci.c
--                                          arg_info(op->args[2])->val);
+@@ -XXX,XX +XXX,XX @@
-+        int sh = arg_info(op->args[2])->val;
-+
+ #include "qemu/osdep.h"
-+        ctx->z_mask = do_constant_folding(op->opc, ctx->type, z_mask, sh);
+ #include "tcg/tcg.h"
-+
++#include "tcg/helper-info.h"
-+        s_mask = do_constant_folding(op->opc, ctx->type, s_mask, sh);
+ #include "tcg/tcg-ldst.h"
-+        ctx->s_mask = smask_from_smask(s_mask);
+ #include <ffi.h>
 +
          return fold_masks(ctx, op);
      }
 +
 +    switch (op->opc) {
 +    CASE_OP_32_64(sar):
 +        /*
 +         * Arithmetic right shift will not reduce the number of
 +         * input sign repetitions.
 +         */
 +        ctx->s_mask = s_mask;
 +        break;
 +    CASE_OP_32_64(shr):
 +        /*
 +         * If the sign bit is known zero, then logical right shift
 +         * will not reduced the number of input sign repetitions.
 +         */
 +        sign = (s_mask & -s_mask) >> 1;
 +        if (!(z_mask & sign)) {
 +            ctx->s_mask = s_mask;
 +        }
 +        break;
 +    default:
 +        break;
 +    }
 +
      return false;
  }
 --
-.25.1
+.34.1

-[PULL 09/56] tcg/optimize: Change tcg_opt_gen_{mov,movi} interface
+[PULL 02/20] tcg: Pass function pointer to tcg_gen_call*
-Adjust the interface to take the OptContext parameter instead
+For normal helpers, read the function pointer from the
-of TCGContext or both.
+structure earlier.  For plugins, this will allow the
 function pointer to come from elsewhere.
 Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/optimize.c | 67 +++++++++++++++++++++++++-------------------------
+ include/tcg/tcg.h             | 21 +++++++++-------
-file changed, 34 insertions(+), 33 deletions(-)
+ include/exec/helper-gen.h.inc | 24 ++++++++++++-------
+ tcg/tcg.c                     | 45 +++++++++++++++++++----------------
-diff --git a/tcg/optimize.c b/tcg/optimize.c
+files changed, 52 insertions(+), 38 deletions(-)
 diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
+--- a/include/tcg/tcg.h
-+++ b/tcg/optimize.c
++++ b/include/tcg/tcg.h
-@@ -XXX,XX +XXX,XX @@ typedef struct TempOptInfo {
+@@ -XXX,XX +XXX,XX @@ typedef struct TCGTargetOpDef {
- } TempOptInfo;
+ bool tcg_op_supported(TCGOpcode op);
- typedef struct OptContext {
-+    TCGContext *tcg;
+-void tcg_gen_call0(TCGHelperInfo *, TCGTemp *ret);
-     TCGTempSet temps_used;
+-void tcg_gen_call1(TCGHelperInfo *, TCGTemp *ret, TCGTemp *);
- } OptContext;
+-void tcg_gen_call2(TCGHelperInfo *, TCGTemp *ret, TCGTemp *, TCGTemp *);
+-void tcg_gen_call3(TCGHelperInfo *, TCGTemp *ret, TCGTemp *,
-@@ -XXX,XX +XXX,XX @@ static bool args_are_copies(TCGArg arg1, TCGArg arg2)
++void tcg_gen_call0(void *func, TCGHelperInfo *, TCGTemp *ret);
-     return ts_are_copies(arg_temp(arg1), arg_temp(arg2));
++void tcg_gen_call1(void *func, TCGHelperInfo *, TCGTemp *ret, TCGTemp *);
- }
++void tcg_gen_call2(void *func, TCGHelperInfo *, TCGTemp *ret,
+                    TCGTemp *, TCGTemp *);
--static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
+-void tcg_gen_call4(TCGHelperInfo *, TCGTemp *ret, TCGTemp *, TCGTemp *,
-+static void tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
+-                   TCGTemp *, TCGTemp *);
- {
+-void tcg_gen_call5(TCGHelperInfo *, TCGTemp *ret, TCGTemp *, TCGTemp *,
-     TCGTemp *dst_ts = arg_temp(dst);
++void tcg_gen_call3(void *func, TCGHelperInfo *, TCGTemp *ret,
-     TCGTemp *src_ts = arg_temp(src);
+                    TCGTemp *, TCGTemp *, TCGTemp *);
-@@ -XXX,XX +XXX,XX @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
+-void tcg_gen_call6(TCGHelperInfo *, TCGTemp *ret, TCGTemp *, TCGTemp *,
-     TCGOpcode new_op;
++void tcg_gen_call4(void *func, TCGHelperInfo *, TCGTemp *ret,
+                    TCGTemp *, TCGTemp *, TCGTemp *, TCGTemp *);
-     if (ts_are_copies(dst_ts, src_ts)) {
+-void tcg_gen_call7(TCGHelperInfo *, TCGTemp *ret, TCGTemp *, TCGTemp *,
--        tcg_op_remove(s, op);
++void tcg_gen_call5(void *func, TCGHelperInfo *, TCGTemp *ret,
-+        tcg_op_remove(ctx->tcg, op);
+                    TCGTemp *, TCGTemp *, TCGTemp *, TCGTemp *, TCGTemp *);
-         return;
++void tcg_gen_call6(void *func, TCGHelperInfo *, TCGTemp *ret,
 +                   TCGTemp *, TCGTemp *, TCGTemp *, TCGTemp *,
 +                   TCGTemp *, TCGTemp *);
 +void tcg_gen_call7(void *func, TCGHelperInfo *, TCGTemp *ret,
 +                   TCGTemp *, TCGTemp *, TCGTemp *, TCGTemp *,
 +                   TCGTemp *, TCGTemp *, TCGTemp *);
  TCGOp *tcg_emit_op(TCGOpcode opc, unsigned nargs);
  void tcg_op_remove(TCGContext *s, TCGOp *op);
 diff --git a/include/exec/helper-gen.h.inc b/include/exec/helper-gen.h.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/include/exec/helper-gen.h.inc
 +++ b/include/exec/helper-gen.h.inc
@@ -XXX,XX +XXX,XX @@
  extern TCGHelperInfo glue(helper_info_, name);                          \
  static inline void glue(gen_helper_, name)(dh_retvar_decl0(ret))        \
  {                                                                       \
 -    tcg_gen_call0(&glue(helper_info_, name), dh_retvar(ret));           \
 +    tcg_gen_call0(glue(helper_info_,name).func,                         \
 +                  &glue(helper_info_,name), dh_retvar(ret));            \
  }
  #define DEF_HELPER_FLAGS_1(name, flags, ret, t1)                        \
@@ -XXX,XX +XXX,XX @@ extern TCGHelperInfo glue(helper_info_, name);                          \
  static inline void glue(gen_helper_, name)(dh_retvar_decl(ret)          \
      dh_arg_decl(t1, 1))                                                 \
  {                                                                       \
 -    tcg_gen_call1(&glue(helper_info_, name), dh_retvar(ret),            \
 +    tcg_gen_call1(glue(helper_info_,name).func,                         \
 +                  &glue(helper_info_,name), dh_retvar(ret),             \
                    dh_arg(t1, 1));                                       \
  }
@@ -XXX,XX +XXX,XX @@ extern TCGHelperInfo glue(helper_info_, name);                          \
  static inline void glue(gen_helper_, name)(dh_retvar_decl(ret)          \
      dh_arg_decl(t1, 1), dh_arg_decl(t2, 2))                             \
  {                                                                       \
 -    tcg_gen_call2(&glue(helper_info_, name), dh_retvar(ret),            \
 +    tcg_gen_call2(glue(helper_info_,name).func,                         \
 +                  &glue(helper_info_,name), dh_retvar(ret),             \
                    dh_arg(t1, 1), dh_arg(t2, 2));                        \
  }
@@ -XXX,XX +XXX,XX @@ extern TCGHelperInfo glue(helper_info_, name);                          \
  static inline void glue(gen_helper_, name)(dh_retvar_decl(ret)          \
      dh_arg_decl(t1, 1), dh_arg_decl(t2, 2), dh_arg_decl(t3, 3))         \
  {                                                                       \
 -    tcg_gen_call3(&glue(helper_info_, name), dh_retvar(ret),            \
 +    tcg_gen_call3(glue(helper_info_,name).func,                         \
 +                  &glue(helper_info_,name), dh_retvar(ret),             \
                    dh_arg(t1, 1), dh_arg(t2, 2), dh_arg(t3, 3));         \
  }
@@ -XXX,XX +XXX,XX @@ static inline void glue(gen_helper_, name)(dh_retvar_decl(ret)          \
      dh_arg_decl(t1, 1), dh_arg_decl(t2, 2),                             \
      dh_arg_decl(t3, 3), dh_arg_decl(t4, 4))                             \
  {                                                                       \
 -    tcg_gen_call4(&glue(helper_info_, name), dh_retvar(ret),            \
 +    tcg_gen_call4(glue(helper_info_,name).func,                         \
 +                  &glue(helper_info_,name), dh_retvar(ret),             \
                    dh_arg(t1, 1), dh_arg(t2, 2),                         \
                    dh_arg(t3, 3), dh_arg(t4, 4));                        \
  }
@@ -XXX,XX +XXX,XX @@ static inline void glue(gen_helper_, name)(dh_retvar_decl(ret)          \
      dh_arg_decl(t1, 1), dh_arg_decl(t2, 2), dh_arg_decl(t3, 3),         \
      dh_arg_decl(t4, 4), dh_arg_decl(t5, 5))                             \
  {                                                                       \
 -    tcg_gen_call5(&glue(helper_info_, name), dh_retvar(ret),            \
 +    tcg_gen_call5(glue(helper_info_,name).func,                         \
 +                  &glue(helper_info_,name), dh_retvar(ret),             \
                    dh_arg(t1, 1), dh_arg(t2, 2), dh_arg(t3, 3),          \
                    dh_arg(t4, 4), dh_arg(t5, 5));                        \
  }
@@ -XXX,XX +XXX,XX @@ static inline void glue(gen_helper_, name)(dh_retvar_decl(ret)          \
      dh_arg_decl(t1, 1), dh_arg_decl(t2, 2), dh_arg_decl(t3, 3),         \
      dh_arg_decl(t4, 4), dh_arg_decl(t5, 5), dh_arg_decl(t6, 6))         \
  {                                                                       \
 -    tcg_gen_call6(&glue(helper_info_, name), dh_retvar(ret),            \
 +    tcg_gen_call6(glue(helper_info_,name).func,                         \
 +                  &glue(helper_info_,name), dh_retvar(ret),             \
                    dh_arg(t1, 1), dh_arg(t2, 2), dh_arg(t3, 3),          \
                    dh_arg(t4, 4), dh_arg(t5, 5), dh_arg(t6, 6));         \
  }
@@ -XXX,XX +XXX,XX @@ static inline void glue(gen_helper_, name)(dh_retvar_decl(ret)          \
      dh_arg_decl(t4, 4), dh_arg_decl(t5, 5), dh_arg_decl(t6, 6),         \
      dh_arg_decl(t7, 7))                                                 \
  {                                                                       \
 -    tcg_gen_call7(&glue(helper_info_, name), dh_retvar(ret),            \
 +    tcg_gen_call7(glue(helper_info_,name).func,                         \
 +                  &glue(helper_info_,name), dh_retvar(ret),             \
                    dh_arg(t1, 1), dh_arg(t2, 2), dh_arg(t3, 3),          \
                    dh_arg(t4, 4), dh_arg(t5, 5), dh_arg(t6, 6),          \
                    dh_arg(t7, 7));                                       \
 diff --git a/tcg/tcg.c b/tcg/tcg.c
 index XXXXXXX..XXXXXXX 100644
 --- a/tcg/tcg.c
 +++ b/tcg/tcg.c
@@ -XXX,XX +XXX,XX @@ bool tcg_op_supported(TCGOpcode op)
  static TCGOp *tcg_op_alloc(TCGOpcode opc, unsigned nargs);
 -static void tcg_gen_callN(TCGHelperInfo *info, TCGTemp *ret, TCGTemp **args)
 +static void tcg_gen_callN(void *func, TCGHelperInfo *info,
 +                          TCGTemp *ret, TCGTemp **args)
  {
      TCGv_i64 extend_free[MAX_CALL_IARGS];
      int n_extend = 0;
@@ -XXX,XX +XXX,XX @@ static void tcg_gen_callN(TCGHelperInfo *info, TCGTemp *ret, TCGTemp **args)
              g_assert_not_reached();
          }
      }
+-    op->args[pi++] = (uintptr_t)info->func;
-@@ -XXX,XX +XXX,XX @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
++    op->args[pi++] = (uintptr_t)func;
      op->args[pi++] = (uintptr_t)info;
      tcg_debug_assert(pi == total_args);
@@ -XXX,XX +XXX,XX @@ static void tcg_gen_callN(TCGHelperInfo *info, TCGTemp *ret, TCGTemp **args)
      }
  }
--static void tcg_opt_gen_movi(TCGContext *s, OptContext *ctx,
+-void tcg_gen_call0(TCGHelperInfo *info, TCGTemp *ret)
--                             TCGOp *op, TCGArg dst, uint64_t val)
++void tcg_gen_call0(void *func, TCGHelperInfo *info, TCGTemp *ret)
-+static void tcg_opt_gen_movi(OptContext *ctx, TCGOp *op,
+ {
-+                             TCGArg dst, uint64_t val)
+-    tcg_gen_callN(info, ret, NULL);
- {
++    tcg_gen_callN(func, info, ret, NULL);
-     const TCGOpDef *def = &tcg_op_defs[op->opc];
+ }
-     TCGType type;
-@@ -XXX,XX +XXX,XX @@ static void tcg_opt_gen_movi(TCGContext *s, OptContext *ctx,
+-void tcg_gen_call1(TCGHelperInfo *info, TCGTemp *ret, TCGTemp *t1)
-     /* Convert movi to mov with constant temp. */
++void tcg_gen_call1(void *func, TCGHelperInfo *info, TCGTemp *ret, TCGTemp *t1)
-     tv = tcg_constant_internal(type, val);
+ {
-     init_ts_info(ctx, tv);
+-    tcg_gen_callN(info, ret, &t1);
--    tcg_opt_gen_mov(s, op, dst, temp_arg(tv));
++    tcg_gen_callN(func, info, ret, &t1);
-+    tcg_opt_gen_mov(ctx, op, dst, temp_arg(tv));
+ }
- }
+-void tcg_gen_call2(TCGHelperInfo *info, TCGTemp *ret, TCGTemp *t1, TCGTemp *t2)
- static uint64_t do_constant_folding_2(TCGOpcode op, uint64_t x, uint64_t y)
++void tcg_gen_call2(void *func, TCGHelperInfo *info, TCGTemp *ret,
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
++                   TCGTemp *t1, TCGTemp *t2)
  {
-     int nb_temps, nb_globals, i;
+     TCGTemp *args[2] = { t1, t2 };
-     TCGOp *op, *op_next, *prev_mb = NULL;
+-    tcg_gen_callN(info, ret, args);
--    OptContext ctx = {};
++    tcg_gen_callN(func, info, ret, args);
-+    OptContext ctx = { .tcg = s };
+ }
-     /* Array VALS has an element for each temp.
+-void tcg_gen_call3(TCGHelperInfo *info, TCGTemp *ret, TCGTemp *t1,
-        If this temp holds a constant then its value is kept in VALS' element.
+-                   TCGTemp *t2, TCGTemp *t3)
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
++void tcg_gen_call3(void *func, TCGHelperInfo *info, TCGTemp *ret,
-         CASE_OP_32_64(rotr):
++                   TCGTemp *t1, TCGTemp *t2, TCGTemp *t3)
-             if (arg_is_const(op->args[1])
+ {
-                 && arg_info(op->args[1])->val == 0) {
+     TCGTemp *args[3] = { t1, t2, t3 };
--                tcg_opt_gen_movi(s, &ctx, op, op->args[0], 0);
+-    tcg_gen_callN(info, ret, args);
-+                tcg_opt_gen_movi(&ctx, op, op->args[0], 0);
++    tcg_gen_callN(func, info, ret, args);
-                 continue;
+ }
-             }
-             break;
+-void tcg_gen_call4(TCGHelperInfo *info, TCGTemp *ret, TCGTemp *t1,
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
+-                   TCGTemp *t2, TCGTemp *t3, TCGTemp *t4)
-             if (!arg_is_const(op->args[1])
++void tcg_gen_call4(void *func, TCGHelperInfo *info, TCGTemp *ret,
-                 && arg_is_const(op->args[2])
++                   TCGTemp *t1, TCGTemp *t2, TCGTemp *t3, TCGTemp *t4)
-                 && arg_info(op->args[2])->val == 0) {
+ {
--                tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
+     TCGTemp *args[4] = { t1, t2, t3, t4 };
-+                tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
+-    tcg_gen_callN(info, ret, args);
-                 continue;
++    tcg_gen_callN(func, info, ret, args);
-             }
+ }
-             break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
+-void tcg_gen_call5(TCGHelperInfo *info, TCGTemp *ret, TCGTemp *t1,
-             if (!arg_is_const(op->args[1])
++void tcg_gen_call5(void *func, TCGHelperInfo *info, TCGTemp *ret, TCGTemp *t1,
-                 && arg_is_const(op->args[2])
+                    TCGTemp *t2, TCGTemp *t3, TCGTemp *t4, TCGTemp *t5)
-                 && arg_info(op->args[2])->val == -1) {
+ {
--                tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
+     TCGTemp *args[5] = { t1, t2, t3, t4, t5 };
-+                tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
+-    tcg_gen_callN(info, ret, args);
-                 continue;
++    tcg_gen_callN(func, info, ret, args);
-             }
+ }
-             break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
+-void tcg_gen_call6(TCGHelperInfo *info, TCGTemp *ret, TCGTemp *t1, TCGTemp *t2,
+-                   TCGTemp *t3, TCGTemp *t4, TCGTemp *t5, TCGTemp *t6)
-         if (partmask == 0) {
++void tcg_gen_call6(void *func, TCGHelperInfo *info, TCGTemp *ret,
-             tcg_debug_assert(nb_oargs == 1);
++                   TCGTemp *t1, TCGTemp *t2, TCGTemp *t3,
--            tcg_opt_gen_movi(s, &ctx, op, op->args[0], 0);
++                   TCGTemp *t4, TCGTemp *t5, TCGTemp *t6)
-+            tcg_opt_gen_movi(&ctx, op, op->args[0], 0);
+ {
-             continue;
+     TCGTemp *args[6] = { t1, t2, t3, t4, t5, t6 };
-         }
+-    tcg_gen_callN(info, ret, args);
-         if (affected == 0) {
++    tcg_gen_callN(func, info, ret, args);
-             tcg_debug_assert(nb_oargs == 1);
+ }
--            tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
-+            tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
+-void tcg_gen_call7(TCGHelperInfo *info, TCGTemp *ret, TCGTemp *t1,
-             continue;
++void tcg_gen_call7(void *func, TCGHelperInfo *info, TCGTemp *ret, TCGTemp *t1,
-         }
+                    TCGTemp *t2, TCGTemp *t3, TCGTemp *t4,
+                    TCGTemp *t5, TCGTemp *t6, TCGTemp *t7)
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
+ {
-         CASE_OP_32_64(mulsh):
+     TCGTemp *args[7] = { t1, t2, t3, t4, t5, t6, t7 };
-             if (arg_is_const(op->args[2])
+-    tcg_gen_callN(info, ret, args);
-                 && arg_info(op->args[2])->val == 0) {
++    tcg_gen_callN(func, info, ret, args);
--                tcg_opt_gen_movi(s, &ctx, op, op->args[0], 0);
+ }
-+                tcg_opt_gen_movi(&ctx, op, op->args[0], 0);
-                 continue;
+ static void tcg_reg_alloc_start(TCGContext *s)
              }
              break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
          CASE_OP_32_64_VEC(or):
          CASE_OP_32_64_VEC(and):
              if (args_are_copies(op->args[1], op->args[2])) {
 -                tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
 +                tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
                  continue;
              }
              break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
          CASE_OP_32_64_VEC(sub):
          CASE_OP_32_64_VEC(xor):
              if (args_are_copies(op->args[1], op->args[2])) {
 -                tcg_opt_gen_movi(s, &ctx, op, op->args[0], 0);
 +                tcg_opt_gen_movi(&ctx, op, op->args[0], 0);
                  continue;
              }
              break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             allocator where needed and possible.  Also detect copies. */
          switch (opc) {
          CASE_OP_32_64_VEC(mov):
 -            tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
 +            tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
              continue;
          case INDEX_op_dup_vec:
              if (arg_is_const(op->args[1])) {
                  tmp = arg_info(op->args[1])->val;
                  tmp = dup_const(TCGOP_VECE(op), tmp);
 -                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
 +                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
                  continue;
              }
              break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
          case INDEX_op_dup2_vec:
              assert(TCG_TARGET_REG_BITS == 32);
              if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
 -                tcg_opt_gen_movi(s, &ctx, op, op->args[0],
 +                tcg_opt_gen_movi(&ctx, op, op->args[0],
                                   deposit64(arg_info(op->args[1])->val, 32, 32,
                                             arg_info(op->args[2])->val));
                  continue;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
          case INDEX_op_extrh_i64_i32:
              if (arg_is_const(op->args[1])) {
                  tmp = do_constant_folding(opc, arg_info(op->args[1])->val, 0);
 -                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
 +                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
                  continue;
              }
              break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
              if (arg_is_const(op->args[1])) {
                  tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
                                            op->args[2]);
 -                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
 +                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
                  continue;
              }
              break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
              if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
                  tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
                                            arg_info(op->args[2])->val);
 -                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
 +                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
                  continue;
              }
              break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                  TCGArg v = arg_info(op->args[1])->val;
                  if (v != 0) {
                      tmp = do_constant_folding(opc, v, 0);
 -                    tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
 +                    tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
                  } else {
 -                    tcg_opt_gen_mov(s, op, op->args[0], op->args[2]);
 +                    tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[2]);
                  }
                  continue;
              }
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                  tmp = deposit64(arg_info(op->args[1])->val,
                                  op->args[3], op->args[4],
                                  arg_info(op->args[2])->val);
 -                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
 +                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
                  continue;
              }
              break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
              if (arg_is_const(op->args[1])) {
                  tmp = extract64(arg_info(op->args[1])->val,
                                  op->args[2], op->args[3]);
 -                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
 +                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
                  continue;
              }
              break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
              if (arg_is_const(op->args[1])) {
                  tmp = sextract64(arg_info(op->args[1])->val,
                                   op->args[2], op->args[3]);
 -                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
 +                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
                  continue;
              }
              break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                      tmp = (int32_t)(((uint32_t)v1 >> shr) |
                                      ((uint32_t)v2 << (32 - shr)));
                  }
 -                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
 +                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
                  continue;
              }
              break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
              tmp = do_constant_folding_cond(opc, op->args[1],
                                             op->args[2], op->args[3]);
              if (tmp != 2) {
 -                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
 +                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
                  continue;
              }
              break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
              tmp = do_constant_folding_cond(opc, op->args[1],
                                             op->args[2], op->args[5]);
              if (tmp != 2) {
 -                tcg_opt_gen_mov(s, op, op->args[0], op->args[4-tmp]);
 +                tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[4-tmp]);
                  continue;
              }
              if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                  rl = op->args[0];
                  rh = op->args[1];
 -                tcg_opt_gen_movi(s, &ctx, op, rl, (int32_t)a);
 -                tcg_opt_gen_movi(s, &ctx, op2, rh, (int32_t)(a >> 32));
 +                tcg_opt_gen_movi(&ctx, op, rl, (int32_t)a);
 +                tcg_opt_gen_movi(&ctx, op2, rh, (int32_t)(a >> 32));
                  continue;
              }
              break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                  rl = op->args[0];
                  rh = op->args[1];
 -                tcg_opt_gen_movi(s, &ctx, op, rl, (int32_t)r);
 -                tcg_opt_gen_movi(s, &ctx, op2, rh, (int32_t)(r >> 32));
 +                tcg_opt_gen_movi(&ctx, op, rl, (int32_t)r);
 +                tcg_opt_gen_movi(&ctx, op2, rh, (int32_t)(r >> 32));
                  continue;
              }
              break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                                              op->args[5]);
              if (tmp != 2) {
              do_setcond_const:
 -                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
 +                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
                  continue;
              }
              if ((op->args[5] == TCG_COND_LT || op->args[5] == TCG_COND_GE)
 --
-.25.1
+.34.1

-[PULL 18/56] tcg/optimize: Use a boolean to avoid a mass of continues
+[PULL 03/20] plugins: Zero new qemu_plugin_dyn_cb entries
 Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/optimize.c | 9 ++++++---
+ plugins/core.c | 2 +-
-file changed, 6 insertions(+), 3 deletions(-)
+file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
+diff --git a/plugins/core.c b/plugins/core.c
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
+--- a/plugins/core.c
-+++ b/tcg/optimize.c
++++ b/plugins/core.c
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
+@@ -XXX,XX +XXX,XX @@ static struct qemu_plugin_dyn_cb *plugin_get_dyn_cb(GArray **arr)
-         uint64_t z_mask, partmask, affected, tmp;
+     GArray *cbs = *arr;
-         TCGOpcode opc = op->opc;
-         const TCGOpDef *def;
+     if (!cbs) {
-+        bool done = false;
+-        cbs = g_array_sized_new(false, false,
++        cbs = g_array_sized_new(false, true,
-         /* Calls are special. */
+                                 sizeof(struct qemu_plugin_dyn_cb), 1);
-         if (opc == INDEX_op_call) {
+         *arr = cbs;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
+     }
             allocator where needed and possible.  Also detect copies. */
          switch (opc) {
          CASE_OP_32_64_VEC(mov):
 -            tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
 -            continue;
 +            done = tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
 +            break;
          case INDEX_op_dup_vec:
              if (arg_is_const(op->args[1])) {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
              break;
          }
 -        finish_folding(&ctx, op);
 +        if (!done) {
 +            finish_folding(&ctx, op);
 +        }
          /* Eliminate duplicate and redundant fence instructions.  */
          if (ctx.prev_mb) {
 --
-.25.1
+.34.1

-[PULL 16/56] tcg/optimize: Return true from tcg_opt_gen_{mov,movi}
+[PULL 04/20] plugins: Move function pointer in qemu_plugin_dyn_cb
-This will allow callers to tail call to these functions
+The out-of-line function pointer is mutually exclusive
-and return true indicating processing complete.
+with inline expansion, so move it into the union.
 Wrap the pointer in a structure named 'regular' to match
 PLUGIN_CB_REGULAR.
 Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/optimize.c | 9 +++++----
+ include/qemu/plugin.h  | 4 +++-
-file changed, 5 insertions(+), 4 deletions(-)
+ accel/tcg/plugin-gen.c | 4 ++--
  plugins/core.c         | 8 ++++----
 files changed, 9 insertions(+), 7 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
+diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
+--- a/include/qemu/plugin.h
-+++ b/tcg/optimize.c
++++ b/include/qemu/plugin.h
-@@ -XXX,XX +XXX,XX @@ static bool args_are_copies(TCGArg arg1, TCGArg arg2)
+@@ -XXX,XX +XXX,XX @@ enum plugin_dyn_cb_subtype {
-     return ts_are_copies(arg_temp(arg1), arg_temp(arg2));
+  * instance of a callback to be called upon the execution of a particular TB.
   */
  struct qemu_plugin_dyn_cb {
 -    union qemu_plugin_cb_sig f;
      void *userp;
      enum plugin_dyn_cb_subtype type;
      /* @rw applies to mem callbacks only (both regular and inline) */
      enum qemu_plugin_mem_rw rw;
      /* fields specific to each dyn_cb type go here */
      union {
 +        struct {
 +            union qemu_plugin_cb_sig f;
 +        } regular;
          struct {
              qemu_plugin_u64 entry;
              enum qemu_plugin_op op;
 diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/tcg/plugin-gen.c
 +++ b/accel/tcg/plugin-gen.c
@@ -XXX,XX +XXX,XX @@ static TCGOp *append_udata_cb(const struct qemu_plugin_dyn_cb *cb,
      }
      /* call */
 -    op = copy_call(&begin_op, op, cb->f.vcpu_udata, cb_idx);
 +    op = copy_call(&begin_op, op, cb->regular.f.vcpu_udata, cb_idx);
      return op;
  }
+@@ -XXX,XX +XXX,XX @@ static TCGOp *append_mem_cb(const struct qemu_plugin_dyn_cb *cb,
--static void tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
-+static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
+     if (type == PLUGIN_GEN_CB_MEM) {
- {
+         /* call */
-     TCGTemp *dst_ts = arg_temp(dst);
+-        op = copy_call(&begin_op, op, cb->f.vcpu_udata, cb_idx);
-     TCGTemp *src_ts = arg_temp(src);
++        op = copy_call(&begin_op, op, cb->regular.f.vcpu_udata, cb_idx);
@@ -XXX,XX +XXX,XX @@ static void tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
      if (ts_are_copies(dst_ts, src_ts)) {
          tcg_op_remove(ctx->tcg, op);
 -        return;
 +        return true;
      }
-     reset_ts(dst_ts);
+     return op;
-@@ -XXX,XX +XXX,XX @@ static void tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
+diff --git a/plugins/core.c b/plugins/core.c
-         di->is_const = si->is_const;
+index XXXXXXX..XXXXXXX 100644
-         di->val = si->val;
+--- a/plugins/core.c
-     }
++++ b/plugins/core.c
-+    return true;
+@@ -XXX,XX +XXX,XX @@ void plugin_register_dyn_cb__udata(GArray **arr,
      dyn_cb->userp = udata;
      /* Note flags are discarded as unused. */
 -    dyn_cb->f.vcpu_udata = cb;
 +    dyn_cb->regular.f.vcpu_udata = cb;
      dyn_cb->type = PLUGIN_CB_REGULAR;
  }
--static void tcg_opt_gen_movi(OptContext *ctx, TCGOp *op,
+@@ -XXX,XX +XXX,XX @@ void plugin_register_vcpu_mem_cb(GArray **arr,
-+static bool tcg_opt_gen_movi(OptContext *ctx, TCGOp *op,
+     /* Note flags are discarded as unused. */
-                              TCGArg dst, uint64_t val)
+     dyn_cb->type = PLUGIN_CB_REGULAR;
- {
+     dyn_cb->rw = rw;
-     const TCGOpDef *def = &tcg_op_defs[op->opc];
+-    dyn_cb->f.generic = cb;
-@@ -XXX,XX +XXX,XX @@ static void tcg_opt_gen_movi(OptContext *ctx, TCGOp *op,
++    dyn_cb->regular.f.vcpu_mem = cb;
      /* Convert movi to mov with constant temp. */
      tv = tcg_constant_internal(type, val);
      init_ts_info(ctx, tv);
 -    tcg_opt_gen_mov(ctx, op, dst, temp_arg(tv));
 +    return tcg_opt_gen_mov(ctx, op, dst, temp_arg(tv));
  }
- static uint64_t do_constant_folding_2(TCGOpcode op, uint64_t x, uint64_t y)
+ /*
@@ -XXX,XX +XXX,XX @@ void qemu_plugin_vcpu_mem_cb(CPUState *cpu, uint64_t vaddr,
          }
          switch (cb->type) {
          case PLUGIN_CB_REGULAR:
 -            cb->f.vcpu_mem(cpu->cpu_index, make_plugin_meminfo(oi, rw),
 -                           vaddr, cb->userp);
 +            cb->regular.f.vcpu_mem(cpu->cpu_index, make_plugin_meminfo(oi, rw),
 +                                   vaddr, cb->userp);
              break;
          case PLUGIN_CB_INLINE:
              exec_inline_op(cb, cpu->cpu_index);
 --
-.25.1
+.34.1

-[PULL 55/56] tcg/optimize: Propagate sign info for bit counting
+[PULL 05/20] plugins: Create TCGHelperInfo for all out-of-line callbacks
-The results are generally 6 bit unsigned values, though
+TCGHelperInfo includes the ABI for every function call.
 the count leading and trailing bits may produce any value
 for a zero input.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
 Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/optimize.c | 3 ++-
+ include/qemu/plugin.h |  1 +
-file changed, 2 insertions(+), 1 deletion(-)
+ plugins/core.c        | 51 ++++++++++++++++++++++++++++++++++++++-----
 files changed, 46 insertions(+), 6 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
+diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
+--- a/include/qemu/plugin.h
-+++ b/tcg/optimize.c
++++ b/include/qemu/plugin.h
-@@ -XXX,XX +XXX,XX @@ static bool fold_count_zeros(OptContext *ctx, TCGOp *op)
+@@ -XXX,XX +XXX,XX @@ struct qemu_plugin_dyn_cb {
-         g_assert_not_reached();
+     union {
-     }
+         struct {
-     ctx->z_mask = arg_info(op->args[2])->z_mask | z_mask;
+             union qemu_plugin_cb_sig f;
--
++            TCGHelperInfo *info;
-+    ctx->s_mask = smask_from_zmask(ctx->z_mask);
+         } regular;
-     return false;
+         struct {
              qemu_plugin_u64 entry;
 diff --git a/plugins/core.c b/plugins/core.c
 index XXXXXXX..XXXXXXX 100644
 --- a/plugins/core.c
 +++ b/plugins/core.c
@@ -XXX,XX +XXX,XX @@ void plugin_register_dyn_cb__udata(GArray **arr,
                                     enum qemu_plugin_cb_flags flags,
                                     void *udata)
  {
 -    struct qemu_plugin_dyn_cb *dyn_cb = plugin_get_dyn_cb(arr);
 +    static TCGHelperInfo info[3] = {
 +        [QEMU_PLUGIN_CB_NO_REGS].flags = TCG_CALL_NO_RWG | TCG_CALL_PLUGIN,
 +        [QEMU_PLUGIN_CB_R_REGS].flags = TCG_CALL_NO_WG | TCG_CALL_PLUGIN,
 +        [QEMU_PLUGIN_CB_RW_REGS].flags = TCG_CALL_PLUGIN,
 +        /*
 +         * Match qemu_plugin_vcpu_udata_cb_t:
 +         *   void (*)(uint32_t, void *)
 +         */
 +        [0 ... 2].typemask = (dh_typemask(void, 0) |
 +                              dh_typemask(i32, 1) |
 +                              dh_typemask(ptr, 2))
 +    };
 +    struct qemu_plugin_dyn_cb *dyn_cb = plugin_get_dyn_cb(arr);
      dyn_cb->userp = udata;
 -    /* Note flags are discarded as unused. */
 -    dyn_cb->regular.f.vcpu_udata = cb;
      dyn_cb->type = PLUGIN_CB_REGULAR;
 +    dyn_cb->regular.f.vcpu_udata = cb;
 +
 +    assert((unsigned)flags < ARRAY_SIZE(info));
 +    dyn_cb->regular.info = &info[flags];
  }
-@@ -XXX,XX +XXX,XX @@ static bool fold_ctpop(OptContext *ctx, TCGOp *op)
+ void plugin_register_vcpu_mem_cb(GArray **arr,
-     default:
+@@ -XXX,XX +XXX,XX @@ void plugin_register_vcpu_mem_cb(GArray **arr,
-         g_assert_not_reached();
+                                  enum qemu_plugin_mem_rw rw,
-     }
+                                  void *udata)
-+    ctx->s_mask = smask_from_zmask(ctx->z_mask);
+ {
-     return false;
+-    struct qemu_plugin_dyn_cb *dyn_cb;
 +    /*
 +     * Expect that the underlying type for enum qemu_plugin_meminfo_t
 +     * is either int32_t or uint32_t, aka int or unsigned int.
 +     */
 +    QEMU_BUILD_BUG_ON(
 +        !__builtin_types_compatible_p(qemu_plugin_meminfo_t, uint32_t) &&
 +        !__builtin_types_compatible_p(qemu_plugin_meminfo_t, int32_t));
 -    dyn_cb = plugin_get_dyn_cb(arr);
 +    static TCGHelperInfo info[3] = {
 +        [QEMU_PLUGIN_CB_NO_REGS].flags = TCG_CALL_NO_RWG | TCG_CALL_PLUGIN,
 +        [QEMU_PLUGIN_CB_R_REGS].flags = TCG_CALL_NO_WG | TCG_CALL_PLUGIN,
 +        [QEMU_PLUGIN_CB_RW_REGS].flags = TCG_CALL_PLUGIN,
 +        /*
 +         * Match qemu_plugin_vcpu_mem_cb_t:
 +         *   void (*)(uint32_t, qemu_plugin_meminfo_t, uint64_t, void *)
 +         */
 +        [0 ... 2].typemask =
 +            (dh_typemask(void, 0) |
 +             dh_typemask(i32, 1) |
 +             (__builtin_types_compatible_p(qemu_plugin_meminfo_t, uint32_t)
 +              ? dh_typemask(i32, 2) : dh_typemask(s32, 2)) |
 +             dh_typemask(i64, 3) |
 +             dh_typemask(ptr, 4))
 +    };
 +
 +    struct qemu_plugin_dyn_cb *dyn_cb = plugin_get_dyn_cb(arr);
      dyn_cb->userp = udata;
 -    /* Note flags are discarded as unused. */
      dyn_cb->type = PLUGIN_CB_REGULAR;
      dyn_cb->rw = rw;
      dyn_cb->regular.f.vcpu_mem = cb;
 +
 +    assert((unsigned)flags < ARRAY_SIZE(info));
 +    dyn_cb->regular.info = &info[flags];
  }
+ /*
 --
-.25.1
+.34.1

-[PULL 02/56] host-utils: move checks out of divu128/divs128
+[PULL 06/20] plugins: Use emit_before_op for PLUGIN_GEN_AFTER_INSN
-From: Luis Pires <luis.pires@eldorado.org.br>
+Introduce a new plugin_cb op and migrate one operation.
 By using emit_before_op, we do not need to emit opcodes
 early and modify them later -- we can simply emit the
 final set of opcodes once.
-In preparation for changing the divu128/divs128 implementations
+Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
 to allow for quotients larger than 64 bits, move the div-by-zero
 and overflow checks to the callers.
 Signed-off-by: Luis Pires <luis.pires@eldorado.org.br>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-Id: <20211025191154.350831-2-luis.pires@eldorado.org.br>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- include/hw/clock.h        |  5 +++--
+ include/tcg/tcg-op-common.h |  1 +
- include/qemu/host-utils.h | 34 ++++++++++++---------------------
+ include/tcg/tcg-opc.h       |  1 +
- target/ppc/int_helper.c   | 14 +++++++++-----
+ accel/tcg/plugin-gen.c      | 74 +++++++++++++++++++++----------------
- util/host-utils.c         | 40 ++++++++++++++++++---------------------
+ tcg/tcg-op.c                |  5 +++
-files changed, 42 insertions(+), 51 deletions(-)
+files changed, 50 insertions(+), 31 deletions(-)
-diff --git a/include/hw/clock.h b/include/hw/clock.h
+diff --git a/include/tcg/tcg-op-common.h b/include/tcg/tcg-op-common.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/clock.h
+--- a/include/tcg/tcg-op-common.h
-+++ b/include/hw/clock.h
++++ b/include/tcg/tcg-op-common.h
-@@ -XXX,XX +XXX,XX @@ static inline uint64_t clock_ns_to_ticks(const Clock *clk, uint64_t ns)
+@@ -XXX,XX +XXX,XX @@ void tcg_gen_goto_tb(unsigned idx);
-         return 0;
+  */
-     }
+ void tcg_gen_lookup_and_goto_ptr(void);
-     /*
--     * Ignore divu128() return value as we've caught div-by-zero and don't
++void tcg_gen_plugin_cb(unsigned from);
--     * need different behaviour for overflow.
+ void tcg_gen_plugin_cb_start(unsigned from, unsigned type, unsigned wr);
-+     * BUG: when CONFIG_INT128 is not defined, the current implementation of
+ void tcg_gen_plugin_cb_end(void);
-+     * divu128 does not return a valid truncated quotient, so the result will
-+     * be wrong.
+diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
       */
      divu128(&lo, &hi, clk->period);
      return lo;
 diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/qemu/host-utils.h
+--- a/include/tcg/tcg-opc.h
-+++ b/include/qemu/host-utils.h
++++ b/include/tcg/tcg-opc.h
-@@ -XXX,XX +XXX,XX @@ static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
+@@ -XXX,XX +XXX,XX @@ DEF(exit_tb, 0, 0, 1, TCG_OPF_BB_EXIT | TCG_OPF_BB_END)
-     return (__int128_t)a * b / c;
+ DEF(goto_tb, 0, 0, 1, TCG_OPF_BB_EXIT | TCG_OPF_BB_END)
  DEF(goto_ptr, 0, 1, 0, TCG_OPF_BB_EXIT | TCG_OPF_BB_END)
 +DEF(plugin_cb, 0, 0, 1, TCG_OPF_NOT_PRESENT)
  DEF(plugin_cb_start, 0, 0, 3, TCG_OPF_NOT_PRESENT)
  DEF(plugin_cb_end, 0, 0, 0, TCG_OPF_NOT_PRESENT)
 diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/tcg/plugin-gen.c
 +++ b/accel/tcg/plugin-gen.c
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_empty_callback(enum plugin_gen_from from)
  {
      switch (from) {
      case PLUGIN_GEN_AFTER_INSN:
 -        gen_wrapped(from, PLUGIN_GEN_DISABLE_MEM_HELPER,
 -                    gen_empty_mem_helper);
 +        tcg_gen_plugin_cb(from);
          break;
      case PLUGIN_GEN_FROM_INSN:
          /*
@@ -XXX,XX +XXX,XX @@ static void inject_mem_enable_helper(struct qemu_plugin_tb *ptb,
      inject_mem_helper(begin_op, arr);
  }
--static inline int divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor)
+-static void inject_mem_disable_helper(struct qemu_plugin_insn *plugin_insn,
-+static inline void divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor)
+-                                      TCGOp *begin_op)
 -{
 -    if (likely(!plugin_insn->mem_helper)) {
 -        rm_ops(begin_op);
 -        return;
 -    }
 -    inject_mem_helper(begin_op, NULL);
 -}
 -
  /* called before finishing a TB with exit_tb, goto_tb or goto_ptr */
  void plugin_gen_disable_mem_helpers(void)
  {
--    if (divisor == 0) {
+@@ -XXX,XX +XXX,XX @@ static void plugin_gen_enable_mem_helper(struct qemu_plugin_tb *ptb,
--        return 1;
+     inject_mem_enable_helper(ptb, insn, begin_op);
 -    } else {
 -        __uint128_t dividend = ((__uint128_t)*phigh << 64) | *plow;
 -        __uint128_t result = dividend / divisor;
 -        *plow = result;
 -        *phigh = dividend % divisor;
 -        return result > UINT64_MAX;
 -    }
 +    __uint128_t dividend = ((__uint128_t)*phigh << 64) | *plow;
 +    __uint128_t result = dividend / divisor;
 +    *plow = result;
 +    *phigh = dividend % divisor;
  }
--static inline int divs128(int64_t *plow, int64_t *phigh, int64_t divisor)
+-static void plugin_gen_disable_mem_helper(struct qemu_plugin_tb *ptb,
-+static inline void divs128(int64_t *plow, int64_t *phigh, int64_t divisor)
+-                                          TCGOp *begin_op, int insn_idx)
 +static void gen_disable_mem_helper(struct qemu_plugin_tb *ptb,
 +                                   struct qemu_plugin_insn *insn)
  {
--    if (divisor == 0) {
+-    struct qemu_plugin_insn *insn = g_ptr_array_index(ptb->insns, insn_idx);
--        return 1;
+-    inject_mem_disable_helper(insn, begin_op);
--    } else {
++    if (insn->mem_helper) {
--        __int128_t dividend = ((__int128_t)*phigh << 64) | (uint64_t)*plow;
++        tcg_gen_st_ptr(tcg_constant_ptr(0), tcg_env,
--        __int128_t result = dividend / divisor;
++                       offsetof(CPUState, plugin_mem_cbs) -
--        *plow = result;
++                       offsetof(ArchCPU, env));
--        *phigh = dividend % divisor;
++    }
 -        return result != *plow;
 -    }
 +    __int128_t dividend = ((__int128_t)*phigh << 64) | (uint64_t)*plow;
 +    __int128_t result = dividend / divisor;
 +    *plow = result;
 +    *phigh = dividend % divisor;
  }
- #else
- void muls64(uint64_t *plow, uint64_t *phigh, int64_t a, int64_t b);
+ /* #define DEBUG_PLUGIN_GEN_OPS */
- void mulu64(uint64_t *plow, uint64_t *phigh, uint64_t a, uint64_t b);
+@@ -XXX,XX +XXX,XX @@ static void pr_ops(void)
--int divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor);
--int divs128(int64_t *plow, int64_t *phigh, int64_t divisor);
+ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
 +void divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor);
 +void divs128(int64_t *plow, int64_t *phigh, int64_t divisor);
  static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
  {
-diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
+-    TCGOp *op;
 +    TCGOp *op, *next;
      int insn_idx = -1;
      pr_ops();
 -    QTAILQ_FOREACH(op, &tcg_ctx->ops, link) {
 +    /*
 +     * While injecting code, we cannot afford to reuse any ebb temps
 +     * that might be live within the existing opcode stream.
 +     * The simplest solution is to release them all and create new.
 +     */
 +    memset(tcg_ctx->free_temps, 0, sizeof(tcg_ctx->free_temps));
 +
 +    QTAILQ_FOREACH_SAFE(op, &tcg_ctx->ops, link, next) {
          switch (op->opc) {
          case INDEX_op_insn_start:
              insn_idx++;
              break;
 +
 +        case INDEX_op_plugin_cb:
 +        {
 +            enum plugin_gen_from from = op->args[0];
 +            struct qemu_plugin_insn *insn = NULL;
 +
 +            if (insn_idx >= 0) {
 +                insn = g_ptr_array_index(plugin_tb->insns, insn_idx);
 +            }
 +
 +            tcg_ctx->emit_before_op = op;
 +
 +            switch (from) {
 +            case PLUGIN_GEN_AFTER_INSN:
 +                assert(insn != NULL);
 +                gen_disable_mem_helper(plugin_tb, insn);
 +                break;
 +            default:
 +                g_assert_not_reached();
 +            }
 +
 +            tcg_ctx->emit_before_op = NULL;
 +            tcg_op_remove(tcg_ctx, op);
 +            break;
 +        }
 +
          case INDEX_op_plugin_cb_start:
          {
              enum plugin_gen_from from = op->args[0];
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
                  break;
              }
 -            case PLUGIN_GEN_AFTER_INSN:
 -            {
 -                g_assert(insn_idx >= 0);
 -
 -                switch (type) {
 -                case PLUGIN_GEN_DISABLE_MEM_HELPER:
 -                    plugin_gen_disable_mem_helper(plugin_tb, op, insn_idx);
 -                    break;
 -                default:
 -                    g_assert_not_reached();
 -                }
 -                break;
 -            }
              default:
                  g_assert_not_reached();
              }
 diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/ppc/int_helper.c
+--- a/tcg/tcg-op.c
-+++ b/target/ppc/int_helper.c
++++ b/tcg/tcg-op.c
-@@ -XXX,XX +XXX,XX @@ uint64_t helper_divdeu(CPUPPCState *env, uint64_t ra, uint64_t rb, uint32_t oe)
+@@ -XXX,XX +XXX,XX @@ void tcg_gen_mb(TCGBar mb_type)
      uint64_t rt = 0;
      int overflow = 0;
 -    overflow = divu128(&rt, &ra, rb);
 -
 -    if (unlikely(overflow)) {
 +    if (unlikely(rb == 0 || ra >= rb)) {
 +        overflow = 1;
          rt = 0; /* Undefined */
 +    } else {
 +        divu128(&rt, &ra, rb);
      }
      if (oe) {
@@ -XXX,XX +XXX,XX @@ uint64_t helper_divde(CPUPPCState *env, uint64_t rau, uint64_t rbu, uint32_t oe)
      int64_t rt = 0;
      int64_t ra = (int64_t)rau;
      int64_t rb = (int64_t)rbu;
 -    int overflow = divs128(&rt, &ra, rb);
 +    int overflow = 0;
 -    if (unlikely(overflow)) {
 +    if (unlikely(rb == 0 || uabs64(ra) >= uabs64(rb))) {
 +        overflow = 1;
          rt = 0; /* Undefined */
 +    } else {
 +        divs128(&rt, &ra, rb);
      }
      if (oe) {
 diff --git a/util/host-utils.c b/util/host-utils.c
 index XXXXXXX..XXXXXXX 100644
 --- a/util/host-utils.c
 +++ b/util/host-utils.c
@@ -XXX,XX +XXX,XX @@ void muls64 (uint64_t *plow, uint64_t *phigh, int64_t a, int64_t b)
      *phigh = rh;
  }
 -/* Unsigned 128x64 division.  Returns 1 if overflow (divide by zero or */
 -/* quotient exceeds 64 bits).  Otherwise returns quotient via plow and */
 -/* remainder via phigh. */
 -int divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor)
 +/*
 + * Unsigned 128-by-64 division. Returns quotient via plow and
 + * remainder via phigh.
 + * The result must fit in 64 bits (plow) - otherwise, the result
 + * is undefined.
 + * This function will cause a division by zero if passed a zero divisor.
 + */
 +void divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor)
  {
      uint64_t dhi = *phigh;
      uint64_t dlo = *plow;
      unsigned i;
      uint64_t carry = 0;
 -    if (divisor == 0) {
 -        return 1;
 -    } else if (dhi == 0) {
 +    if (divisor == 0 || dhi == 0) {
          *plow  = dlo / divisor;
          *phigh = dlo % divisor;
 -        return 0;
 -    } else if (dhi >= divisor) {
 -        return 1;
      } else {
          for (i = 0; i < 64; i++) {
@@ -XXX,XX +XXX,XX @@ int divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor)
          *plow = dlo;
          *phigh = dhi;
 -        return 0;
      }
  }
--int divs128(int64_t *plow, int64_t *phigh, int64_t divisor)
++void tcg_gen_plugin_cb(unsigned from)
-+/*
++{
-+ * Signed 128-by-64 division. Returns quotient via plow and
++    tcg_gen_op1(INDEX_op_plugin_cb, from);
-+ * remainder via phigh.
++}
-+ * The result must fit in 64 bits (plow) - otherwise, the result
++
-+ * is undefined.
+ void tcg_gen_plugin_cb_start(unsigned from, unsigned type, unsigned wr)
 + * This function will cause a division by zero if passed a zero divisor.
 + */
 +void divs128(int64_t *plow, int64_t *phigh, int64_t divisor)
  {
-     int sgn_dvdnd = *phigh < 0;
+     tcg_gen_op3(INDEX_op_plugin_cb_start, from, type, wr);
      int sgn_divsr = divisor < 0;
 -    int overflow = 0;
      if (sgn_dvdnd) {
          *plow = ~(*plow);
@@ -XXX,XX +XXX,XX @@ int divs128(int64_t *plow, int64_t *phigh, int64_t divisor)
          divisor = 0 - divisor;
      }
 -    overflow = divu128((uint64_t *)plow, (uint64_t *)phigh, (uint64_t)divisor);
 +    divu128((uint64_t *)plow, (uint64_t *)phigh, (uint64_t)divisor);
      if (sgn_dvdnd  ^ sgn_divsr) {
          *plow = 0 - *plow;
      }
 -
 -    if (!overflow) {
 -        if ((*plow < 0) ^ (sgn_dvdnd ^ sgn_divsr)) {
 -            overflow = 1;
 -        }
 -    }
 -
 -    return overflow;
  }
  #endif
 --
-.25.1
+.34.1

-[PULL 19/56] tcg/optimize: Split out fold_mb, fold_qemu_{ld,st}
+[PULL 07/20] plugins: Use emit_before_op for PLUGIN_GEN_FROM_TB
-This puts the separate mb optimization into the same framework
+By having the qemu_plugin_cb_flags be recorded in the TCGHelperInfo,
-as the others.  While fold_qemu_{ld,st} are currently identical,
+we no longer need to distinguish PLUGIN_CB_REGULAR from
-that won't last as more code gets moved.
+PLUGIN_CB_REGULAR_R, so place all TB callbacks in the same queue.
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
+Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/optimize.c | 89 +++++++++++++++++++++++++++++---------------------
+ accel/tcg/plugin-gen.c | 96 +++++++++++++++++++++++++-----------------
-file changed, 51 insertions(+), 38 deletions(-)
+ plugins/api.c          |  6 +--
 files changed, 58 insertions(+), 44 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
+diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
+--- a/accel/tcg/plugin-gen.c
-+++ b/tcg/optimize.c
++++ b/accel/tcg/plugin-gen.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_call(OptContext *ctx, TCGOp *op)
+@@ -XXX,XX +XXX,XX @@ static void plugin_gen_empty_callback(enum plugin_gen_from from)
-     return true;
+ {
      switch (from) {
      case PLUGIN_GEN_AFTER_INSN:
 +    case PLUGIN_GEN_FROM_TB:
          tcg_gen_plugin_cb(from);
          break;
      case PLUGIN_GEN_FROM_INSN:
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_empty_callback(enum plugin_gen_from from)
           */
          gen_wrapped(from, PLUGIN_GEN_ENABLE_MEM_HELPER,
                      gen_empty_mem_helper);
 -        /* fall through */
 -    case PLUGIN_GEN_FROM_TB:
          gen_wrapped(from, PLUGIN_GEN_CB_UDATA, gen_empty_udata_cb_no_rwg);
          gen_wrapped(from, PLUGIN_GEN_CB_UDATA_R, gen_empty_udata_cb_no_wg);
          gen_wrapped(from, PLUGIN_GEN_CB_INLINE, gen_empty_inline_cb);
@@ -XXX,XX +XXX,XX @@ void plugin_gen_disable_mem_helpers(void)
                     offsetof(CPUState, plugin_mem_cbs) - offsetof(ArchCPU, env));
  }
-+static bool fold_mb(OptContext *ctx, TCGOp *op)
+-static void plugin_gen_tb_udata(const struct qemu_plugin_tb *ptb,
 -                                TCGOp *begin_op)
 -{
 -    inject_udata_cb(ptb->cbs[PLUGIN_CB_REGULAR], begin_op);
 -}
 -
 -static void plugin_gen_tb_udata_r(const struct qemu_plugin_tb *ptb,
 -                                  TCGOp *begin_op)
 -{
 -    inject_udata_cb(ptb->cbs[PLUGIN_CB_REGULAR_R], begin_op);
 -}
 -
 -static void plugin_gen_tb_inline(const struct qemu_plugin_tb *ptb,
 -                                 TCGOp *begin_op)
 -{
 -    inject_inline_cb(ptb->cbs[PLUGIN_CB_INLINE], begin_op, op_ok);
 -}
 -
  static void plugin_gen_insn_udata(const struct qemu_plugin_tb *ptb,
                                    TCGOp *begin_op, int insn_idx)
  {
@@ -XXX,XX +XXX,XX @@ static void gen_disable_mem_helper(struct qemu_plugin_tb *ptb,
      }
  }
 +static void gen_udata_cb(struct qemu_plugin_dyn_cb *cb)
 +{
-+    /* Eliminate duplicate and redundant fence instructions.  */
++    TCGv_i32 cpu_index = tcg_temp_ebb_new_i32();
-+    if (ctx->prev_mb) {
++
-+        /*
++    tcg_gen_ld_i32(cpu_index, tcg_env,
-+         * Merge two barriers of the same type into one,
++                   -offsetof(ArchCPU, env) + offsetof(CPUState, cpu_index));
-+         * or a weaker barrier into a stronger one,
++    tcg_gen_call2(cb->regular.f.vcpu_udata, cb->regular.info, NULL,
-+         * or two weaker barriers into a stronger one.
++                  tcgv_i32_temp(cpu_index),
-+         *   mb X; mb Y => mb X|Y
++                  tcgv_ptr_temp(tcg_constant_ptr(cb->userp)));
-+         *   mb; strl => mb; st
++    tcg_temp_free_i32(cpu_index);
 +         *   ldaq; mb => ld; mb
 +         *   ldaq; strl => ld; mb; st
 +         * Other combinations are also merged into a strong
 +         * barrier.  This is stricter than specified but for
 +         * the purposes of TCG is better than not optimizing.
 +         */
 +        ctx->prev_mb->args[0] |= op->args[0];
 +        tcg_op_remove(ctx->tcg, op);
 +    } else {
 +        ctx->prev_mb = op;
 +    }
 +    return true;
 +}
 +
-+static bool fold_qemu_ld(OptContext *ctx, TCGOp *op)
++static void gen_inline_cb(struct qemu_plugin_dyn_cb *cb)
 +{
-+    /* Opcodes that touch guest memory stop the mb optimization.  */
++    GArray *arr = cb->inline_insn.entry.score->data;
-+    ctx->prev_mb = NULL;
++    size_t offset = cb->inline_insn.entry.offset;
-+    return false;
++    TCGv_i32 cpu_index = tcg_temp_ebb_new_i32();
 +    TCGv_i64 val = tcg_temp_ebb_new_i64();
 +    TCGv_ptr ptr = tcg_temp_ebb_new_ptr();
 +
 +    tcg_gen_ld_i32(cpu_index, tcg_env,
 +                   -offsetof(ArchCPU, env) + offsetof(CPUState, cpu_index));
 +    tcg_gen_muli_i32(cpu_index, cpu_index, g_array_get_element_size(arr));
 +    tcg_gen_ext_i32_ptr(ptr, cpu_index);
 +    tcg_temp_free_i32(cpu_index);
 +
 +    tcg_gen_addi_ptr(ptr, ptr, (intptr_t)arr->data);
 +    tcg_gen_ld_i64(val, ptr, offset);
 +    tcg_gen_addi_i64(val, val, cb->inline_insn.imm);
 +    tcg_gen_st_i64(val, ptr, offset);
 +
 +    tcg_temp_free_i64(val);
 +    tcg_temp_free_ptr(ptr);
 +}
 +
-+static bool fold_qemu_st(OptContext *ctx, TCGOp *op)
+ /* #define DEBUG_PLUGIN_GEN_OPS */
-+{
+ static void pr_ops(void)
-+    /* Opcodes that touch guest memory stop the mb optimization.  */
+ {
-+    ctx->prev_mb = NULL;
+@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
-+    return false;
+         {
-+}
+             enum plugin_gen_from from = op->args[0];
              struct qemu_plugin_insn *insn = NULL;
 +            const GArray *cbs;
 +            int i, n;
              if (insn_idx >= 0) {
                  insn = g_ptr_array_index(plugin_tb->insns, insn_idx);
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
                  assert(insn != NULL);
                  gen_disable_mem_helper(plugin_tb, insn);
                  break;
 +
- /* Propagate constants and copies, fold constant expressions. */
++            case PLUGIN_GEN_FROM_TB:
- void tcg_optimize(TCGContext *s)
++                assert(insn == NULL);
- {
++
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
++                cbs = plugin_tb->cbs[PLUGIN_CB_REGULAR];
 +                for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
 +                    struct qemu_plugin_dyn_cb *cb =
 +                        &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
 +                    gen_udata_cb(cb);
 +                }
 +
 +                cbs = plugin_tb->cbs[PLUGIN_CB_INLINE];
 +                for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
 +                    struct qemu_plugin_dyn_cb *cb =
 +                        &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
 +                    gen_inline_cb(cb);
 +                }
 +                break;
 +
              default:
                  g_assert_not_reached();
              }
-             break;
+@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
+             enum plugin_gen_cb type = op->args[1];
-+        case INDEX_op_mb:
-+            done = fold_mb(&ctx, op);
+             switch (from) {
-+            break;
+-            case PLUGIN_GEN_FROM_TB:
-+        case INDEX_op_qemu_ld_i32:
+-            {
-+        case INDEX_op_qemu_ld_i64:
+-                g_assert(insn_idx == -1);
 +            done = fold_qemu_ld(&ctx, op);
 +            break;
 +        case INDEX_op_qemu_st_i32:
 +        case INDEX_op_qemu_st8_i32:
 +        case INDEX_op_qemu_st_i64:
 +            done = fold_qemu_st(&ctx, op);
 +            break;
 +
          default:
              break;
          }
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
          if (!done) {
              finish_folding(&ctx, op);
          }
 -
--        /* Eliminate duplicate and redundant fence instructions.  */
+-                switch (type) {
--        if (ctx.prev_mb) {
+-                case PLUGIN_GEN_CB_UDATA:
--            switch (opc) {
+-                    plugin_gen_tb_udata(plugin_tb, op);
 -            case INDEX_op_mb:
 -                /* Merge two barriers of the same type into one,
 -                 * or a weaker barrier into a stronger one,
 -                 * or two weaker barriers into a stronger one.
 -                 *   mb X; mb Y => mb X|Y
 -                 *   mb; strl => mb; st
 -                 *   ldaq; mb => ld; mb
 -                 *   ldaq; strl => ld; mb; st
 -                 * Other combinations are also merged into a strong
 -                 * barrier.  This is stricter than specified but for
 -                 * the purposes of TCG is better than not optimizing.
 -                 */
 -                ctx.prev_mb->args[0] |= op->args[0];
 -                tcg_op_remove(s, op);
 -                break;
 -
 -            default:
 -                /* Opcodes that end the block stop the optimization.  */
 -                if ((def->flags & TCG_OPF_BB_END) == 0) {
 -                    break;
+-                case PLUGIN_GEN_CB_UDATA_R:
+-                    plugin_gen_tb_udata_r(plugin_tb, op);
+-                    break;
+-                case PLUGIN_GEN_CB_INLINE:
+-                    plugin_gen_tb_inline(plugin_tb, op);
+-                    break;
+-                default:
+-                    g_assert_not_reached();
 -                }
--                /* fallthru */
--            case INDEX_op_qemu_ld_i32:
--            case INDEX_op_qemu_ld_i64:
--            case INDEX_op_qemu_st_i32:
--            case INDEX_op_qemu_st8_i32:
--            case INDEX_op_qemu_st_i64:
--                /* Opcodes that touch guest memory stop the optimization.  */
--                ctx.prev_mb = NULL;
 -                break;
 -            }
--        } else if (opc == INDEX_op_mb) {
+             case PLUGIN_GEN_FROM_INSN:
--            ctx.prev_mb = op;
+             {
--        }
+                 g_assert(insn_idx >= 0);
 diff --git a/plugins/api.c b/plugins/api.c
 index XXXXXXX..XXXXXXX 100644
 --- a/plugins/api.c
 +++ b/plugins/api.c
@@ -XXX,XX +XXX,XX @@ void qemu_plugin_register_vcpu_tb_exec_cb(struct qemu_plugin_tb *tb,
                                            void *udata)
  {
      if (!tb->mem_only) {
 -        int index = flags == QEMU_PLUGIN_CB_R_REGS ||
 -                    flags == QEMU_PLUGIN_CB_RW_REGS ?
 -                    PLUGIN_CB_REGULAR_R : PLUGIN_CB_REGULAR;
 -
 -        plugin_register_dyn_cb__udata(&tb->cbs[index],
 +        plugin_register_dyn_cb__udata(&tb->cbs[PLUGIN_CB_REGULAR],
                                        cb, flags, udata);
      }
  }
 --
-.25.1
+.34.1

-[PULL 06/56] tcg/optimize: Rename "mask" to "z_mask"
+[PULL 08/20] plugins: Add PLUGIN_GEN_AFTER_TB
-Prepare for tracking different masks by renaming this one.
+Delay test of plugin_tb->mem_helper until the inject pass.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
 Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/optimize.c | 142 +++++++++++++++++++++++++------------------------
+ accel/tcg/plugin-gen.c | 37 ++++++++++++++++---------------------
-file changed, 72 insertions(+), 70 deletions(-)
+file changed, 16 insertions(+), 21 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
+diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
+--- a/accel/tcg/plugin-gen.c
-+++ b/tcg/optimize.c
++++ b/accel/tcg/plugin-gen.c
-@@ -XXX,XX +XXX,XX @@ typedef struct TempOptInfo {
+@@ -XXX,XX +XXX,XX @@ enum plugin_gen_from {
-     TCGTemp *prev_copy;
+     PLUGIN_GEN_FROM_INSN,
-     TCGTemp *next_copy;
+     PLUGIN_GEN_FROM_MEM,
-     uint64_t val;
+     PLUGIN_GEN_AFTER_INSN,
--    uint64_t mask;
++    PLUGIN_GEN_AFTER_TB,
-+    uint64_t z_mask;  /* mask bit is 0 if and only if value bit is 0 */
+     PLUGIN_GEN_N_FROMS,
- } TempOptInfo;
+ };
- static inline TempOptInfo *ts_info(TCGTemp *ts)
+@@ -XXX,XX +XXX,XX @@ static void inject_mem_enable_helper(struct qemu_plugin_tb *ptb,
-@@ -XXX,XX +XXX,XX @@ static void reset_ts(TCGTemp *ts)
+ /* called before finishing a TB with exit_tb, goto_tb or goto_ptr */
-     ti->next_copy = ts;
+ void plugin_gen_disable_mem_helpers(void)
-     ti->prev_copy = ts;
+ {
-     ti->is_const = false;
+-    /*
--    ti->mask = -1;
+-     * We could emit the clearing unconditionally and be done. However, this can
-+    ti->z_mask = -1;
+-     * be wasteful if for instance plugins don't track memory accesses, or if
 -     * most TBs don't use helpers. Instead, emit the clearing iff the TB calls
 -     * helpers that might access guest memory.
 -     *
 -     * Note: we do not reset plugin_tb->mem_helper here; a TB might have several
 -     * exit points, and we want to emit the clearing from all of them.
 -     */
 -    if (!tcg_ctx->plugin_tb->mem_helper) {
 -        return;
 +    if (tcg_ctx->plugin_insn) {
 +        tcg_gen_plugin_cb(PLUGIN_GEN_AFTER_TB);
      }
 -    tcg_gen_st_ptr(tcg_constant_ptr(NULL), tcg_env,
 -                   offsetof(CPUState, plugin_mem_cbs) - offsetof(ArchCPU, env));
  }
- static void reset_temp(TCGArg arg)
+ static void plugin_gen_insn_udata(const struct qemu_plugin_tb *ptb,
-@@ -XXX,XX +XXX,XX @@ static void init_ts_info(TCGTempSet *temps_used, TCGTemp *ts)
+@@ -XXX,XX +XXX,XX @@ static void plugin_gen_enable_mem_helper(struct qemu_plugin_tb *ptb,
-     if (ts->kind == TEMP_CONST) {
+     inject_mem_enable_helper(ptb, insn, begin_op);
          ti->is_const = true;
          ti->val = ts->val;
 -        ti->mask = ts->val;
 +        ti->z_mask = ts->val;
          if (TCG_TARGET_REG_BITS > 32 && ts->type == TCG_TYPE_I32) {
              /* High bits of a 32-bit quantity are garbage.  */
 -            ti->mask |= ~0xffffffffull;
 +            ti->z_mask |= ~0xffffffffull;
          }
      } else {
          ti->is_const = false;
 -        ti->mask = -1;
 +        ti->z_mask = -1;
      }
  }
-@@ -XXX,XX +XXX,XX @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
+-static void gen_disable_mem_helper(struct qemu_plugin_tb *ptb,
-     const TCGOpDef *def;
+-                                   struct qemu_plugin_insn *insn)
-     TempOptInfo *di;
++static void gen_disable_mem_helper(void)
-     TempOptInfo *si;
+ {
--    uint64_t mask;
+-    if (insn->mem_helper) {
-+    uint64_t z_mask;
+-        tcg_gen_st_ptr(tcg_constant_ptr(0), tcg_env,
-     TCGOpcode new_op;
+-                       offsetof(CPUState, plugin_mem_cbs) -
+-                       offsetof(ArchCPU, env));
-     if (ts_are_copies(dst_ts, src_ts)) {
+-    }
-@@ -XXX,XX +XXX,XX @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
++    tcg_gen_st_ptr(tcg_constant_ptr(0), tcg_env,
-     op->args[0] = dst;
++                   offsetof(CPUState, plugin_mem_cbs) -
-     op->args[1] = src;
++                   offsetof(ArchCPU, env));
+ }
--    mask = si->mask;
-+    z_mask = si->z_mask;
+ static void gen_udata_cb(struct qemu_plugin_dyn_cb *cb)
-     if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_mov_i32) {
+@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
-         /* High bits of the destination are now garbage.  */
+             tcg_ctx->emit_before_op = op;
--        mask |= ~0xffffffffull;
-+        z_mask |= ~0xffffffffull;
+             switch (from) {
-     }
++            case PLUGIN_GEN_AFTER_TB:
--    di->mask = mask;
++                if (plugin_tb->mem_helper) {
-+    di->z_mask = z_mask;
++                    gen_disable_mem_helper();
++                }
-     if (src_ts->type == dst_ts->type) {
++                break;
-         TempOptInfo *ni = ts_info(si->next_copy);
++
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
+             case PLUGIN_GEN_AFTER_INSN:
-     }
+                 assert(insn != NULL);
+-                gen_disable_mem_helper(plugin_tb, insn);
-     QTAILQ_FOREACH_SAFE(op, &s->ops, link, op_next) {
++                if (insn->mem_helper) {
--        uint64_t mask, partmask, affected, tmp;
++                    gen_disable_mem_helper();
-+        uint64_t z_mask, partmask, affected, tmp;
++                }
          int nb_oargs, nb_iargs;
          TCGOpcode opc = op->opc;
          const TCGOpDef *def = &tcg_op_defs[opc];
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
          /* Simplify using known-zero bits. Currently only ops with a single
             output argument is supported. */
 -        mask = -1;
 +        z_mask = -1;
          affected = -1;
          switch (opc) {
          CASE_OP_32_64(ext8s):
 -            if ((arg_info(op->args[1])->mask & 0x80) != 0) {
 +            if ((arg_info(op->args[1])->z_mask & 0x80) != 0) {
                  break;
-             }
-             QEMU_FALLTHROUGH;
+             case PLUGIN_GEN_FROM_TB:
          CASE_OP_32_64(ext8u):
 -            mask = 0xff;
 +            z_mask = 0xff;
              goto and_const;
          CASE_OP_32_64(ext16s):
 -            if ((arg_info(op->args[1])->mask & 0x8000) != 0) {
 +            if ((arg_info(op->args[1])->z_mask & 0x8000) != 0) {
                  break;
              }
              QEMU_FALLTHROUGH;
          CASE_OP_32_64(ext16u):
 -            mask = 0xffff;
 +            z_mask = 0xffff;
              goto and_const;
          case INDEX_op_ext32s_i64:
 -            if ((arg_info(op->args[1])->mask & 0x80000000) != 0) {
 +            if ((arg_info(op->args[1])->z_mask & 0x80000000) != 0) {
                  break;
              }
              QEMU_FALLTHROUGH;
          case INDEX_op_ext32u_i64:
 -            mask = 0xffffffffU;
 +            z_mask = 0xffffffffU;
              goto and_const;
          CASE_OP_32_64(and):
 -            mask = arg_info(op->args[2])->mask;
 +            z_mask = arg_info(op->args[2])->z_mask;
              if (arg_is_const(op->args[2])) {
          and_const:
 -                affected = arg_info(op->args[1])->mask & ~mask;
 +                affected = arg_info(op->args[1])->z_mask & ~z_mask;
              }
 -            mask = arg_info(op->args[1])->mask & mask;
 +            z_mask = arg_info(op->args[1])->z_mask & z_mask;
              break;
          case INDEX_op_ext_i32_i64:
 -            if ((arg_info(op->args[1])->mask & 0x80000000) != 0) {
 +            if ((arg_info(op->args[1])->z_mask & 0x80000000) != 0) {
                  break;
              }
              QEMU_FALLTHROUGH;
          case INDEX_op_extu_i32_i64:
              /* We do not compute affected as it is a size changing op.  */
 -            mask = (uint32_t)arg_info(op->args[1])->mask;
 +            z_mask = (uint32_t)arg_info(op->args[1])->z_mask;
              break;
          CASE_OP_32_64(andc):
              /* Known-zeros does not imply known-ones.  Therefore unless
                 op->args[2] is constant, we can't infer anything from it.  */
              if (arg_is_const(op->args[2])) {
 -                mask = ~arg_info(op->args[2])->mask;
 +                z_mask = ~arg_info(op->args[2])->z_mask;
                  goto and_const;
              }
              /* But we certainly know nothing outside args[1] may be set. */
 -            mask = arg_info(op->args[1])->mask;
 +            z_mask = arg_info(op->args[1])->z_mask;
              break;
          case INDEX_op_sar_i32:
              if (arg_is_const(op->args[2])) {
                  tmp = arg_info(op->args[2])->val & 31;
 -                mask = (int32_t)arg_info(op->args[1])->mask >> tmp;
 +                z_mask = (int32_t)arg_info(op->args[1])->z_mask >> tmp;
              }
              break;
          case INDEX_op_sar_i64:
              if (arg_is_const(op->args[2])) {
                  tmp = arg_info(op->args[2])->val & 63;
 -                mask = (int64_t)arg_info(op->args[1])->mask >> tmp;
 +                z_mask = (int64_t)arg_info(op->args[1])->z_mask >> tmp;
              }
              break;
          case INDEX_op_shr_i32:
              if (arg_is_const(op->args[2])) {
                  tmp = arg_info(op->args[2])->val & 31;
 -                mask = (uint32_t)arg_info(op->args[1])->mask >> tmp;
 +                z_mask = (uint32_t)arg_info(op->args[1])->z_mask >> tmp;
              }
              break;
          case INDEX_op_shr_i64:
              if (arg_is_const(op->args[2])) {
                  tmp = arg_info(op->args[2])->val & 63;
 -                mask = (uint64_t)arg_info(op->args[1])->mask >> tmp;
 +                z_mask = (uint64_t)arg_info(op->args[1])->z_mask >> tmp;
              }
              break;
          case INDEX_op_extrl_i64_i32:
 -            mask = (uint32_t)arg_info(op->args[1])->mask;
 +            z_mask = (uint32_t)arg_info(op->args[1])->z_mask;
              break;
          case INDEX_op_extrh_i64_i32:
 -            mask = (uint64_t)arg_info(op->args[1])->mask >> 32;
 +            z_mask = (uint64_t)arg_info(op->args[1])->z_mask >> 32;
              break;
          CASE_OP_32_64(shl):
              if (arg_is_const(op->args[2])) {
                  tmp = arg_info(op->args[2])->val & (TCG_TARGET_REG_BITS - 1);
 -                mask = arg_info(op->args[1])->mask << tmp;
 +                z_mask = arg_info(op->args[1])->z_mask << tmp;
              }
              break;
          CASE_OP_32_64(neg):
              /* Set to 1 all bits to the left of the rightmost.  */
 -            mask = -(arg_info(op->args[1])->mask
 -                     & -arg_info(op->args[1])->mask);
 +            z_mask = -(arg_info(op->args[1])->z_mask
 +                       & -arg_info(op->args[1])->z_mask);
              break;
          CASE_OP_32_64(deposit):
 -            mask = deposit64(arg_info(op->args[1])->mask,
 -                             op->args[3], op->args[4],
 -                             arg_info(op->args[2])->mask);
 +            z_mask = deposit64(arg_info(op->args[1])->z_mask,
 +                               op->args[3], op->args[4],
 +                               arg_info(op->args[2])->z_mask);
              break;
          CASE_OP_32_64(extract):
 -            mask = extract64(arg_info(op->args[1])->mask,
 -                             op->args[2], op->args[3]);
 +            z_mask = extract64(arg_info(op->args[1])->z_mask,
 +                               op->args[2], op->args[3]);
              if (op->args[2] == 0) {
 -                affected = arg_info(op->args[1])->mask & ~mask;
 +                affected = arg_info(op->args[1])->z_mask & ~z_mask;
              }
              break;
          CASE_OP_32_64(sextract):
 -            mask = sextract64(arg_info(op->args[1])->mask,
 -                              op->args[2], op->args[3]);
 -            if (op->args[2] == 0 && (tcg_target_long)mask >= 0) {
 -                affected = arg_info(op->args[1])->mask & ~mask;
 +            z_mask = sextract64(arg_info(op->args[1])->z_mask,
 +                                op->args[2], op->args[3]);
 +            if (op->args[2] == 0 && (tcg_target_long)z_mask >= 0) {
 +                affected = arg_info(op->args[1])->z_mask & ~z_mask;
              }
              break;
          CASE_OP_32_64(or):
          CASE_OP_32_64(xor):
 -            mask = arg_info(op->args[1])->mask | arg_info(op->args[2])->mask;
 +            z_mask = arg_info(op->args[1])->z_mask
 +                   | arg_info(op->args[2])->z_mask;
              break;
          case INDEX_op_clz_i32:
          case INDEX_op_ctz_i32:
 -            mask = arg_info(op->args[2])->mask | 31;
 +            z_mask = arg_info(op->args[2])->z_mask | 31;
              break;
          case INDEX_op_clz_i64:
          case INDEX_op_ctz_i64:
 -            mask = arg_info(op->args[2])->mask | 63;
 +            z_mask = arg_info(op->args[2])->z_mask | 63;
              break;
          case INDEX_op_ctpop_i32:
 -            mask = 32 | 31;
 +            z_mask = 32 | 31;
              break;
          case INDEX_op_ctpop_i64:
 -            mask = 64 | 63;
 +            z_mask = 64 | 63;
              break;
          CASE_OP_32_64(setcond):
          case INDEX_op_setcond2_i32:
 -            mask = 1;
 +            z_mask = 1;
              break;
          CASE_OP_32_64(movcond):
 -            mask = arg_info(op->args[3])->mask | arg_info(op->args[4])->mask;
 +            z_mask = arg_info(op->args[3])->z_mask
 +                   | arg_info(op->args[4])->z_mask;
              break;
          CASE_OP_32_64(ld8u):
 -            mask = 0xff;
 +            z_mask = 0xff;
              break;
          CASE_OP_32_64(ld16u):
 -            mask = 0xffff;
 +            z_mask = 0xffff;
              break;
          case INDEX_op_ld32u_i64:
 -            mask = 0xffffffffu;
 +            z_mask = 0xffffffffu;
              break;
          CASE_OP_32_64(qemu_ld):
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                  MemOpIdx oi = op->args[nb_oargs + nb_iargs];
                  MemOp mop = get_memop(oi);
                  if (!(mop & MO_SIGN)) {
 -                    mask = (2ULL << ((8 << (mop & MO_SIZE)) - 1)) - 1;
 +                    z_mask = (2ULL << ((8 << (mop & MO_SIZE)) - 1)) - 1;
                  }
              }
              break;
          CASE_OP_32_64(bswap16):
 -            mask = arg_info(op->args[1])->mask;
 -            if (mask <= 0xffff) {
 +            z_mask = arg_info(op->args[1])->z_mask;
 +            if (z_mask <= 0xffff) {
                  op->args[2] |= TCG_BSWAP_IZ;
              }
 -            mask = bswap16(mask);
 +            z_mask = bswap16(z_mask);
              switch (op->args[2] & (TCG_BSWAP_OZ | TCG_BSWAP_OS)) {
              case TCG_BSWAP_OZ:
                  break;
              case TCG_BSWAP_OS:
 -                mask = (int16_t)mask;
 +                z_mask = (int16_t)z_mask;
                  break;
              default: /* undefined high bits */
 -                mask |= MAKE_64BIT_MASK(16, 48);
 +                z_mask |= MAKE_64BIT_MASK(16, 48);
                  break;
              }
              break;
          case INDEX_op_bswap32_i64:
 -            mask = arg_info(op->args[1])->mask;
 -            if (mask <= 0xffffffffu) {
 +            z_mask = arg_info(op->args[1])->z_mask;
 +            if (z_mask <= 0xffffffffu) {
                  op->args[2] |= TCG_BSWAP_IZ;
              }
 -            mask = bswap32(mask);
 +            z_mask = bswap32(z_mask);
              switch (op->args[2] & (TCG_BSWAP_OZ | TCG_BSWAP_OS)) {
              case TCG_BSWAP_OZ:
                  break;
              case TCG_BSWAP_OS:
 -                mask = (int32_t)mask;
 +                z_mask = (int32_t)z_mask;
                  break;
              default: /* undefined high bits */
 -                mask |= MAKE_64BIT_MASK(32, 32);
 +                z_mask |= MAKE_64BIT_MASK(32, 32);
                  break;
              }
              break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
          /* 32-bit ops generate 32-bit results.  For the result is zero test
             below, we can ignore high bits, but for further optimizations we
             need to record that the high bits contain garbage.  */
 -        partmask = mask;
 +        partmask = z_mask;
          if (!(def->flags & TCG_OPF_64BIT)) {
 -            mask |= ~(tcg_target_ulong)0xffffffffu;
 +            z_mask |= ~(tcg_target_ulong)0xffffffffu;
              partmask &= 0xffffffffu;
              affected &= 0xffffffffu;
          }
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                     vs the high word of the input.  */
              do_setcond_high:
                  reset_temp(op->args[0]);
 -                arg_info(op->args[0])->mask = 1;
 +                arg_info(op->args[0])->z_mask = 1;
                  op->opc = INDEX_op_setcond_i32;
                  op->args[1] = op->args[2];
                  op->args[2] = op->args[4];
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                  }
              do_setcond_low:
                  reset_temp(op->args[0]);
 -                arg_info(op->args[0])->mask = 1;
 +                arg_info(op->args[0])->z_mask = 1;
                  op->opc = INDEX_op_setcond_i32;
                  op->args[2] = op->args[3];
                  op->args[3] = op->args[5];
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
              /* Default case: we know nothing about operation (or were unable
                 to compute the operation result) so no propagation is done.
                 We trash everything if the operation is the end of a basic
 -               block, otherwise we only trash the output args.  "mask" is
 +               block, otherwise we only trash the output args.  "z_mask" is
                 the non-zero bits mask for the first output arg.  */
              if (def->flags & TCG_OPF_BB_END) {
                  memset(&temps_used, 0, sizeof(temps_used));
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                      /* Save the corresponding known-zero bits mask for the
                         first output argument (only one supported so far). */
                      if (i == 0) {
 -                        arg_info(op->args[i])->mask = mask;
 +                        arg_info(op->args[i])->z_mask = z_mask;
                      }
                  }
              }
 --
-.25.1
+.34.1

-[PULL 17/56] tcg/optimize: Split out finish_folding
+[PULL 09/20] plugins: Use emit_before_op for PLUGIN_GEN_FROM_INSN
-Copy z_mask into OptContext, for writeback to the
+Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
 first output within the new function.
 Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/optimize.c | 49 +++++++++++++++++++++++++++++++++----------------
+ include/qemu/plugin.h  |   1 -
-file changed, 33 insertions(+), 16 deletions(-)
+ accel/tcg/plugin-gen.c | 286 ++++++++++-------------------------------
  plugins/api.c          |   8 +-
 files changed, 67 insertions(+), 228 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
+diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
+--- a/include/qemu/plugin.h
-+++ b/tcg/optimize.c
++++ b/include/qemu/plugin.h
-@@ -XXX,XX +XXX,XX @@ typedef struct OptContext {
+@@ -XXX,XX +XXX,XX @@ enum plugin_dyn_cb_type {
-     TCGContext *tcg;
-     TCGOp *prev_mb;
+ enum plugin_dyn_cb_subtype {
-     TCGTempSet temps_used;
+     PLUGIN_CB_REGULAR,
-+
+-    PLUGIN_CB_REGULAR_R,
-+    /* In flight values from optimization. */
+     PLUGIN_CB_INLINE,
-+    uint64_t z_mask;
+     PLUGIN_N_CB_SUBTYPES,
- } OptContext;
+ };
+diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
- static inline TempOptInfo *ts_info(TCGTemp *ts)
+index XXXXXXX..XXXXXXX 100644
-@@ -XXX,XX +XXX,XX @@ static void copy_propagate(OptContext *ctx, TCGOp *op,
+--- a/accel/tcg/plugin-gen.c
 +++ b/accel/tcg/plugin-gen.c
@@ -XXX,XX +XXX,XX @@ void HELPER(plugin_vcpu_mem_cb)(unsigned int vcpu_index,
                                  void *userdata)
  { }
 -static void gen_empty_udata_cb(void (*gen_helper)(TCGv_i32, TCGv_ptr))
 -{
 -    TCGv_i32 cpu_index = tcg_temp_ebb_new_i32();
 -    TCGv_ptr udata = tcg_temp_ebb_new_ptr();
 -
 -    tcg_gen_movi_ptr(udata, 0);
 -    tcg_gen_ld_i32(cpu_index, tcg_env,
 -                   -offsetof(ArchCPU, env) + offsetof(CPUState, cpu_index));
 -    gen_helper(cpu_index, udata);
 -
 -    tcg_temp_free_ptr(udata);
 -    tcg_temp_free_i32(cpu_index);
 -}
 -
 -static void gen_empty_udata_cb_no_wg(void)
 -{
 -    gen_empty_udata_cb(gen_helper_plugin_vcpu_udata_cb_no_wg);
 -}
 -
 -static void gen_empty_udata_cb_no_rwg(void)
 -{
 -    gen_empty_udata_cb(gen_helper_plugin_vcpu_udata_cb_no_rwg);
 -}
 -
  /*
   * For now we only support addi_i64.
   * When we support more ops, we can generate one empty inline cb for each.
@@ -XXX,XX +XXX,XX @@ static void gen_empty_mem_cb(TCGv_i64 addr, uint32_t info)
      tcg_temp_free_i32(cpu_index);
  }
 -/*
 - * Share the same function for enable/disable. When enabling, the NULL
 - * pointer will be overwritten later.
 - */
 -static void gen_empty_mem_helper(void)
 -{
 -    TCGv_ptr ptr = tcg_temp_ebb_new_ptr();
 -
 -    tcg_gen_movi_ptr(ptr, 0);
 -    tcg_gen_st_ptr(ptr, tcg_env, offsetof(CPUState, plugin_mem_cbs) -
 -                                 offsetof(ArchCPU, env));
 -    tcg_temp_free_ptr(ptr);
 -}
 -
  static void gen_plugin_cb_start(enum plugin_gen_from from,
                                  enum plugin_gen_cb type, unsigned wr)
  {
      tcg_gen_plugin_cb_start(from, type, wr);
  }
 -static void gen_wrapped(enum plugin_gen_from from,
 -                        enum plugin_gen_cb type, void (*func)(void))
 -{
 -    gen_plugin_cb_start(from, type, 0);
 -    func();
 -    tcg_gen_plugin_cb_end();
 -}
 -
  static void plugin_gen_empty_callback(enum plugin_gen_from from)
  {
      switch (from) {
      case PLUGIN_GEN_AFTER_INSN:
      case PLUGIN_GEN_FROM_TB:
 -        tcg_gen_plugin_cb(from);
 -        break;
      case PLUGIN_GEN_FROM_INSN:
 -        /*
 -         * Note: plugin_gen_inject() relies on ENABLE_MEM_HELPER being
 -         * the first callback of an instruction
 -         */
 -        gen_wrapped(from, PLUGIN_GEN_ENABLE_MEM_HELPER,
 -                    gen_empty_mem_helper);
 -        gen_wrapped(from, PLUGIN_GEN_CB_UDATA, gen_empty_udata_cb_no_rwg);
 -        gen_wrapped(from, PLUGIN_GEN_CB_UDATA_R, gen_empty_udata_cb_no_wg);
 -        gen_wrapped(from, PLUGIN_GEN_CB_INLINE, gen_empty_inline_cb);
 +        tcg_gen_plugin_cb(from);
          break;
      default:
          g_assert_not_reached();
@@ -XXX,XX +XXX,XX @@ static TCGOp *copy_mul_i32(TCGOp **begin_op, TCGOp *op, uint32_t v)
      return op;
  }
 -static TCGOp *copy_st_ptr(TCGOp **begin_op, TCGOp *op)
 -{
 -    if (UINTPTR_MAX == UINT32_MAX) {
 -        /* st_i32 */
 -        op = copy_op(begin_op, op, INDEX_op_st_i32);
 -    } else {
 -        /* st_i64 */
 -        op = copy_st_i64(begin_op, op);
 -    }
 -    return op;
 -}
 -
  static TCGOp *copy_call(TCGOp **begin_op, TCGOp *op, void *func, int *cb_idx)
  {
      TCGOp *old_op;
@@ -XXX,XX +XXX,XX @@ static TCGOp *copy_call(TCGOp **begin_op, TCGOp *op, void *func, int *cb_idx)
      return op;
  }
 -/*
 - * When we append/replace ops here we are sensitive to changing patterns of
 - * TCGOps generated by the tcg_gen_FOO calls when we generated the
 - * empty callbacks. This will assert very quickly in a debug build as
 - * we assert the ops we are replacing are the correct ones.
 - */
 -static TCGOp *append_udata_cb(const struct qemu_plugin_dyn_cb *cb,
 -                              TCGOp *begin_op, TCGOp *op, int *cb_idx)
 -{
 -    /* const_ptr */
 -    op = copy_const_ptr(&begin_op, op, cb->userp);
 -
 -    /* copy the ld_i32, but note that we only have to copy it once */
 -    if (*cb_idx == -1) {
 -        op = copy_op(&begin_op, op, INDEX_op_ld_i32);
 -    } else {
 -        begin_op = QTAILQ_NEXT(begin_op, link);
 -        tcg_debug_assert(begin_op && begin_op->opc == INDEX_op_ld_i32);
 -    }
 -
 -    /* call */
 -    op = copy_call(&begin_op, op, cb->regular.f.vcpu_udata, cb_idx);
 -
 -    return op;
 -}
 -
  static TCGOp *append_inline_cb(const struct qemu_plugin_dyn_cb *cb,
                                 TCGOp *begin_op, TCGOp *op,
                                 int *unused)
@@ -XXX,XX +XXX,XX @@ typedef TCGOp *(*inject_fn)(const struct qemu_plugin_dyn_cb *cb,
                              TCGOp *begin_op, TCGOp *op, int *intp);
  typedef bool (*op_ok_fn)(const TCGOp *op, const struct qemu_plugin_dyn_cb *cb);
 -static bool op_ok(const TCGOp *op, const struct qemu_plugin_dyn_cb *cb)
 -{
 -    return true;
 -}
 -
  static bool op_rw(const TCGOp *op, const struct qemu_plugin_dyn_cb *cb)
  {
      int w;
@@ -XXX,XX +XXX,XX @@ static void inject_cb_type(const GArray *cbs, TCGOp *begin_op,
      rm_ops_range(begin_op, end_op);
  }
 -static void
 -inject_udata_cb(const GArray *cbs, TCGOp *begin_op)
 -{
 -    inject_cb_type(cbs, begin_op, append_udata_cb, op_ok);
 -}
 -
  static void
  inject_inline_cb(const GArray *cbs, TCGOp *begin_op, op_ok_fn ok)
  {
@@ -XXX,XX +XXX,XX @@ inject_mem_cb(const GArray *cbs, TCGOp *begin_op)
      inject_cb_type(cbs, begin_op, append_mem_cb, op_rw);
  }
 -/* we could change the ops in place, but we can reuse more code by copying */
 -static void inject_mem_helper(TCGOp *begin_op, GArray *arr)
 -{
 -    TCGOp *orig_op = begin_op;
 -    TCGOp *end_op;
 -    TCGOp *op;
 -
 -    end_op = find_op(begin_op, INDEX_op_plugin_cb_end);
 -    tcg_debug_assert(end_op);
 -
 -    /* const ptr */
 -    op = copy_const_ptr(&begin_op, end_op, arr);
 -
 -    /* st_ptr */
 -    op = copy_st_ptr(&begin_op, op);
 -
 -    rm_ops_range(orig_op, end_op);
 -}
 -
 -/*
 - * Tracking memory accesses performed from helpers requires extra work.
 - * If an instruction is emulated with helpers, we do two things:
 - * (1) copy the CB descriptors, and keep track of it so that they can be
 - * freed later on, and (2) point CPUState.plugin_mem_cbs to the descriptors, so
 - * that we can read them at run-time (i.e. when the helper executes).
 - * This run-time access is performed from qemu_plugin_vcpu_mem_cb.
 - *
 - * Note that plugin_gen_disable_mem_helpers undoes (2). Since it
 - * is possible that the code we generate after the instruction is
 - * dead, we also add checks before generating tb_exit etc.
 - */
 -static void inject_mem_enable_helper(struct qemu_plugin_tb *ptb,
 -                                     struct qemu_plugin_insn *plugin_insn,
 -                                     TCGOp *begin_op)
 -{
 -    GArray *cbs[2];
 -    GArray *arr;
 -    size_t n_cbs, i;
 -
 -    cbs[0] = plugin_insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR];
 -    cbs[1] = plugin_insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE];
 -
 -    n_cbs = 0;
 -    for (i = 0; i < ARRAY_SIZE(cbs); i++) {
 -        n_cbs += cbs[i]->len;
 -    }
 -
 -    plugin_insn->mem_helper = plugin_insn->calls_helpers && n_cbs;
 -    if (likely(!plugin_insn->mem_helper)) {
 -        rm_ops(begin_op);
 -        return;
 -    }
 -    ptb->mem_helper = true;
 -
 -    arr = g_array_sized_new(false, false,
 -                            sizeof(struct qemu_plugin_dyn_cb), n_cbs);
 -
 -    for (i = 0; i < ARRAY_SIZE(cbs); i++) {
 -        g_array_append_vals(arr, cbs[i]->data, cbs[i]->len);
 -    }
 -
 -    qemu_plugin_add_dyn_cb_arr(arr);
 -    inject_mem_helper(begin_op, arr);
 -}
 -
  /* called before finishing a TB with exit_tb, goto_tb or goto_ptr */
  void plugin_gen_disable_mem_helpers(void)
  {
@@ -XXX,XX +XXX,XX @@ void plugin_gen_disable_mem_helpers(void)
      }
  }
-+static void finish_folding(OptContext *ctx, TCGOp *op)
+-static void plugin_gen_insn_udata(const struct qemu_plugin_tb *ptb,
-+{
+-                                  TCGOp *begin_op, int insn_idx)
-+    const TCGOpDef *def = &tcg_op_defs[op->opc];
+-{
-+    int i, nb_oargs;
+-    struct qemu_plugin_insn *insn = g_ptr_array_index(ptb->insns, insn_idx);
 -
 -    inject_udata_cb(insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR], begin_op);
 -}
 -
 -static void plugin_gen_insn_udata_r(const struct qemu_plugin_tb *ptb,
 -                                    TCGOp *begin_op, int insn_idx)
 -{
 -    struct qemu_plugin_insn *insn = g_ptr_array_index(ptb->insns, insn_idx);
 -
 -    inject_udata_cb(insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR_R], begin_op);
 -}
 -
 -static void plugin_gen_insn_inline(const struct qemu_plugin_tb *ptb,
 -                                   TCGOp *begin_op, int insn_idx)
 -{
 -    struct qemu_plugin_insn *insn = g_ptr_array_index(ptb->insns, insn_idx);
 -    inject_inline_cb(insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
 -                     begin_op, op_ok);
 -}
 -
  static void plugin_gen_mem_regular(const struct qemu_plugin_tb *ptb,
                                     TCGOp *begin_op, int insn_idx)
  {
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_mem_inline(const struct qemu_plugin_tb *ptb,
      inject_inline_cb(cbs, begin_op, op_rw);
  }
 -static void plugin_gen_enable_mem_helper(struct qemu_plugin_tb *ptb,
 -                                         TCGOp *begin_op, int insn_idx)
 +static void gen_enable_mem_helper(struct qemu_plugin_tb *ptb,
 +                                  struct qemu_plugin_insn *insn)
  {
 -    struct qemu_plugin_insn *insn = g_ptr_array_index(ptb->insns, insn_idx);
 -    inject_mem_enable_helper(ptb, insn, begin_op);
 +    GArray *cbs[2];
 +    GArray *arr;
 +    size_t n_cbs;
 +
 +    /*
-+     * For an opcode that ends a BB, reset all temp data.
++     * Tracking memory accesses performed from helpers requires extra work.
-+     * We do no cross-BB optimization.
++     * If an instruction is emulated with helpers, we do two things:
 +     * (1) copy the CB descriptors, and keep track of it so that they can be
 +     * freed later on, and (2) point CPUState.plugin_mem_cbs to the
 +     * descriptors, so that we can read them at run-time
 +     * (i.e. when the helper executes).
 +     * This run-time access is performed from qemu_plugin_vcpu_mem_cb.
 +     *
 +     * Note that plugin_gen_disable_mem_helpers undoes (2). Since it
 +     * is possible that the code we generate after the instruction is
 +     * dead, we also add checks before generating tb_exit etc.
 +     */
-+    if (def->flags & TCG_OPF_BB_END) {
++    if (!insn->calls_helpers) {
 +        memset(&ctx->temps_used, 0, sizeof(ctx->temps_used));
 +        ctx->prev_mb = NULL;
 +        return;
 +    }
 +
-+    nb_oargs = def->nb_oargs;
++    cbs[0] = insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR];
-+    for (i = 0; i < nb_oargs; i++) {
++    cbs[1] = insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE];
-+        reset_temp(op->args[i]);
++    n_cbs = cbs[0]->len + cbs[1]->len;
-+        /*
++
-+         * Save the corresponding known-zero bits mask for the
++    if (n_cbs == 0) {
-+         * first output argument (only one supported so far).
++        insn->mem_helper = false;
-+         */
++        return;
 +        if (i == 0) {
 +            arg_info(op->args[i])->z_mask = ctx->z_mask;
 +        }
 +    }
-+}
++    insn->mem_helper = true;
-+
++    ptb->mem_helper = true;
- static bool fold_call(OptContext *ctx, TCGOp *op)
++
- {
++    arr = g_array_sized_new(false, false,
-     TCGContext *s = ctx->tcg;
++                            sizeof(struct qemu_plugin_dyn_cb), n_cbs);
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
++    g_array_append_vals(arr, cbs[0]->data, cbs[0]->len);
-             partmask &= 0xffffffffu;
++    g_array_append_vals(arr, cbs[1]->data, cbs[1]->len);
-             affected &= 0xffffffffu;
++
-         }
++    qemu_plugin_add_dyn_cb_arr(arr);
-+        ctx.z_mask = z_mask;
++
++    tcg_gen_st_ptr(tcg_constant_ptr((intptr_t)arr), tcg_env,
-         if (partmask == 0) {
++                   offsetof(CPUState, plugin_mem_cbs) -
-             tcg_opt_gen_movi(&ctx, op, op->args[0], 0);
++                   offsetof(ArchCPU, env));
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
+ }
-             break;
-         }
+ static void gen_disable_mem_helper(void)
+@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
--        /* Some of the folding above can change opc. */
+                 }
--        opc = op->opc;
+                 break;
--        def = &tcg_op_defs[opc];
--        if (def->flags & TCG_OPF_BB_END) {
++            case PLUGIN_GEN_FROM_INSN:
--            memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
++                assert(insn != NULL);
--        } else {
++
--            int nb_oargs = def->nb_oargs;
++                gen_enable_mem_helper(plugin_tb, insn);
--            for (i = 0; i < nb_oargs; i++) {
++
--                reset_temp(op->args[i]);
++                cbs = insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR];
--                /* Save the corresponding known-zero bits mask for the
++                for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
--                   first output argument (only one supported so far). */
++                    struct qemu_plugin_dyn_cb *cb =
--                if (i == 0) {
++                        &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
--                    arg_info(op->args[i])->z_mask = z_mask;
++                    gen_udata_cb(cb);
 +                }
 +
 +                cbs = insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE];
 +                for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
 +                    struct qemu_plugin_dyn_cb *cb =
 +                        &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
 +                    gen_inline_cb(cb);
 +                }
 +                break;
 +
              default:
                  g_assert_not_reached();
              }
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
              enum plugin_gen_cb type = op->args[1];
              switch (from) {
 -            case PLUGIN_GEN_FROM_INSN:
 -            {
 -                g_assert(insn_idx >= 0);
 -
 -                switch (type) {
 -                case PLUGIN_GEN_CB_UDATA:
 -                    plugin_gen_insn_udata(plugin_tb, op, insn_idx);
 -                    break;
 -                case PLUGIN_GEN_CB_UDATA_R:
 -                    plugin_gen_insn_udata_r(plugin_tb, op, insn_idx);
 -                    break;
 -                case PLUGIN_GEN_CB_INLINE:
 -                    plugin_gen_insn_inline(plugin_tb, op, insn_idx);
 -                    break;
 -                case PLUGIN_GEN_ENABLE_MEM_HELPER:
 -                    plugin_gen_enable_mem_helper(plugin_tb, op, insn_idx);
 -                    break;
 -                default:
 -                    g_assert_not_reached();
 -                }
+-                break;
 -            }
--        }
+             case PLUGIN_GEN_FROM_MEM:
-+        finish_folding(&ctx, op);
+             {
+                 g_assert(insn_idx >= 0);
-         /* Eliminate duplicate and redundant fence instructions.  */
+diff --git a/plugins/api.c b/plugins/api.c
-         if (ctx.prev_mb) {
+index XXXXXXX..XXXXXXX 100644
 --- a/plugins/api.c
 +++ b/plugins/api.c
@@ -XXX,XX +XXX,XX @@ void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
                                              void *udata)
  {
      if (!insn->mem_only) {
 -        int index = flags == QEMU_PLUGIN_CB_R_REGS ||
 -                    flags == QEMU_PLUGIN_CB_RW_REGS ?
 -                    PLUGIN_CB_REGULAR_R : PLUGIN_CB_REGULAR;
 -
 -        plugin_register_dyn_cb__udata(&insn->cbs[PLUGIN_CB_INSN][index],
 -                                      cb, flags, udata);
 +        plugin_register_dyn_cb__udata(
 +            &insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR], cb, flags, udata);
      }
  }
 --
-.25.1
+.34.1

-[PULL 40/56] tcg/optimize: Split out fold_sub_to_neg
+[PULL 10/20] plugins: Use emit_before_op for PLUGIN_GEN_FROM_MEM
-Even though there is only one user, place this more complex
+Introduce a new plugin_mem_cb op to hold the address temp
-conversion into its own helper.
+and meminfo computed by tcg-op-ldst.c.  Because this now
 has its own opcode, we no longer need PLUGIN_GEN_FROM_MEM.
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
+Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/optimize.c | 89 ++++++++++++++++++++++++++------------------------
+ include/exec/plugin-gen.h   |   4 -
-file changed, 47 insertions(+), 42 deletions(-)
+ include/tcg/tcg-op-common.h |   1 +
  include/tcg/tcg-opc.h       |   1 +
  accel/tcg/plugin-gen.c      | 408 ++++--------------------------------
  tcg/tcg-op-ldst.c           |   6 +-
  tcg/tcg-op.c                |   5 +
 files changed, 54 insertions(+), 371 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
+diff --git a/include/exec/plugin-gen.h b/include/exec/plugin-gen.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
+--- a/include/exec/plugin-gen.h
-+++ b/tcg/optimize.c
++++ b/include/exec/plugin-gen.h
-@@ -XXX,XX +XXX,XX @@ static bool fold_nand(OptContext *ctx, TCGOp *op)
+@@ -XXX,XX +XXX,XX @@ void plugin_gen_insn_start(CPUState *cpu, const struct DisasContextBase *db);
+ void plugin_gen_insn_end(void);
- static bool fold_neg(OptContext *ctx, TCGOp *op)
  void plugin_gen_disable_mem_helpers(void);
 -void plugin_gen_empty_mem_callback(TCGv_i64 addr, uint32_t info);
  #else /* !CONFIG_PLUGIN */
@@ -XXX,XX +XXX,XX @@ static inline void plugin_gen_tb_end(CPUState *cpu, size_t num_insns)
  static inline void plugin_gen_disable_mem_helpers(void)
  { }
 -static inline void plugin_gen_empty_mem_callback(TCGv_i64 addr, uint32_t info)
 -{ }
 -
  #endif /* CONFIG_PLUGIN */
  #endif /* QEMU_PLUGIN_GEN_H */
 diff --git a/include/tcg/tcg-op-common.h b/include/tcg/tcg-op-common.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/tcg/tcg-op-common.h
 +++ b/include/tcg/tcg-op-common.h
@@ -XXX,XX +XXX,XX @@ void tcg_gen_goto_tb(unsigned idx);
  void tcg_gen_lookup_and_goto_ptr(void);
  void tcg_gen_plugin_cb(unsigned from);
 +void tcg_gen_plugin_mem_cb(TCGv_i64 addr, unsigned meminfo);
  void tcg_gen_plugin_cb_start(unsigned from, unsigned type, unsigned wr);
  void tcg_gen_plugin_cb_end(void);
 diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/tcg/tcg-opc.h
 +++ b/include/tcg/tcg-opc.h
@@ -XXX,XX +XXX,XX @@ DEF(goto_tb, 0, 0, 1, TCG_OPF_BB_EXIT | TCG_OPF_BB_END)
  DEF(goto_ptr, 0, 1, 0, TCG_OPF_BB_EXIT | TCG_OPF_BB_END)
  DEF(plugin_cb, 0, 0, 1, TCG_OPF_NOT_PRESENT)
 +DEF(plugin_mem_cb, 0, 1, 1, TCG_OPF_NOT_PRESENT)
  DEF(plugin_cb_start, 0, 0, 3, TCG_OPF_NOT_PRESENT)
  DEF(plugin_cb_end, 0, 0, 0, TCG_OPF_NOT_PRESENT)
 diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/tcg/plugin-gen.c
 +++ b/accel/tcg/plugin-gen.c
@@ -XXX,XX +XXX,XX @@
  enum plugin_gen_from {
      PLUGIN_GEN_FROM_TB,
      PLUGIN_GEN_FROM_INSN,
 -    PLUGIN_GEN_FROM_MEM,
      PLUGIN_GEN_AFTER_INSN,
      PLUGIN_GEN_AFTER_TB,
      PLUGIN_GEN_N_FROMS,
@@ -XXX,XX +XXX,XX @@ void HELPER(plugin_vcpu_mem_cb)(unsigned int vcpu_index,
                                  void *userdata)
  { }
 -/*
 - * For now we only support addi_i64.
 - * When we support more ops, we can generate one empty inline cb for each.
 - */
 -static void gen_empty_inline_cb(void)
 -{
 -    TCGv_i32 cpu_index = tcg_temp_ebb_new_i32();
 -    TCGv_ptr cpu_index_as_ptr = tcg_temp_ebb_new_ptr();
 -    TCGv_i64 val = tcg_temp_ebb_new_i64();
 -    TCGv_ptr ptr = tcg_temp_ebb_new_ptr();
 -
 -    tcg_gen_ld_i32(cpu_index, tcg_env,
 -                   -offsetof(ArchCPU, env) + offsetof(CPUState, cpu_index));
 -    /* second operand will be replaced by immediate value */
 -    tcg_gen_mul_i32(cpu_index, cpu_index, cpu_index);
 -    tcg_gen_ext_i32_ptr(cpu_index_as_ptr, cpu_index);
 -
 -    tcg_gen_movi_ptr(ptr, 0);
 -    tcg_gen_add_ptr(ptr, ptr, cpu_index_as_ptr);
 -    tcg_gen_ld_i64(val, ptr, 0);
 -    /* second operand will be replaced by immediate value */
 -    tcg_gen_add_i64(val, val, val);
 -
 -    tcg_gen_st_i64(val, ptr, 0);
 -    tcg_temp_free_ptr(ptr);
 -    tcg_temp_free_i64(val);
 -    tcg_temp_free_ptr(cpu_index_as_ptr);
 -    tcg_temp_free_i32(cpu_index);
 -}
 -
 -static void gen_empty_mem_cb(TCGv_i64 addr, uint32_t info)
 -{
 -    TCGv_i32 cpu_index = tcg_temp_ebb_new_i32();
 -    TCGv_i32 meminfo = tcg_temp_ebb_new_i32();
 -    TCGv_ptr udata = tcg_temp_ebb_new_ptr();
 -
 -    tcg_gen_movi_i32(meminfo, info);
 -    tcg_gen_movi_ptr(udata, 0);
 -    tcg_gen_ld_i32(cpu_index, tcg_env,
 -                   -offsetof(ArchCPU, env) + offsetof(CPUState, cpu_index));
 -
 -    gen_helper_plugin_vcpu_mem_cb(cpu_index, meminfo, addr, udata);
 -
 -    tcg_temp_free_ptr(udata);
 -    tcg_temp_free_i32(meminfo);
 -    tcg_temp_free_i32(cpu_index);
 -}
 -
 -static void gen_plugin_cb_start(enum plugin_gen_from from,
 -                                enum plugin_gen_cb type, unsigned wr)
 -{
 -    tcg_gen_plugin_cb_start(from, type, wr);
 -}
 -
  static void plugin_gen_empty_callback(enum plugin_gen_from from)
  {
--    return fold_const1(ctx, op);
+     switch (from) {
-+    if (fold_const1(ctx, op)) {
+@@ -XXX,XX +XXX,XX @@ static void plugin_gen_empty_callback(enum plugin_gen_from from)
-+        return true;
+     }
 +    }
 +    /*
 +     * Because of fold_sub_to_neg, we want to always return true,
 +     * via finish_folding.
 +     */
 +    finish_folding(ctx, op);
 +    return true;
  }
- static bool fold_nor(OptContext *ctx, TCGOp *op)
+-void plugin_gen_empty_mem_callback(TCGv_i64 addr, uint32_t info)
-@@ -XXX,XX +XXX,XX @@ static bool fold_shift(OptContext *ctx, TCGOp *op)
+-{
-     return fold_const2(ctx, op);
+-    enum qemu_plugin_mem_rw rw = get_plugin_meminfo_rw(info);
 -
 -    gen_plugin_cb_start(PLUGIN_GEN_FROM_MEM, PLUGIN_GEN_CB_MEM, rw);
 -    gen_empty_mem_cb(addr, info);
 -    tcg_gen_plugin_cb_end();
 -
 -    gen_plugin_cb_start(PLUGIN_GEN_FROM_MEM, PLUGIN_GEN_CB_INLINE, rw);
 -    gen_empty_inline_cb();
 -    tcg_gen_plugin_cb_end();
 -}
 -
 -static TCGOp *find_op(TCGOp *op, TCGOpcode opc)
 -{
 -    while (op) {
 -        if (op->opc == opc) {
 -            return op;
 -        }
 -        op = QTAILQ_NEXT(op, link);
 -    }
 -    return NULL;
 -}
 -
 -static TCGOp *rm_ops_range(TCGOp *begin, TCGOp *end)
 -{
 -    TCGOp *ret = QTAILQ_NEXT(end, link);
 -
 -    QTAILQ_REMOVE_SEVERAL(&tcg_ctx->ops, begin, end, link);
 -    return ret;
 -}
 -
 -/* remove all ops until (and including) plugin_cb_end */
 -static TCGOp *rm_ops(TCGOp *op)
 -{
 -    TCGOp *end_op = find_op(op, INDEX_op_plugin_cb_end);
 -
 -    tcg_debug_assert(end_op);
 -    return rm_ops_range(op, end_op);
 -}
 -
 -static TCGOp *copy_op_nocheck(TCGOp **begin_op, TCGOp *op)
 -{
 -    TCGOp *old_op = QTAILQ_NEXT(*begin_op, link);
 -    unsigned nargs = old_op->nargs;
 -
 -    *begin_op = old_op;
 -    op = tcg_op_insert_after(tcg_ctx, op, old_op->opc, nargs);
 -    memcpy(op->args, old_op->args, sizeof(op->args[0]) * nargs);
 -
 -    return op;
 -}
 -
 -static TCGOp *copy_op(TCGOp **begin_op, TCGOp *op, TCGOpcode opc)
 -{
 -    op = copy_op_nocheck(begin_op, op);
 -    tcg_debug_assert((*begin_op)->opc == opc);
 -    return op;
 -}
 -
 -static TCGOp *copy_const_ptr(TCGOp **begin_op, TCGOp *op, void *ptr)
 -{
 -    if (UINTPTR_MAX == UINT32_MAX) {
 -        /* mov_i32 */
 -        op = copy_op(begin_op, op, INDEX_op_mov_i32);
 -        op->args[1] = tcgv_i32_arg(tcg_constant_i32((uintptr_t)ptr));
 -    } else {
 -        /* mov_i64 */
 -        op = copy_op(begin_op, op, INDEX_op_mov_i64);
 -        op->args[1] = tcgv_i64_arg(tcg_constant_i64((uintptr_t)ptr));
 -    }
 -    return op;
 -}
 -
 -static TCGOp *copy_ld_i32(TCGOp **begin_op, TCGOp *op)
 -{
 -    return copy_op(begin_op, op, INDEX_op_ld_i32);
 -}
 -
 -static TCGOp *copy_ext_i32_ptr(TCGOp **begin_op, TCGOp *op)
 -{
 -    if (UINTPTR_MAX == UINT32_MAX) {
 -        op = copy_op(begin_op, op, INDEX_op_mov_i32);
 -    } else {
 -        op = copy_op(begin_op, op, INDEX_op_ext_i32_i64);
 -    }
 -    return op;
 -}
 -
 -static TCGOp *copy_add_ptr(TCGOp **begin_op, TCGOp *op)
 -{
 -    if (UINTPTR_MAX == UINT32_MAX) {
 -        op = copy_op(begin_op, op, INDEX_op_add_i32);
 -    } else {
 -        op = copy_op(begin_op, op, INDEX_op_add_i64);
 -    }
 -    return op;
 -}
 -
 -static TCGOp *copy_ld_i64(TCGOp **begin_op, TCGOp *op)
 -{
 -    if (TCG_TARGET_REG_BITS == 32) {
 -        /* 2x ld_i32 */
 -        op = copy_ld_i32(begin_op, op);
 -        op = copy_ld_i32(begin_op, op);
 -    } else {
 -        /* ld_i64 */
 -        op = copy_op(begin_op, op, INDEX_op_ld_i64);
 -    }
 -    return op;
 -}
 -
 -static TCGOp *copy_st_i64(TCGOp **begin_op, TCGOp *op)
 -{
 -    if (TCG_TARGET_REG_BITS == 32) {
 -        /* 2x st_i32 */
 -        op = copy_op(begin_op, op, INDEX_op_st_i32);
 -        op = copy_op(begin_op, op, INDEX_op_st_i32);
 -    } else {
 -        /* st_i64 */
 -        op = copy_op(begin_op, op, INDEX_op_st_i64);
 -    }
 -    return op;
 -}
 -
 -static TCGOp *copy_add_i64(TCGOp **begin_op, TCGOp *op, uint64_t v)
 -{
 -    if (TCG_TARGET_REG_BITS == 32) {
 -        /* all 32-bit backends must implement add2_i32 */
 -        g_assert(TCG_TARGET_HAS_add2_i32);
 -        op = copy_op(begin_op, op, INDEX_op_add2_i32);
 -        op->args[4] = tcgv_i32_arg(tcg_constant_i32(v));
 -        op->args[5] = tcgv_i32_arg(tcg_constant_i32(v >> 32));
 -    } else {
 -        op = copy_op(begin_op, op, INDEX_op_add_i64);
 -        op->args[2] = tcgv_i64_arg(tcg_constant_i64(v));
 -    }
 -    return op;
 -}
 -
 -static TCGOp *copy_mul_i32(TCGOp **begin_op, TCGOp *op, uint32_t v)
 -{
 -    op = copy_op(begin_op, op, INDEX_op_mul_i32);
 -    op->args[2] = tcgv_i32_arg(tcg_constant_i32(v));
 -    return op;
 -}
 -
 -static TCGOp *copy_call(TCGOp **begin_op, TCGOp *op, void *func, int *cb_idx)
 -{
 -    TCGOp *old_op;
 -    int func_idx;
 -
 -    /* copy all ops until the call */
 -    do {
 -        op = copy_op_nocheck(begin_op, op);
 -    } while (op->opc != INDEX_op_call);
 -
 -    /* fill in the op call */
 -    old_op = *begin_op;
 -    TCGOP_CALLI(op) = TCGOP_CALLI(old_op);
 -    TCGOP_CALLO(op) = TCGOP_CALLO(old_op);
 -    tcg_debug_assert(op->life == 0);
 -
 -    func_idx = TCGOP_CALLO(op) + TCGOP_CALLI(op);
 -    *cb_idx = func_idx;
 -    op->args[func_idx] = (uintptr_t)func;
 -
 -    return op;
 -}
 -
 -static TCGOp *append_inline_cb(const struct qemu_plugin_dyn_cb *cb,
 -                               TCGOp *begin_op, TCGOp *op,
 -                               int *unused)
 -{
 -    char *ptr = cb->inline_insn.entry.score->data->data;
 -    size_t elem_size = g_array_get_element_size(
 -        cb->inline_insn.entry.score->data);
 -    size_t offset = cb->inline_insn.entry.offset;
 -
 -    op = copy_ld_i32(&begin_op, op);
 -    op = copy_mul_i32(&begin_op, op, elem_size);
 -    op = copy_ext_i32_ptr(&begin_op, op);
 -    op = copy_const_ptr(&begin_op, op, ptr + offset);
 -    op = copy_add_ptr(&begin_op, op);
 -    op = copy_ld_i64(&begin_op, op);
 -    op = copy_add_i64(&begin_op, op, cb->inline_insn.imm);
 -    op = copy_st_i64(&begin_op, op);
 -    return op;
 -}
 -
 -static TCGOp *append_mem_cb(const struct qemu_plugin_dyn_cb *cb,
 -                            TCGOp *begin_op, TCGOp *op, int *cb_idx)
 -{
 -    enum plugin_gen_cb type = begin_op->args[1];
 -
 -    tcg_debug_assert(type == PLUGIN_GEN_CB_MEM);
 -
 -    /* const_i32 == mov_i32 ("info", so it remains as is) */
 -    op = copy_op(&begin_op, op, INDEX_op_mov_i32);
 -
 -    /* const_ptr */
 -    op = copy_const_ptr(&begin_op, op, cb->userp);
 -
 -    /* copy the ld_i32, but note that we only have to copy it once */
 -    if (*cb_idx == -1) {
 -        op = copy_op(&begin_op, op, INDEX_op_ld_i32);
 -    } else {
 -        begin_op = QTAILQ_NEXT(begin_op, link);
 -        tcg_debug_assert(begin_op && begin_op->opc == INDEX_op_ld_i32);
 -    }
 -
 -    if (type == PLUGIN_GEN_CB_MEM) {
 -        /* call */
 -        op = copy_call(&begin_op, op, cb->regular.f.vcpu_udata, cb_idx);
 -    }
 -
 -    return op;
 -}
 -
 -typedef TCGOp *(*inject_fn)(const struct qemu_plugin_dyn_cb *cb,
 -                            TCGOp *begin_op, TCGOp *op, int *intp);
 -typedef bool (*op_ok_fn)(const TCGOp *op, const struct qemu_plugin_dyn_cb *cb);
 -
 -static bool op_rw(const TCGOp *op, const struct qemu_plugin_dyn_cb *cb)
 -{
 -    int w;
 -
 -    w = op->args[2];
 -    return !!(cb->rw & (w + 1));
 -}
 -
 -static void inject_cb_type(const GArray *cbs, TCGOp *begin_op,
 -                           inject_fn inject, op_ok_fn ok)
 -{
 -    TCGOp *end_op;
 -    TCGOp *op;
 -    int cb_idx = -1;
 -    int i;
 -
 -    if (!cbs || cbs->len == 0) {
 -        rm_ops(begin_op);
 -        return;
 -    }
 -
 -    end_op = find_op(begin_op, INDEX_op_plugin_cb_end);
 -    tcg_debug_assert(end_op);
 -
 -    op = end_op;
 -    for (i = 0; i < cbs->len; i++) {
 -        struct qemu_plugin_dyn_cb *cb =
 -            &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
 -
 -        if (!ok(begin_op, cb)) {
 -            continue;
 -        }
 -        op = inject(cb, begin_op, op, &cb_idx);
 -    }
 -    rm_ops_range(begin_op, end_op);
 -}
 -
 -static void
 -inject_inline_cb(const GArray *cbs, TCGOp *begin_op, op_ok_fn ok)
 -{
 -    inject_cb_type(cbs, begin_op, append_inline_cb, ok);
 -}
 -
 -static void
 -inject_mem_cb(const GArray *cbs, TCGOp *begin_op)
 -{
 -    inject_cb_type(cbs, begin_op, append_mem_cb, op_rw);
 -}
 -
  /* called before finishing a TB with exit_tb, goto_tb or goto_ptr */
  void plugin_gen_disable_mem_helpers(void)
  {
@@ -XXX,XX +XXX,XX @@ void plugin_gen_disable_mem_helpers(void)
      }
  }
-+static bool fold_sub_to_neg(OptContext *ctx, TCGOp *op)
+-static void plugin_gen_mem_regular(const struct qemu_plugin_tb *ptb,
 -                                   TCGOp *begin_op, int insn_idx)
 -{
 -    struct qemu_plugin_insn *insn = g_ptr_array_index(ptb->insns, insn_idx);
 -    inject_mem_cb(insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR], begin_op);
 -}
 -
 -static void plugin_gen_mem_inline(const struct qemu_plugin_tb *ptb,
 -                                  TCGOp *begin_op, int insn_idx)
 -{
 -    const GArray *cbs;
 -    struct qemu_plugin_insn *insn = g_ptr_array_index(ptb->insns, insn_idx);
 -
 -    cbs = insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE];
 -    inject_inline_cb(cbs, begin_op, op_rw);
 -}
 -
  static void gen_enable_mem_helper(struct qemu_plugin_tb *ptb,
                                    struct qemu_plugin_insn *insn)
  {
@@ -XXX,XX +XXX,XX @@ static void gen_inline_cb(struct qemu_plugin_dyn_cb *cb)
      tcg_temp_free_ptr(ptr);
  }
 +static void gen_mem_cb(struct qemu_plugin_dyn_cb *cb,
 +                       qemu_plugin_meminfo_t meminfo, TCGv_i64 addr)
 +{
-+    TCGOpcode neg_op;
++    TCGv_i32 cpu_index = tcg_temp_ebb_new_i32();
 +    bool have_neg;
 +
-+    if (!arg_is_const(op->args[1]) || arg_info(op->args[1])->val != 0) {
++    tcg_gen_ld_i32(cpu_index, tcg_env,
-+        return false;
++                   -offsetof(ArchCPU, env) + offsetof(CPUState, cpu_index));
-+    }
++    tcg_gen_call4(cb->regular.f.vcpu_mem, cb->regular.info, NULL,
-+
++                  tcgv_i32_temp(cpu_index),
-+    switch (ctx->type) {
++                  tcgv_i32_temp(tcg_constant_i32(meminfo)),
-+    case TCG_TYPE_I32:
++                  tcgv_i64_temp(addr),
-+        neg_op = INDEX_op_neg_i32;
++                  tcgv_ptr_temp(tcg_constant_ptr(cb->userp)));
-+        have_neg = TCG_TARGET_HAS_neg_i32;
++    tcg_temp_free_i32(cpu_index);
 +        break;
 +    case TCG_TYPE_I64:
 +        neg_op = INDEX_op_neg_i64;
 +        have_neg = TCG_TARGET_HAS_neg_i64;
 +        break;
 +    case TCG_TYPE_V64:
 +    case TCG_TYPE_V128:
 +    case TCG_TYPE_V256:
 +        neg_op = INDEX_op_neg_vec;
 +        have_neg = (TCG_TARGET_HAS_neg_vec &&
 +                    tcg_can_emit_vec_op(neg_op, ctx->type, TCGOP_VECE(op)) > 0);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +    if (have_neg) {
 +        op->opc = neg_op;
 +        op->args[1] = op->args[2];
 +        return fold_neg(ctx, op);
 +    }
 +    return false;
 +}
 +
- static bool fold_sub(OptContext *ctx, TCGOp *op)
+ /* #define DEBUG_PLUGIN_GEN_OPS */
  static void pr_ops(void)
  {
-     if (fold_const2(ctx, op) ||
+@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
 -        fold_xx_to_i(ctx, op, 0)) {
 +        fold_xx_to_i(ctx, op, 0) ||
 +        fold_sub_to_neg(ctx, op)) {
          return true;
      }
      return false;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                  continue;
              }
              break;
--        CASE_OP_32_64_VEC(sub):
+         }
 -        case INDEX_op_plugin_cb_start:
 +        case INDEX_op_plugin_mem_cb:
          {
 -            enum plugin_gen_from from = op->args[0];
 -            enum plugin_gen_cb type = op->args[1];
 +            TCGv_i64 addr = temp_tcgv_i64(arg_temp(op->args[0]));
 +            qemu_plugin_meminfo_t meminfo = op->args[1];
 +            struct qemu_plugin_insn *insn;
 +            const GArray *cbs;
 +            int i, n, rw;
 -            switch (from) {
 -            case PLUGIN_GEN_FROM_MEM:
 -            {
--                TCGOpcode neg_op;
+-                g_assert(insn_idx >= 0);
--                bool have_neg;
++            assert(insn_idx >= 0);
--
++            insn = g_ptr_array_index(plugin_tb->insns, insn_idx);
--                if (arg_is_const(op->args[2])) {
++            rw = qemu_plugin_mem_is_store(meminfo) ? 2 : 1;
--                    /* Proceed with possible constant folding. */
 -                switch (type) {
 -                case PLUGIN_GEN_CB_MEM:
 -                    plugin_gen_mem_regular(plugin_tb, op, insn_idx);
 -                    break;
--                }
+-                case PLUGIN_GEN_CB_INLINE:
--                switch (ctx.type) {
+-                    plugin_gen_mem_inline(plugin_tb, op, insn_idx);
 -                case TCG_TYPE_I32:
 -                    neg_op = INDEX_op_neg_i32;
 -                    have_neg = TCG_TARGET_HAS_neg_i32;
 -                    break;
 -                case TCG_TYPE_I64:
 -                    neg_op = INDEX_op_neg_i64;
 -                    have_neg = TCG_TARGET_HAS_neg_i64;
 -                    break;
 -                case TCG_TYPE_V64:
 -                case TCG_TYPE_V128:
 -                case TCG_TYPE_V256:
 -                    neg_op = INDEX_op_neg_vec;
 -                    have_neg = tcg_can_emit_vec_op(neg_op, ctx.type,
 -                                                   TCGOP_VECE(op)) > 0;
 -                    break;
 -                default:
 -                    g_assert_not_reached();
--                }
++            tcg_ctx->emit_before_op = op;
--                if (!have_neg) {
++
--                    break;
++            cbs = insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR];
--                }
++            for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
--                if (arg_is_const(op->args[1])
++                struct qemu_plugin_dyn_cb *cb =
--                    && arg_info(op->args[1])->val == 0) {
++                    &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
--                    op->opc = neg_op;
++                if (cb->rw & rw) {
--                    reset_temp(op->args[0]);
++                    gen_mem_cb(cb, meminfo, addr);
--                    op->args[1] = op->args[2];
+                 }
--                    continue;
++            }
--                }
 -                break;
 -            }
--            break;
+-            default:
-         default:
+-                g_assert_not_reached();
 +            cbs = insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE];
 +            for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
 +                struct qemu_plugin_dyn_cb *cb =
 +                    &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
 +                if (cb->rw & rw) {
 +                    gen_inline_cb(cb);
 +                }
              }
 +
 +            tcg_ctx->emit_before_op = NULL;
 +            tcg_op_remove(tcg_ctx, op);
              break;
          }
++
+         default:
+             /* plugins don't care about any other ops */
+             break;
+diff --git a/tcg/tcg-op-ldst.c b/tcg/tcg-op-ldst.c
+index XXXXXXX..XXXXXXX 100644
+--- a/tcg/tcg-op-ldst.c
++++ b/tcg/tcg-op-ldst.c
+@@ -XXX,XX +XXX,XX @@ plugin_gen_mem_callbacks(TCGv_i64 copy_addr, TCGTemp *orig_addr, MemOpIdx oi,
+                 copy_addr = tcg_temp_ebb_new_i64();
+                 tcg_gen_extu_i32_i64(copy_addr, temp_tcgv_i32(orig_addr));
+             }
+-            plugin_gen_empty_mem_callback(copy_addr, info);
++            tcg_gen_plugin_mem_cb(copy_addr, info);
+             tcg_temp_free_i64(copy_addr);
+         } else {
+             if (copy_addr) {
+-                plugin_gen_empty_mem_callback(copy_addr, info);
++                tcg_gen_plugin_mem_cb(copy_addr, info);
+                 tcg_temp_free_i64(copy_addr);
+             } else {
+-                plugin_gen_empty_mem_callback(temp_tcgv_i64(orig_addr), info);
++                tcg_gen_plugin_mem_cb(temp_tcgv_i64(orig_addr), info);
+             }
+         }
+     }
+diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
+index XXXXXXX..XXXXXXX 100644
+--- a/tcg/tcg-op.c
++++ b/tcg/tcg-op.c
+@@ -XXX,XX +XXX,XX @@ void tcg_gen_plugin_cb(unsigned from)
+     tcg_gen_op1(INDEX_op_plugin_cb, from);
+ }
++void tcg_gen_plugin_mem_cb(TCGv_i64 addr, unsigned meminfo)
++{
++    tcg_gen_op2(INDEX_op_plugin_mem_cb, tcgv_i64_arg(addr), meminfo);
++}
++
+ void tcg_gen_plugin_cb_start(unsigned from, unsigned type, unsigned wr)
+ {
+     tcg_gen_op3(INDEX_op_plugin_cb_start, from, type, wr);
 --
-.25.1
+.34.1

-[PULL 54/56] tcg/optimize: Propagate sign info for setcond
+[PULL 11/20] plugins: Remove plugin helpers
-The result is either 0 or 1, which means that we have
+These placeholder helpers are no longer required.
 a 2 bit signed result, and thus 62 bits of sign.
 For clarity, use the smask_from_zmask function.
 Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/optimize.c | 2 ++
+ accel/tcg/plugin-helpers.h         |  5 -----
-file changed, 2 insertions(+)
+ include/exec/helper-gen-common.h   |  4 ----
  include/exec/helper-proto-common.h |  4 ----
  accel/tcg/plugin-gen.c             | 20 --------------------
 files changed, 33 deletions(-)
  delete mode 100644 accel/tcg/plugin-helpers.h
-diff --git a/tcg/optimize.c b/tcg/optimize.c
+diff --git a/accel/tcg/plugin-helpers.h b/accel/tcg/plugin-helpers.h
 deleted file mode 100644
 index XXXXXXX..XXXXXXX
 --- a/accel/tcg/plugin-helpers.h
 +++ /dev/null
@@ -XXX,XX +XXX,XX @@
 -#ifdef CONFIG_PLUGIN
 -DEF_HELPER_FLAGS_2(plugin_vcpu_udata_cb_no_wg, TCG_CALL_NO_WG | TCG_CALL_PLUGIN, void, i32, ptr)
 -DEF_HELPER_FLAGS_2(plugin_vcpu_udata_cb_no_rwg, TCG_CALL_NO_RWG | TCG_CALL_PLUGIN, void, i32, ptr)
 -DEF_HELPER_FLAGS_4(plugin_vcpu_mem_cb, TCG_CALL_NO_RWG | TCG_CALL_PLUGIN, void, i32, i32, i64, ptr)
 -#endif
 diff --git a/include/exec/helper-gen-common.h b/include/exec/helper-gen-common.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
+--- a/include/exec/helper-gen-common.h
-+++ b/tcg/optimize.c
++++ b/include/exec/helper-gen-common.h
-@@ -XXX,XX +XXX,XX @@ static bool fold_setcond(OptContext *ctx, TCGOp *op)
+@@ -XXX,XX +XXX,XX @@
-     }
+ #include "exec/helper-gen.h.inc"
+ #undef  HELPER_H
-     ctx->z_mask = 1;
-+    ctx->s_mask = smask_from_zmask(1);
+-#define HELPER_H "accel/tcg/plugin-helpers.h"
-     return false;
+-#include "exec/helper-gen.h.inc"
- }
+-#undef  HELPER_H
+-
-@@ -XXX,XX +XXX,XX @@ static bool fold_setcond2(OptContext *ctx, TCGOp *op)
+ #endif /* HELPER_GEN_COMMON_H */
-     }
+diff --git a/include/exec/helper-proto-common.h b/include/exec/helper-proto-common.h
+index XXXXXXX..XXXXXXX 100644
-     ctx->z_mask = 1;
+--- a/include/exec/helper-proto-common.h
-+    ctx->s_mask = smask_from_zmask(1);
++++ b/include/exec/helper-proto-common.h
-     return false;
+@@ -XXX,XX +XXX,XX @@
+ #include "exec/helper-proto.h.inc"
-  do_setcond_const:
+ #undef  HELPER_H
 -#define HELPER_H "accel/tcg/plugin-helpers.h"
 -#include "exec/helper-proto.h.inc"
 -#undef  HELPER_H
 -
  #endif /* HELPER_PROTO_COMMON_H */
 diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/tcg/plugin-gen.c
 +++ b/accel/tcg/plugin-gen.c
@@ -XXX,XX +XXX,XX @@
  #include "exec/exec-all.h"
  #include "exec/plugin-gen.h"
  #include "exec/translator.h"
 -#include "exec/helper-proto-common.h"
 -
 -#define HELPER_H  "accel/tcg/plugin-helpers.h"
 -#include "exec/helper-info.c.inc"
 -#undef  HELPER_H
  /*
   * plugin_cb_start TCG op args[]:
@@ -XXX,XX +XXX,XX @@ enum plugin_gen_cb {
      PLUGIN_GEN_N_CBS,
  };
 -/*
 - * These helpers are stubs that get dynamically switched out for calls
 - * direct to the plugin if they are subscribed to.
 - */
 -void HELPER(plugin_vcpu_udata_cb_no_wg)(uint32_t cpu_index, void *udata)
 -{ }
 -
 -void HELPER(plugin_vcpu_udata_cb_no_rwg)(uint32_t cpu_index, void *udata)
 -{ }
 -
 -void HELPER(plugin_vcpu_mem_cb)(unsigned int vcpu_index,
 -                                qemu_plugin_meminfo_t info, uint64_t vaddr,
 -                                void *userdata)
 -{ }
 -
  static void plugin_gen_empty_callback(enum plugin_gen_from from)
  {
      switch (from) {
 --
-.25.1
+.34.1

-[PULL 04/56] host-utils: add 128-bit quotient support to divu128/divs128
+[PULL 12/20] tcg: Remove TCG_CALL_PLUGIN
-From: Luis Pires <luis.pires@eldorado.org.br>
+Since we no longer emit plugin helpers during the initial code
 translation phase, we don't need to specially mark plugin helpers.
-These will be used to implement new decimal floating point
+Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
 instructions from Power ISA 3.1.
 The remainder is now returned directly by divu128/divs128,
 freeing up phigh to receive the high 64 bits of the quotient.
 Signed-off-by: Luis Pires <luis.pires@eldorado.org.br>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-Id: <20211025191154.350831-4-luis.pires@eldorado.org.br>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- include/hw/clock.h        |   6 +-
+ include/tcg/tcg.h |  2 --
- include/qemu/host-utils.h |  20 ++++--
+ plugins/core.c    | 10 ++++------
- target/ppc/int_helper.c   |   9 +--
+ tcg/tcg.c         |  4 +---
- util/host-utils.c         | 133 +++++++++++++++++++++++++-------------
+files changed, 5 insertions(+), 11 deletions(-)
 files changed, 108 insertions(+), 60 deletions(-)
-diff --git a/include/hw/clock.h b/include/hw/clock.h
+diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/clock.h
+--- a/include/tcg/tcg.h
-+++ b/include/hw/clock.h
++++ b/include/tcg/tcg.h
-@@ -XXX,XX +XXX,XX @@ static inline uint64_t clock_ns_to_ticks(const Clock *clk, uint64_t ns)
+@@ -XXX,XX +XXX,XX @@ typedef TCGv_ptr TCGv_env;
-     if (clk->period == 0) {
+ #define TCG_CALL_NO_SIDE_EFFECTS    0x0004
-         return 0;
+ /* Helper is G_NORETURN.  */
  #define TCG_CALL_NO_RETURN          0x0008
 -/* Helper is part of Plugins.  */
 -#define TCG_CALL_PLUGIN             0x0010
  /* convenience version of most used call flags */
  #define TCG_CALL_NO_RWG         TCG_CALL_NO_READ_GLOBALS
 diff --git a/plugins/core.c b/plugins/core.c
 index XXXXXXX..XXXXXXX 100644
 --- a/plugins/core.c
 +++ b/plugins/core.c
@@ -XXX,XX +XXX,XX @@ void plugin_register_dyn_cb__udata(GArray **arr,
                                     void *udata)
  {
      static TCGHelperInfo info[3] = {
 -        [QEMU_PLUGIN_CB_NO_REGS].flags = TCG_CALL_NO_RWG | TCG_CALL_PLUGIN,
 -        [QEMU_PLUGIN_CB_R_REGS].flags = TCG_CALL_NO_WG | TCG_CALL_PLUGIN,
 -        [QEMU_PLUGIN_CB_RW_REGS].flags = TCG_CALL_PLUGIN,
 +        [QEMU_PLUGIN_CB_NO_REGS].flags = TCG_CALL_NO_RWG,
 +        [QEMU_PLUGIN_CB_R_REGS].flags = TCG_CALL_NO_WG,
          /*
           * Match qemu_plugin_vcpu_udata_cb_t:
           *   void (*)(uint32_t, void *)
@@ -XXX,XX +XXX,XX @@ void plugin_register_vcpu_mem_cb(GArray **arr,
          !__builtin_types_compatible_p(qemu_plugin_meminfo_t, int32_t));
      static TCGHelperInfo info[3] = {
 -        [QEMU_PLUGIN_CB_NO_REGS].flags = TCG_CALL_NO_RWG | TCG_CALL_PLUGIN,
 -        [QEMU_PLUGIN_CB_R_REGS].flags = TCG_CALL_NO_WG | TCG_CALL_PLUGIN,
 -        [QEMU_PLUGIN_CB_RW_REGS].flags = TCG_CALL_PLUGIN,
 +        [QEMU_PLUGIN_CB_NO_REGS].flags = TCG_CALL_NO_RWG,
 +        [QEMU_PLUGIN_CB_R_REGS].flags = TCG_CALL_NO_WG,
          /*
           * Match qemu_plugin_vcpu_mem_cb_t:
           *   void (*)(uint32_t, qemu_plugin_meminfo_t, uint64_t, void *)
 diff --git a/tcg/tcg.c b/tcg/tcg.c
 index XXXXXXX..XXXXXXX 100644
 --- a/tcg/tcg.c
 +++ b/tcg/tcg.c
@@ -XXX,XX +XXX,XX @@ static void tcg_gen_callN(void *func, TCGHelperInfo *info,
  #ifdef CONFIG_PLUGIN
      /* Flag helpers that may affect guest state */
 -    if (tcg_ctx->plugin_insn &&
 -        !(info->flags & TCG_CALL_PLUGIN) &&
 -        !(info->flags & TCG_CALL_NO_SIDE_EFFECTS)) {
 +    if (tcg_ctx->plugin_insn && !(info->flags & TCG_CALL_NO_SIDE_EFFECTS)) {
          tcg_ctx->plugin_insn->calls_helpers = true;
      }
--    /*
--     * BUG: when CONFIG_INT128 is not defined, the current implementation of
--     * divu128 does not return a valid truncated quotient, so the result will
--     * be wrong.
--     */
-+
-     divu128(&lo, &hi, clk->period);
-     return lo;
- }
-diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/qemu/host-utils.h
-+++ b/include/qemu/host-utils.h
-@@ -XXX,XX +XXX,XX @@ static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
-     return (__int128_t)a * b / c;
- }
--static inline void divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor)
-+static inline uint64_t divu128(uint64_t *plow, uint64_t *phigh,
-+                               uint64_t divisor)
- {
-     __uint128_t dividend = ((__uint128_t)*phigh << 64) | *plow;
-     __uint128_t result = dividend / divisor;
-+
-     *plow = result;
--    *phigh = dividend % divisor;
-+    *phigh = result >> 64;
-+    return dividend % divisor;
- }
--static inline void divs128(int64_t *plow, int64_t *phigh, int64_t divisor)
-+static inline int64_t divs128(uint64_t *plow, int64_t *phigh,
-+                              int64_t divisor)
- {
--    __int128_t dividend = ((__int128_t)*phigh << 64) | (uint64_t)*plow;
-+    __int128_t dividend = ((__int128_t)*phigh << 64) | *plow;
-     __int128_t result = dividend / divisor;
-+
-     *plow = result;
--    *phigh = dividend % divisor;
-+    *phigh = result >> 64;
-+    return dividend % divisor;
- }
- #else
- void muls64(uint64_t *plow, uint64_t *phigh, int64_t a, int64_t b);
- void mulu64(uint64_t *plow, uint64_t *phigh, uint64_t a, uint64_t b);
--void divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor);
--void divs128(int64_t *plow, int64_t *phigh, int64_t divisor);
-+uint64_t divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor);
-+int64_t divs128(uint64_t *plow, int64_t *phigh, int64_t divisor);
- static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
- {
-diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/ppc/int_helper.c
-+++ b/target/ppc/int_helper.c
-@@ -XXX,XX +XXX,XX @@ uint64_t helper_divdeu(CPUPPCState *env, uint64_t ra, uint64_t rb, uint32_t oe)
- uint64_t helper_divde(CPUPPCState *env, uint64_t rau, uint64_t rbu, uint32_t oe)
- {
--    int64_t rt = 0;
-+    uint64_t rt = 0;
-     int64_t ra = (int64_t)rau;
-     int64_t rb = (int64_t)rbu;
-     int overflow = 0;
-@@ -XXX,XX +XXX,XX @@ uint32_t helper_bcdcfsq(ppc_avr_t *r, ppc_avr_t *b, uint32_t ps)
-     int cr;
-     uint64_t lo_value;
-     uint64_t hi_value;
-+    uint64_t rem;
-     ppc_avr_t ret = { .u64 = { 0, 0 } };
-     if (b->VsrSD(0) < 0) {
-@@ -XXX,XX +XXX,XX @@ uint32_t helper_bcdcfsq(ppc_avr_t *r, ppc_avr_t *b, uint32_t ps)
-          * In that case, we leave r unchanged.
-          */
-     } else {
--        divu128(&lo_value, &hi_value, 1000000000000000ULL);
-+        rem = divu128(&lo_value, &hi_value, 1000000000000000ULL);
--        for (i = 1; i < 16; hi_value /= 10, i++) {
--            bcd_put_digit(&ret, hi_value % 10, i);
-+        for (i = 1; i < 16; rem /= 10, i++) {
-+            bcd_put_digit(&ret, rem % 10, i);
-         }
-         for (; i < 32; lo_value /= 10, i++) {
-diff --git a/util/host-utils.c b/util/host-utils.c
-index XXXXXXX..XXXXXXX 100644
---- a/util/host-utils.c
-+++ b/util/host-utils.c
-@@ -XXX,XX +XXX,XX @@ void muls64 (uint64_t *plow, uint64_t *phigh, int64_t a, int64_t b)
- }
- /*
-- * Unsigned 128-by-64 division. Returns quotient via plow and
-- * remainder via phigh.
-- * The result must fit in 64 bits (plow) - otherwise, the result
-- * is undefined.
-- * This function will cause a division by zero if passed a zero divisor.
-+ * Unsigned 128-by-64 division.
-+ * Returns the remainder.
-+ * Returns quotient via plow and phigh.
-+ * Also returns the remainder via the function return value.
-  */
--void divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor)
-+uint64_t divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor)
- {
-     uint64_t dhi = *phigh;
-     uint64_t dlo = *plow;
--    unsigned i;
--    uint64_t carry = 0;
-+    uint64_t rem, dhighest;
-+    int sh;
-     if (divisor == 0 || dhi == 0) {
-         *plow  = dlo / divisor;
--        *phigh = dlo % divisor;
-+        *phigh = 0;
-+        return dlo % divisor;
-     } else {
-+        sh = clz64(divisor);
--        for (i = 0; i < 64; i++) {
--            carry = dhi >> 63;
--            dhi = (dhi << 1) | (dlo >> 63);
--            if (carry || (dhi >= divisor)) {
--                dhi -= divisor;
--                carry = 1;
--            } else {
--                carry = 0;
-+        if (dhi < divisor) {
-+            if (sh != 0) {
-+                /* normalize the divisor, shifting the dividend accordingly */
-+                divisor <<= sh;
-+                dhi = (dhi << sh) | (dlo >> (64 - sh));
-+                dlo <<= sh;
-             }
--            dlo = (dlo << 1) | carry;
-+
-+            *phigh = 0;
-+            *plow = udiv_qrnnd(&rem, dhi, dlo, divisor);
-+        } else {
-+            if (sh != 0) {
-+                /* normalize the divisor, shifting the dividend accordingly */
-+                divisor <<= sh;
-+                dhighest = dhi >> (64 - sh);
-+                dhi = (dhi << sh) | (dlo >> (64 - sh));
-+                dlo <<= sh;
-+
-+                *phigh = udiv_qrnnd(&dhi, dhighest, dhi, divisor);
-+            } else {
-+                /**
-+                 * dhi >= divisor
-+                 * Since the MSB of divisor is set (sh == 0),
-+                 * (dhi - divisor) < divisor
-+                 *
-+                 * Thus, the high part of the quotient is 1, and we can
-+                 * calculate the low part with a single call to udiv_qrnnd
-+                 * after subtracting divisor from dhi
-+                 */
-+                dhi -= divisor;
-+                *phigh = 1;
-+            }
-+
-+            *plow = udiv_qrnnd(&rem, dhi, dlo, divisor);
-         }
--        *plow = dlo;
--        *phigh = dhi;
-+        /*
-+         * since the dividend/divisor might have been normalized,
-+         * the remainder might also have to be shifted back
-+         */
-+        return rem >> sh;
-     }
- }
- /*
-- * Signed 128-by-64 division. Returns quotient via plow and
-- * remainder via phigh.
-- * The result must fit in 64 bits (plow) - otherwise, the result
-- * is undefined.
-- * This function will cause a division by zero if passed a zero divisor.
-+ * Signed 128-by-64 division.
-+ * Returns quotient via plow and phigh.
-+ * Also returns the remainder via the function return value.
-  */
--void divs128(int64_t *plow, int64_t *phigh, int64_t divisor)
-+int64_t divs128(uint64_t *plow, int64_t *phigh, int64_t divisor)
- {
--    int sgn_dvdnd = *phigh < 0;
--    int sgn_divsr = divisor < 0;
-+    bool neg_quotient = false, neg_remainder = false;
-+    uint64_t unsig_hi = *phigh, unsig_lo = *plow;
-+    uint64_t rem;
--    if (sgn_dvdnd) {
--        *plow = ~(*plow);
--        *phigh = ~(*phigh);
--        if (*plow == (int64_t)-1) {
-+    if (*phigh < 0) {
-+        neg_quotient = !neg_quotient;
-+        neg_remainder = !neg_remainder;
-+
-+        if (unsig_lo == 0) {
-+            unsig_hi = -unsig_hi;
-+        } else {
-+            unsig_hi = ~unsig_hi;
-+            unsig_lo = -unsig_lo;
-+        }
-+    }
-+
-+    if (divisor < 0) {
-+        neg_quotient = !neg_quotient;
-+
-+        divisor = -divisor;
-+    }
-+
-+    rem = divu128(&unsig_lo, &unsig_hi, (uint64_t)divisor);
-+
-+    if (neg_quotient) {
-+        if (unsig_lo == 0) {
-+            *phigh = -unsig_hi;
-             *plow = 0;
--            (*phigh)++;
--         } else {
--            (*plow)++;
--         }
-+        } else {
-+            *phigh = ~unsig_hi;
-+            *plow = -unsig_lo;
-+        }
-+    } else {
-+        *phigh = unsig_hi;
-+        *plow = unsig_lo;
-     }
--    if (sgn_divsr) {
--        divisor = 0 - divisor;
--    }
--
--    divu128((uint64_t *)plow, (uint64_t *)phigh, (uint64_t)divisor);
--
--    if (sgn_dvdnd  ^ sgn_divsr) {
--        *plow = 0 - *plow;
-+    if (neg_remainder) {
-+        return -rem;
-+    } else {
-+        return rem;
-     }
- }
  #endif
 --
-.25.1
+.34.1

-[PULL 11/56] tcg/optimize: Split out init_arguments
+[PULL 13/20] tcg: Remove INDEX_op_plugin_cb_{start,end}
-There was no real reason for calls to have separate code here.
+These opcodes are no longer used.
 Unify init for calls vs non-calls using the call path, which
 handles TCG_CALL_DUMMY_ARG.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
 Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/optimize.c | 25 +++++++++++--------------
+ include/tcg/tcg-op-common.h |  2 --
-file changed, 11 insertions(+), 14 deletions(-)
+ include/tcg/tcg-opc.h       |  2 --
  accel/tcg/plugin-gen.c      | 18 ------------------
  tcg/tcg-op.c                | 10 ----------
 files changed, 32 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
+diff --git a/include/tcg/tcg-op-common.h b/include/tcg/tcg-op-common.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
+--- a/include/tcg/tcg-op-common.h
-+++ b/tcg/optimize.c
++++ b/include/tcg/tcg-op-common.h
-@@ -XXX,XX +XXX,XX @@ static void init_ts_info(OptContext *ctx, TCGTemp *ts)
+@@ -XXX,XX +XXX,XX @@ void tcg_gen_lookup_and_goto_ptr(void);
-     }
  void tcg_gen_plugin_cb(unsigned from);
  void tcg_gen_plugin_mem_cb(TCGv_i64 addr, unsigned meminfo);
 -void tcg_gen_plugin_cb_start(unsigned from, unsigned type, unsigned wr);
 -void tcg_gen_plugin_cb_end(void);
  /* 32 bit ops */
 diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/tcg/tcg-opc.h
 +++ b/include/tcg/tcg-opc.h
@@ -XXX,XX +XXX,XX @@ DEF(goto_ptr, 0, 1, 0, TCG_OPF_BB_EXIT | TCG_OPF_BB_END)
  DEF(plugin_cb, 0, 0, 1, TCG_OPF_NOT_PRESENT)
  DEF(plugin_mem_cb, 0, 1, 1, TCG_OPF_NOT_PRESENT)
 -DEF(plugin_cb_start, 0, 0, 3, TCG_OPF_NOT_PRESENT)
 -DEF(plugin_cb_end, 0, 0, 0, TCG_OPF_NOT_PRESENT)
  /* Replicate ld/st ops for 32 and 64-bit guest addresses. */
  DEF(qemu_ld_a32_i32, 1, 1, 1,
 diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/tcg/plugin-gen.c
 +++ b/accel/tcg/plugin-gen.c
@@ -XXX,XX +XXX,XX @@
  #include "exec/plugin-gen.h"
  #include "exec/translator.h"
 -/*
 - * plugin_cb_start TCG op args[]:
 - * 0: enum plugin_gen_from
 - * 1: enum plugin_gen_cb
 - * 2: set to 1 for mem callback that is a write, 0 otherwise.
 - */
 -
  enum plugin_gen_from {
      PLUGIN_GEN_FROM_TB,
      PLUGIN_GEN_FROM_INSN,
      PLUGIN_GEN_AFTER_INSN,
      PLUGIN_GEN_AFTER_TB,
 -    PLUGIN_GEN_N_FROMS,
 -};
 -
 -enum plugin_gen_cb {
 -    PLUGIN_GEN_CB_UDATA,
 -    PLUGIN_GEN_CB_UDATA_R,
 -    PLUGIN_GEN_CB_INLINE,
 -    PLUGIN_GEN_CB_MEM,
 -    PLUGIN_GEN_ENABLE_MEM_HELPER,
 -    PLUGIN_GEN_DISABLE_MEM_HELPER,
 -    PLUGIN_GEN_N_CBS,
  };
  static void plugin_gen_empty_callback(enum plugin_gen_from from)
 diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
 index XXXXXXX..XXXXXXX 100644
 --- a/tcg/tcg-op.c
 +++ b/tcg/tcg-op.c
@@ -XXX,XX +XXX,XX @@ void tcg_gen_plugin_mem_cb(TCGv_i64 addr, unsigned meminfo)
      tcg_gen_op2(INDEX_op_plugin_mem_cb, tcgv_i64_arg(addr), meminfo);
  }
--static void init_arg_info(OptContext *ctx, TCGArg arg)
+-void tcg_gen_plugin_cb_start(unsigned from, unsigned type, unsigned wr)
 -{
--    init_ts_info(ctx, arg_temp(arg));
+-    tcg_gen_op3(INDEX_op_plugin_cb_start, from, type, wr);
 -}
 -
- static TCGTemp *find_better_copy(TCGContext *s, TCGTemp *ts)
+-void tcg_gen_plugin_cb_end(void)
- {
+-{
-     TCGTemp *i, *g, *l;
+-    tcg_emit_op(INDEX_op_plugin_cb_end, 0);
-@@ -XXX,XX +XXX,XX @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
+-}
-     return false;
+-
- }
+ /* 32 bit ops */
-+static void init_arguments(OptContext *ctx, TCGOp *op, int nb_args)
+ void tcg_gen_discard_i32(TCGv_i32 arg)
 +{
 +    for (int i = 0; i < nb_args; i++) {
 +        TCGTemp *ts = arg_temp(op->args[i]);
 +        if (ts) {
 +            init_ts_info(ctx, ts);
 +        }
 +    }
 +}
 +
  /* Propagate constants and copies, fold constant expressions. */
  void tcg_optimize(TCGContext *s)
  {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
          if (opc == INDEX_op_call) {
              nb_oargs = TCGOP_CALLO(op);
              nb_iargs = TCGOP_CALLI(op);
 -            for (i = 0; i < nb_oargs + nb_iargs; i++) {
 -                TCGTemp *ts = arg_temp(op->args[i]);
 -                if (ts) {
 -                    init_ts_info(&ctx, ts);
 -                }
 -            }
          } else {
              nb_oargs = def->nb_oargs;
              nb_iargs = def->nb_iargs;
 -            for (i = 0; i < nb_oargs + nb_iargs; i++) {
 -                init_arg_info(&ctx, op->args[i]);
 -            }
          }
 +        init_arguments(&ctx, op, nb_oargs + nb_iargs);
          /* Do copy propagation */
          for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
 --
-.25.1
+.34.1

-[PULL 07/56] tcg/optimize: Split out OptContext
+[PULL 14/20] plugins: Simplify callback queues
-Provide what will become a larger context for splitting
+We have qemu_plugin_dyn_cb.type to differentiate the various
-the very large tcg_optimize function.
+callback types, so we do not need to keep them in separate queues.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
 Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/optimize.c | 77 ++++++++++++++++++++++++++------------------------
+ include/qemu/plugin.h  | 35 ++++++----------
-file changed, 40 insertions(+), 37 deletions(-)
+ accel/tcg/plugin-gen.c | 90 ++++++++++++++++++++++--------------------
  plugins/api.c          | 18 +++------
 files changed, 65 insertions(+), 78 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
+diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
+--- a/include/qemu/plugin.h
-+++ b/tcg/optimize.c
++++ b/include/qemu/plugin.h
-@@ -XXX,XX +XXX,XX @@ typedef struct TempOptInfo {
+@@ -XXX,XX +XXX,XX @@ union qemu_plugin_cb_sig {
-     uint64_t z_mask;  /* mask bit is 0 if and only if value bit is 0 */
+ };
- } TempOptInfo;
+ enum plugin_dyn_cb_type {
-+typedef struct OptContext {
+-    PLUGIN_CB_INSN,
-+    TCGTempSet temps_used;
+-    PLUGIN_CB_MEM,
-+} OptContext;
+-    PLUGIN_N_CB_TYPES,
 -};
 -
 -enum plugin_dyn_cb_subtype {
      PLUGIN_CB_REGULAR,
      PLUGIN_CB_INLINE,
 -    PLUGIN_N_CB_SUBTYPES,
  };
  /*
@@ -XXX,XX +XXX,XX @@ enum plugin_dyn_cb_subtype {
   */
  struct qemu_plugin_dyn_cb {
      void *userp;
 -    enum plugin_dyn_cb_subtype type;
 +    enum plugin_dyn_cb_type type;
      /* @rw applies to mem callbacks only (both regular and inline) */
      enum qemu_plugin_mem_rw rw;
      /* fields specific to each dyn_cb type go here */
@@ -XXX,XX +XXX,XX @@ struct qemu_plugin_insn {
      GByteArray *data;
      uint64_t vaddr;
      void *haddr;
 -    GArray *cbs[PLUGIN_N_CB_TYPES][PLUGIN_N_CB_SUBTYPES];
 +    GArray *insn_cbs;
 +    GArray *mem_cbs;
      bool calls_helpers;
      /* if set, the instruction calls helpers that might access guest memory */
@@ -XXX,XX +XXX,XX @@ static inline void qemu_plugin_insn_cleanup_fn(gpointer data)
  static inline struct qemu_plugin_insn *qemu_plugin_insn_alloc(void)
  {
 -    int i, j;
      struct qemu_plugin_insn *insn = g_new0(struct qemu_plugin_insn, 1);
 -    insn->data = g_byte_array_sized_new(4);
 -    for (i = 0; i < PLUGIN_N_CB_TYPES; i++) {
 -        for (j = 0; j < PLUGIN_N_CB_SUBTYPES; j++) {
 -            insn->cbs[i][j] = g_array_new(false, false,
 -                                          sizeof(struct qemu_plugin_dyn_cb));
 -        }
 -    }
 +    insn->data = g_byte_array_sized_new(4);
      return insn;
  }
@@ -XXX,XX +XXX,XX @@ struct qemu_plugin_tb {
      /* if set, the TB calls helpers that might access guest memory */
      bool mem_helper;
 -    GArray *cbs[PLUGIN_N_CB_SUBTYPES];
 +    GArray *cbs;
  };
  /**
@@ -XXX,XX +XXX,XX @@ struct qemu_plugin_insn *qemu_plugin_tb_insn_get(struct qemu_plugin_tb *tb,
                                                   uint64_t pc)
  {
      struct qemu_plugin_insn *insn;
 -    int i, j;
      if (unlikely(tb->n == tb->insns->len)) {
          struct qemu_plugin_insn *new_insn = qemu_plugin_insn_alloc();
          g_ptr_array_add(tb->insns, new_insn);
      }
 +
- static inline TempOptInfo *ts_info(TCGTemp *ts)
+     insn = g_ptr_array_index(tb->insns, tb->n++);
- {
+     g_byte_array_set_size(insn->data, 0);
-     return ts->state_ptr;
+     insn->calls_helpers = false;
-@@ -XXX,XX +XXX,XX @@ static void reset_temp(TCGArg arg)
+     insn->mem_helper = false;
- }
+     insn->vaddr = pc;
+-
- /* Initialize and activate a temporary.  */
+-    for (i = 0; i < PLUGIN_N_CB_TYPES; i++) {
--static void init_ts_info(TCGTempSet *temps_used, TCGTemp *ts)
+-        for (j = 0; j < PLUGIN_N_CB_SUBTYPES; j++) {
-+static void init_ts_info(OptContext *ctx, TCGTemp *ts)
+-            g_array_set_size(insn->cbs[i][j], 0);
- {
+-        }
-     size_t idx = temp_idx(ts);
++    if (insn->insn_cbs) {
-     TempOptInfo *ti;
++        g_array_set_size(insn->insn_cbs, 0);
++    }
--    if (test_bit(idx, temps_used->l)) {
++    if (insn->mem_cbs) {
-+    if (test_bit(idx, ctx->temps_used.l)) {
++        g_array_set_size(insn->mem_cbs, 0);
      }
      return insn;
 diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/tcg/plugin-gen.c
 +++ b/accel/tcg/plugin-gen.c
@@ -XXX,XX +XXX,XX @@ void plugin_gen_disable_mem_helpers(void)
  static void gen_enable_mem_helper(struct qemu_plugin_tb *ptb,
                                    struct qemu_plugin_insn *insn)
  {
 -    GArray *cbs[2];
      GArray *arr;
 -    size_t n_cbs;
 +    size_t len;
      /*
       * Tracking memory accesses performed from helpers requires extra work.
@@ -XXX,XX +XXX,XX @@ static void gen_enable_mem_helper(struct qemu_plugin_tb *ptb,
          return;
      }
--    set_bit(idx, temps_used->l);
-+    set_bit(idx, ctx->temps_used.l);
+-    cbs[0] = insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR];
+-    cbs[1] = insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE];
-     ti = ts->state_ptr;
+-    n_cbs = cbs[0]->len + cbs[1]->len;
-     if (ti == NULL) {
+-
-@@ -XXX,XX +XXX,XX @@ static void init_ts_info(TCGTempSet *temps_used, TCGTemp *ts)
+-    if (n_cbs == 0) {
-     }
++    if (!insn->mem_cbs || !insn->mem_cbs->len) {
- }
+         insn->mem_helper = false;
+         return;
--static void init_arg_info(TCGTempSet *temps_used, TCGArg arg)
+     }
-+static void init_arg_info(OptContext *ctx, TCGArg arg)
+     insn->mem_helper = true;
- {
+     ptb->mem_helper = true;
--    init_ts_info(temps_used, arg_temp(arg));
-+    init_ts_info(ctx, arg_temp(arg));
++    /*
- }
++     * TODO: It seems like we should be able to use ref/unref
++     * to avoid needing to actually copy this array.
- static TCGTemp *find_better_copy(TCGContext *s, TCGTemp *ts)
++     * Alternately, perhaps we could allocate new memory adjacent
-@@ -XXX,XX +XXX,XX @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
++     * to the TranslationBlock itself, so that we do not have to
-     }
++     * actively manage the lifetime after this.
- }
++     */
++    len = insn->mem_cbs->len;
--static void tcg_opt_gen_movi(TCGContext *s, TCGTempSet *temps_used,
+     arr = g_array_sized_new(false, false,
-+static void tcg_opt_gen_movi(TCGContext *s, OptContext *ctx,
+-                            sizeof(struct qemu_plugin_dyn_cb), n_cbs);
-                              TCGOp *op, TCGArg dst, uint64_t val)
+-    g_array_append_vals(arr, cbs[0]->data, cbs[0]->len);
- {
+-    g_array_append_vals(arr, cbs[1]->data, cbs[1]->len);
-     const TCGOpDef *def = &tcg_op_defs[op->opc];
+-
-@@ -XXX,XX +XXX,XX @@ static void tcg_opt_gen_movi(TCGContext *s, TCGTempSet *temps_used,
++                            sizeof(struct qemu_plugin_dyn_cb), len);
++    memcpy(arr->data, insn->mem_cbs->data,
-     /* Convert movi to mov with constant temp. */
++           len * sizeof(struct qemu_plugin_dyn_cb));
-     tv = tcg_constant_internal(type, val);
+     qemu_plugin_add_dyn_cb_arr(arr);
--    init_ts_info(temps_used, tv);
-+    init_ts_info(ctx, tv);
+     tcg_gen_st_ptr(tcg_constant_ptr((intptr_t)arr), tcg_env,
-     tcg_opt_gen_mov(s, op, dst, temp_arg(tv));
+@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
- }
+             case PLUGIN_GEN_FROM_TB:
+                 assert(insn == NULL);
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
- {
+-                cbs = plugin_tb->cbs[PLUGIN_CB_REGULAR];
-     int nb_temps, nb_globals, i;
++                cbs = plugin_tb->cbs;
-     TCGOp *op, *op_next, *prev_mb = NULL;
+                 for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
--    TCGTempSet temps_used;
+                     struct qemu_plugin_dyn_cb *cb =
-+    OptContext ctx = {};
+                         &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
+-                    gen_udata_cb(cb);
-     /* Array VALS has an element for each temp.
+-                }
-        If this temp holds a constant then its value is kept in VALS' element.
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
+-                cbs = plugin_tb->cbs[PLUGIN_CB_INLINE];
-     nb_temps = s->nb_temps;
+-                for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
-     nb_globals = s->nb_globals;
+-                    struct qemu_plugin_dyn_cb *cb =
+-                        &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
--    memset(&temps_used, 0, sizeof(temps_used));
+-                    gen_inline_cb(cb);
-     for (i = 0; i < nb_temps; ++i) {
++                    switch (cb->type) {
-         s->temps[i].state_ptr = NULL;
++                    case PLUGIN_CB_REGULAR:
-     }
++                        gen_udata_cb(cb);
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
++                        break;
-             for (i = 0; i < nb_oargs + nb_iargs; i++) {
++                    case PLUGIN_CB_INLINE:
-                 TCGTemp *ts = arg_temp(op->args[i]);
++                        gen_inline_cb(cb);
-                 if (ts) {
++                        break;
--                    init_ts_info(&temps_used, ts);
++                    default:
-+                    init_ts_info(&ctx, ts);
++                        g_assert_not_reached();
 +                    }
                  }
                  break;
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
                  gen_enable_mem_helper(plugin_tb, insn);
 -                cbs = insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR];
 +                cbs = insn->insn_cbs;
                  for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
                      struct qemu_plugin_dyn_cb *cb =
                          &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
 -                    gen_udata_cb(cb);
 -                }
 -                cbs = insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE];
 -                for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
 -                    struct qemu_plugin_dyn_cb *cb =
 -                        &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
 -                    gen_inline_cb(cb);
 +                    switch (cb->type) {
 +                    case PLUGIN_CB_REGULAR:
 +                        gen_udata_cb(cb);
 +                        break;
 +                    case PLUGIN_CB_INLINE:
 +                        gen_inline_cb(cb);
 +                        break;
 +                    default:
 +                        g_assert_not_reached();
 +                    }
                  }
                  break;
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
              tcg_ctx->emit_before_op = op;
 -            cbs = insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR];
 +            cbs = insn->mem_cbs;
              for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
                  struct qemu_plugin_dyn_cb *cb =
                      &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
 -                if (cb->rw & rw) {
 -                    gen_mem_cb(cb, meminfo, addr);
 -                }
 -            }
 -            cbs = insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE];
 -            for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
 -                struct qemu_plugin_dyn_cb *cb =
 -                    &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
                  if (cb->rw & rw) {
 -                    gen_inline_cb(cb);
 +                    switch (cb->type) {
 +                    case PLUGIN_CB_REGULAR:
 +                        gen_mem_cb(cb, meminfo, addr);
 +                        break;
 +                    case PLUGIN_CB_INLINE:
 +                        gen_inline_cb(cb);
 +                        break;
 +                    default:
 +                        g_assert_not_reached();
 +                    }
                  }
              }
-         } else {
-             nb_oargs = def->nb_oargs;
+@@ -XXX,XX +XXX,XX @@ bool plugin_gen_tb_start(CPUState *cpu, const DisasContextBase *db,
-             nb_iargs = def->nb_iargs;
-             for (i = 0; i < nb_oargs + nb_iargs; i++) {
+     if (test_bit(QEMU_PLUGIN_EV_VCPU_TB_TRANS, cpu->plugin_state->event_mask)) {
--                init_arg_info(&temps_used, op->args[i]);
+         struct qemu_plugin_tb *ptb = tcg_ctx->plugin_tb;
-+                init_arg_info(&ctx, op->args[i]);
+-        int i;
-             }
          /* reset callbacks */
 -        for (i = 0; i < PLUGIN_N_CB_SUBTYPES; i++) {
 -            if (ptb->cbs[i]) {
 -                g_array_set_size(ptb->cbs[i], 0);
 -            }
 +        if (ptb->cbs) {
 +            g_array_set_size(ptb->cbs, 0);
          }
+         ptb->n = 0;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         CASE_OP_32_64(rotr):
+diff --git a/plugins/api.c b/plugins/api.c
-             if (arg_is_const(op->args[1])
+index XXXXXXX..XXXXXXX 100644
-                 && arg_info(op->args[1])->val == 0) {
+--- a/plugins/api.c
--                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], 0);
++++ b/plugins/api.c
-+                tcg_opt_gen_movi(s, &ctx, op, op->args[0], 0);
+@@ -XXX,XX +XXX,XX @@ void qemu_plugin_register_vcpu_tb_exec_cb(struct qemu_plugin_tb *tb,
-                 continue;
+                                           void *udata)
-             }
+ {
-             break;
+     if (!tb->mem_only) {
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
+-        plugin_register_dyn_cb__udata(&tb->cbs[PLUGIN_CB_REGULAR],
+-                                      cb, flags, udata);
-         if (partmask == 0) {
++        plugin_register_dyn_cb__udata(&tb->cbs, cb, flags, udata);
-             tcg_debug_assert(nb_oargs == 1);
+     }
--            tcg_opt_gen_movi(s, &temps_used, op, op->args[0], 0);
+ }
-+            tcg_opt_gen_movi(s, &ctx, op, op->args[0], 0);
-             continue;
+@@ -XXX,XX +XXX,XX @@ void qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
-         }
+     uint64_t imm)
-         if (affected == 0) {
+ {
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
+     if (!tb->mem_only) {
-         CASE_OP_32_64(mulsh):
+-        plugin_register_inline_op_on_entry(
-             if (arg_is_const(op->args[2])
+-            &tb->cbs[PLUGIN_CB_INLINE], 0, op, entry, imm);
-                 && arg_info(op->args[2])->val == 0) {
++        plugin_register_inline_op_on_entry(&tb->cbs, 0, op, entry, imm);
--                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], 0);
+     }
-+                tcg_opt_gen_movi(s, &ctx, op, op->args[0], 0);
+ }
-                 continue;
-             }
+@@ -XXX,XX +XXX,XX @@ void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
-             break;
+                                             void *udata)
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
+ {
-         CASE_OP_32_64_VEC(sub):
+     if (!insn->mem_only) {
-         CASE_OP_32_64_VEC(xor):
+-        plugin_register_dyn_cb__udata(
-             if (args_are_copies(op->args[1], op->args[2])) {
+-            &insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR], cb, flags, udata);
--                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], 0);
++        plugin_register_dyn_cb__udata(&insn->insn_cbs, cb, flags, udata);
-+                tcg_opt_gen_movi(s, &ctx, op, op->args[0], 0);
+     }
-                 continue;
+ }
-             }
-             break;
+@@ -XXX,XX +XXX,XX @@ void qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
+     uint64_t imm)
-             if (arg_is_const(op->args[1])) {
+ {
-                 tmp = arg_info(op->args[1])->val;
+     if (!insn->mem_only) {
-                 tmp = dup_const(TCGOP_VECE(op), tmp);
+-        plugin_register_inline_op_on_entry(
--                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
+-            &insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE], 0, op, entry, imm);
-+                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
++        plugin_register_inline_op_on_entry(&insn->insn_cbs, 0, op, entry, imm);
-                 break;
+     }
-             }
+ }
-             goto do_default;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
+@@ -XXX,XX +XXX,XX @@ void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
-         case INDEX_op_dup2_vec:
+                                       enum qemu_plugin_mem_rw rw,
-             assert(TCG_TARGET_REG_BITS == 32);
+                                       void *udata)
-             if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
+ {
--                tcg_opt_gen_movi(s, &temps_used, op, op->args[0],
+-    plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
-+                tcg_opt_gen_movi(s, &ctx, op, op->args[0],
+-                                cb, flags, rw, udata);
-                                  deposit64(arg_info(op->args[1])->val, 32, 32,
++    plugin_register_vcpu_mem_cb(&insn->mem_cbs, cb, flags, rw, udata);
-                                            arg_info(op->args[2])->val));
+ }
-                 break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
+ void qemu_plugin_register_vcpu_mem_inline_per_vcpu(
-         case INDEX_op_extrh_i64_i32:
+@@ -XXX,XX +XXX,XX @@ void qemu_plugin_register_vcpu_mem_inline_per_vcpu(
-             if (arg_is_const(op->args[1])) {
+     qemu_plugin_u64 entry,
-                 tmp = do_constant_folding(opc, arg_info(op->args[1])->val, 0);
+     uint64_t imm)
--                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
+ {
-+                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
+-    plugin_register_inline_op_on_entry(
-                 break;
+-        &insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE], rw, op, entry, imm);
-             }
++    plugin_register_inline_op_on_entry(&insn->mem_cbs, rw, op, entry, imm);
-             goto do_default;
+ }
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             if (arg_is_const(op->args[1])) {
+ void qemu_plugin_register_vcpu_tb_trans_cb(qemu_plugin_id_t id,
                  tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
                                            op->args[2]);
 -                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
 +                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
                  break;
              }
              goto do_default;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
              if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
                  tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
                                            arg_info(op->args[2])->val);
 -                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
 +                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
                  break;
              }
              goto do_default;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                  TCGArg v = arg_info(op->args[1])->val;
                  if (v != 0) {
                      tmp = do_constant_folding(opc, v, 0);
 -                    tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
 +                    tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
                  } else {
                      tcg_opt_gen_mov(s, op, op->args[0], op->args[2]);
                  }
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                  tmp = deposit64(arg_info(op->args[1])->val,
                                  op->args[3], op->args[4],
                                  arg_info(op->args[2])->val);
 -                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
 +                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
                  break;
              }
              goto do_default;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
              if (arg_is_const(op->args[1])) {
                  tmp = extract64(arg_info(op->args[1])->val,
                                  op->args[2], op->args[3]);
 -                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
 +                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
                  break;
              }
              goto do_default;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
              if (arg_is_const(op->args[1])) {
                  tmp = sextract64(arg_info(op->args[1])->val,
                                   op->args[2], op->args[3]);
 -                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
 +                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
                  break;
              }
              goto do_default;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                      tmp = (int32_t)(((uint32_t)v1 >> shr) |
                                      ((uint32_t)v2 << (32 - shr)));
                  }
 -                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
 +                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
                  break;
              }
              goto do_default;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
              tmp = do_constant_folding_cond(opc, op->args[1],
                                             op->args[2], op->args[3]);
              if (tmp != 2) {
 -                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
 +                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
                  break;
              }
              goto do_default;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                                             op->args[1], op->args[2]);
              if (tmp != 2) {
                  if (tmp) {
 -                    memset(&temps_used, 0, sizeof(temps_used));
 +                    memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
                      op->opc = INDEX_op_br;
                      op->args[0] = op->args[3];
                  } else {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                  rl = op->args[0];
                  rh = op->args[1];
 -                tcg_opt_gen_movi(s, &temps_used, op, rl, (int32_t)a);
 -                tcg_opt_gen_movi(s, &temps_used, op2, rh, (int32_t)(a >> 32));
 +                tcg_opt_gen_movi(s, &ctx, op, rl, (int32_t)a);
 +                tcg_opt_gen_movi(s, &ctx, op2, rh, (int32_t)(a >> 32));
                  break;
              }
              goto do_default;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                  rl = op->args[0];
                  rh = op->args[1];
 -                tcg_opt_gen_movi(s, &temps_used, op, rl, (int32_t)r);
 -                tcg_opt_gen_movi(s, &temps_used, op2, rh, (int32_t)(r >> 32));
 +                tcg_opt_gen_movi(s, &ctx, op, rl, (int32_t)r);
 +                tcg_opt_gen_movi(s, &ctx, op2, rh, (int32_t)(r >> 32));
                  break;
              }
              goto do_default;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
              if (tmp != 2) {
                  if (tmp) {
              do_brcond_true:
 -                    memset(&temps_used, 0, sizeof(temps_used));
 +                    memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
                      op->opc = INDEX_op_br;
                      op->args[0] = op->args[5];
                  } else {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                  /* Simplify LT/GE comparisons vs zero to a single compare
                     vs the high word of the input.  */
              do_brcond_high:
 -                memset(&temps_used, 0, sizeof(temps_used));
 +                memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
                  op->opc = INDEX_op_brcond_i32;
                  op->args[0] = op->args[1];
                  op->args[1] = op->args[3];
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                      goto do_default;
                  }
              do_brcond_low:
 -                memset(&temps_used, 0, sizeof(temps_used));
 +                memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
                  op->opc = INDEX_op_brcond_i32;
                  op->args[1] = op->args[2];
                  op->args[2] = op->args[4];
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                                              op->args[5]);
              if (tmp != 2) {
              do_setcond_const:
 -                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
 +                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
              } else if ((op->args[5] == TCG_COND_LT
                          || op->args[5] == TCG_COND_GE)
                         && arg_is_const(op->args[3])
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
              if (!(tcg_call_flags(op)
                    & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS))) {
                  for (i = 0; i < nb_globals; i++) {
 -                    if (test_bit(i, temps_used.l)) {
 +                    if (test_bit(i, ctx.temps_used.l)) {
                          reset_ts(&s->temps[i]);
                      }
                  }
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 block, otherwise we only trash the output args.  "z_mask" is
                 the non-zero bits mask for the first output arg.  */
              if (def->flags & TCG_OPF_BB_END) {
 -                memset(&temps_used, 0, sizeof(temps_used));
 +                memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
              } else {
          do_reset_output:
                  for (i = 0; i < nb_oargs; i++) {
 --
-.25.1
+.34.1

-[PULL 52/56] tcg/optimize: Optimize sign extensions
+[PULL 15/20] plugins: Introduce PLUGIN_CB_MEM_REGULAR
-Certain targets, like riscv, produce signed 32-bit results.
+Use different enumerators for vcpu_udata and vcpu_mem callbacks.
 This can lead to lots of redundant extensions as values are
 manipulated.
-Begin by tracking only the obvious sign-extensions, and
+Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
 converting them to simple copies when possible.
 Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/optimize.c | 123 ++++++++++++++++++++++++++++++++++++++++---------
+ include/qemu/plugin.h  | 1 +
-file changed, 102 insertions(+), 21 deletions(-)
+ accel/tcg/plugin-gen.c | 2 +-
  plugins/core.c         | 4 ++--
 files changed, 4 insertions(+), 3 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
+diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
+--- a/include/qemu/plugin.h
-+++ b/tcg/optimize.c
++++ b/include/qemu/plugin.h
-@@ -XXX,XX +XXX,XX @@ typedef struct TempOptInfo {
+@@ -XXX,XX +XXX,XX @@ union qemu_plugin_cb_sig {
-     TCGTemp *next_copy;
-     uint64_t val;
+ enum plugin_dyn_cb_type {
-     uint64_t z_mask;  /* mask bit is 0 if and only if value bit is 0 */
+     PLUGIN_CB_REGULAR,
-+    uint64_t s_mask;  /* a left-aligned mask of clrsb(value) bits. */
++    PLUGIN_CB_MEM_REGULAR,
- } TempOptInfo;
+     PLUGIN_CB_INLINE,
+ };
- typedef struct OptContext {
-@@ -XXX,XX +XXX,XX @@ typedef struct OptContext {
+diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
-     /* In flight values from optimization. */
+index XXXXXXX..XXXXXXX 100644
-     uint64_t a_mask;  /* mask bit is 0 iff value identical to first input */
+--- a/accel/tcg/plugin-gen.c
-     uint64_t z_mask;  /* mask bit is 0 iff value bit is 0 */
++++ b/accel/tcg/plugin-gen.c
-+    uint64_t s_mask;  /* mask of clrsb(value) bits */
+@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
-     TCGType type;
- } OptContext;
+                 if (cb->rw & rw) {
+                     switch (cb->type) {
-+/* Calculate the smask for a specific value. */
+-                    case PLUGIN_CB_REGULAR:
-+static uint64_t smask_from_value(uint64_t value)
++                    case PLUGIN_CB_MEM_REGULAR:
-+{
+                         gen_mem_cb(cb, meminfo, addr);
-+    int rep = clrsb64(value);
+                         break;
-+    return ~(~0ull >> rep);
+                     case PLUGIN_CB_INLINE:
-+}
+diff --git a/plugins/core.c b/plugins/core.c
-+
+index XXXXXXX..XXXXXXX 100644
-+/*
+--- a/plugins/core.c
-+ * Calculate the smask for a given set of known-zeros.
++++ b/plugins/core.c
-+ * If there are lots of zeros on the left, we can consider the remainder
+@@ -XXX,XX +XXX,XX @@ void plugin_register_vcpu_mem_cb(GArray **arr,
-+ * an unsigned field, and thus the corresponding signed field is one bit
-+ * larger.
+     struct qemu_plugin_dyn_cb *dyn_cb = plugin_get_dyn_cb(arr);
-+ */
+     dyn_cb->userp = udata;
-+static uint64_t smask_from_zmask(uint64_t zmask)
+-    dyn_cb->type = PLUGIN_CB_REGULAR;
-+{
++    dyn_cb->type = PLUGIN_CB_MEM_REGULAR;
-+    /*
+     dyn_cb->rw = rw;
-+     * Only the 0 bits are significant for zmask, thus the msb itself
+     dyn_cb->regular.f.vcpu_mem = cb;
-+     * must be zero, else we have no sign information.
-+     */
+@@ -XXX,XX +XXX,XX @@ void qemu_plugin_vcpu_mem_cb(CPUState *cpu, uint64_t vaddr,
-+    int rep = clz64(zmask);
+                 break;
 +    if (rep == 0) {
 +        return 0;
 +    }
 +    rep -= 1;
 +    return ~(~0ull >> rep);
 +}
 +
  static inline TempOptInfo *ts_info(TCGTemp *ts)
  {
      return ts->state_ptr;
@@ -XXX,XX +XXX,XX @@ static void reset_ts(TCGTemp *ts)
      ti->prev_copy = ts;
      ti->is_const = false;
      ti->z_mask = -1;
 +    ti->s_mask = 0;
  }
  static void reset_temp(TCGArg arg)
@@ -XXX,XX +XXX,XX @@ static void init_ts_info(OptContext *ctx, TCGTemp *ts)
          ti->is_const = true;
          ti->val = ts->val;
          ti->z_mask = ts->val;
 +        ti->s_mask = smask_from_value(ts->val);
      } else {
          ti->is_const = false;
          ti->z_mask = -1;
 +        ti->s_mask = 0;
      }
  }
@@ -XXX,XX +XXX,XX @@ static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
      op->args[1] = src;
      di->z_mask = si->z_mask;
 +    di->s_mask = si->s_mask;
      if (src_ts->type == dst_ts->type) {
          TempOptInfo *ni = ts_info(si->next_copy);
@@ -XXX,XX +XXX,XX @@ static void finish_folding(OptContext *ctx, TCGOp *op)
      nb_oargs = def->nb_oargs;
      for (i = 0; i < nb_oargs; i++) {
 -        reset_temp(op->args[i]);
 +        TCGTemp *ts = arg_temp(op->args[i]);
 +        reset_ts(ts);
          /*
 -         * Save the corresponding known-zero bits mask for the
 +         * Save the corresponding known-zero/sign bits mask for the
           * first output argument (only one supported so far).
           */
          if (i == 0) {
 -            arg_info(op->args[i])->z_mask = ctx->z_mask;
 +            ts_info(ts)->z_mask = ctx->z_mask;
 +            ts_info(ts)->s_mask = ctx->s_mask;
          }
-     }
+         switch (cb->type) {
- }
+-        case PLUGIN_CB_REGULAR:
-@@ -XXX,XX +XXX,XX @@ static bool fold_masks(OptContext *ctx, TCGOp *op)
++        case PLUGIN_CB_MEM_REGULAR:
- {
+             cb->regular.f.vcpu_mem(cpu->cpu_index, make_plugin_meminfo(oi, rw),
-     uint64_t a_mask = ctx->a_mask;
+                                    vaddr, cb->userp);
      uint64_t z_mask = ctx->z_mask;
 +    uint64_t s_mask = ctx->s_mask;
      /*
       * 32-bit ops generate 32-bit results, which for the purpose of
@@ -XXX,XX +XXX,XX @@ static bool fold_masks(OptContext *ctx, TCGOp *op)
      if (ctx->type == TCG_TYPE_I32) {
          a_mask = (int32_t)a_mask;
          z_mask = (int32_t)z_mask;
 +        s_mask |= MAKE_64BIT_MASK(32, 32);
          ctx->z_mask = z_mask;
 +        ctx->s_mask = s_mask;
      }
      if (z_mask == 0) {
@@ -XXX,XX +XXX,XX @@ static bool fold_brcond2(OptContext *ctx, TCGOp *op)
  static bool fold_bswap(OptContext *ctx, TCGOp *op)
  {
 -    uint64_t z_mask, sign;
 +    uint64_t z_mask, s_mask, sign;
      if (arg_is_const(op->args[1])) {
          uint64_t t = arg_info(op->args[1])->val;
@@ -XXX,XX +XXX,XX @@ static bool fold_bswap(OptContext *ctx, TCGOp *op)
      }
      z_mask = arg_info(op->args[1])->z_mask;
 +
      switch (op->opc) {
      case INDEX_op_bswap16_i32:
      case INDEX_op_bswap16_i64:
@@ -XXX,XX +XXX,XX @@ static bool fold_bswap(OptContext *ctx, TCGOp *op)
      default:
          g_assert_not_reached();
      }
 +    s_mask = smask_from_zmask(z_mask);
      switch (op->args[2] & (TCG_BSWAP_OZ | TCG_BSWAP_OS)) {
      case TCG_BSWAP_OZ:
@@ -XXX,XX +XXX,XX @@ static bool fold_bswap(OptContext *ctx, TCGOp *op)
          /* If the sign bit may be 1, force all the bits above to 1. */
          if (z_mask & sign) {
              z_mask |= sign;
 +            s_mask = sign << 1;
          }
          break;
      default:
          /* The high bits are undefined: force all bits above the sign to 1. */
          z_mask |= sign << 1;
 +        s_mask = 0;
          break;
      }
      ctx->z_mask = z_mask;
 +    ctx->s_mask = s_mask;
      return fold_masks(ctx, op);
  }
@@ -XXX,XX +XXX,XX @@ static bool fold_eqv(OptContext *ctx, TCGOp *op)
  static bool fold_extract(OptContext *ctx, TCGOp *op)
  {
      uint64_t z_mask_old, z_mask;
 +    int pos = op->args[2];
 +    int len = op->args[3];
      if (arg_is_const(op->args[1])) {
          uint64_t t;
          t = arg_info(op->args[1])->val;
 -        t = extract64(t, op->args[2], op->args[3]);
 +        t = extract64(t, pos, len);
          return tcg_opt_gen_movi(ctx, op, op->args[0], t);
      }
      z_mask_old = arg_info(op->args[1])->z_mask;
 -    z_mask = extract64(z_mask_old, op->args[2], op->args[3]);
 -    if (op->args[2] == 0) {
 +    z_mask = extract64(z_mask_old, pos, len);
 +    if (pos == 0) {
          ctx->a_mask = z_mask_old ^ z_mask;
      }
      ctx->z_mask = z_mask;
 +    ctx->s_mask = smask_from_zmask(z_mask);
      return fold_masks(ctx, op);
  }
@@ -XXX,XX +XXX,XX @@ static bool fold_extract2(OptContext *ctx, TCGOp *op)
  static bool fold_exts(OptContext *ctx, TCGOp *op)
  {
 -    uint64_t z_mask_old, z_mask, sign;
 +    uint64_t s_mask_old, s_mask, z_mask, sign;
      bool type_change = false;
      if (fold_const1(ctx, op)) {
          return true;
      }
 -    z_mask_old = z_mask = arg_info(op->args[1])->z_mask;
 +    z_mask = arg_info(op->args[1])->z_mask;
 +    s_mask = arg_info(op->args[1])->s_mask;
 +    s_mask_old = s_mask;
      switch (op->opc) {
      CASE_OP_32_64(ext8s):
@@ -XXX,XX +XXX,XX @@ static bool fold_exts(OptContext *ctx, TCGOp *op)
      if (z_mask & sign) {
          z_mask |= sign;
 -    } else if (!type_change) {
 -        ctx->a_mask = z_mask_old ^ z_mask;
      }
 +    s_mask |= sign << 1;
 +
      ctx->z_mask = z_mask;
 +    ctx->s_mask = s_mask;
 +    if (!type_change) {
 +        ctx->a_mask = s_mask & ~s_mask_old;
 +    }
      return fold_masks(ctx, op);
  }
@@ -XXX,XX +XXX,XX @@ static bool fold_extu(OptContext *ctx, TCGOp *op)
      }
      ctx->z_mask = z_mask;
 +    ctx->s_mask = smask_from_zmask(z_mask);
      if (!type_change) {
          ctx->a_mask = z_mask_old ^ z_mask;
      }
@@ -XXX,XX +XXX,XX @@ static bool fold_qemu_ld(OptContext *ctx, TCGOp *op)
      MemOp mop = get_memop(oi);
      int width = 8 * memop_size(mop);
 -    if (!(mop & MO_SIGN) && width < 64) {
 -        ctx->z_mask = MAKE_64BIT_MASK(0, width);
 +    if (width < 64) {
 +        ctx->s_mask = MAKE_64BIT_MASK(width, 64 - width);
 +        if (!(mop & MO_SIGN)) {
 +            ctx->z_mask = MAKE_64BIT_MASK(0, width);
 +            ctx->s_mask <<= 1;
 +        }
      }
      /* Opcodes that touch guest memory stop the mb optimization.  */
@@ -XXX,XX +XXX,XX @@ static bool fold_setcond2(OptContext *ctx, TCGOp *op)
  static bool fold_sextract(OptContext *ctx, TCGOp *op)
  {
 -    int64_t z_mask_old, z_mask;
 +    uint64_t z_mask, s_mask, s_mask_old;
 +    int pos = op->args[2];
 +    int len = op->args[3];
      if (arg_is_const(op->args[1])) {
          uint64_t t;
          t = arg_info(op->args[1])->val;
 -        t = sextract64(t, op->args[2], op->args[3]);
 +        t = sextract64(t, pos, len);
          return tcg_opt_gen_movi(ctx, op, op->args[0], t);
      }
 -    z_mask_old = arg_info(op->args[1])->z_mask;
 -    z_mask = sextract64(z_mask_old, op->args[2], op->args[3]);
 -    if (op->args[2] == 0 && z_mask >= 0) {
 -        ctx->a_mask = z_mask_old ^ z_mask;
 -    }
 +    z_mask = arg_info(op->args[1])->z_mask;
 +    z_mask = sextract64(z_mask, pos, len);
      ctx->z_mask = z_mask;
 +    s_mask_old = arg_info(op->args[1])->s_mask;
 +    s_mask = sextract64(s_mask_old, pos, len);
 +    s_mask |= MAKE_64BIT_MASK(len, 64 - len);
 +    ctx->s_mask = s_mask;
 +
 +    if (pos == 0) {
 +        ctx->a_mask = s_mask & ~s_mask_old;
 +    }
 +
      return fold_masks(ctx, op);
  }
@@ -XXX,XX +XXX,XX @@ static bool fold_tcg_ld(OptContext *ctx, TCGOp *op)
  {
      /* We can't do any folding with a load, but we can record bits. */
      switch (op->opc) {
 +    CASE_OP_32_64(ld8s):
 +        ctx->s_mask = MAKE_64BIT_MASK(8, 56);
 +        break;
      CASE_OP_32_64(ld8u):
          ctx->z_mask = MAKE_64BIT_MASK(0, 8);
 +        ctx->s_mask = MAKE_64BIT_MASK(9, 55);
 +        break;
 +    CASE_OP_32_64(ld16s):
 +        ctx->s_mask = MAKE_64BIT_MASK(16, 48);
          break;
      CASE_OP_32_64(ld16u):
          ctx->z_mask = MAKE_64BIT_MASK(0, 16);
 +        ctx->s_mask = MAKE_64BIT_MASK(17, 47);
 +        break;
 +    case INDEX_op_ld32s_i64:
 +        ctx->s_mask = MAKE_64BIT_MASK(32, 32);
          break;
      case INDEX_op_ld32u_i64:
          ctx->z_mask = MAKE_64BIT_MASK(0, 32);
 +        ctx->s_mask = MAKE_64BIT_MASK(33, 31);
          break;
      default:
          g_assert_not_reached();
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
              ctx.type = TCG_TYPE_I32;
          }
 -        /* Assume all bits affected, and no bits known zero. */
 +        /* Assume all bits affected, no bits known zero, no sign reps. */
          ctx.a_mask = -1;
          ctx.z_mask = -1;
 +        ctx.s_mask = 0;
          /*
           * Process each opcode.
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
          case INDEX_op_extrh_i64_i32:
              done = fold_extu(&ctx, op);
              break;
 +        CASE_OP_32_64(ld8s):
          CASE_OP_32_64(ld8u):
 +        CASE_OP_32_64(ld16s):
          CASE_OP_32_64(ld16u):
 +        case INDEX_op_ld32s_i64:
          case INDEX_op_ld32u_i64:
              done = fold_tcg_ld(&ctx, op);
              break;
 --
-.25.1
+.34.1

-[PULL 03/56] host-utils: move udiv_qrnnd() to host-utils
+[PULL 16/20] plugins: Replace pr_ops with a proper debug dump flag
-From: Luis Pires <luis.pires@eldorado.org.br>
+The DEBUG_PLUGIN_GEN_OPS ifdef is replaced with "-d op_plugin".
 The second pr_ops call can be obtained with "-d op".
-Move udiv_qrnnd() from include/fpu/softfloat-macros.h to host-utils,
+Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
 so it can be reused by divu128().
 Signed-off-by: Luis Pires <luis.pires@eldorado.org.br>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-Id: <20211025191154.350831-3-luis.pires@eldorado.org.br>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- include/fpu/softfloat-macros.h | 82 ----------------------------------
+ include/qemu/log.h     |  1 +
- include/qemu/host-utils.h      | 81 +++++++++++++++++++++++++++++++++
+ include/tcg/tcg.h      |  1 +
-files changed, 81 insertions(+), 82 deletions(-)
+ accel/tcg/plugin-gen.c | 67 +++++++-----------------------------------
  tcg/tcg.c              | 29 +++++++++++++++++-
  util/log.c             |  4 +++
 files changed, 45 insertions(+), 57 deletions(-)
-diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
+diff --git a/include/qemu/log.h b/include/qemu/log.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/fpu/softfloat-macros.h
+--- a/include/qemu/log.h
-+++ b/include/fpu/softfloat-macros.h
++++ b/include/qemu/log.h
@@ -XXX,XX +XXX,XX @@ bool qemu_log_separate(void);
  #define LOG_STRACE         (1 << 19)
  #define LOG_PER_THREAD     (1 << 20)
  #define CPU_LOG_TB_VPU     (1 << 21)
 +#define LOG_TB_OP_PLUGIN   (1 << 22)
  /* Lock/unlock output. */
 diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/tcg/tcg.h
 +++ b/include/tcg/tcg.h
@@ -XXX,XX +XXX,XX @@ static inline const TCGOpcode *tcg_swap_vecop_list(const TCGOpcode *n)
  }
  bool tcg_can_emit_vecop_list(const TCGOpcode *, TCGType, unsigned);
 +void tcg_dump_ops(TCGContext *s, FILE *f, bool have_prefs);
  #endif /* TCG_H */
 diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/tcg/plugin-gen.c
 +++ b/accel/tcg/plugin-gen.c
 @@ -XXX,XX +XXX,XX @@
-  * so some portions are provided under:
-  *  the SoftFloat-2a license
-  *  the BSD license
-- *  GPL-v2-or-later
-  *
-  * Any future contributions to this file after December 1st 2014 will be
-  * taken to be licensed under the Softfloat-2a license unless specifically
-@@ -XXX,XX +XXX,XX @@ this code that are retained.
-  * THE POSSIBILITY OF SUCH DAMAGE.
   */
+ #include "qemu/osdep.h"
--/* Portions of this work are licensed under the terms of the GNU GPL,
+ #include "qemu/plugin.h"
-- * version 2 or later. See the COPYING file in the top-level directory.
++#include "qemu/log.h"
-- */
+ #include "cpu.h"
  #include "tcg/tcg.h"
  #include "tcg/tcg-temp-internal.h"
@@ -XXX,XX +XXX,XX @@ static void gen_mem_cb(struct qemu_plugin_dyn_cb *cb,
      tcg_temp_free_i32(cpu_index);
  }
 -/* #define DEBUG_PLUGIN_GEN_OPS */
 -static void pr_ops(void)
 -{
 -#ifdef DEBUG_PLUGIN_GEN_OPS
 -    TCGOp *op;
 -    int i = 0;
 -
- #ifndef FPU_SOFTFLOAT_MACROS_H
+-    QTAILQ_FOREACH(op, &tcg_ctx->ops, link) {
- #define FPU_SOFTFLOAT_MACROS_H
+-        const char *name = "";
+-        const char *type = "";
@@ -XXX,XX +XXX,XX @@ static inline uint64_t estimateDiv128To64(uint64_t a0, uint64_t a1, uint64_t b)
  }
 -/* From the GNU Multi Precision Library - longlong.h __udiv_qrnnd
 - * (https://gmplib.org/repo/gmp/file/tip/longlong.h)
 - *
 - * Licensed under the GPLv2/LGPLv3
 - */
 -static inline uint64_t udiv_qrnnd(uint64_t *r, uint64_t n1,
 -                                  uint64_t n0, uint64_t d)
 -{
 -#if defined(__x86_64__)
 -    uint64_t q;
 -    asm("divq %4" : "=a"(q), "=d"(*r) : "0"(n0), "1"(n1), "rm"(d));
 -    return q;
 -#elif defined(__s390x__) && !defined(__clang__)
 -    /* Need to use a TImode type to get an even register pair for DLGR.  */
 -    unsigned __int128 n = (unsigned __int128)n1 << 64 | n0;
 -    asm("dlgr %0, %1" : "+r"(n) : "r"(d));
 -    *r = n >> 64;
 -    return n;
 -#elif defined(_ARCH_PPC64) && defined(_ARCH_PWR7)
 -    /* From Power ISA 2.06, programming note for divdeu.  */
 -    uint64_t q1, q2, Q, r1, r2, R;
 -    asm("divdeu %0,%2,%4; divdu %1,%3,%4"
 -        : "=&r"(q1), "=r"(q2)
 -        : "r"(n1), "r"(n0), "r"(d));
 -    r1 = -(q1 * d);         /* low part of (n1<<64) - (q1 * d) */
 -    r2 = n0 - (q2 * d);
 -    Q = q1 + q2;
 -    R = r1 + r2;
 -    if (R >= d || R < r2) { /* overflow implies R > d */
 -        Q += 1;
 -        R -= d;
 -    }
 -    *r = R;
 -    return Q;
 -#else
 -    uint64_t d0, d1, q0, q1, r1, r0, m;
 -
--    d0 = (uint32_t)d;
+-        if (op->opc == INDEX_op_plugin_cb_start) {
--    d1 = d >> 32;
+-            switch (op->args[0]) {
--
+-            case PLUGIN_GEN_FROM_TB:
--    r1 = n1 % d1;
+-                name = "tb";
--    q1 = n1 / d1;
+-                break;
--    m = q1 * d0;
+-            case PLUGIN_GEN_FROM_INSN:
--    r1 = (r1 << 32) | (n0 >> 32);
+-                name = "insn";
--    if (r1 < m) {
+-                break;
--        q1 -= 1;
+-            case PLUGIN_GEN_FROM_MEM:
--        r1 += d;
+-                name = "mem";
--        if (r1 >= d) {
+-                break;
--            if (r1 < m) {
+-            case PLUGIN_GEN_AFTER_INSN:
--                q1 -= 1;
+-                name = "after insn";
--                r1 += d;
+-                break;
 -            default:
 -                break;
 -            }
 -            switch (op->args[1]) {
 -            case PLUGIN_GEN_CB_UDATA:
 -                type = "udata";
 -                break;
 -            case PLUGIN_GEN_CB_INLINE:
 -                type = "inline";
 -                break;
 -            case PLUGIN_GEN_CB_MEM:
 -                type = "mem";
 -                break;
 -            case PLUGIN_GEN_ENABLE_MEM_HELPER:
 -                type = "enable mem helper";
 -                break;
 -            case PLUGIN_GEN_DISABLE_MEM_HELPER:
 -                type = "disable mem helper";
 -                break;
 -            default:
 -                break;
 -            }
 -        }
+-        printf("op[%2i]: %s %s %s\n", i, tcg_op_defs[op->opc].name, name, type);
+-        i++;
 -    }
--    r1 -= m;
--
--    r0 = r1 % d1;
--    q0 = r1 / d1;
--    m = q0 * d0;
--    r0 = (r0 << 32) | (uint32_t)n0;
--    if (r0 < m) {
--        q0 -= 1;
--        r0 += d;
--        if (r0 >= d) {
--            if (r0 < m) {
--                q0 -= 1;
--                r0 += d;
--            }
--        }
--    }
--    r0 -= m;
--
--    *r = r0;
--    return (q1 << 32) | q0;
 -#endif
 -}
 -
- /*----------------------------------------------------------------------------
+ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
- | Returns an approximation to the square root of the 32-bit significand given
+ {
- | by `a'.  Considered as an integer, `a' must be at least 2^31.  If bit 0 of
+     TCGOp *op, *next;
-diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
+     int insn_idx = -1;
-index XXXXXXX..XXXXXXX 100644
---- a/include/qemu/host-utils.h
+-    pr_ops();
-+++ b/include/qemu/host-utils.h
++    if (unlikely(qemu_loglevel_mask(LOG_TB_OP_PLUGIN)
-@@ -XXX,XX +XXX,XX @@
++                 && qemu_log_in_addr_range(plugin_tb->vaddr))) {
-  * THE SOFTWARE.
++        FILE *logfile = qemu_log_trylock();
-  */
++        if (logfile) {
++            fprintf(logfile, "OP before plugin injection:\n");
-+/* Portions of this work are licensed under the terms of the GNU GPL,
++            tcg_dump_ops(tcg_ctx, logfile, false);
-+ * version 2 or later. See the COPYING file in the top-level directory.
++            fprintf(logfile, "\n");
-+ */
++            qemu_log_unlock(logfile);
 +
  #ifndef HOST_UTILS_H
  #define HOST_UTILS_H
@@ -XXX,XX +XXX,XX @@ void urshift(uint64_t *plow, uint64_t *phigh, int32_t shift);
   */
  void ulshift(uint64_t *plow, uint64_t *phigh, int32_t shift, bool *overflow);
 +/* From the GNU Multi Precision Library - longlong.h __udiv_qrnnd
 + * (https://gmplib.org/repo/gmp/file/tip/longlong.h)
 + *
 + * Licensed under the GPLv2/LGPLv3
 + */
 +static inline uint64_t udiv_qrnnd(uint64_t *r, uint64_t n1,
 +                                  uint64_t n0, uint64_t d)
 +{
 +#if defined(__x86_64__)
 +    uint64_t q;
 +    asm("divq %4" : "=a"(q), "=d"(*r) : "0"(n0), "1"(n1), "rm"(d));
 +    return q;
 +#elif defined(__s390x__) && !defined(__clang__)
 +    /* Need to use a TImode type to get an even register pair for DLGR.  */
 +    unsigned __int128 n = (unsigned __int128)n1 << 64 | n0;
 +    asm("dlgr %0, %1" : "+r"(n) : "r"(d));
 +    *r = n >> 64;
 +    return n;
 +#elif defined(_ARCH_PPC64) && defined(_ARCH_PWR7)
 +    /* From Power ISA 2.06, programming note for divdeu.  */
 +    uint64_t q1, q2, Q, r1, r2, R;
 +    asm("divdeu %0,%2,%4; divdu %1,%3,%4"
 +        : "=&r"(q1), "=r"(q2)
 +        : "r"(n1), "r"(n0), "r"(d));
 +    r1 = -(q1 * d);         /* low part of (n1<<64) - (q1 * d) */
 +    r2 = n0 - (q2 * d);
 +    Q = q1 + q2;
 +    R = r1 + r2;
 +    if (R >= d || R < r2) { /* overflow implies R > d */
 +        Q += 1;
 +        R -= d;
 +    }
 +    *r = R;
 +    return Q;
 +#else
 +    uint64_t d0, d1, q0, q1, r1, r0, m;
 +
 +    d0 = (uint32_t)d;
 +    d1 = d >> 32;
 +
 +    r1 = n1 % d1;
 +    q1 = n1 / d1;
 +    m = q1 * d0;
 +    r1 = (r1 << 32) | (n0 >> 32);
 +    if (r1 < m) {
 +        q1 -= 1;
 +        r1 += d;
 +        if (r1 >= d) {
 +            if (r1 < m) {
 +                q1 -= 1;
 +                r1 += d;
 +            }
 +        }
 +    }
-+    r1 -= m;
      /*
       * While injecting code, we cannot afford to reuse any ebb temps
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
              break;
          }
      }
 -    pr_ops();
  }
  bool plugin_gen_tb_start(CPUState *cpu, const DisasContextBase *db,
 diff --git a/tcg/tcg.c b/tcg/tcg.c
 index XXXXXXX..XXXXXXX 100644
 --- a/tcg/tcg.c
 +++ b/tcg/tcg.c
@@ -XXX,XX +XXX,XX @@ static const char bswap_flag_name[][6] = {
      [TCG_BSWAP_IZ | TCG_BSWAP_OS] = "iz,os",
  };
 +#ifdef CONFIG_PLUGIN
 +static const char * const plugin_from_name[] = {
 +    "from-tb",
 +    "from-insn",
 +    "after-insn",
 +    "after-tb",
 +};
 +#endif
 +
-+    r0 = r1 % d1;
+ static inline bool tcg_regset_single(TCGRegSet d)
-+    q0 = r1 / d1;
+ {
-+    m = q0 * d0;
+     return (d & (d - 1)) == 0;
-+    r0 = (r0 << 32) | (uint32_t)n0;
+@@ -XXX,XX +XXX,XX @@ static inline TCGReg tcg_regset_first(TCGRegSet d)
-+    if (r0 < m) {
+ #define ne_fprintf(...) \
-+        q0 -= 1;
+     ({ int ret_ = fprintf(__VA_ARGS__); ret_ >= 0 ? ret_ : 0; })
-+        r0 += d;
-+        if (r0 >= d) {
+-static void tcg_dump_ops(TCGContext *s, FILE *f, bool have_prefs)
-+            if (r0 < m) {
++void tcg_dump_ops(TCGContext *s, FILE *f, bool have_prefs)
-+                q0 -= 1;
+ {
-+                r0 += d;
+     char buf[128];
-+            }
+     TCGOp *op;
-+        }
+@@ -XXX,XX +XXX,XX @@ static void tcg_dump_ops(TCGContext *s, FILE *f, bool have_prefs)
-+    }
+                     i = k = 1;
-+    r0 -= m;
+                 }
                  break;
 +#ifdef CONFIG_PLUGIN
 +            case INDEX_op_plugin_cb:
 +                {
 +                    TCGArg from = op->args[k++];
 +                    const char *name = NULL;
 +
-+    *r = r0;
++                    if (from < ARRAY_SIZE(plugin_from_name)) {
-+    return (q1 << 32) | q0;
++                        name = plugin_from_name[from];
 +                    }
 +                    if (name) {
 +                        col += ne_fprintf(f, "%s", name);
 +                    } else {
 +                        col += ne_fprintf(f, "$0x%" TCG_PRIlx, from);
 +                    }
 +                    i = 1;
 +                }
 +                break;
 +#endif
-+}
+             default:
-+
+                 i = 0;
- #endif
+                 break;
 diff --git a/util/log.c b/util/log.c
 index XXXXXXX..XXXXXXX 100644
 --- a/util/log.c
 +++ b/util/log.c
@@ -XXX,XX +XXX,XX @@ const QEMULogItem qemu_log_items[] = {
        "show micro ops after optimization" },
      { CPU_LOG_TB_OP_IND, "op_ind",
        "show micro ops before indirect lowering" },
 +#ifdef CONFIG_PLUGIN
 +    { LOG_TB_OP_PLUGIN, "op_plugin",
 +      "show micro ops before plugin injection" },
 +#endif
      { CPU_LOG_INT, "int",
        "show interrupts/exceptions in short format" },
      { CPU_LOG_EXEC, "exec",
 --
-.25.1
+.34.1

-[PULL 05/56] host-utils: add unit tests for divu128/divs128
+Deleted patch
-From: Luis Pires <luis.pires@eldorado.org.br>
-Signed-off-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-Id: <20211025191154.350831-5-luis.pires@eldorado.org.br>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tests/unit/test-div128.c | 197 +++++++++++++++++++++++++++++++++++++++
- tests/unit/meson.build   |   1 +
-files changed, 198 insertions(+)
- create mode 100644 tests/unit/test-div128.c
-diff --git a/tests/unit/test-div128.c b/tests/unit/test-div128.c
-new file mode 100644
-index XXXXXXX..XXXXXXX
---- /dev/null
-+++ b/tests/unit/test-div128.c
-@@ -XXX,XX +XXX,XX @@
-+/*
-+ * Test 128-bit division functions
-+ *
-+ * Copyright (c) 2021 Instituto de Pesquisas Eldorado (eldorado.org.br)
-+ *
-+ * This library is free software; you can redistribute it and/or
-+ * modify it under the terms of the GNU Lesser General Public
-+ * License as published by the Free Software Foundation; either
-+ * version 2.1 of the License, or (at your option) any later version.
-+ *
-+ * This library is distributed in the hope that it will be useful,
-+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
-+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-+ * Lesser General Public License for more details.
-+ *
-+ * You should have received a copy of the GNU Lesser General Public
-+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
-+ */
-+
-+#include "qemu/osdep.h"
-+#include "qemu/host-utils.h"
-+
-+typedef struct {
-+    uint64_t high;
-+    uint64_t low;
-+    uint64_t rhigh;
-+    uint64_t rlow;
-+    uint64_t divisor;
-+    uint64_t remainder;
-+} test_data_unsigned;
-+
-+typedef struct {
-+    int64_t high;
-+    uint64_t low;
-+    int64_t rhigh;
-+    uint64_t rlow;
-+    int64_t divisor;
-+    int64_t remainder;
-+} test_data_signed;
-+
-+static const test_data_unsigned test_table_unsigned[] = {
-+    /* Dividend fits in 64 bits */
-+    { 0x0000000000000000ULL, 0x0000000000000000ULL,
-+      0x0000000000000000ULL, 0x0000000000000000ULL,
-+      0x0000000000000001ULL, 0x0000000000000000ULL},
-+    { 0x0000000000000000ULL, 0x0000000000000001ULL,
-+      0x0000000000000000ULL, 0x0000000000000001ULL,
-+      0x0000000000000001ULL, 0x0000000000000000ULL},
-+    { 0x0000000000000000ULL, 0x0000000000000003ULL,
-+      0x0000000000000000ULL, 0x0000000000000001ULL,
-+      0x0000000000000002ULL, 0x0000000000000001ULL},
-+    { 0x0000000000000000ULL, 0x8000000000000000ULL,
-+      0x0000000000000000ULL, 0x8000000000000000ULL,
-+      0x0000000000000001ULL, 0x0000000000000000ULL},
-+    { 0x0000000000000000ULL, 0xa000000000000000ULL,
-+      0x0000000000000000ULL, 0x0000000000000002ULL,
-+      0x4000000000000000ULL, 0x2000000000000000ULL},
-+    { 0x0000000000000000ULL, 0x8000000000000000ULL,
-+      0x0000000000000000ULL, 0x0000000000000001ULL,
-+      0x8000000000000000ULL, 0x0000000000000000ULL},
-+
-+    /* Dividend > 64 bits, with MSB 0 */
-+    { 0x123456789abcdefeULL, 0xefedcba987654321ULL,
-+      0x123456789abcdefeULL, 0xefedcba987654321ULL,
-+      0x0000000000000001ULL, 0x0000000000000000ULL},
-+    { 0x123456789abcdefeULL, 0xefedcba987654321ULL,
-+      0x0000000000000001ULL, 0x000000000000000dULL,
-+      0x123456789abcdefeULL, 0x03456789abcdf03bULL},
-+    { 0x123456789abcdefeULL, 0xefedcba987654321ULL,
-+      0x0123456789abcdefULL, 0xeefedcba98765432ULL,
-+      0x0000000000000010ULL, 0x0000000000000001ULL},
-+
-+    /* Dividend > 64 bits, with MSB 1 */
-+    { 0xfeeddccbbaa99887ULL, 0x766554433221100fULL,
-+      0xfeeddccbbaa99887ULL, 0x766554433221100fULL,
-+      0x0000000000000001ULL, 0x0000000000000000ULL},
-+    { 0xfeeddccbbaa99887ULL, 0x766554433221100fULL,
-+      0x0000000000000001ULL, 0x0000000000000000ULL,
-+      0xfeeddccbbaa99887ULL, 0x766554433221100fULL},
-+    { 0xfeeddccbbaa99887ULL, 0x766554433221100fULL,
-+      0x0feeddccbbaa9988ULL, 0x7766554433221100ULL,
-+      0x0000000000000010ULL, 0x000000000000000fULL},
-+    { 0xfeeddccbbaa99887ULL, 0x766554433221100fULL,
-+      0x000000000000000eULL, 0x00f0f0f0f0f0f35aULL,
-+      0x123456789abcdefeULL, 0x0f8922bc55ef90c3ULL},
-+
-+    /**
-+     * Divisor == 64 bits, with MSB 1
-+     * and high 64 bits of dividend >= divisor
-+     * (for testing normalization)
-+     */
-+    { 0xfeeddccbbaa99887ULL, 0x766554433221100fULL,
-+      0x0000000000000001ULL, 0x0000000000000000ULL,
-+      0xfeeddccbbaa99887ULL, 0x766554433221100fULL},
-+    { 0xfeeddccbbaa99887ULL, 0x766554433221100fULL,
-+      0x0000000000000001ULL, 0xfddbb9977553310aULL,
-+      0x8000000000000001ULL, 0x78899aabbccddf05ULL},
-+
-+    /* Dividend > 64 bits, divisor almost as big */
-+    { 0x0000000000000001ULL, 0x23456789abcdef01ULL,
-+      0x0000000000000000ULL, 0x000000000000000fULL,
-+      0x123456789abcdefeULL, 0x123456789abcde1fULL},
-+};
-+
-+static const test_data_signed test_table_signed[] = {
-+    /* Positive dividend, positive/negative divisors */
-+    { 0x0000000000000000LL, 0x0000000000bc614eULL,
-+      0x0000000000000000LL, 0x0000000000bc614eULL,
-+      0x0000000000000001LL, 0x0000000000000000LL},
-+    { 0x0000000000000000LL, 0x0000000000bc614eULL,
-+      0xffffffffffffffffLL, 0xffffffffff439eb2ULL,
-+      0xffffffffffffffffLL, 0x0000000000000000LL},
-+    { 0x0000000000000000LL, 0x0000000000bc614eULL,
-+      0x0000000000000000LL, 0x00000000005e30a7ULL,
-+      0x0000000000000002LL, 0x0000000000000000LL},
-+    { 0x0000000000000000LL, 0x0000000000bc614eULL,
-+      0xffffffffffffffffLL, 0xffffffffffa1cf59ULL,
-+      0xfffffffffffffffeLL, 0x0000000000000000LL},
-+    { 0x0000000000000000LL, 0x0000000000bc614eULL,
-+      0x0000000000000000LL, 0x0000000000178c29ULL,
-+      0x0000000000000008LL, 0x0000000000000006LL},
-+    { 0x0000000000000000LL, 0x0000000000bc614eULL,
-+      0xffffffffffffffffLL, 0xffffffffffe873d7ULL,
-+      0xfffffffffffffff8LL, 0x0000000000000006LL},
-+    { 0x0000000000000000LL, 0x0000000000bc614eULL,
-+      0x0000000000000000LL, 0x000000000000550dULL,
-+      0x0000000000000237LL, 0x0000000000000183LL},
-+    { 0x0000000000000000LL, 0x0000000000bc614eULL,
-+      0xffffffffffffffffLL, 0xffffffffffffaaf3ULL,
-+      0xfffffffffffffdc9LL, 0x0000000000000183LL},
-+
-+    /* Negative dividend, positive/negative divisors */
-+    { 0xffffffffffffffffLL, 0xffffffffff439eb2ULL,
-+      0xffffffffffffffffLL, 0xffffffffff439eb2ULL,
-+      0x0000000000000001LL, 0x0000000000000000LL},
-+    { 0xffffffffffffffffLL, 0xffffffffff439eb2ULL,
-+      0x0000000000000000LL, 0x0000000000bc614eULL,
-+      0xffffffffffffffffLL, 0x0000000000000000LL},
-+    { 0xffffffffffffffffLL, 0xffffffffff439eb2ULL,
-+      0xffffffffffffffffLL, 0xffffffffffa1cf59ULL,
-+      0x0000000000000002LL, 0x0000000000000000LL},
-+    { 0xffffffffffffffffLL, 0xffffffffff439eb2ULL,
-+      0x0000000000000000LL, 0x00000000005e30a7ULL,
-+      0xfffffffffffffffeLL, 0x0000000000000000LL},
-+    { 0xffffffffffffffffLL, 0xffffffffff439eb2ULL,
-+      0xffffffffffffffffLL, 0xffffffffffe873d7ULL,
-+      0x0000000000000008LL, 0xfffffffffffffffaLL},
-+    { 0xffffffffffffffffLL, 0xffffffffff439eb2ULL,
-+      0x0000000000000000LL, 0x0000000000178c29ULL,
-+      0xfffffffffffffff8LL, 0xfffffffffffffffaLL},
-+    { 0xffffffffffffffffLL, 0xffffffffff439eb2ULL,
-+      0xffffffffffffffffLL, 0xffffffffffffaaf3ULL,
-+      0x0000000000000237LL, 0xfffffffffffffe7dLL},
-+    { 0xffffffffffffffffLL, 0xffffffffff439eb2ULL,
-+      0x0000000000000000LL, 0x000000000000550dULL,
-+      0xfffffffffffffdc9LL, 0xfffffffffffffe7dLL},
-+};
-+
-+static void test_divu128(void)
-+{
-+    int i;
-+    uint64_t rem;
-+    test_data_unsigned tmp;
-+
-+    for (i = 0; i < ARRAY_SIZE(test_table_unsigned); ++i) {
-+        tmp = test_table_unsigned[i];
-+
-+        rem = divu128(&tmp.low, &tmp.high, tmp.divisor);
-+        g_assert_cmpuint(tmp.low, ==, tmp.rlow);
-+        g_assert_cmpuint(tmp.high, ==, tmp.rhigh);
-+        g_assert_cmpuint(rem, ==, tmp.remainder);
-+    }
-+}
-+
-+static void test_divs128(void)
-+{
-+    int i;
-+    int64_t rem;
-+    test_data_signed tmp;
-+
-+    for (i = 0; i < ARRAY_SIZE(test_table_signed); ++i) {
-+        tmp = test_table_signed[i];
-+
-+        rem = divs128(&tmp.low, &tmp.high, tmp.divisor);
-+        g_assert_cmpuint(tmp.low, ==, tmp.rlow);
-+        g_assert_cmpuint(tmp.high, ==, tmp.rhigh);
-+        g_assert_cmpuint(rem, ==, tmp.remainder);
-+    }
-+}
-+
-+int main(int argc, char **argv)
-+{
-+    g_test_init(&argc, &argv, NULL);
-+    g_test_add_func("/host-utils/test_divu128", test_divu128);
-+    g_test_add_func("/host-utils/test_divs128", test_divs128);
-+    return g_test_run();
-+}
-diff --git a/tests/unit/meson.build b/tests/unit/meson.build
-index XXXXXXX..XXXXXXX 100644
---- a/tests/unit/meson.build
-+++ b/tests/unit/meson.build
-@@ -XXX,XX +XXX,XX @@ tests = {
-   # all code tested by test-x86-cpuid is inside topology.h
-   'test-x86-cpuid': [],
-   'test-cutils': [],
-+  'test-div128': [],
-   'test-shift128': [],
-   'test-mul64': [],
-   # all code tested by test-int128 is inside int128.h
---
-.25.1

-[PULL 08/56] tcg/optimize: Remove do_default label
+Deleted patch
-Break the final cleanup clause out of the main switch
-statement.  When fully folding an opcode to mov/movi,
-use "continue" to process the next opcode, else break
-to fall into the final cleanup.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 190 ++++++++++++++++++++++++-------------------------
-file changed, 94 insertions(+), 96 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         switch (opc) {
-         CASE_OP_32_64_VEC(mov):
-             tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
--            break;
-+            continue;
-         case INDEX_op_dup_vec:
-             if (arg_is_const(op->args[1])) {
-                 tmp = arg_info(op->args[1])->val;
-                 tmp = dup_const(TCGOP_VECE(op), tmp);
-                 tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
--                break;
-+                continue;
-             }
--            goto do_default;
-+            break;
-         case INDEX_op_dup2_vec:
-             assert(TCG_TARGET_REG_BITS == 32);
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-                 tcg_opt_gen_movi(s, &ctx, op, op->args[0],
-                                  deposit64(arg_info(op->args[1])->val, 32, 32,
-                                            arg_info(op->args[2])->val));
--                break;
-+                continue;
-             } else if (args_are_copies(op->args[1], op->args[2])) {
-                 op->opc = INDEX_op_dup_vec;
-                 TCGOP_VECE(op) = MO_32;
-                 nb_iargs = 1;
-             }
--            goto do_default;
-+            break;
-         CASE_OP_32_64(not):
-         CASE_OP_32_64(neg):
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             if (arg_is_const(op->args[1])) {
-                 tmp = do_constant_folding(opc, arg_info(op->args[1])->val, 0);
-                 tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
--                break;
-+                continue;
-             }
--            goto do_default;
-+            break;
-         CASE_OP_32_64(bswap16):
-         CASE_OP_32_64(bswap32):
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-                 tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
-                                           op->args[2]);
-                 tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
--                break;
-+                continue;
-             }
--            goto do_default;
-+            break;
-         CASE_OP_32_64(add):
-         CASE_OP_32_64(sub):
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-                 tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
-                                           arg_info(op->args[2])->val);
-                 tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
--                break;
-+                continue;
-             }
--            goto do_default;
-+            break;
-         CASE_OP_32_64(clz):
-         CASE_OP_32_64(ctz):
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-                 } else {
-                     tcg_opt_gen_mov(s, op, op->args[0], op->args[2]);
-                 }
--                break;
-+                continue;
-             }
--            goto do_default;
-+            break;
-         CASE_OP_32_64(deposit):
-             if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-                                 op->args[3], op->args[4],
-                                 arg_info(op->args[2])->val);
-                 tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
--                break;
-+                continue;
-             }
--            goto do_default;
-+            break;
-         CASE_OP_32_64(extract):
-             if (arg_is_const(op->args[1])) {
-                 tmp = extract64(arg_info(op->args[1])->val,
-                                 op->args[2], op->args[3]);
-                 tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
--                break;
-+                continue;
-             }
--            goto do_default;
-+            break;
-         CASE_OP_32_64(sextract):
-             if (arg_is_const(op->args[1])) {
-                 tmp = sextract64(arg_info(op->args[1])->val,
-                                  op->args[2], op->args[3]);
-                 tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
--                break;
-+                continue;
-             }
--            goto do_default;
-+            break;
-         CASE_OP_32_64(extract2):
-             if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-                                     ((uint32_t)v2 << (32 - shr)));
-                 }
-                 tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
--                break;
-+                continue;
-             }
--            goto do_default;
-+            break;
-         CASE_OP_32_64(setcond):
-             tmp = do_constant_folding_cond(opc, op->args[1],
-                                            op->args[2], op->args[3]);
-             if (tmp != 2) {
-                 tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
--                break;
-+                continue;
-             }
--            goto do_default;
-+            break;
-         CASE_OP_32_64(brcond):
-             tmp = do_constant_folding_cond(opc, op->args[0],
-                                            op->args[1], op->args[2]);
--            if (tmp != 2) {
--                if (tmp) {
--                    memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
--                    op->opc = INDEX_op_br;
--                    op->args[0] = op->args[3];
--                } else {
--                    tcg_op_remove(s, op);
--                }
-+            switch (tmp) {
-+            case 0:
-+                tcg_op_remove(s, op);
-+                continue;
-+            case 1:
-+                memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
-+                op->opc = opc = INDEX_op_br;
-+                op->args[0] = op->args[3];
-                 break;
-             }
--            goto do_default;
-+            break;
-         CASE_OP_32_64(movcond):
-             tmp = do_constant_folding_cond(opc, op->args[1],
-                                            op->args[2], op->args[5]);
-             if (tmp != 2) {
-                 tcg_opt_gen_mov(s, op, op->args[0], op->args[4-tmp]);
--                break;
-+                continue;
-             }
-             if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
-                 uint64_t tv = arg_info(op->args[3])->val;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-                 if (fv == 1 && tv == 0) {
-                     cond = tcg_invert_cond(cond);
-                 } else if (!(tv == 1 && fv == 0)) {
--                    goto do_default;
-+                    break;
-                 }
-                 op->args[3] = cond;
-                 op->opc = opc = (opc == INDEX_op_movcond_i32
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-                                  : INDEX_op_setcond_i64);
-                 nb_iargs = 2;
-             }
--            goto do_default;
-+            break;
-         case INDEX_op_add2_i32:
-         case INDEX_op_sub2_i32:
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-                 rh = op->args[1];
-                 tcg_opt_gen_movi(s, &ctx, op, rl, (int32_t)a);
-                 tcg_opt_gen_movi(s, &ctx, op2, rh, (int32_t)(a >> 32));
--                break;
-+                continue;
-             }
--            goto do_default;
-+            break;
-         case INDEX_op_mulu2_i32:
-             if (arg_is_const(op->args[2]) && arg_is_const(op->args[3])) {
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-                 rh = op->args[1];
-                 tcg_opt_gen_movi(s, &ctx, op, rl, (int32_t)r);
-                 tcg_opt_gen_movi(s, &ctx, op2, rh, (int32_t)(r >> 32));
--                break;
-+                continue;
-             }
--            goto do_default;
-+            break;
-         case INDEX_op_brcond2_i32:
-             tmp = do_constant_folding_cond2(&op->args[0], &op->args[2],
-                                             op->args[4]);
--            if (tmp != 2) {
--                if (tmp) {
--            do_brcond_true:
--                    memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
--                    op->opc = INDEX_op_br;
--                    op->args[0] = op->args[5];
--                } else {
-+            if (tmp == 0) {
-             do_brcond_false:
--                    tcg_op_remove(s, op);
--                }
--            } else if ((op->args[4] == TCG_COND_LT
--                        || op->args[4] == TCG_COND_GE)
--                       && arg_is_const(op->args[2])
--                       && arg_info(op->args[2])->val == 0
--                       && arg_is_const(op->args[3])
--                       && arg_info(op->args[3])->val == 0) {
-+                tcg_op_remove(s, op);
-+                continue;
-+            }
-+            if (tmp == 1) {
-+            do_brcond_true:
-+                op->opc = opc = INDEX_op_br;
-+                op->args[0] = op->args[5];
-+                break;
-+            }
-+            if ((op->args[4] == TCG_COND_LT || op->args[4] == TCG_COND_GE)
-+                 && arg_is_const(op->args[2])
-+                 && arg_info(op->args[2])->val == 0
-+                 && arg_is_const(op->args[3])
-+                 && arg_info(op->args[3])->val == 0) {
-                 /* Simplify LT/GE comparisons vs zero to a single compare
-                    vs the high word of the input.  */
-             do_brcond_high:
--                memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
--                op->opc = INDEX_op_brcond_i32;
-+                op->opc = opc = INDEX_op_brcond_i32;
-                 op->args[0] = op->args[1];
-                 op->args[1] = op->args[3];
-                 op->args[2] = op->args[4];
-                 op->args[3] = op->args[5];
--            } else if (op->args[4] == TCG_COND_EQ) {
-+                break;
-+            }
-+            if (op->args[4] == TCG_COND_EQ) {
-                 /* Simplify EQ comparisons where one of the pairs
-                    can be simplified.  */
-                 tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-                 if (tmp == 0) {
-                     goto do_brcond_false;
-                 } else if (tmp != 1) {
--                    goto do_default;
-+                    break;
-                 }
-             do_brcond_low:
-                 memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-                 op->args[1] = op->args[2];
-                 op->args[2] = op->args[4];
-                 op->args[3] = op->args[5];
--            } else if (op->args[4] == TCG_COND_NE) {
-+                break;
-+            }
-+            if (op->args[4] == TCG_COND_NE) {
-                 /* Simplify NE comparisons where one of the pairs
-                    can be simplified.  */
-                 tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-                 } else if (tmp == 1) {
-                     goto do_brcond_true;
-                 }
--                goto do_default;
--            } else {
--                goto do_default;
-             }
-             break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             if (tmp != 2) {
-             do_setcond_const:
-                 tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
--            } else if ((op->args[5] == TCG_COND_LT
--                        || op->args[5] == TCG_COND_GE)
--                       && arg_is_const(op->args[3])
--                       && arg_info(op->args[3])->val == 0
--                       && arg_is_const(op->args[4])
--                       && arg_info(op->args[4])->val == 0) {
-+                continue;
-+            }
-+            if ((op->args[5] == TCG_COND_LT || op->args[5] == TCG_COND_GE)
-+                 && arg_is_const(op->args[3])
-+                 && arg_info(op->args[3])->val == 0
-+                 && arg_is_const(op->args[4])
-+                 && arg_info(op->args[4])->val == 0) {
-                 /* Simplify LT/GE comparisons vs zero to a single compare
-                    vs the high word of the input.  */
-             do_setcond_high:
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-                 op->args[1] = op->args[2];
-                 op->args[2] = op->args[4];
-                 op->args[3] = op->args[5];
--            } else if (op->args[5] == TCG_COND_EQ) {
-+                break;
-+            }
-+            if (op->args[5] == TCG_COND_EQ) {
-                 /* Simplify EQ comparisons where one of the pairs
-                    can be simplified.  */
-                 tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-                 if (tmp == 0) {
-                     goto do_setcond_high;
-                 } else if (tmp != 1) {
--                    goto do_default;
-+                    break;
-                 }
-             do_setcond_low:
-                 reset_temp(op->args[0]);
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-                 op->opc = INDEX_op_setcond_i32;
-                 op->args[2] = op->args[3];
-                 op->args[3] = op->args[5];
--            } else if (op->args[5] == TCG_COND_NE) {
-+                break;
-+            }
-+            if (op->args[5] == TCG_COND_NE) {
-                 /* Simplify NE comparisons where one of the pairs
-                    can be simplified.  */
-                 tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-                 } else if (tmp == 1) {
-                     goto do_setcond_const;
-                 }
--                goto do_default;
--            } else {
--                goto do_default;
-             }
-             break;
--        case INDEX_op_call:
--            if (!(tcg_call_flags(op)
-+        default:
-+            break;
-+        }
-+
-+        /* Some of the folding above can change opc. */
-+        opc = op->opc;
-+        def = &tcg_op_defs[opc];
-+        if (def->flags & TCG_OPF_BB_END) {
-+            memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
-+        } else {
-+            if (opc == INDEX_op_call &&
-+                !(tcg_call_flags(op)
-                   & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS))) {
-                 for (i = 0; i < nb_globals; i++) {
-                     if (test_bit(i, ctx.temps_used.l)) {
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-                     }
-                 }
-             }
--            goto do_reset_output;
--        default:
--        do_default:
--            /* Default case: we know nothing about operation (or were unable
--               to compute the operation result) so no propagation is done.
--               We trash everything if the operation is the end of a basic
--               block, otherwise we only trash the output args.  "z_mask" is
--               the non-zero bits mask for the first output arg.  */
--            if (def->flags & TCG_OPF_BB_END) {
--                memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
--            } else {
--        do_reset_output:
--                for (i = 0; i < nb_oargs; i++) {
--                    reset_temp(op->args[i]);
--                    /* Save the corresponding known-zero bits mask for the
--                       first output argument (only one supported so far). */
--                    if (i == 0) {
--                        arg_info(op->args[i])->z_mask = z_mask;
--                    }
-+            for (i = 0; i < nb_oargs; i++) {
-+                reset_temp(op->args[i]);
-+                /* Save the corresponding known-zero bits mask for the
-+                   first output argument (only one supported so far). */
-+                if (i == 0) {
-+                    arg_info(op->args[i])->z_mask = z_mask;
-                 }
-             }
--            break;
-         }
-         /* Eliminate duplicate and redundant fence instructions.  */
---
-.25.1

-[PULL 10/56] tcg/optimize: Move prev_mb into OptContext
+Deleted patch
-This will expose the variable to subroutines that
-will be broken out of tcg_optimize.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 11 ++++++-----
-file changed, 6 insertions(+), 5 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ typedef struct TempOptInfo {
- typedef struct OptContext {
-     TCGContext *tcg;
-+    TCGOp *prev_mb;
-     TCGTempSet temps_used;
- } OptContext;
-@@ -XXX,XX +XXX,XX @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
- void tcg_optimize(TCGContext *s)
- {
-     int nb_temps, nb_globals, i;
--    TCGOp *op, *op_next, *prev_mb = NULL;
-+    TCGOp *op, *op_next;
-     OptContext ctx = { .tcg = s };
-     /* Array VALS has an element for each temp.
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         }
-         /* Eliminate duplicate and redundant fence instructions.  */
--        if (prev_mb) {
-+        if (ctx.prev_mb) {
-             switch (opc) {
-             case INDEX_op_mb:
-                 /* Merge two barriers of the same type into one,
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-                  * barrier.  This is stricter than specified but for
-                  * the purposes of TCG is better than not optimizing.
-                  */
--                prev_mb->args[0] |= op->args[0];
-+                ctx.prev_mb->args[0] |= op->args[0];
-                 tcg_op_remove(s, op);
-                 break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             case INDEX_op_qemu_st_i64:
-             case INDEX_op_call:
-                 /* Opcodes that touch guest memory stop the optimization.  */
--                prev_mb = NULL;
-+                ctx.prev_mb = NULL;
-                 break;
-             }
-         } else if (opc == INDEX_op_mb) {
--            prev_mb = op;
-+            ctx.prev_mb = op;
-         }
-     }
- }
---
-.25.1

-[PULL 12/56] tcg/optimize: Split out copy_propagate
+Deleted patch
-Continue splitting tcg_optimize.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 22 ++++++++++++++--------
-file changed, 14 insertions(+), 8 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static void init_arguments(OptContext *ctx, TCGOp *op, int nb_args)
-     }
- }
-+static void copy_propagate(OptContext *ctx, TCGOp *op,
-+                           int nb_oargs, int nb_iargs)
-+{
-+    TCGContext *s = ctx->tcg;
-+
-+    for (int i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
-+        TCGTemp *ts = arg_temp(op->args[i]);
-+        if (ts && ts_is_copy(ts)) {
-+            op->args[i] = temp_arg(find_better_copy(s, ts));
-+        }
-+    }
-+}
-+
- /* Propagate constants and copies, fold constant expressions. */
- void tcg_optimize(TCGContext *s)
- {
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             nb_iargs = def->nb_iargs;
-         }
-         init_arguments(&ctx, op, nb_oargs + nb_iargs);
--
--        /* Do copy propagation */
--        for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
--            TCGTemp *ts = arg_temp(op->args[i]);
--            if (ts && ts_is_copy(ts)) {
--                op->args[i] = temp_arg(find_better_copy(s, ts));
--            }
--        }
-+        copy_propagate(&ctx, op, nb_oargs, nb_iargs);
-         /* For commutative operations make constant second argument */
-         switch (opc) {
---
-.25.1

-[PULL 13/56] tcg/optimize: Split out fold_call
+Deleted patch
-Calls are special in that they have a variable number
-of arguments, and need to be able to clobber globals.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 63 ++++++++++++++++++++++++++++++++------------------
-file changed, 41 insertions(+), 22 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static void copy_propagate(OptContext *ctx, TCGOp *op,
-     }
- }
-+static bool fold_call(OptContext *ctx, TCGOp *op)
-+{
-+    TCGContext *s = ctx->tcg;
-+    int nb_oargs = TCGOP_CALLO(op);
-+    int nb_iargs = TCGOP_CALLI(op);
-+    int flags, i;
-+
-+    init_arguments(ctx, op, nb_oargs + nb_iargs);
-+    copy_propagate(ctx, op, nb_oargs, nb_iargs);
-+
-+    /* If the function reads or writes globals, reset temp data. */
-+    flags = tcg_call_flags(op);
-+    if (!(flags & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS))) {
-+        int nb_globals = s->nb_globals;
-+
-+        for (i = 0; i < nb_globals; i++) {
-+            if (test_bit(i, ctx->temps_used.l)) {
-+                reset_ts(&ctx->tcg->temps[i]);
-+            }
-+        }
-+    }
-+
-+    /* Reset temp data for outputs. */
-+    for (i = 0; i < nb_oargs; i++) {
-+        reset_temp(op->args[i]);
-+    }
-+
-+    /* Stop optimizing MB across calls. */
-+    ctx->prev_mb = NULL;
-+    return true;
-+}
-+
- /* Propagate constants and copies, fold constant expressions. */
- void tcg_optimize(TCGContext *s)
- {
--    int nb_temps, nb_globals, i;
-+    int nb_temps, i;
-     TCGOp *op, *op_next;
-     OptContext ctx = { .tcg = s };
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-        available through the doubly linked circular list. */
-     nb_temps = s->nb_temps;
--    nb_globals = s->nb_globals;
--
-     for (i = 0; i < nb_temps; ++i) {
-         s->temps[i].state_ptr = NULL;
-     }
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         uint64_t z_mask, partmask, affected, tmp;
-         int nb_oargs, nb_iargs;
-         TCGOpcode opc = op->opc;
--        const TCGOpDef *def = &tcg_op_defs[opc];
-+        const TCGOpDef *def;
--        /* Count the arguments, and initialize the temps that are
--           going to be used */
-+        /* Calls are special. */
-         if (opc == INDEX_op_call) {
--            nb_oargs = TCGOP_CALLO(op);
--            nb_iargs = TCGOP_CALLI(op);
--        } else {
--            nb_oargs = def->nb_oargs;
--            nb_iargs = def->nb_iargs;
-+            fold_call(&ctx, op);
-+            continue;
-         }
-+
-+        def = &tcg_op_defs[opc];
-+        nb_oargs = def->nb_oargs;
-+        nb_iargs = def->nb_iargs;
-         init_arguments(&ctx, op, nb_oargs + nb_iargs);
-         copy_propagate(&ctx, op, nb_oargs, nb_iargs);
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         if (def->flags & TCG_OPF_BB_END) {
-             memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
-         } else {
--            if (opc == INDEX_op_call &&
--                !(tcg_call_flags(op)
--                  & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS))) {
--                for (i = 0; i < nb_globals; i++) {
--                    if (test_bit(i, ctx.temps_used.l)) {
--                        reset_ts(&s->temps[i]);
--                    }
--                }
--            }
--
-             for (i = 0; i < nb_oargs; i++) {
-                 reset_temp(op->args[i]);
-                 /* Save the corresponding known-zero bits mask for the
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             case INDEX_op_qemu_st_i32:
-             case INDEX_op_qemu_st8_i32:
-             case INDEX_op_qemu_st_i64:
--            case INDEX_op_call:
-                 /* Opcodes that touch guest memory stop the optimization.  */
-                 ctx.prev_mb = NULL;
-                 break;
---
-.25.1

-[PULL 14/56] tcg/optimize: Drop nb_oargs, nb_iargs locals
+Deleted patch
-Rather than try to keep these up-to-date across folding,
-re-read nb_oargs at the end, after re-reading the opcode.
-A couple of asserts need dropping, but that will take care
-of itself as we split the function further.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 14 ++++----------
-file changed, 4 insertions(+), 10 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-     QTAILQ_FOREACH_SAFE(op, &s->ops, link, op_next) {
-         uint64_t z_mask, partmask, affected, tmp;
--        int nb_oargs, nb_iargs;
-         TCGOpcode opc = op->opc;
-         const TCGOpDef *def;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         }
-         def = &tcg_op_defs[opc];
--        nb_oargs = def->nb_oargs;
--        nb_iargs = def->nb_iargs;
--        init_arguments(&ctx, op, nb_oargs + nb_iargs);
--        copy_propagate(&ctx, op, nb_oargs, nb_iargs);
-+        init_arguments(&ctx, op, def->nb_oargs + def->nb_iargs);
-+        copy_propagate(&ctx, op, def->nb_oargs, def->nb_iargs);
-         /* For commutative operations make constant second argument */
-         switch (opc) {
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         CASE_OP_32_64(qemu_ld):
-             {
--                MemOpIdx oi = op->args[nb_oargs + nb_iargs];
-+                MemOpIdx oi = op->args[def->nb_oargs + def->nb_iargs];
-                 MemOp mop = get_memop(oi);
-                 if (!(mop & MO_SIGN)) {
-                     z_mask = (2ULL << ((8 << (mop & MO_SIZE)) - 1)) - 1;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         }
-         if (partmask == 0) {
--            tcg_debug_assert(nb_oargs == 1);
-             tcg_opt_gen_movi(&ctx, op, op->args[0], 0);
-             continue;
-         }
-         if (affected == 0) {
--            tcg_debug_assert(nb_oargs == 1);
-             tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
-             continue;
-         }
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             } else if (args_are_copies(op->args[1], op->args[2])) {
-                 op->opc = INDEX_op_dup_vec;
-                 TCGOP_VECE(op) = MO_32;
--                nb_iargs = 1;
-             }
-             break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-                 op->opc = opc = (opc == INDEX_op_movcond_i32
-                                  ? INDEX_op_setcond_i32
-                                  : INDEX_op_setcond_i64);
--                nb_iargs = 2;
-             }
-             break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         if (def->flags & TCG_OPF_BB_END) {
-             memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
-         } else {
-+            int nb_oargs = def->nb_oargs;
-             for (i = 0; i < nb_oargs; i++) {
-                 reset_temp(op->args[i]);
-                 /* Save the corresponding known-zero bits mask for the
---
-.25.1

-[PULL 15/56] tcg/optimize: Change fail return for do_constant_folding_cond*
+Deleted patch
-Return -1 instead of 2 for failure, so that we can
-use comparisons against 0 for all cases.
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 145 +++++++++++++++++++++++++------------------------
-file changed, 74 insertions(+), 71 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool do_constant_folding_cond_eq(TCGCond c)
-     }
- }
--/* Return 2 if the condition can't be simplified, and the result
--   of the condition (0 or 1) if it can */
--static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
--                                       TCGArg y, TCGCond c)
-+/*
-+ * Return -1 if the condition can't be simplified,
-+ * and the result of the condition (0 or 1) if it can.
-+ */
-+static int do_constant_folding_cond(TCGOpcode op, TCGArg x,
-+                                    TCGArg y, TCGCond c)
- {
-     uint64_t xv = arg_info(x)->val;
-     uint64_t yv = arg_info(y)->val;
-@@ -XXX,XX +XXX,XX @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
-         case TCG_COND_GEU:
-             return 1;
-         default:
--            return 2;
-+            return -1;
-         }
-     }
--    return 2;
-+    return -1;
- }
--/* Return 2 if the condition can't be simplified, and the result
--   of the condition (0 or 1) if it can */
--static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
-+/*
-+ * Return -1 if the condition can't be simplified,
-+ * and the result of the condition (0 or 1) if it can.
-+ */
-+static int do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
- {
-     TCGArg al = p1[0], ah = p1[1];
-     TCGArg bl = p2[0], bh = p2[1];
-@@ -XXX,XX +XXX,XX @@ static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
-     if (args_are_copies(al, bl) && args_are_copies(ah, bh)) {
-         return do_constant_folding_cond_eq(c);
-     }
--    return 2;
-+    return -1;
- }
- static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             break;
-         CASE_OP_32_64(setcond):
--            tmp = do_constant_folding_cond(opc, op->args[1],
--                                           op->args[2], op->args[3]);
--            if (tmp != 2) {
--                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
-+            i = do_constant_folding_cond(opc, op->args[1],
-+                                         op->args[2], op->args[3]);
-+            if (i >= 0) {
-+                tcg_opt_gen_movi(&ctx, op, op->args[0], i);
-                 continue;
-             }
-             break;
-         CASE_OP_32_64(brcond):
--            tmp = do_constant_folding_cond(opc, op->args[0],
--                                           op->args[1], op->args[2]);
--            switch (tmp) {
--            case 0:
-+            i = do_constant_folding_cond(opc, op->args[0],
-+                                         op->args[1], op->args[2]);
-+            if (i == 0) {
-                 tcg_op_remove(s, op);
-                 continue;
--            case 1:
-+            } else if (i > 0) {
-                 memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
-                 op->opc = opc = INDEX_op_br;
-                 op->args[0] = op->args[3];
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             break;
-         CASE_OP_32_64(movcond):
--            tmp = do_constant_folding_cond(opc, op->args[1],
--                                           op->args[2], op->args[5]);
--            if (tmp != 2) {
--                tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[4-tmp]);
-+            i = do_constant_folding_cond(opc, op->args[1],
-+                                         op->args[2], op->args[5]);
-+            if (i >= 0) {
-+                tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[4 - i]);
-                 continue;
-             }
-             if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             break;
-         case INDEX_op_brcond2_i32:
--            tmp = do_constant_folding_cond2(&op->args[0], &op->args[2],
--                                            op->args[4]);
--            if (tmp == 0) {
-+            i = do_constant_folding_cond2(&op->args[0], &op->args[2],
-+                                          op->args[4]);
-+            if (i == 0) {
-             do_brcond_false:
-                 tcg_op_remove(s, op);
-                 continue;
-             }
--            if (tmp == 1) {
-+            if (i > 0) {
-             do_brcond_true:
-                 op->opc = opc = INDEX_op_br;
-                 op->args[0] = op->args[5];
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             if (op->args[4] == TCG_COND_EQ) {
-                 /* Simplify EQ comparisons where one of the pairs
-                    can be simplified.  */
--                tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
--                                               op->args[0], op->args[2],
--                                               TCG_COND_EQ);
--                if (tmp == 0) {
-+                i = do_constant_folding_cond(INDEX_op_brcond_i32,
-+                                             op->args[0], op->args[2],
-+                                             TCG_COND_EQ);
-+                if (i == 0) {
-                     goto do_brcond_false;
--                } else if (tmp == 1) {
-+                } else if (i > 0) {
-                     goto do_brcond_high;
-                 }
--                tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
--                                               op->args[1], op->args[3],
--                                               TCG_COND_EQ);
--                if (tmp == 0) {
-+                i = do_constant_folding_cond(INDEX_op_brcond_i32,
-+                                             op->args[1], op->args[3],
-+                                             TCG_COND_EQ);
-+                if (i == 0) {
-                     goto do_brcond_false;
--                } else if (tmp != 1) {
-+                } else if (i < 0) {
-                     break;
-                 }
-             do_brcond_low:
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             if (op->args[4] == TCG_COND_NE) {
-                 /* Simplify NE comparisons where one of the pairs
-                    can be simplified.  */
--                tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
--                                               op->args[0], op->args[2],
--                                               TCG_COND_NE);
--                if (tmp == 0) {
-+                i = do_constant_folding_cond(INDEX_op_brcond_i32,
-+                                             op->args[0], op->args[2],
-+                                             TCG_COND_NE);
-+                if (i == 0) {
-                     goto do_brcond_high;
--                } else if (tmp == 1) {
-+                } else if (i > 0) {
-                     goto do_brcond_true;
-                 }
--                tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
--                                               op->args[1], op->args[3],
--                                               TCG_COND_NE);
--                if (tmp == 0) {
-+                i = do_constant_folding_cond(INDEX_op_brcond_i32,
-+                                             op->args[1], op->args[3],
-+                                             TCG_COND_NE);
-+                if (i == 0) {
-                     goto do_brcond_low;
--                } else if (tmp == 1) {
-+                } else if (i > 0) {
-                     goto do_brcond_true;
-                 }
-             }
-             break;
-         case INDEX_op_setcond2_i32:
--            tmp = do_constant_folding_cond2(&op->args[1], &op->args[3],
--                                            op->args[5]);
--            if (tmp != 2) {
-+            i = do_constant_folding_cond2(&op->args[1], &op->args[3],
-+                                          op->args[5]);
-+            if (i >= 0) {
-             do_setcond_const:
--                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
-+                tcg_opt_gen_movi(&ctx, op, op->args[0], i);
-                 continue;
-             }
-             if ((op->args[5] == TCG_COND_LT || op->args[5] == TCG_COND_GE)
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             if (op->args[5] == TCG_COND_EQ) {
-                 /* Simplify EQ comparisons where one of the pairs
-                    can be simplified.  */
--                tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
--                                               op->args[1], op->args[3],
--                                               TCG_COND_EQ);
--                if (tmp == 0) {
-+                i = do_constant_folding_cond(INDEX_op_setcond_i32,
-+                                             op->args[1], op->args[3],
-+                                             TCG_COND_EQ);
-+                if (i == 0) {
-                     goto do_setcond_const;
--                } else if (tmp == 1) {
-+                } else if (i > 0) {
-                     goto do_setcond_high;
-                 }
--                tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
--                                               op->args[2], op->args[4],
--                                               TCG_COND_EQ);
--                if (tmp == 0) {
-+                i = do_constant_folding_cond(INDEX_op_setcond_i32,
-+                                             op->args[2], op->args[4],
-+                                             TCG_COND_EQ);
-+                if (i == 0) {
-                     goto do_setcond_high;
--                } else if (tmp != 1) {
-+                } else if (i < 0) {
-                     break;
-                 }
-             do_setcond_low:
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             if (op->args[5] == TCG_COND_NE) {
-                 /* Simplify NE comparisons where one of the pairs
-                    can be simplified.  */
--                tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
--                                               op->args[1], op->args[3],
--                                               TCG_COND_NE);
--                if (tmp == 0) {
-+                i = do_constant_folding_cond(INDEX_op_setcond_i32,
-+                                             op->args[1], op->args[3],
-+                                             TCG_COND_NE);
-+                if (i == 0) {
-                     goto do_setcond_high;
--                } else if (tmp == 1) {
-+                } else if (i > 0) {
-                     goto do_setcond_const;
-                 }
--                tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
--                                               op->args[2], op->args[4],
--                                               TCG_COND_NE);
--                if (tmp == 0) {
-+                i = do_constant_folding_cond(INDEX_op_setcond_i32,
-+                                             op->args[2], op->args[4],
-+                                             TCG_COND_NE);
-+                if (i == 0) {
-                     goto do_setcond_low;
--                } else if (tmp == 1) {
-+                } else if (i > 0) {
-                     goto do_setcond_const;
-                 }
-             }
---
-.25.1

-[PULL 20/56] tcg/optimize: Split out fold_const{1,2}
+Deleted patch
-Split out a whole bunch of placeholder functions, which are
-currently identical.  That won't last as more code gets moved.
-Use CASE_32_64_VEC for some logical operators that previously
-missed the addition of vectors.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 271 +++++++++++++++++++++++++++++++++++++++----------
-file changed, 219 insertions(+), 52 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static void finish_folding(OptContext *ctx, TCGOp *op)
-     }
- }
-+/*
-+ * The fold_* functions return true when processing is complete,
-+ * usually by folding the operation to a constant or to a copy,
-+ * and calling tcg_opt_gen_{mov,movi}.  They may do other things,
-+ * like collect information about the value produced, for use in
-+ * optimizing a subsequent operation.
-+ *
-+ * These first fold_* functions are all helpers, used by other
-+ * folders for more specific operations.
-+ */
-+
-+static bool fold_const1(OptContext *ctx, TCGOp *op)
-+{
-+    if (arg_is_const(op->args[1])) {
-+        uint64_t t;
-+
-+        t = arg_info(op->args[1])->val;
-+        t = do_constant_folding(op->opc, t, 0);
-+        return tcg_opt_gen_movi(ctx, op, op->args[0], t);
-+    }
-+    return false;
-+}
-+
-+static bool fold_const2(OptContext *ctx, TCGOp *op)
-+{
-+    if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
-+        uint64_t t1 = arg_info(op->args[1])->val;
-+        uint64_t t2 = arg_info(op->args[2])->val;
-+
-+        t1 = do_constant_folding(op->opc, t1, t2);
-+        return tcg_opt_gen_movi(ctx, op, op->args[0], t1);
-+    }
-+    return false;
-+}
-+
-+/*
-+ * These outermost fold_<op> functions are sorted alphabetically.
-+ */
-+
-+static bool fold_add(OptContext *ctx, TCGOp *op)
-+{
-+    return fold_const2(ctx, op);
-+}
-+
-+static bool fold_and(OptContext *ctx, TCGOp *op)
-+{
-+    return fold_const2(ctx, op);
-+}
-+
-+static bool fold_andc(OptContext *ctx, TCGOp *op)
-+{
-+    return fold_const2(ctx, op);
-+}
-+
- static bool fold_call(OptContext *ctx, TCGOp *op)
- {
-     TCGContext *s = ctx->tcg;
-@@ -XXX,XX +XXX,XX @@ static bool fold_call(OptContext *ctx, TCGOp *op)
-     return true;
- }
-+static bool fold_ctpop(OptContext *ctx, TCGOp *op)
-+{
-+    return fold_const1(ctx, op);
-+}
-+
-+static bool fold_divide(OptContext *ctx, TCGOp *op)
-+{
-+    return fold_const2(ctx, op);
-+}
-+
-+static bool fold_eqv(OptContext *ctx, TCGOp *op)
-+{
-+    return fold_const2(ctx, op);
-+}
-+
-+static bool fold_exts(OptContext *ctx, TCGOp *op)
-+{
-+    return fold_const1(ctx, op);
-+}
-+
-+static bool fold_extu(OptContext *ctx, TCGOp *op)
-+{
-+    return fold_const1(ctx, op);
-+}
-+
- static bool fold_mb(OptContext *ctx, TCGOp *op)
- {
-     /* Eliminate duplicate and redundant fence instructions.  */
-@@ -XXX,XX +XXX,XX @@ static bool fold_mb(OptContext *ctx, TCGOp *op)
-     return true;
- }
-+static bool fold_mul(OptContext *ctx, TCGOp *op)
-+{
-+    return fold_const2(ctx, op);
-+}
-+
-+static bool fold_mul_highpart(OptContext *ctx, TCGOp *op)
-+{
-+    return fold_const2(ctx, op);
-+}
-+
-+static bool fold_nand(OptContext *ctx, TCGOp *op)
-+{
-+    return fold_const2(ctx, op);
-+}
-+
-+static bool fold_neg(OptContext *ctx, TCGOp *op)
-+{
-+    return fold_const1(ctx, op);
-+}
-+
-+static bool fold_nor(OptContext *ctx, TCGOp *op)
-+{
-+    return fold_const2(ctx, op);
-+}
-+
-+static bool fold_not(OptContext *ctx, TCGOp *op)
-+{
-+    return fold_const1(ctx, op);
-+}
-+
-+static bool fold_or(OptContext *ctx, TCGOp *op)
-+{
-+    return fold_const2(ctx, op);
-+}
-+
-+static bool fold_orc(OptContext *ctx, TCGOp *op)
-+{
-+    return fold_const2(ctx, op);
-+}
-+
- static bool fold_qemu_ld(OptContext *ctx, TCGOp *op)
- {
-     /* Opcodes that touch guest memory stop the mb optimization.  */
-@@ -XXX,XX +XXX,XX @@ static bool fold_qemu_st(OptContext *ctx, TCGOp *op)
-     return false;
- }
-+static bool fold_remainder(OptContext *ctx, TCGOp *op)
-+{
-+    return fold_const2(ctx, op);
-+}
-+
-+static bool fold_shift(OptContext *ctx, TCGOp *op)
-+{
-+    return fold_const2(ctx, op);
-+}
-+
-+static bool fold_sub(OptContext *ctx, TCGOp *op)
-+{
-+    return fold_const2(ctx, op);
-+}
-+
-+static bool fold_xor(OptContext *ctx, TCGOp *op)
-+{
-+    return fold_const2(ctx, op);
-+}
-+
- /* Propagate constants and copies, fold constant expressions. */
- void tcg_optimize(TCGContext *s)
- {
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             }
-             break;
--        CASE_OP_32_64(not):
--        CASE_OP_32_64(neg):
--        CASE_OP_32_64(ext8s):
--        CASE_OP_32_64(ext8u):
--        CASE_OP_32_64(ext16s):
--        CASE_OP_32_64(ext16u):
--        CASE_OP_32_64(ctpop):
--        case INDEX_op_ext32s_i64:
--        case INDEX_op_ext32u_i64:
--        case INDEX_op_ext_i32_i64:
--        case INDEX_op_extu_i32_i64:
--        case INDEX_op_extrl_i64_i32:
--        case INDEX_op_extrh_i64_i32:
--            if (arg_is_const(op->args[1])) {
--                tmp = do_constant_folding(opc, arg_info(op->args[1])->val, 0);
--                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
--                continue;
--            }
--            break;
--
-         CASE_OP_32_64(bswap16):
-         CASE_OP_32_64(bswap32):
-         case INDEX_op_bswap64_i64:
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             }
-             break;
--        CASE_OP_32_64(add):
--        CASE_OP_32_64(sub):
--        CASE_OP_32_64(mul):
--        CASE_OP_32_64(or):
--        CASE_OP_32_64(and):
--        CASE_OP_32_64(xor):
--        CASE_OP_32_64(shl):
--        CASE_OP_32_64(shr):
--        CASE_OP_32_64(sar):
--        CASE_OP_32_64(rotl):
--        CASE_OP_32_64(rotr):
--        CASE_OP_32_64(andc):
--        CASE_OP_32_64(orc):
--        CASE_OP_32_64(eqv):
--        CASE_OP_32_64(nand):
--        CASE_OP_32_64(nor):
--        CASE_OP_32_64(muluh):
--        CASE_OP_32_64(mulsh):
--        CASE_OP_32_64(div):
--        CASE_OP_32_64(divu):
--        CASE_OP_32_64(rem):
--        CASE_OP_32_64(remu):
--            if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
--                tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
--                                          arg_info(op->args[2])->val);
--                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
--                continue;
--            }
--            break;
--
-         CASE_OP_32_64(clz):
-         CASE_OP_32_64(ctz):
-             if (arg_is_const(op->args[1])) {
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             }
-             break;
-+        default:
-+            break;
-+
-+        /* ---------------------------------------------------------- */
-+        /* Sorted alphabetically by opcode as much as possible. */
-+
-+        CASE_OP_32_64_VEC(add):
-+            done = fold_add(&ctx, op);
-+            break;
-+        CASE_OP_32_64_VEC(and):
-+            done = fold_and(&ctx, op);
-+            break;
-+        CASE_OP_32_64_VEC(andc):
-+            done = fold_andc(&ctx, op);
-+            break;
-+        CASE_OP_32_64(ctpop):
-+            done = fold_ctpop(&ctx, op);
-+            break;
-+        CASE_OP_32_64(div):
-+        CASE_OP_32_64(divu):
-+            done = fold_divide(&ctx, op);
-+            break;
-+        CASE_OP_32_64(eqv):
-+            done = fold_eqv(&ctx, op);
-+            break;
-+        CASE_OP_32_64(ext8s):
-+        CASE_OP_32_64(ext16s):
-+        case INDEX_op_ext32s_i64:
-+        case INDEX_op_ext_i32_i64:
-+            done = fold_exts(&ctx, op);
-+            break;
-+        CASE_OP_32_64(ext8u):
-+        CASE_OP_32_64(ext16u):
-+        case INDEX_op_ext32u_i64:
-+        case INDEX_op_extu_i32_i64:
-+        case INDEX_op_extrl_i64_i32:
-+        case INDEX_op_extrh_i64_i32:
-+            done = fold_extu(&ctx, op);
-+            break;
-         case INDEX_op_mb:
-             done = fold_mb(&ctx, op);
-             break;
-+        CASE_OP_32_64(mul):
-+            done = fold_mul(&ctx, op);
-+            break;
-+        CASE_OP_32_64(mulsh):
-+        CASE_OP_32_64(muluh):
-+            done = fold_mul_highpart(&ctx, op);
-+            break;
-+        CASE_OP_32_64(nand):
-+            done = fold_nand(&ctx, op);
-+            break;
-+        CASE_OP_32_64(neg):
-+            done = fold_neg(&ctx, op);
-+            break;
-+        CASE_OP_32_64(nor):
-+            done = fold_nor(&ctx, op);
-+            break;
-+        CASE_OP_32_64_VEC(not):
-+            done = fold_not(&ctx, op);
-+            break;
-+        CASE_OP_32_64_VEC(or):
-+            done = fold_or(&ctx, op);
-+            break;
-+        CASE_OP_32_64_VEC(orc):
-+            done = fold_orc(&ctx, op);
-+            break;
-         case INDEX_op_qemu_ld_i32:
-         case INDEX_op_qemu_ld_i64:
-             done = fold_qemu_ld(&ctx, op);
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         case INDEX_op_qemu_st_i64:
-             done = fold_qemu_st(&ctx, op);
-             break;
--
--        default:
-+        CASE_OP_32_64(rem):
-+        CASE_OP_32_64(remu):
-+            done = fold_remainder(&ctx, op);
-+            break;
-+        CASE_OP_32_64(rotl):
-+        CASE_OP_32_64(rotr):
-+        CASE_OP_32_64(sar):
-+        CASE_OP_32_64(shl):
-+        CASE_OP_32_64(shr):
-+            done = fold_shift(&ctx, op);
-+            break;
-+        CASE_OP_32_64_VEC(sub):
-+            done = fold_sub(&ctx, op);
-+            break;
-+        CASE_OP_32_64_VEC(xor):
-+            done = fold_xor(&ctx, op);
-             break;
-         }
---
-.25.1

-[PULL 21/56] tcg/optimize: Split out fold_setcond2
+Deleted patch
-Reduce some code duplication by folding the NE and EQ cases.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 145 ++++++++++++++++++++++++-------------------------
-file changed, 72 insertions(+), 73 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_remainder(OptContext *ctx, TCGOp *op)
-     return fold_const2(ctx, op);
- }
-+static bool fold_setcond2(OptContext *ctx, TCGOp *op)
-+{
-+    TCGCond cond = op->args[5];
-+    int i = do_constant_folding_cond2(&op->args[1], &op->args[3], cond);
-+    int inv = 0;
-+
-+    if (i >= 0) {
-+        goto do_setcond_const;
-+    }
-+
-+    switch (cond) {
-+    case TCG_COND_LT:
-+    case TCG_COND_GE:
-+        /*
-+         * Simplify LT/GE comparisons vs zero to a single compare
-+         * vs the high word of the input.
-+         */
-+        if (arg_is_const(op->args[3]) && arg_info(op->args[3])->val == 0 &&
-+            arg_is_const(op->args[4]) && arg_info(op->args[4])->val == 0) {
-+            goto do_setcond_high;
-+        }
-+        break;
-+
-+    case TCG_COND_NE:
-+        inv = 1;
-+        QEMU_FALLTHROUGH;
-+    case TCG_COND_EQ:
-+        /*
-+         * Simplify EQ/NE comparisons where one of the pairs
-+         * can be simplified.
-+         */
-+        i = do_constant_folding_cond(INDEX_op_setcond_i32, op->args[1],
-+                                     op->args[3], cond);
-+        switch (i ^ inv) {
-+        case 0:
-+            goto do_setcond_const;
-+        case 1:
-+            goto do_setcond_high;
-+        }
-+
-+        i = do_constant_folding_cond(INDEX_op_setcond_i32, op->args[2],
-+                                     op->args[4], cond);
-+        switch (i ^ inv) {
-+        case 0:
-+            goto do_setcond_const;
-+        case 1:
-+            op->args[2] = op->args[3];
-+            op->args[3] = cond;
-+            op->opc = INDEX_op_setcond_i32;
-+            break;
-+        }
-+        break;
-+
-+    default:
-+        break;
-+
-+    do_setcond_high:
-+        op->args[1] = op->args[2];
-+        op->args[2] = op->args[4];
-+        op->args[3] = cond;
-+        op->opc = INDEX_op_setcond_i32;
-+        break;
-+    }
-+    return false;
-+
-+ do_setcond_const:
-+    return tcg_opt_gen_movi(ctx, op, op->args[0], i);
-+}
-+
- static bool fold_shift(OptContext *ctx, TCGOp *op)
- {
-     return fold_const2(ctx, op);
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             }
-             break;
--        case INDEX_op_setcond2_i32:
--            i = do_constant_folding_cond2(&op->args[1], &op->args[3],
--                                          op->args[5]);
--            if (i >= 0) {
--            do_setcond_const:
--                tcg_opt_gen_movi(&ctx, op, op->args[0], i);
--                continue;
--            }
--            if ((op->args[5] == TCG_COND_LT || op->args[5] == TCG_COND_GE)
--                 && arg_is_const(op->args[3])
--                 && arg_info(op->args[3])->val == 0
--                 && arg_is_const(op->args[4])
--                 && arg_info(op->args[4])->val == 0) {
--                /* Simplify LT/GE comparisons vs zero to a single compare
--                   vs the high word of the input.  */
--            do_setcond_high:
--                reset_temp(op->args[0]);
--                arg_info(op->args[0])->z_mask = 1;
--                op->opc = INDEX_op_setcond_i32;
--                op->args[1] = op->args[2];
--                op->args[2] = op->args[4];
--                op->args[3] = op->args[5];
--                break;
--            }
--            if (op->args[5] == TCG_COND_EQ) {
--                /* Simplify EQ comparisons where one of the pairs
--                   can be simplified.  */
--                i = do_constant_folding_cond(INDEX_op_setcond_i32,
--                                             op->args[1], op->args[3],
--                                             TCG_COND_EQ);
--                if (i == 0) {
--                    goto do_setcond_const;
--                } else if (i > 0) {
--                    goto do_setcond_high;
--                }
--                i = do_constant_folding_cond(INDEX_op_setcond_i32,
--                                             op->args[2], op->args[4],
--                                             TCG_COND_EQ);
--                if (i == 0) {
--                    goto do_setcond_high;
--                } else if (i < 0) {
--                    break;
--                }
--            do_setcond_low:
--                reset_temp(op->args[0]);
--                arg_info(op->args[0])->z_mask = 1;
--                op->opc = INDEX_op_setcond_i32;
--                op->args[2] = op->args[3];
--                op->args[3] = op->args[5];
--                break;
--            }
--            if (op->args[5] == TCG_COND_NE) {
--                /* Simplify NE comparisons where one of the pairs
--                   can be simplified.  */
--                i = do_constant_folding_cond(INDEX_op_setcond_i32,
--                                             op->args[1], op->args[3],
--                                             TCG_COND_NE);
--                if (i == 0) {
--                    goto do_setcond_high;
--                } else if (i > 0) {
--                    goto do_setcond_const;
--                }
--                i = do_constant_folding_cond(INDEX_op_setcond_i32,
--                                             op->args[2], op->args[4],
--                                             TCG_COND_NE);
--                if (i == 0) {
--                    goto do_setcond_low;
--                } else if (i > 0) {
--                    goto do_setcond_const;
--                }
--            }
--            break;
--
-         default:
-             break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         CASE_OP_32_64(shr):
-             done = fold_shift(&ctx, op);
-             break;
-+        case INDEX_op_setcond2_i32:
-+            done = fold_setcond2(&ctx, op);
-+            break;
-         CASE_OP_32_64_VEC(sub):
-             done = fold_sub(&ctx, op);
-             break;
---
-.25.1

-[PULL 22/56] tcg/optimize: Split out fold_brcond2
+Deleted patch
-Reduce some code duplication by folding the NE and EQ cases.
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 159 +++++++++++++++++++++++++------------------------
-file changed, 81 insertions(+), 78 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_andc(OptContext *ctx, TCGOp *op)
-     return fold_const2(ctx, op);
- }
-+static bool fold_brcond2(OptContext *ctx, TCGOp *op)
-+{
-+    TCGCond cond = op->args[4];
-+    int i = do_constant_folding_cond2(&op->args[0], &op->args[2], cond);
-+    TCGArg label = op->args[5];
-+    int inv = 0;
-+
-+    if (i >= 0) {
-+        goto do_brcond_const;
-+    }
-+
-+    switch (cond) {
-+    case TCG_COND_LT:
-+    case TCG_COND_GE:
-+        /*
-+         * Simplify LT/GE comparisons vs zero to a single compare
-+         * vs the high word of the input.
-+         */
-+        if (arg_is_const(op->args[2]) && arg_info(op->args[2])->val == 0 &&
-+            arg_is_const(op->args[3]) && arg_info(op->args[3])->val == 0) {
-+            goto do_brcond_high;
-+        }
-+        break;
-+
-+    case TCG_COND_NE:
-+        inv = 1;
-+        QEMU_FALLTHROUGH;
-+    case TCG_COND_EQ:
-+        /*
-+         * Simplify EQ/NE comparisons where one of the pairs
-+         * can be simplified.
-+         */
-+        i = do_constant_folding_cond(INDEX_op_brcond_i32, op->args[0],
-+                                     op->args[2], cond);
-+        switch (i ^ inv) {
-+        case 0:
-+            goto do_brcond_const;
-+        case 1:
-+            goto do_brcond_high;
-+        }
-+
-+        i = do_constant_folding_cond(INDEX_op_brcond_i32, op->args[1],
-+                                     op->args[3], cond);
-+        switch (i ^ inv) {
-+        case 0:
-+            goto do_brcond_const;
-+        case 1:
-+            op->opc = INDEX_op_brcond_i32;
-+            op->args[1] = op->args[2];
-+            op->args[2] = cond;
-+            op->args[3] = label;
-+            break;
-+        }
-+        break;
-+
-+    default:
-+        break;
-+
-+    do_brcond_high:
-+        op->opc = INDEX_op_brcond_i32;
-+        op->args[0] = op->args[1];
-+        op->args[1] = op->args[3];
-+        op->args[2] = cond;
-+        op->args[3] = label;
-+        break;
-+
-+    do_brcond_const:
-+        if (i == 0) {
-+            tcg_op_remove(ctx->tcg, op);
-+            return true;
-+        }
-+        op->opc = INDEX_op_br;
-+        op->args[0] = label;
-+        break;
-+    }
-+    return false;
-+}
-+
- static bool fold_call(OptContext *ctx, TCGOp *op)
- {
-     TCGContext *s = ctx->tcg;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             }
-             break;
--        case INDEX_op_brcond2_i32:
--            i = do_constant_folding_cond2(&op->args[0], &op->args[2],
--                                          op->args[4]);
--            if (i == 0) {
--            do_brcond_false:
--                tcg_op_remove(s, op);
--                continue;
--            }
--            if (i > 0) {
--            do_brcond_true:
--                op->opc = opc = INDEX_op_br;
--                op->args[0] = op->args[5];
--                break;
--            }
--            if ((op->args[4] == TCG_COND_LT || op->args[4] == TCG_COND_GE)
--                 && arg_is_const(op->args[2])
--                 && arg_info(op->args[2])->val == 0
--                 && arg_is_const(op->args[3])
--                 && arg_info(op->args[3])->val == 0) {
--                /* Simplify LT/GE comparisons vs zero to a single compare
--                   vs the high word of the input.  */
--            do_brcond_high:
--                op->opc = opc = INDEX_op_brcond_i32;
--                op->args[0] = op->args[1];
--                op->args[1] = op->args[3];
--                op->args[2] = op->args[4];
--                op->args[3] = op->args[5];
--                break;
--            }
--            if (op->args[4] == TCG_COND_EQ) {
--                /* Simplify EQ comparisons where one of the pairs
--                   can be simplified.  */
--                i = do_constant_folding_cond(INDEX_op_brcond_i32,
--                                             op->args[0], op->args[2],
--                                             TCG_COND_EQ);
--                if (i == 0) {
--                    goto do_brcond_false;
--                } else if (i > 0) {
--                    goto do_brcond_high;
--                }
--                i = do_constant_folding_cond(INDEX_op_brcond_i32,
--                                             op->args[1], op->args[3],
--                                             TCG_COND_EQ);
--                if (i == 0) {
--                    goto do_brcond_false;
--                } else if (i < 0) {
--                    break;
--                }
--            do_brcond_low:
--                memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
--                op->opc = INDEX_op_brcond_i32;
--                op->args[1] = op->args[2];
--                op->args[2] = op->args[4];
--                op->args[3] = op->args[5];
--                break;
--            }
--            if (op->args[4] == TCG_COND_NE) {
--                /* Simplify NE comparisons where one of the pairs
--                   can be simplified.  */
--                i = do_constant_folding_cond(INDEX_op_brcond_i32,
--                                             op->args[0], op->args[2],
--                                             TCG_COND_NE);
--                if (i == 0) {
--                    goto do_brcond_high;
--                } else if (i > 0) {
--                    goto do_brcond_true;
--                }
--                i = do_constant_folding_cond(INDEX_op_brcond_i32,
--                                             op->args[1], op->args[3],
--                                             TCG_COND_NE);
--                if (i == 0) {
--                    goto do_brcond_low;
--                } else if (i > 0) {
--                    goto do_brcond_true;
--                }
--            }
--            break;
--
-         default:
-             break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         CASE_OP_32_64_VEC(andc):
-             done = fold_andc(&ctx, op);
-             break;
-+        case INDEX_op_brcond2_i32:
-+            done = fold_brcond2(&ctx, op);
-+            break;
-         CASE_OP_32_64(ctpop):
-             done = fold_ctpop(&ctx, op);
-             break;
---
-.25.1

-[PULL 23/56] tcg/optimize: Split out fold_brcond
+Deleted patch
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 33 +++++++++++++++++++--------------
-file changed, 19 insertions(+), 14 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_andc(OptContext *ctx, TCGOp *op)
-     return fold_const2(ctx, op);
- }
-+static bool fold_brcond(OptContext *ctx, TCGOp *op)
-+{
-+    TCGCond cond = op->args[2];
-+    int i = do_constant_folding_cond(op->opc, op->args[0], op->args[1], cond);
-+
-+    if (i == 0) {
-+        tcg_op_remove(ctx->tcg, op);
-+        return true;
-+    }
-+    if (i > 0) {
-+        op->opc = INDEX_op_br;
-+        op->args[0] = op->args[3];
-+    }
-+    return false;
-+}
-+
- static bool fold_brcond2(OptContext *ctx, TCGOp *op)
- {
-     TCGCond cond = op->args[4];
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             }
-             break;
--        CASE_OP_32_64(brcond):
--            i = do_constant_folding_cond(opc, op->args[0],
--                                         op->args[1], op->args[2]);
--            if (i == 0) {
--                tcg_op_remove(s, op);
--                continue;
--            } else if (i > 0) {
--                memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
--                op->opc = opc = INDEX_op_br;
--                op->args[0] = op->args[3];
--                break;
--            }
--            break;
--
-         CASE_OP_32_64(movcond):
-             i = do_constant_folding_cond(opc, op->args[1],
-                                          op->args[2], op->args[5]);
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         CASE_OP_32_64_VEC(andc):
-             done = fold_andc(&ctx, op);
-             break;
-+        CASE_OP_32_64(brcond):
-+            done = fold_brcond(&ctx, op);
-+            break;
-         case INDEX_op_brcond2_i32:
-             done = fold_brcond2(&ctx, op);
-             break;
---
-.25.1

-[PULL 24/56] tcg/optimize: Split out fold_setcond
+Deleted patch
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 23 ++++++++++++++---------
-file changed, 14 insertions(+), 9 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_remainder(OptContext *ctx, TCGOp *op)
-     return fold_const2(ctx, op);
- }
-+static bool fold_setcond(OptContext *ctx, TCGOp *op)
-+{
-+    TCGCond cond = op->args[3];
-+    int i = do_constant_folding_cond(op->opc, op->args[1], op->args[2], cond);
-+
-+    if (i >= 0) {
-+        return tcg_opt_gen_movi(ctx, op, op->args[0], i);
-+    }
-+    return false;
-+}
-+
- static bool fold_setcond2(OptContext *ctx, TCGOp *op)
- {
-     TCGCond cond = op->args[5];
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             }
-             break;
--        CASE_OP_32_64(setcond):
--            i = do_constant_folding_cond(opc, op->args[1],
--                                         op->args[2], op->args[3]);
--            if (i >= 0) {
--                tcg_opt_gen_movi(&ctx, op, op->args[0], i);
--                continue;
--            }
--            break;
--
-         CASE_OP_32_64(movcond):
-             i = do_constant_folding_cond(opc, op->args[1],
-                                          op->args[2], op->args[5]);
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         CASE_OP_32_64(shr):
-             done = fold_shift(&ctx, op);
-             break;
-+        CASE_OP_32_64(setcond):
-+            done = fold_setcond(&ctx, op);
-+            break;
-         case INDEX_op_setcond2_i32:
-             done = fold_setcond2(&ctx, op);
-             break;
---
-.25.1

-[PULL 25/56] tcg/optimize: Split out fold_mulu2_i32
+Deleted patch
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 37 +++++++++++++++++++++----------------
-file changed, 21 insertions(+), 16 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_mul_highpart(OptContext *ctx, TCGOp *op)
-     return fold_const2(ctx, op);
- }
-+static bool fold_mulu2_i32(OptContext *ctx, TCGOp *op)
-+{
-+    if (arg_is_const(op->args[2]) && arg_is_const(op->args[3])) {
-+        uint32_t a = arg_info(op->args[2])->val;
-+        uint32_t b = arg_info(op->args[3])->val;
-+        uint64_t r = (uint64_t)a * b;
-+        TCGArg rl, rh;
-+        TCGOp *op2 = tcg_op_insert_before(ctx->tcg, op, INDEX_op_mov_i32);
-+
-+        rl = op->args[0];
-+        rh = op->args[1];
-+        tcg_opt_gen_movi(ctx, op, rl, (int32_t)r);
-+        tcg_opt_gen_movi(ctx, op2, rh, (int32_t)(r >> 32));
-+        return true;
-+    }
-+    return false;
-+}
-+
- static bool fold_nand(OptContext *ctx, TCGOp *op)
- {
-     return fold_const2(ctx, op);
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             }
-             break;
--        case INDEX_op_mulu2_i32:
--            if (arg_is_const(op->args[2]) && arg_is_const(op->args[3])) {
--                uint32_t a = arg_info(op->args[2])->val;
--                uint32_t b = arg_info(op->args[3])->val;
--                uint64_t r = (uint64_t)a * b;
--                TCGArg rl, rh;
--                TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_mov_i32);
--
--                rl = op->args[0];
--                rh = op->args[1];
--                tcg_opt_gen_movi(&ctx, op, rl, (int32_t)r);
--                tcg_opt_gen_movi(&ctx, op2, rh, (int32_t)(r >> 32));
--                continue;
--            }
--            break;
--
-         default:
-             break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         CASE_OP_32_64(muluh):
-             done = fold_mul_highpart(&ctx, op);
-             break;
-+        case INDEX_op_mulu2_i32:
-+            done = fold_mulu2_i32(&ctx, op);
-+            break;
-         CASE_OP_32_64(nand):
-             done = fold_nand(&ctx, op);
-             break;
---
-.25.1

-[PULL 26/56] tcg/optimize: Split out fold_addsub2_i32
+Deleted patch
-Add two additional helpers, fold_add2_i32 and fold_sub2_i32
-which will not be simple wrappers forever.
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 70 +++++++++++++++++++++++++++++++-------------------
-file changed, 44 insertions(+), 26 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_add(OptContext *ctx, TCGOp *op)
-     return fold_const2(ctx, op);
- }
-+static bool fold_addsub2_i32(OptContext *ctx, TCGOp *op, bool add)
-+{
-+    if (arg_is_const(op->args[2]) && arg_is_const(op->args[3]) &&
-+        arg_is_const(op->args[4]) && arg_is_const(op->args[5])) {
-+        uint32_t al = arg_info(op->args[2])->val;
-+        uint32_t ah = arg_info(op->args[3])->val;
-+        uint32_t bl = arg_info(op->args[4])->val;
-+        uint32_t bh = arg_info(op->args[5])->val;
-+        uint64_t a = ((uint64_t)ah << 32) | al;
-+        uint64_t b = ((uint64_t)bh << 32) | bl;
-+        TCGArg rl, rh;
-+        TCGOp *op2 = tcg_op_insert_before(ctx->tcg, op, INDEX_op_mov_i32);
-+
-+        if (add) {
-+            a += b;
-+        } else {
-+            a -= b;
-+        }
-+
-+        rl = op->args[0];
-+        rh = op->args[1];
-+        tcg_opt_gen_movi(ctx, op, rl, (int32_t)a);
-+        tcg_opt_gen_movi(ctx, op2, rh, (int32_t)(a >> 32));
-+        return true;
-+    }
-+    return false;
-+}
-+
-+static bool fold_add2_i32(OptContext *ctx, TCGOp *op)
-+{
-+    return fold_addsub2_i32(ctx, op, true);
-+}
-+
- static bool fold_and(OptContext *ctx, TCGOp *op)
- {
-     return fold_const2(ctx, op);
-@@ -XXX,XX +XXX,XX @@ static bool fold_sub(OptContext *ctx, TCGOp *op)
-     return fold_const2(ctx, op);
- }
-+static bool fold_sub2_i32(OptContext *ctx, TCGOp *op)
-+{
-+    return fold_addsub2_i32(ctx, op, false);
-+}
-+
- static bool fold_xor(OptContext *ctx, TCGOp *op)
- {
-     return fold_const2(ctx, op);
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             }
-             break;
--        case INDEX_op_add2_i32:
--        case INDEX_op_sub2_i32:
--            if (arg_is_const(op->args[2]) && arg_is_const(op->args[3])
--                && arg_is_const(op->args[4]) && arg_is_const(op->args[5])) {
--                uint32_t al = arg_info(op->args[2])->val;
--                uint32_t ah = arg_info(op->args[3])->val;
--                uint32_t bl = arg_info(op->args[4])->val;
--                uint32_t bh = arg_info(op->args[5])->val;
--                uint64_t a = ((uint64_t)ah << 32) | al;
--                uint64_t b = ((uint64_t)bh << 32) | bl;
--                TCGArg rl, rh;
--                TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_mov_i32);
--
--                if (opc == INDEX_op_add2_i32) {
--                    a += b;
--                } else {
--                    a -= b;
--                }
--
--                rl = op->args[0];
--                rh = op->args[1];
--                tcg_opt_gen_movi(&ctx, op, rl, (int32_t)a);
--                tcg_opt_gen_movi(&ctx, op2, rh, (int32_t)(a >> 32));
--                continue;
--            }
--            break;
-         default:
-             break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         CASE_OP_32_64_VEC(add):
-             done = fold_add(&ctx, op);
-             break;
-+        case INDEX_op_add2_i32:
-+            done = fold_add2_i32(&ctx, op);
-+            break;
-         CASE_OP_32_64_VEC(and):
-             done = fold_and(&ctx, op);
-             break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         CASE_OP_32_64_VEC(sub):
-             done = fold_sub(&ctx, op);
-             break;
-+        case INDEX_op_sub2_i32:
-+            done = fold_sub2_i32(&ctx, op);
-+            break;
-         CASE_OP_32_64_VEC(xor):
-             done = fold_xor(&ctx, op);
-             break;
---
-.25.1

-[PULL 27/56] tcg/optimize: Split out fold_movcond
+Deleted patch
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 56 ++++++++++++++++++++++++++++----------------------
-file changed, 31 insertions(+), 25 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_mb(OptContext *ctx, TCGOp *op)
-     return true;
- }
-+static bool fold_movcond(OptContext *ctx, TCGOp *op)
-+{
-+    TCGOpcode opc = op->opc;
-+    TCGCond cond = op->args[5];
-+    int i = do_constant_folding_cond(opc, op->args[1], op->args[2], cond);
-+
-+    if (i >= 0) {
-+        return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[4 - i]);
-+    }
-+
-+    if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
-+        uint64_t tv = arg_info(op->args[3])->val;
-+        uint64_t fv = arg_info(op->args[4])->val;
-+
-+        opc = (opc == INDEX_op_movcond_i32
-+               ? INDEX_op_setcond_i32 : INDEX_op_setcond_i64);
-+
-+        if (tv == 1 && fv == 0) {
-+            op->opc = opc;
-+            op->args[3] = cond;
-+        } else if (fv == 1 && tv == 0) {
-+            op->opc = opc;
-+            op->args[3] = tcg_invert_cond(cond);
-+        }
-+    }
-+    return false;
-+}
-+
- static bool fold_mul(OptContext *ctx, TCGOp *op)
- {
-     return fold_const2(ctx, op);
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             }
-             break;
--        CASE_OP_32_64(movcond):
--            i = do_constant_folding_cond(opc, op->args[1],
--                                         op->args[2], op->args[5]);
--            if (i >= 0) {
--                tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[4 - i]);
--                continue;
--            }
--            if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
--                uint64_t tv = arg_info(op->args[3])->val;
--                uint64_t fv = arg_info(op->args[4])->val;
--                TCGCond cond = op->args[5];
--
--                if (fv == 1 && tv == 0) {
--                    cond = tcg_invert_cond(cond);
--                } else if (!(tv == 1 && fv == 0)) {
--                    break;
--                }
--                op->args[3] = cond;
--                op->opc = opc = (opc == INDEX_op_movcond_i32
--                                 ? INDEX_op_setcond_i32
--                                 : INDEX_op_setcond_i64);
--            }
--            break;
--
--
-         default:
-             break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         case INDEX_op_mb:
-             done = fold_mb(&ctx, op);
-             break;
-+        CASE_OP_32_64(movcond):
-+            done = fold_movcond(&ctx, op);
-+            break;
-         CASE_OP_32_64(mul):
-             done = fold_mul(&ctx, op);
-             break;
---
-.25.1

-[PULL 28/56] tcg/optimize: Split out fold_extract2
+Deleted patch
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 39 ++++++++++++++++++++++-----------------
-file changed, 22 insertions(+), 17 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_eqv(OptContext *ctx, TCGOp *op)
-     return fold_const2(ctx, op);
- }
-+static bool fold_extract2(OptContext *ctx, TCGOp *op)
-+{
-+    if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
-+        uint64_t v1 = arg_info(op->args[1])->val;
-+        uint64_t v2 = arg_info(op->args[2])->val;
-+        int shr = op->args[3];
-+
-+        if (op->opc == INDEX_op_extract2_i64) {
-+            v1 >>= shr;
-+            v2 <<= 64 - shr;
-+        } else {
-+            v1 = (uint32_t)v1 >> shr;
-+            v2 = (int32_t)v2 << (32 - shr);
-+        }
-+        return tcg_opt_gen_movi(ctx, op, op->args[0], v1 | v2);
-+    }
-+    return false;
-+}
-+
- static bool fold_exts(OptContext *ctx, TCGOp *op)
- {
-     return fold_const1(ctx, op);
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             }
-             break;
--        CASE_OP_32_64(extract2):
--            if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
--                uint64_t v1 = arg_info(op->args[1])->val;
--                uint64_t v2 = arg_info(op->args[2])->val;
--                int shr = op->args[3];
--
--                if (opc == INDEX_op_extract2_i64) {
--                    tmp = (v1 >> shr) | (v2 << (64 - shr));
--                } else {
--                    tmp = (int32_t)(((uint32_t)v1 >> shr) |
--                                    ((uint32_t)v2 << (32 - shr)));
--                }
--                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
--                continue;
--            }
--            break;
--
-         default:
-             break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         CASE_OP_32_64(eqv):
-             done = fold_eqv(&ctx, op);
-             break;
-+        CASE_OP_32_64(extract2):
-+            done = fold_extract2(&ctx, op);
-+            break;
-         CASE_OP_32_64(ext8s):
-         CASE_OP_32_64(ext16s):
-         case INDEX_op_ext32s_i64:
---
-.25.1

-[PULL 29/56] tcg/optimize: Split out fold_extract, fold_sextract
+Deleted patch
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 48 ++++++++++++++++++++++++++++++------------------
-file changed, 30 insertions(+), 18 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_eqv(OptContext *ctx, TCGOp *op)
-     return fold_const2(ctx, op);
- }
-+static bool fold_extract(OptContext *ctx, TCGOp *op)
-+{
-+    if (arg_is_const(op->args[1])) {
-+        uint64_t t;
-+
-+        t = arg_info(op->args[1])->val;
-+        t = extract64(t, op->args[2], op->args[3]);
-+        return tcg_opt_gen_movi(ctx, op, op->args[0], t);
-+    }
-+    return false;
-+}
-+
- static bool fold_extract2(OptContext *ctx, TCGOp *op)
- {
-     if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
-@@ -XXX,XX +XXX,XX @@ static bool fold_setcond2(OptContext *ctx, TCGOp *op)
-     return tcg_opt_gen_movi(ctx, op, op->args[0], i);
- }
-+static bool fold_sextract(OptContext *ctx, TCGOp *op)
-+{
-+    if (arg_is_const(op->args[1])) {
-+        uint64_t t;
-+
-+        t = arg_info(op->args[1])->val;
-+        t = sextract64(t, op->args[2], op->args[3]);
-+        return tcg_opt_gen_movi(ctx, op, op->args[0], t);
-+    }
-+    return false;
-+}
-+
- static bool fold_shift(OptContext *ctx, TCGOp *op)
- {
-     return fold_const2(ctx, op);
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             }
-             break;
--        CASE_OP_32_64(extract):
--            if (arg_is_const(op->args[1])) {
--                tmp = extract64(arg_info(op->args[1])->val,
--                                op->args[2], op->args[3]);
--                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
--                continue;
--            }
--            break;
--
--        CASE_OP_32_64(sextract):
--            if (arg_is_const(op->args[1])) {
--                tmp = sextract64(arg_info(op->args[1])->val,
--                                 op->args[2], op->args[3]);
--                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
--                continue;
--            }
--            break;
--
-         default:
-             break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         CASE_OP_32_64(eqv):
-             done = fold_eqv(&ctx, op);
-             break;
-+        CASE_OP_32_64(extract):
-+            done = fold_extract(&ctx, op);
-+            break;
-         CASE_OP_32_64(extract2):
-             done = fold_extract2(&ctx, op);
-             break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         case INDEX_op_setcond2_i32:
-             done = fold_setcond2(&ctx, op);
-             break;
-+        CASE_OP_32_64(sextract):
-+            done = fold_sextract(&ctx, op);
-+            break;
-         CASE_OP_32_64_VEC(sub):
-             done = fold_sub(&ctx, op);
-             break;
---
-.25.1

-[PULL 30/56] tcg/optimize: Split out fold_deposit
+Deleted patch
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 25 +++++++++++++++----------
-file changed, 15 insertions(+), 10 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_ctpop(OptContext *ctx, TCGOp *op)
-     return fold_const1(ctx, op);
- }
-+static bool fold_deposit(OptContext *ctx, TCGOp *op)
-+{
-+    if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
-+        uint64_t t1 = arg_info(op->args[1])->val;
-+        uint64_t t2 = arg_info(op->args[2])->val;
-+
-+        t1 = deposit64(t1, op->args[3], op->args[4], t2);
-+        return tcg_opt_gen_movi(ctx, op, op->args[0], t1);
-+    }
-+    return false;
-+}
-+
- static bool fold_divide(OptContext *ctx, TCGOp *op)
- {
-     return fold_const2(ctx, op);
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             }
-             break;
--        CASE_OP_32_64(deposit):
--            if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
--                tmp = deposit64(arg_info(op->args[1])->val,
--                                op->args[3], op->args[4],
--                                arg_info(op->args[2])->val);
--                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
--                continue;
--            }
--            break;
--
-         default:
-             break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         CASE_OP_32_64(ctpop):
-             done = fold_ctpop(&ctx, op);
-             break;
-+        CASE_OP_32_64(deposit):
-+            done = fold_deposit(&ctx, op);
-+            break;
-         CASE_OP_32_64(div):
-         CASE_OP_32_64(divu):
-             done = fold_divide(&ctx, op);
---
-.25.1

-[PULL 31/56] tcg/optimize: Split out fold_count_zeros
+Deleted patch
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 32 ++++++++++++++++++--------------
-file changed, 18 insertions(+), 14 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_call(OptContext *ctx, TCGOp *op)
-     return true;
- }
-+static bool fold_count_zeros(OptContext *ctx, TCGOp *op)
-+{
-+    if (arg_is_const(op->args[1])) {
-+        uint64_t t = arg_info(op->args[1])->val;
-+
-+        if (t != 0) {
-+            t = do_constant_folding(op->opc, t, 0);
-+            return tcg_opt_gen_movi(ctx, op, op->args[0], t);
-+        }
-+        return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[2]);
-+    }
-+    return false;
-+}
-+
- static bool fold_ctpop(OptContext *ctx, TCGOp *op)
- {
-     return fold_const1(ctx, op);
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             }
-             break;
--        CASE_OP_32_64(clz):
--        CASE_OP_32_64(ctz):
--            if (arg_is_const(op->args[1])) {
--                TCGArg v = arg_info(op->args[1])->val;
--                if (v != 0) {
--                    tmp = do_constant_folding(opc, v, 0);
--                    tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
--                } else {
--                    tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[2]);
--                }
--                continue;
--            }
--            break;
--
-         default:
-             break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         case INDEX_op_brcond2_i32:
-             done = fold_brcond2(&ctx, op);
-             break;
-+        CASE_OP_32_64(clz):
-+        CASE_OP_32_64(ctz):
-+            done = fold_count_zeros(&ctx, op);
-+            break;
-         CASE_OP_32_64(ctpop):
-             done = fold_ctpop(&ctx, op);
-             break;
---
-.25.1

-[PULL 32/56] tcg/optimize: Split out fold_bswap
+Deleted patch
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 27 ++++++++++++++++-----------
-file changed, 16 insertions(+), 11 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_brcond2(OptContext *ctx, TCGOp *op)
-     return false;
- }
-+static bool fold_bswap(OptContext *ctx, TCGOp *op)
-+{
-+    if (arg_is_const(op->args[1])) {
-+        uint64_t t = arg_info(op->args[1])->val;
-+
-+        t = do_constant_folding(op->opc, t, op->args[2]);
-+        return tcg_opt_gen_movi(ctx, op, op->args[0], t);
-+    }
-+    return false;
-+}
-+
- static bool fold_call(OptContext *ctx, TCGOp *op)
- {
-     TCGContext *s = ctx->tcg;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             }
-             break;
--        CASE_OP_32_64(bswap16):
--        CASE_OP_32_64(bswap32):
--        case INDEX_op_bswap64_i64:
--            if (arg_is_const(op->args[1])) {
--                tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
--                                          op->args[2]);
--                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
--                continue;
--            }
--            break;
--
-         default:
-             break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         case INDEX_op_brcond2_i32:
-             done = fold_brcond2(&ctx, op);
-             break;
-+        CASE_OP_32_64(bswap16):
-+        CASE_OP_32_64(bswap32):
-+        case INDEX_op_bswap64_i64:
-+            done = fold_bswap(&ctx, op);
-+            break;
-         CASE_OP_32_64(clz):
-         CASE_OP_32_64(ctz):
-             done = fold_count_zeros(&ctx, op);
---
-.25.1

-[PULL 33/56] tcg/optimize: Split out fold_dup, fold_dup2
+Deleted patch
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 53 +++++++++++++++++++++++++++++---------------------
-file changed, 31 insertions(+), 22 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_divide(OptContext *ctx, TCGOp *op)
-     return fold_const2(ctx, op);
- }
-+static bool fold_dup(OptContext *ctx, TCGOp *op)
-+{
-+    if (arg_is_const(op->args[1])) {
-+        uint64_t t = arg_info(op->args[1])->val;
-+        t = dup_const(TCGOP_VECE(op), t);
-+        return tcg_opt_gen_movi(ctx, op, op->args[0], t);
-+    }
-+    return false;
-+}
-+
-+static bool fold_dup2(OptContext *ctx, TCGOp *op)
-+{
-+    if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
-+        uint64_t t = deposit64(arg_info(op->args[1])->val, 32, 32,
-+                               arg_info(op->args[2])->val);
-+        return tcg_opt_gen_movi(ctx, op, op->args[0], t);
-+    }
-+
-+    if (args_are_copies(op->args[1], op->args[2])) {
-+        op->opc = INDEX_op_dup_vec;
-+        TCGOP_VECE(op) = MO_32;
-+    }
-+    return false;
-+}
-+
- static bool fold_eqv(OptContext *ctx, TCGOp *op)
- {
-     return fold_const2(ctx, op);
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             done = tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
-             break;
--        case INDEX_op_dup_vec:
--            if (arg_is_const(op->args[1])) {
--                tmp = arg_info(op->args[1])->val;
--                tmp = dup_const(TCGOP_VECE(op), tmp);
--                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
--                continue;
--            }
--            break;
--
--        case INDEX_op_dup2_vec:
--            assert(TCG_TARGET_REG_BITS == 32);
--            if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
--                tcg_opt_gen_movi(&ctx, op, op->args[0],
--                                 deposit64(arg_info(op->args[1])->val, 32, 32,
--                                           arg_info(op->args[2])->val));
--                continue;
--            } else if (args_are_copies(op->args[1], op->args[2])) {
--                op->opc = INDEX_op_dup_vec;
--                TCGOP_VECE(op) = MO_32;
--            }
--            break;
--
-         default:
-             break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         CASE_OP_32_64(divu):
-             done = fold_divide(&ctx, op);
-             break;
-+        case INDEX_op_dup_vec:
-+            done = fold_dup(&ctx, op);
-+            break;
-+        case INDEX_op_dup2_vec:
-+            done = fold_dup2(&ctx, op);
-+            break;
-         CASE_OP_32_64(eqv):
-             done = fold_eqv(&ctx, op);
-             break;
---
-.25.1

-[PULL 34/56] tcg/optimize: Split out fold_mov
+Deleted patch
-This is the final entry in the main switch that was in a
-different form.  After this, we have the option to convert
-the switch into a function dispatch table.
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 27 ++++++++++++++-------------
-file changed, 14 insertions(+), 13 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_mb(OptContext *ctx, TCGOp *op)
-     return true;
- }
-+static bool fold_mov(OptContext *ctx, TCGOp *op)
-+{
-+    return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[1]);
-+}
-+
- static bool fold_movcond(OptContext *ctx, TCGOp *op)
- {
-     TCGOpcode opc = op->opc;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             break;
-         }
--        /* Propagate constants through copy operations and do constant
--           folding.  Constants will be substituted to arguments by register
--           allocator where needed and possible.  Also detect copies. */
-+        /*
-+         * Process each opcode.
-+         * Sorted alphabetically by opcode as much as possible.
-+         */
-         switch (opc) {
--        CASE_OP_32_64_VEC(mov):
--            done = tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
--            break;
--
--        default:
--            break;
--
--        /* ---------------------------------------------------------- */
--        /* Sorted alphabetically by opcode as much as possible. */
--
-         CASE_OP_32_64_VEC(add):
-             done = fold_add(&ctx, op);
-             break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         case INDEX_op_mb:
-             done = fold_mb(&ctx, op);
-             break;
-+        CASE_OP_32_64_VEC(mov):
-+            done = fold_mov(&ctx, op);
-+            break;
-         CASE_OP_32_64(movcond):
-             done = fold_movcond(&ctx, op);
-             break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         CASE_OP_32_64_VEC(xor):
-             done = fold_xor(&ctx, op);
-             break;
-+        default:
-+            break;
-         }
-         if (!done) {
---
-.25.1

-[PULL 35/56] tcg/optimize: Split out fold_xx_to_i
+Deleted patch
-Pull the "op r, a, a => movi r, 0" optimization into a function,
-and use it in the outer opcode fold functions.
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 41 ++++++++++++++++++++++++-----------------
-file changed, 24 insertions(+), 17 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_const2(OptContext *ctx, TCGOp *op)
-     return false;
- }
-+/* If the binary operation has both arguments equal, fold to @i. */
-+static bool fold_xx_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
-+{
-+    if (args_are_copies(op->args[1], op->args[2])) {
-+        return tcg_opt_gen_movi(ctx, op, op->args[0], i);
-+    }
-+    return false;
-+}
-+
- /*
-  * These outermost fold_<op> functions are sorted alphabetically.
-  */
-@@ -XXX,XX +XXX,XX @@ static bool fold_and(OptContext *ctx, TCGOp *op)
- static bool fold_andc(OptContext *ctx, TCGOp *op)
- {
--    return fold_const2(ctx, op);
-+    if (fold_const2(ctx, op) ||
-+        fold_xx_to_i(ctx, op, 0)) {
-+        return true;
-+    }
-+    return false;
- }
- static bool fold_brcond(OptContext *ctx, TCGOp *op)
-@@ -XXX,XX +XXX,XX @@ static bool fold_shift(OptContext *ctx, TCGOp *op)
- static bool fold_sub(OptContext *ctx, TCGOp *op)
- {
--    return fold_const2(ctx, op);
-+    if (fold_const2(ctx, op) ||
-+        fold_xx_to_i(ctx, op, 0)) {
-+        return true;
-+    }
-+    return false;
- }
- static bool fold_sub2_i32(OptContext *ctx, TCGOp *op)
-@@ -XXX,XX +XXX,XX @@ static bool fold_sub2_i32(OptContext *ctx, TCGOp *op)
- static bool fold_xor(OptContext *ctx, TCGOp *op)
- {
--    return fold_const2(ctx, op);
-+    if (fold_const2(ctx, op) ||
-+        fold_xx_to_i(ctx, op, 0)) {
-+        return true;
-+    }
-+    return false;
- }
- /* Propagate constants and copies, fold constant expressions. */
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             break;
-         }
--        /* Simplify expression for "op r, a, a => movi r, 0" cases */
--        switch (opc) {
--        CASE_OP_32_64_VEC(andc):
--        CASE_OP_32_64_VEC(sub):
--        CASE_OP_32_64_VEC(xor):
--            if (args_are_copies(op->args[1], op->args[2])) {
--                tcg_opt_gen_movi(&ctx, op, op->args[0], 0);
--                continue;
--            }
--            break;
--        default:
--            break;
--        }
--
-         /*
-          * Process each opcode.
-          * Sorted alphabetically by opcode as much as possible.
---
-.25.1

-[PULL 36/56] tcg/optimize: Split out fold_xx_to_x
+Deleted patch
-Pull the "op r, a, a => mov r, a" optimization into a function,
-and use it in the outer opcode fold functions.
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 39 ++++++++++++++++++++++++---------------
-file changed, 24 insertions(+), 15 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_xx_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
-     return false;
- }
-+/* If the binary operation has both arguments equal, fold to identity. */
-+static bool fold_xx_to_x(OptContext *ctx, TCGOp *op)
-+{
-+    if (args_are_copies(op->args[1], op->args[2])) {
-+        return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[1]);
-+    }
-+    return false;
-+}
-+
- /*
-  * These outermost fold_<op> functions are sorted alphabetically.
-+ *
-+ * The ordering of the transformations should be:
-+ *   1) those that produce a constant
-+ *   2) those that produce a copy
-+ *   3) those that produce information about the result value.
-  */
- static bool fold_add(OptContext *ctx, TCGOp *op)
-@@ -XXX,XX +XXX,XX @@ static bool fold_add2_i32(OptContext *ctx, TCGOp *op)
- static bool fold_and(OptContext *ctx, TCGOp *op)
- {
--    return fold_const2(ctx, op);
-+    if (fold_const2(ctx, op) ||
-+        fold_xx_to_x(ctx, op)) {
-+        return true;
-+    }
-+    return false;
- }
- static bool fold_andc(OptContext *ctx, TCGOp *op)
-@@ -XXX,XX +XXX,XX @@ static bool fold_not(OptContext *ctx, TCGOp *op)
- static bool fold_or(OptContext *ctx, TCGOp *op)
- {
--    return fold_const2(ctx, op);
-+    if (fold_const2(ctx, op) ||
-+        fold_xx_to_x(ctx, op)) {
-+        return true;
-+    }
-+    return false;
- }
- static bool fold_orc(OptContext *ctx, TCGOp *op)
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             break;
-         }
--        /* Simplify expression for "op r, a, a => mov r, a" cases */
--        switch (opc) {
--        CASE_OP_32_64_VEC(or):
--        CASE_OP_32_64_VEC(and):
--            if (args_are_copies(op->args[1], op->args[2])) {
--                tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
--                continue;
--            }
--            break;
--        default:
--            break;
--        }
--
-         /*
-          * Process each opcode.
-          * Sorted alphabetically by opcode as much as possible.
---
-.25.1

-[PULL 37/56] tcg/optimize: Split out fold_xi_to_i
+Deleted patch
-Pull the "op r, a, 0 => movi r, 0" optimization into a function,
-and use it in the outer opcode fold functions.
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 38 ++++++++++++++++++++------------------
-file changed, 20 insertions(+), 18 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_const2(OptContext *ctx, TCGOp *op)
-     return false;
- }
-+/* If the binary operation has second argument @i, fold to @i. */
-+static bool fold_xi_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
-+{
-+    if (arg_is_const(op->args[2]) && arg_info(op->args[2])->val == i) {
-+        return tcg_opt_gen_movi(ctx, op, op->args[0], i);
-+    }
-+    return false;
-+}
-+
- /* If the binary operation has both arguments equal, fold to @i. */
- static bool fold_xx_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
- {
-@@ -XXX,XX +XXX,XX @@ static bool fold_add2_i32(OptContext *ctx, TCGOp *op)
- static bool fold_and(OptContext *ctx, TCGOp *op)
- {
-     if (fold_const2(ctx, op) ||
-+        fold_xi_to_i(ctx, op, 0) ||
-         fold_xx_to_x(ctx, op)) {
-         return true;
-     }
-@@ -XXX,XX +XXX,XX @@ static bool fold_movcond(OptContext *ctx, TCGOp *op)
- static bool fold_mul(OptContext *ctx, TCGOp *op)
- {
--    return fold_const2(ctx, op);
-+    if (fold_const2(ctx, op) ||
-+        fold_xi_to_i(ctx, op, 0)) {
-+        return true;
-+    }
-+    return false;
- }
- static bool fold_mul_highpart(OptContext *ctx, TCGOp *op)
- {
--    return fold_const2(ctx, op);
-+    if (fold_const2(ctx, op) ||
-+        fold_xi_to_i(ctx, op, 0)) {
-+        return true;
-+    }
-+    return false;
- }
- static bool fold_mulu2_i32(OptContext *ctx, TCGOp *op)
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             continue;
-         }
--        /* Simplify expression for "op r, a, 0 => movi r, 0" cases */
--        switch (opc) {
--        CASE_OP_32_64_VEC(and):
--        CASE_OP_32_64_VEC(mul):
--        CASE_OP_32_64(muluh):
--        CASE_OP_32_64(mulsh):
--            if (arg_is_const(op->args[2])
--                && arg_info(op->args[2])->val == 0) {
--                tcg_opt_gen_movi(&ctx, op, op->args[0], 0);
--                continue;
--            }
--            break;
--        default:
--            break;
--        }
--
-         /*
-          * Process each opcode.
-          * Sorted alphabetically by opcode as much as possible.
---
-.25.1

-[PULL 38/56] tcg/optimize: Add type to OptContext
+Deleted patch
-Compute the type of the operation early.
-There are at least 4 places that used a def->flags ladder
-to determine the type of the operation being optimized.
-There were two places that assumed !TCG_OPF_64BIT means
-TCG_TYPE_I32, and so could potentially compute incorrect
-results for vector operations.
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 149 +++++++++++++++++++++++++++++--------------------
-file changed, 89 insertions(+), 60 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ typedef struct OptContext {
-     /* In flight values from optimization. */
-     uint64_t z_mask;
-+    TCGType type;
- } OptContext;
- static inline TempOptInfo *ts_info(TCGTemp *ts)
-@@ -XXX,XX +XXX,XX @@ static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
- {
-     TCGTemp *dst_ts = arg_temp(dst);
-     TCGTemp *src_ts = arg_temp(src);
--    const TCGOpDef *def;
-     TempOptInfo *di;
-     TempOptInfo *si;
-     uint64_t z_mask;
-@@ -XXX,XX +XXX,XX @@ static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
-     reset_ts(dst_ts);
-     di = ts_info(dst_ts);
-     si = ts_info(src_ts);
--    def = &tcg_op_defs[op->opc];
--    if (def->flags & TCG_OPF_VECTOR) {
--        new_op = INDEX_op_mov_vec;
--    } else if (def->flags & TCG_OPF_64BIT) {
--        new_op = INDEX_op_mov_i64;
--    } else {
-+
-+    switch (ctx->type) {
-+    case TCG_TYPE_I32:
-         new_op = INDEX_op_mov_i32;
-+        break;
-+    case TCG_TYPE_I64:
-+        new_op = INDEX_op_mov_i64;
-+        break;
-+    case TCG_TYPE_V64:
-+    case TCG_TYPE_V128:
-+    case TCG_TYPE_V256:
-+        /* TCGOP_VECL and TCGOP_VECE remain unchanged.  */
-+        new_op = INDEX_op_mov_vec;
-+        break;
-+    default:
-+        g_assert_not_reached();
-     }
-     op->opc = new_op;
--    /* TCGOP_VECL and TCGOP_VECE remain unchanged.  */
-     op->args[0] = dst;
-     op->args[1] = src;
-@@ -XXX,XX +XXX,XX @@ static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
- static bool tcg_opt_gen_movi(OptContext *ctx, TCGOp *op,
-                              TCGArg dst, uint64_t val)
- {
--    const TCGOpDef *def = &tcg_op_defs[op->opc];
--    TCGType type;
--    TCGTemp *tv;
--
--    if (def->flags & TCG_OPF_VECTOR) {
--        type = TCGOP_VECL(op) + TCG_TYPE_V64;
--    } else if (def->flags & TCG_OPF_64BIT) {
--        type = TCG_TYPE_I64;
--    } else {
--        type = TCG_TYPE_I32;
--    }
--
-     /* Convert movi to mov with constant temp. */
--    tv = tcg_constant_internal(type, val);
-+    TCGTemp *tv = tcg_constant_internal(ctx->type, val);
-+
-     init_ts_info(ctx, tv);
-     return tcg_opt_gen_mov(ctx, op, dst, temp_arg(tv));
- }
-@@ -XXX,XX +XXX,XX @@ static uint64_t do_constant_folding_2(TCGOpcode op, uint64_t x, uint64_t y)
-     }
- }
--static uint64_t do_constant_folding(TCGOpcode op, uint64_t x, uint64_t y)
-+static uint64_t do_constant_folding(TCGOpcode op, TCGType type,
-+                                    uint64_t x, uint64_t y)
- {
--    const TCGOpDef *def = &tcg_op_defs[op];
-     uint64_t res = do_constant_folding_2(op, x, y);
--    if (!(def->flags & TCG_OPF_64BIT)) {
-+    if (type == TCG_TYPE_I32) {
-         res = (int32_t)res;
-     }
-     return res;
-@@ -XXX,XX +XXX,XX @@ static bool do_constant_folding_cond_eq(TCGCond c)
-  * Return -1 if the condition can't be simplified,
-  * and the result of the condition (0 or 1) if it can.
-  */
--static int do_constant_folding_cond(TCGOpcode op, TCGArg x,
-+static int do_constant_folding_cond(TCGType type, TCGArg x,
-                                     TCGArg y, TCGCond c)
- {
-     uint64_t xv = arg_info(x)->val;
-     uint64_t yv = arg_info(y)->val;
-     if (arg_is_const(x) && arg_is_const(y)) {
--        const TCGOpDef *def = &tcg_op_defs[op];
--        tcg_debug_assert(!(def->flags & TCG_OPF_VECTOR));
--        if (def->flags & TCG_OPF_64BIT) {
--            return do_constant_folding_cond_64(xv, yv, c);
--        } else {
-+        switch (type) {
-+        case TCG_TYPE_I32:
-             return do_constant_folding_cond_32(xv, yv, c);
-+        case TCG_TYPE_I64:
-+            return do_constant_folding_cond_64(xv, yv, c);
-+        default:
-+            /* Only scalar comparisons are optimizable */
-+            return -1;
-         }
-     } else if (args_are_copies(x, y)) {
-         return do_constant_folding_cond_eq(c);
-@@ -XXX,XX +XXX,XX @@ static bool fold_const1(OptContext *ctx, TCGOp *op)
-         uint64_t t;
-         t = arg_info(op->args[1])->val;
--        t = do_constant_folding(op->opc, t, 0);
-+        t = do_constant_folding(op->opc, ctx->type, t, 0);
-         return tcg_opt_gen_movi(ctx, op, op->args[0], t);
-     }
-     return false;
-@@ -XXX,XX +XXX,XX @@ static bool fold_const2(OptContext *ctx, TCGOp *op)
-         uint64_t t1 = arg_info(op->args[1])->val;
-         uint64_t t2 = arg_info(op->args[2])->val;
--        t1 = do_constant_folding(op->opc, t1, t2);
-+        t1 = do_constant_folding(op->opc, ctx->type, t1, t2);
-         return tcg_opt_gen_movi(ctx, op, op->args[0], t1);
-     }
-     return false;
-@@ -XXX,XX +XXX,XX @@ static bool fold_andc(OptContext *ctx, TCGOp *op)
- static bool fold_brcond(OptContext *ctx, TCGOp *op)
- {
-     TCGCond cond = op->args[2];
--    int i = do_constant_folding_cond(op->opc, op->args[0], op->args[1], cond);
-+    int i = do_constant_folding_cond(ctx->type, op->args[0], op->args[1], cond);
-     if (i == 0) {
-         tcg_op_remove(ctx->tcg, op);
-@@ -XXX,XX +XXX,XX @@ static bool fold_brcond2(OptContext *ctx, TCGOp *op)
-          * Simplify EQ/NE comparisons where one of the pairs
-          * can be simplified.
-          */
--        i = do_constant_folding_cond(INDEX_op_brcond_i32, op->args[0],
-+        i = do_constant_folding_cond(TCG_TYPE_I32, op->args[0],
-                                      op->args[2], cond);
-         switch (i ^ inv) {
-         case 0:
-@@ -XXX,XX +XXX,XX @@ static bool fold_brcond2(OptContext *ctx, TCGOp *op)
-             goto do_brcond_high;
-         }
--        i = do_constant_folding_cond(INDEX_op_brcond_i32, op->args[1],
-+        i = do_constant_folding_cond(TCG_TYPE_I32, op->args[1],
-                                      op->args[3], cond);
-         switch (i ^ inv) {
-         case 0:
-@@ -XXX,XX +XXX,XX @@ static bool fold_bswap(OptContext *ctx, TCGOp *op)
-     if (arg_is_const(op->args[1])) {
-         uint64_t t = arg_info(op->args[1])->val;
--        t = do_constant_folding(op->opc, t, op->args[2]);
-+        t = do_constant_folding(op->opc, ctx->type, t, op->args[2]);
-         return tcg_opt_gen_movi(ctx, op, op->args[0], t);
-     }
-     return false;
-@@ -XXX,XX +XXX,XX @@ static bool fold_count_zeros(OptContext *ctx, TCGOp *op)
-         uint64_t t = arg_info(op->args[1])->val;
-         if (t != 0) {
--            t = do_constant_folding(op->opc, t, 0);
-+            t = do_constant_folding(op->opc, ctx->type, t, 0);
-             return tcg_opt_gen_movi(ctx, op, op->args[0], t);
-         }
-         return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[2]);
-@@ -XXX,XX +XXX,XX @@ static bool fold_mov(OptContext *ctx, TCGOp *op)
- static bool fold_movcond(OptContext *ctx, TCGOp *op)
- {
--    TCGOpcode opc = op->opc;
-     TCGCond cond = op->args[5];
--    int i = do_constant_folding_cond(opc, op->args[1], op->args[2], cond);
-+    int i = do_constant_folding_cond(ctx->type, op->args[1], op->args[2], cond);
-     if (i >= 0) {
-         return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[4 - i]);
-@@ -XXX,XX +XXX,XX @@ static bool fold_movcond(OptContext *ctx, TCGOp *op)
-     if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
-         uint64_t tv = arg_info(op->args[3])->val;
-         uint64_t fv = arg_info(op->args[4])->val;
-+        TCGOpcode opc;
--        opc = (opc == INDEX_op_movcond_i32
--               ? INDEX_op_setcond_i32 : INDEX_op_setcond_i64);
-+        switch (ctx->type) {
-+        case TCG_TYPE_I32:
-+            opc = INDEX_op_setcond_i32;
-+            break;
-+        case TCG_TYPE_I64:
-+            opc = INDEX_op_setcond_i64;
-+            break;
-+        default:
-+            g_assert_not_reached();
-+        }
-         if (tv == 1 && fv == 0) {
-             op->opc = opc;
-@@ -XXX,XX +XXX,XX @@ static bool fold_remainder(OptContext *ctx, TCGOp *op)
- static bool fold_setcond(OptContext *ctx, TCGOp *op)
- {
-     TCGCond cond = op->args[3];
--    int i = do_constant_folding_cond(op->opc, op->args[1], op->args[2], cond);
-+    int i = do_constant_folding_cond(ctx->type, op->args[1], op->args[2], cond);
-     if (i >= 0) {
-         return tcg_opt_gen_movi(ctx, op, op->args[0], i);
-@@ -XXX,XX +XXX,XX @@ static bool fold_setcond2(OptContext *ctx, TCGOp *op)
-          * Simplify EQ/NE comparisons where one of the pairs
-          * can be simplified.
-          */
--        i = do_constant_folding_cond(INDEX_op_setcond_i32, op->args[1],
-+        i = do_constant_folding_cond(TCG_TYPE_I32, op->args[1],
-                                      op->args[3], cond);
-         switch (i ^ inv) {
-         case 0:
-@@ -XXX,XX +XXX,XX @@ static bool fold_setcond2(OptContext *ctx, TCGOp *op)
-             goto do_setcond_high;
-         }
--        i = do_constant_folding_cond(INDEX_op_setcond_i32, op->args[2],
-+        i = do_constant_folding_cond(TCG_TYPE_I32, op->args[2],
-                                      op->args[4], cond);
-         switch (i ^ inv) {
-         case 0:
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         init_arguments(&ctx, op, def->nb_oargs + def->nb_iargs);
-         copy_propagate(&ctx, op, def->nb_oargs, def->nb_iargs);
-+        /* Pre-compute the type of the operation. */
-+        if (def->flags & TCG_OPF_VECTOR) {
-+            ctx.type = TCG_TYPE_V64 + TCGOP_VECL(op);
-+        } else if (def->flags & TCG_OPF_64BIT) {
-+            ctx.type = TCG_TYPE_I64;
-+        } else {
-+            ctx.type = TCG_TYPE_I32;
-+        }
-+
-         /* For commutative operations make constant second argument */
-         switch (opc) {
-         CASE_OP_32_64_VEC(add):
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-                     /* Proceed with possible constant folding. */
-                     break;
-                 }
--                if (opc == INDEX_op_sub_i32) {
-+                switch (ctx.type) {
-+                case TCG_TYPE_I32:
-                     neg_op = INDEX_op_neg_i32;
-                     have_neg = TCG_TARGET_HAS_neg_i32;
--                } else if (opc == INDEX_op_sub_i64) {
-+                    break;
-+                case TCG_TYPE_I64:
-                     neg_op = INDEX_op_neg_i64;
-                     have_neg = TCG_TARGET_HAS_neg_i64;
--                } else if (TCG_TARGET_HAS_neg_vec) {
--                    TCGType type = TCGOP_VECL(op) + TCG_TYPE_V64;
--                    unsigned vece = TCGOP_VECE(op);
--                    neg_op = INDEX_op_neg_vec;
--                    have_neg = tcg_can_emit_vec_op(neg_op, type, vece) > 0;
--                } else {
-                     break;
-+                case TCG_TYPE_V64:
-+                case TCG_TYPE_V128:
-+                case TCG_TYPE_V256:
-+                    neg_op = INDEX_op_neg_vec;
-+                    have_neg = tcg_can_emit_vec_op(neg_op, ctx.type,
-+                                                   TCGOP_VECE(op)) > 0;
-+                    break;
-+                default:
-+                    g_assert_not_reached();
-                 }
-                 if (!have_neg) {
-                     break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-                 TCGOpcode not_op;
-                 bool have_not;
--                if (def->flags & TCG_OPF_VECTOR) {
--                    not_op = INDEX_op_not_vec;
--                    have_not = TCG_TARGET_HAS_not_vec;
--                } else if (def->flags & TCG_OPF_64BIT) {
--                    not_op = INDEX_op_not_i64;
--                    have_not = TCG_TARGET_HAS_not_i64;
--                } else {
-+                switch (ctx.type) {
-+                case TCG_TYPE_I32:
-                     not_op = INDEX_op_not_i32;
-                     have_not = TCG_TARGET_HAS_not_i32;
-+                    break;
-+                case TCG_TYPE_I64:
-+                    not_op = INDEX_op_not_i64;
-+                    have_not = TCG_TARGET_HAS_not_i64;
-+                    break;
-+                case TCG_TYPE_V64:
-+                case TCG_TYPE_V128:
-+                case TCG_TYPE_V256:
-+                    not_op = INDEX_op_not_vec;
-+                    have_not = TCG_TARGET_HAS_not_vec;
-+                    break;
-+                default:
-+                    g_assert_not_reached();
-                 }
-                 if (!have_not) {
-                     break;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-            below, we can ignore high bits, but for further optimizations we
-            need to record that the high bits contain garbage.  */
-         partmask = z_mask;
--        if (!(def->flags & TCG_OPF_64BIT)) {
-+        if (ctx.type == TCG_TYPE_I32) {
-             z_mask |= ~(tcg_target_ulong)0xffffffffu;
-             partmask &= 0xffffffffu;
-             affected &= 0xffffffffu;
---
-.25.1

-[PULL 39/56] tcg/optimize: Split out fold_to_not
+[PULL 17/20] plugins: Split out common cb expanders
-Split out the conditional conversion from a more complex logical
+Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
 operation to a simple NOT.  Create a couple more helpers to make
 this easy for the outer-most logical operations.
 Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/optimize.c | 158 +++++++++++++++++++++++++++----------------------
+ accel/tcg/plugin-gen.c | 84 +++++++++++++++++++++---------------------
-file changed, 86 insertions(+), 72 deletions(-)
+file changed, 41 insertions(+), 43 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
+diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
+--- a/accel/tcg/plugin-gen.c
-+++ b/tcg/optimize.c
++++ b/accel/tcg/plugin-gen.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_const2(OptContext *ctx, TCGOp *op)
+@@ -XXX,XX +XXX,XX @@ static void gen_mem_cb(struct qemu_plugin_dyn_cb *cb,
-     return false;
+     tcg_temp_free_i32(cpu_index);
  }
-+/*
++static void inject_cb(struct qemu_plugin_dyn_cb *cb)
-+ * Convert @op to NOT, if NOT is supported by the host.
++
 + * Return true f the conversion is successful, which will still
 + * indicate that the processing is complete.
 + */
 +static bool fold_not(OptContext *ctx, TCGOp *op);
 +static bool fold_to_not(OptContext *ctx, TCGOp *op, int idx)
 +{
-+    TCGOpcode not_op;
++    switch (cb->type) {
-+    bool have_not;
++    case PLUGIN_CB_REGULAR:
-+
++        gen_udata_cb(cb);
 +    switch (ctx->type) {
 +    case TCG_TYPE_I32:
 +        not_op = INDEX_op_not_i32;
 +        have_not = TCG_TARGET_HAS_not_i32;
 +        break;
-+    case TCG_TYPE_I64:
++    case PLUGIN_CB_INLINE:
-+        not_op = INDEX_op_not_i64;
++        gen_inline_cb(cb);
 +        have_not = TCG_TARGET_HAS_not_i64;
 +        break;
 +    case TCG_TYPE_V64:
 +    case TCG_TYPE_V128:
 +    case TCG_TYPE_V256:
 +        not_op = INDEX_op_not_vec;
 +        have_not = TCG_TARGET_HAS_not_vec;
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
-+    if (have_not) {
-+        op->opc = not_op;
-+        op->args[1] = op->args[idx];
-+        return fold_not(ctx, op);
-+    }
-+    return false;
 +}
 +
-+/* If the binary operation has first argument @i, fold to NOT. */
++static void inject_mem_cb(struct qemu_plugin_dyn_cb *cb,
-+static bool fold_ix_to_not(OptContext *ctx, TCGOp *op, uint64_t i)
++                          enum qemu_plugin_mem_rw rw,
 +                          qemu_plugin_meminfo_t meminfo, TCGv_i64 addr)
 +{
-+    if (arg_is_const(op->args[1]) && arg_info(op->args[1])->val == i) {
++    if (cb->rw & rw) {
-+        return fold_to_not(ctx, op, 2);
++        switch (cb->type) {
 +        case PLUGIN_CB_MEM_REGULAR:
 +            gen_mem_cb(cb, meminfo, addr);
 +            break;
 +        default:
 +            inject_cb(cb);
 +            break;
 +        }
 +    }
-+    return false;
 +}
 +
- /* If the binary operation has second argument @i, fold to @i. */
+ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
  static bool fold_xi_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
  {
-@@ -XXX,XX +XXX,XX @@ static bool fold_xi_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
+     TCGOp *op, *next;
-     return false;
+@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
- }
+                 cbs = plugin_tb->cbs;
-+/* If the binary operation has second argument @i, fold to NOT. */
+                 for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
-+static bool fold_xi_to_not(OptContext *ctx, TCGOp *op, uint64_t i)
+-                    struct qemu_plugin_dyn_cb *cb =
-+{
+-                        &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
-+    if (arg_is_const(op->args[2]) && arg_info(op->args[2])->val == i) {
+-
-+        return fold_to_not(ctx, op, 1);
+-                    switch (cb->type) {
-+    }
+-                    case PLUGIN_CB_REGULAR:
-+    return false;
+-                        gen_udata_cb(cb);
-+}
+-                        break;
-+
+-                    case PLUGIN_CB_INLINE:
- /* If the binary operation has both arguments equal, fold to @i. */
+-                        gen_inline_cb(cb);
- static bool fold_xx_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
+-                        break;
- {
+-                    default:
-@@ -XXX,XX +XXX,XX @@ static bool fold_and(OptContext *ctx, TCGOp *op)
+-                        g_assert_not_reached();
- static bool fold_andc(OptContext *ctx, TCGOp *op)
+-                    }
- {
++                    inject_cb(
-     if (fold_const2(ctx, op) ||
++                        &g_array_index(cbs, struct qemu_plugin_dyn_cb, i));
 -        fold_xx_to_i(ctx, op, 0)) {
 +        fold_xx_to_i(ctx, op, 0) ||
 +        fold_ix_to_not(ctx, op, -1)) {
          return true;
      }
      return false;
@@ -XXX,XX +XXX,XX @@ static bool fold_dup2(OptContext *ctx, TCGOp *op)
  static bool fold_eqv(OptContext *ctx, TCGOp *op)
  {
 -    return fold_const2(ctx, op);
 +    if (fold_const2(ctx, op) ||
 +        fold_xi_to_not(ctx, op, 0)) {
 +        return true;
 +    }
 +    return false;
  }
  static bool fold_extract(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_mulu2_i32(OptContext *ctx, TCGOp *op)
  static bool fold_nand(OptContext *ctx, TCGOp *op)
  {
 -    return fold_const2(ctx, op);
 +    if (fold_const2(ctx, op) ||
 +        fold_xi_to_not(ctx, op, -1)) {
 +        return true;
 +    }
 +    return false;
  }
  static bool fold_neg(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_neg(OptContext *ctx, TCGOp *op)
  static bool fold_nor(OptContext *ctx, TCGOp *op)
  {
 -    return fold_const2(ctx, op);
 +    if (fold_const2(ctx, op) ||
 +        fold_xi_to_not(ctx, op, 0)) {
 +        return true;
 +    }
 +    return false;
  }
  static bool fold_not(OptContext *ctx, TCGOp *op)
  {
 -    return fold_const1(ctx, op);
 +    if (fold_const1(ctx, op)) {
 +        return true;
 +    }
 +
 +    /* Because of fold_to_not, we want to always return true, via finish. */
 +    finish_folding(ctx, op);
 +    return true;
  }
  static bool fold_or(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_or(OptContext *ctx, TCGOp *op)
  static bool fold_orc(OptContext *ctx, TCGOp *op)
  {
 -    return fold_const2(ctx, op);
 +    if (fold_const2(ctx, op) ||
 +        fold_ix_to_not(ctx, op, 0)) {
 +        return true;
 +    }
 +    return false;
  }
  static bool fold_qemu_ld(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_sub2_i32(OptContext *ctx, TCGOp *op)
  static bool fold_xor(OptContext *ctx, TCGOp *op)
  {
      if (fold_const2(ctx, op) ||
 -        fold_xx_to_i(ctx, op, 0)) {
 +        fold_xx_to_i(ctx, op, 0) ||
 +        fold_xi_to_not(ctx, op, -1)) {
          return true;
      }
      return false;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                  }
+                 break;
+@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
+                 cbs = insn->insn_cbs;
+                 for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
+-                    struct qemu_plugin_dyn_cb *cb =
+-                        &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
+-
+-                    switch (cb->type) {
+-                    case PLUGIN_CB_REGULAR:
+-                        gen_udata_cb(cb);
+-                        break;
+-                    case PLUGIN_CB_INLINE:
+-                        gen_inline_cb(cb);
+-                        break;
+-                    default:
+-                        g_assert_not_reached();
+-                    }
++                    inject_cb(
++                        &g_array_index(cbs, struct qemu_plugin_dyn_cb, i));
+                 }
+                 break;
+@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
+         {
+             TCGv_i64 addr = temp_tcgv_i64(arg_temp(op->args[0]));
+             qemu_plugin_meminfo_t meminfo = op->args[1];
++            enum qemu_plugin_mem_rw rw =
++                (qemu_plugin_mem_is_store(meminfo)
++                 ? QEMU_PLUGIN_MEM_W : QEMU_PLUGIN_MEM_R);
+             struct qemu_plugin_insn *insn;
+             const GArray *cbs;
+-            int i, n, rw;
++            int i, n;
+             assert(insn_idx >= 0);
+             insn = g_ptr_array_index(plugin_tb->insns, insn_idx);
+-            rw = qemu_plugin_mem_is_store(meminfo) ? 2 : 1;
+             tcg_ctx->emit_before_op = op;
+             cbs = insn->mem_cbs;
+             for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
+-                struct qemu_plugin_dyn_cb *cb =
+-                    &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
+-
+-                if (cb->rw & rw) {
+-                    switch (cb->type) {
+-                    case PLUGIN_CB_MEM_REGULAR:
+-                        gen_mem_cb(cb, meminfo, addr);
+-                        break;
+-                    case PLUGIN_CB_INLINE:
+-                        gen_inline_cb(cb);
+-                        break;
+-                    default:
+-                        g_assert_not_reached();
+-                    }
+-                }
++                inject_mem_cb(&g_array_index(cbs, struct qemu_plugin_dyn_cb, i),
++                              rw, meminfo, addr);
              }
-             break;
--        CASE_OP_32_64_VEC(xor):
+             tcg_ctx->emit_before_op = NULL;
 -        CASE_OP_32_64(nand):
 -            if (!arg_is_const(op->args[1])
 -                && arg_is_const(op->args[2])
 -                && arg_info(op->args[2])->val == -1) {
 -                i = 1;
 -                goto try_not;
 -            }
 -            break;
 -        CASE_OP_32_64(nor):
 -            if (!arg_is_const(op->args[1])
 -                && arg_is_const(op->args[2])
 -                && arg_info(op->args[2])->val == 0) {
 -                i = 1;
 -                goto try_not;
 -            }
 -            break;
 -        CASE_OP_32_64_VEC(andc):
 -            if (!arg_is_const(op->args[2])
 -                && arg_is_const(op->args[1])
 -                && arg_info(op->args[1])->val == -1) {
 -                i = 2;
 -                goto try_not;
 -            }
 -            break;
 -        CASE_OP_32_64_VEC(orc):
 -        CASE_OP_32_64(eqv):
 -            if (!arg_is_const(op->args[2])
 -                && arg_is_const(op->args[1])
 -                && arg_info(op->args[1])->val == 0) {
 -                i = 2;
 -                goto try_not;
 -            }
 -            break;
 -        try_not:
 -            {
 -                TCGOpcode not_op;
 -                bool have_not;
 -
 -                switch (ctx.type) {
 -                case TCG_TYPE_I32:
 -                    not_op = INDEX_op_not_i32;
 -                    have_not = TCG_TARGET_HAS_not_i32;
 -                    break;
 -                case TCG_TYPE_I64:
 -                    not_op = INDEX_op_not_i64;
 -                    have_not = TCG_TARGET_HAS_not_i64;
 -                    break;
 -                case TCG_TYPE_V64:
 -                case TCG_TYPE_V128:
 -                case TCG_TYPE_V256:
 -                    not_op = INDEX_op_not_vec;
 -                    have_not = TCG_TARGET_HAS_not_vec;
 -                    break;
 -                default:
 -                    g_assert_not_reached();
 -                }
 -                if (!have_not) {
 -                    break;
 -                }
 -                op->opc = not_op;
 -                reset_temp(op->args[0]);
 -                op->args[1] = op->args[i];
 -                continue;
 -            }
          default:
              break;
          }
 --
-.25.1
+.34.1

-[PULL 41/56] tcg/optimize: Split out fold_xi_to_x
+Deleted patch
-Pull the "op r, a, i => mov r, a" optimization into a function,
-and use them in the outer-most logical operations.
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 61 +++++++++++++++++++++-----------------------------
-file changed, 26 insertions(+), 35 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_xi_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
-     return false;
- }
-+/* If the binary operation has second argument @i, fold to identity. */
-+static bool fold_xi_to_x(OptContext *ctx, TCGOp *op, uint64_t i)
-+{
-+    if (arg_is_const(op->args[2]) && arg_info(op->args[2])->val == i) {
-+        return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[1]);
-+    }
-+    return false;
-+}
-+
- /* If the binary operation has second argument @i, fold to NOT. */
- static bool fold_xi_to_not(OptContext *ctx, TCGOp *op, uint64_t i)
- {
-@@ -XXX,XX +XXX,XX @@ static bool fold_xx_to_x(OptContext *ctx, TCGOp *op)
- static bool fold_add(OptContext *ctx, TCGOp *op)
- {
--    return fold_const2(ctx, op);
-+    if (fold_const2(ctx, op) ||
-+        fold_xi_to_x(ctx, op, 0)) {
-+        return true;
-+    }
-+    return false;
- }
- static bool fold_addsub2_i32(OptContext *ctx, TCGOp *op, bool add)
-@@ -XXX,XX +XXX,XX @@ static bool fold_and(OptContext *ctx, TCGOp *op)
- {
-     if (fold_const2(ctx, op) ||
-         fold_xi_to_i(ctx, op, 0) ||
-+        fold_xi_to_x(ctx, op, -1) ||
-         fold_xx_to_x(ctx, op)) {
-         return true;
-     }
-@@ -XXX,XX +XXX,XX @@ static bool fold_andc(OptContext *ctx, TCGOp *op)
- {
-     if (fold_const2(ctx, op) ||
-         fold_xx_to_i(ctx, op, 0) ||
-+        fold_xi_to_x(ctx, op, 0) ||
-         fold_ix_to_not(ctx, op, -1)) {
-         return true;
-     }
-@@ -XXX,XX +XXX,XX @@ static bool fold_dup2(OptContext *ctx, TCGOp *op)
- static bool fold_eqv(OptContext *ctx, TCGOp *op)
- {
-     if (fold_const2(ctx, op) ||
-+        fold_xi_to_x(ctx, op, -1) ||
-         fold_xi_to_not(ctx, op, 0)) {
-         return true;
-     }
-@@ -XXX,XX +XXX,XX @@ static bool fold_not(OptContext *ctx, TCGOp *op)
- static bool fold_or(OptContext *ctx, TCGOp *op)
- {
-     if (fold_const2(ctx, op) ||
-+        fold_xi_to_x(ctx, op, 0) ||
-         fold_xx_to_x(ctx, op)) {
-         return true;
-     }
-@@ -XXX,XX +XXX,XX @@ static bool fold_or(OptContext *ctx, TCGOp *op)
- static bool fold_orc(OptContext *ctx, TCGOp *op)
- {
-     if (fold_const2(ctx, op) ||
-+        fold_xi_to_x(ctx, op, -1) ||
-         fold_ix_to_not(ctx, op, 0)) {
-         return true;
-     }
-@@ -XXX,XX +XXX,XX @@ static bool fold_sextract(OptContext *ctx, TCGOp *op)
- static bool fold_shift(OptContext *ctx, TCGOp *op)
- {
--    return fold_const2(ctx, op);
-+    if (fold_const2(ctx, op) ||
-+        fold_xi_to_x(ctx, op, 0)) {
-+        return true;
-+    }
-+    return false;
- }
- static bool fold_sub_to_neg(OptContext *ctx, TCGOp *op)
-@@ -XXX,XX +XXX,XX @@ static bool fold_sub(OptContext *ctx, TCGOp *op)
- {
-     if (fold_const2(ctx, op) ||
-         fold_xx_to_i(ctx, op, 0) ||
-+        fold_xi_to_x(ctx, op, 0) ||
-         fold_sub_to_neg(ctx, op)) {
-         return true;
-     }
-@@ -XXX,XX +XXX,XX @@ static bool fold_xor(OptContext *ctx, TCGOp *op)
- {
-     if (fold_const2(ctx, op) ||
-         fold_xx_to_i(ctx, op, 0) ||
-+        fold_xi_to_x(ctx, op, 0) ||
-         fold_xi_to_not(ctx, op, -1)) {
-         return true;
-     }
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             break;
-         }
--        /* Simplify expression for "op r, a, const => mov r, a" cases */
--        switch (opc) {
--        CASE_OP_32_64_VEC(add):
--        CASE_OP_32_64_VEC(sub):
--        CASE_OP_32_64_VEC(or):
--        CASE_OP_32_64_VEC(xor):
--        CASE_OP_32_64_VEC(andc):
--        CASE_OP_32_64(shl):
--        CASE_OP_32_64(shr):
--        CASE_OP_32_64(sar):
--        CASE_OP_32_64(rotl):
--        CASE_OP_32_64(rotr):
--            if (!arg_is_const(op->args[1])
--                && arg_is_const(op->args[2])
--                && arg_info(op->args[2])->val == 0) {
--                tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
--                continue;
--            }
--            break;
--        CASE_OP_32_64_VEC(and):
--        CASE_OP_32_64_VEC(orc):
--        CASE_OP_32_64(eqv):
--            if (!arg_is_const(op->args[1])
--                && arg_is_const(op->args[2])
--                && arg_info(op->args[2])->val == -1) {
--                tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
--                continue;
--            }
--            break;
--        default:
--            break;
--        }
--
-         /* Simplify using known-zero bits. Currently only ops with a single
-            output argument is supported. */
-         z_mask = -1;
---
-.25.1

-[PULL 42/56] tcg/optimize: Split out fold_ix_to_i
+Deleted patch
-Pull the "op r, 0, b => movi r, 0" optimization into a function,
-and use it in fold_shift.
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 28 ++++++++++------------------
-file changed, 10 insertions(+), 18 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_to_not(OptContext *ctx, TCGOp *op, int idx)
-     return false;
- }
-+/* If the binary operation has first argument @i, fold to @i. */
-+static bool fold_ix_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
-+{
-+    if (arg_is_const(op->args[1]) && arg_info(op->args[1])->val == i) {
-+        return tcg_opt_gen_movi(ctx, op, op->args[0], i);
-+    }
-+    return false;
-+}
-+
- /* If the binary operation has first argument @i, fold to NOT. */
- static bool fold_ix_to_not(OptContext *ctx, TCGOp *op, uint64_t i)
- {
-@@ -XXX,XX +XXX,XX @@ static bool fold_sextract(OptContext *ctx, TCGOp *op)
- static bool fold_shift(OptContext *ctx, TCGOp *op)
- {
-     if (fold_const2(ctx, op) ||
-+        fold_ix_to_i(ctx, op, 0) ||
-         fold_xi_to_x(ctx, op, 0)) {
-         return true;
-     }
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             break;
-         }
--        /* Simplify expressions for "shift/rot r, 0, a => movi r, 0",
--           and "sub r, 0, a => neg r, a" case.  */
--        switch (opc) {
--        CASE_OP_32_64(shl):
--        CASE_OP_32_64(shr):
--        CASE_OP_32_64(sar):
--        CASE_OP_32_64(rotl):
--        CASE_OP_32_64(rotr):
--            if (arg_is_const(op->args[1])
--                && arg_info(op->args[1])->val == 0) {
--                tcg_opt_gen_movi(&ctx, op, op->args[0], 0);
--                continue;
--            }
--            break;
--        default:
--            break;
--        }
--
-         /* Simplify using known-zero bits. Currently only ops with a single
-            output argument is supported. */
-         z_mask = -1;
---
-.25.1

-[PULL 43/56] tcg/optimize: Split out fold_masks
+[PULL 18/20] plugins: Merge qemu_plugin_tb_insn_get to plugin-gen.c
-Move all of the known-zero optimizations into the per-opcode
+Merge qemu_plugin_insn_alloc and qemu_plugin_tb_insn_get into
-functions.  Use fold_masks when there is a possibility of the
+plugin_gen_insn_start, since it is used nowhere else.
 result being determined, and simply set ctx->z_mask otherwise.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
 Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/optimize.c | 545 ++++++++++++++++++++++++++-----------------------
+ include/qemu/plugin.h  | 39 ---------------------------------------
-file changed, 294 insertions(+), 251 deletions(-)
+ accel/tcg/plugin-gen.c | 39 ++++++++++++++++++++++++++++++++-------
 files changed, 32 insertions(+), 46 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
+diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
+--- a/include/qemu/plugin.h
-+++ b/tcg/optimize.c
++++ b/include/qemu/plugin.h
-@@ -XXX,XX +XXX,XX @@ typedef struct OptContext {
+@@ -XXX,XX +XXX,XX @@ static inline void qemu_plugin_insn_cleanup_fn(gpointer data)
-     TCGTempSet temps_used;
+     g_byte_array_free(insn->data, true);
      /* In flight values from optimization. */
 -    uint64_t z_mask;
 +    uint64_t a_mask;  /* mask bit is 0 iff value identical to first input */
 +    uint64_t z_mask;  /* mask bit is 0 iff value bit is 0 */
      TCGType type;
  } OptContext;
@@ -XXX,XX +XXX,XX @@ static bool fold_const2(OptContext *ctx, TCGOp *op)
      return false;
  }
-+static bool fold_masks(OptContext *ctx, TCGOp *op)
+-static inline struct qemu_plugin_insn *qemu_plugin_insn_alloc(void)
-+{
+-{
-+    uint64_t a_mask = ctx->a_mask;
+-    struct qemu_plugin_insn *insn = g_new0(struct qemu_plugin_insn, 1);
-+    uint64_t z_mask = ctx->z_mask;
+-
-+
+-    insn->data = g_byte_array_sized_new(4);
-+    /*
+-    return insn;
-+     * 32-bit ops generate 32-bit results.  For the result is zero test
+-}
-+     * below, we can ignore high bits, but for further optimizations we
+-
-+     * need to record that the high bits contain garbage.
+ /* Internal context for this TranslationBlock */
-+     */
+ struct qemu_plugin_tb {
-+    if (ctx->type == TCG_TYPE_I32) {
+     GPtrArray *insns;
-+        ctx->z_mask |= MAKE_64BIT_MASK(32, 32);
+@@ -XXX,XX +XXX,XX @@ struct qemu_plugin_tb {
-+        a_mask &= MAKE_64BIT_MASK(0, 32);
+     GArray *cbs;
-+        z_mask &= MAKE_64BIT_MASK(0, 32);
+ };
 -/**
 - * qemu_plugin_tb_insn_get(): get next plugin record for translation.
 - * @tb: the internal tb context
 - * @pc: address of instruction
 - */
 -static inline
 -struct qemu_plugin_insn *qemu_plugin_tb_insn_get(struct qemu_plugin_tb *tb,
 -                                                 uint64_t pc)
 -{
 -    struct qemu_plugin_insn *insn;
 -
 -    if (unlikely(tb->n == tb->insns->len)) {
 -        struct qemu_plugin_insn *new_insn = qemu_plugin_insn_alloc();
 -        g_ptr_array_add(tb->insns, new_insn);
 -    }
 -
 -    insn = g_ptr_array_index(tb->insns, tb->n++);
 -    g_byte_array_set_size(insn->data, 0);
 -    insn->calls_helpers = false;
 -    insn->mem_helper = false;
 -    insn->vaddr = pc;
 -    if (insn->insn_cbs) {
 -        g_array_set_size(insn->insn_cbs, 0);
 -    }
 -    if (insn->mem_cbs) {
 -        g_array_set_size(insn->mem_cbs, 0);
 -    }
 -
 -    return insn;
 -}
 -
  /**
   * struct CPUPluginState - per-CPU state for plugins
   * @event_mask: plugin event bitmap. Modified only via async work.
 diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/tcg/plugin-gen.c
 +++ b/accel/tcg/plugin-gen.c
@@ -XXX,XX +XXX,XX @@ bool plugin_gen_tb_start(CPUState *cpu, const DisasContextBase *db,
  void plugin_gen_insn_start(CPUState *cpu, const DisasContextBase *db)
  {
      struct qemu_plugin_tb *ptb = tcg_ctx->plugin_tb;
 -    struct qemu_plugin_insn *pinsn;
 +    struct qemu_plugin_insn *insn;
 +    size_t n = db->num_insns;
 +    vaddr pc;
 -    pinsn = qemu_plugin_tb_insn_get(ptb, db->pc_next);
 -    tcg_ctx->plugin_insn = pinsn;
 -    plugin_gen_empty_callback(PLUGIN_GEN_FROM_INSN);
 +    assert(n >= 1);
 +    ptb->n = n;
 +    if (n <= ptb->insns->len) {
 +        insn = g_ptr_array_index(ptb->insns, n - 1);
 +        g_byte_array_set_size(insn->data, 0);
 +    } else {
 +        assert(n - 1 == ptb->insns->len);
 +        insn = g_new0(struct qemu_plugin_insn, 1);
 +        insn->data = g_byte_array_sized_new(4);
 +        g_ptr_array_add(ptb->insns, insn);
 +    }
 +
-+    if (z_mask == 0) {
++    tcg_ctx->plugin_insn = insn;
-+        return tcg_opt_gen_movi(ctx, op, op->args[0], 0);
++    insn->calls_helpers = false;
 +    insn->mem_helper = false;
 +    if (insn->insn_cbs) {
 +        g_array_set_size(insn->insn_cbs, 0);
 +    }
-+    if (a_mask == 0) {
++    if (insn->mem_cbs) {
-+        return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[1]);
++        g_array_set_size(insn->mem_cbs, 0);
 +    }
 +    return false;
 +}
 +
  /*
   * Convert @op to NOT, if NOT is supported by the host.
   * Return true f the conversion is successful, which will still
@@ -XXX,XX +XXX,XX @@ static bool fold_add2_i32(OptContext *ctx, TCGOp *op)
  static bool fold_and(OptContext *ctx, TCGOp *op)
  {
 +    uint64_t z1, z2;
 +
      if (fold_const2(ctx, op) ||
          fold_xi_to_i(ctx, op, 0) ||
          fold_xi_to_x(ctx, op, -1) ||
          fold_xx_to_x(ctx, op)) {
          return true;
      }
 -    return false;
 +
 +    z1 = arg_info(op->args[1])->z_mask;
 +    z2 = arg_info(op->args[2])->z_mask;
 +    ctx->z_mask = z1 & z2;
 +
 +    /*
 +     * Known-zeros does not imply known-ones.  Therefore unless
 +     * arg2 is constant, we can't infer affected bits from it.
 +     */
 +    if (arg_is_const(op->args[2])) {
 +        ctx->a_mask = z1 & ~z2;
 +    }
 +
-+    return fold_masks(ctx, op);
++    pc = db->pc_next;
- }
++    insn->vaddr = pc;
- static bool fold_andc(OptContext *ctx, TCGOp *op)
+     /*
- {
+      * Detect page crossing to get the new host address.
-+    uint64_t z1;
+@@ -XXX,XX +XXX,XX @@ void plugin_gen_insn_start(CPUState *cpu, const DisasContextBase *db)
-+
+      * fetching instructions from a region not backed by RAM.
-     if (fold_const2(ctx, op) ||
+      */
-         fold_xx_to_i(ctx, op, 0) ||
+     if (ptb->haddr1 == NULL) {
-         fold_xi_to_x(ctx, op, 0) ||
+-        pinsn->haddr = NULL;
-         fold_ix_to_not(ctx, op, -1)) {
++        insn->haddr = NULL;
-         return true;
+     } else if (is_same_page(db, db->pc_next)) {
-     }
+-        pinsn->haddr = ptb->haddr1 + pinsn->vaddr - ptb->vaddr;
--    return false;
++        insn->haddr = ptb->haddr1 + pc - ptb->vaddr;
-+
+     } else {
-+    z1 = arg_info(op->args[1])->z_mask;
+         if (ptb->vaddr2 == -1) {
-+
+             ptb->vaddr2 = TARGET_PAGE_ALIGN(db->pc_first);
-+    /*
+             get_page_addr_code_hostp(cpu_env(cpu), ptb->vaddr2, &ptb->haddr2);
 +     * Known-zeros does not imply known-ones.  Therefore unless
 +     * arg2 is constant, we can't infer anything from it.
 +     */
 +    if (arg_is_const(op->args[2])) {
 +        uint64_t z2 = ~arg_info(op->args[2])->z_mask;
 +        ctx->a_mask = z1 & ~z2;
 +        z1 &= z2;
 +    }
 +    ctx->z_mask = z1;
 +
 +    return fold_masks(ctx, op);
  }
  static bool fold_brcond(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_brcond2(OptContext *ctx, TCGOp *op)
  static bool fold_bswap(OptContext *ctx, TCGOp *op)
  {
 +    uint64_t z_mask, sign;
 +
      if (arg_is_const(op->args[1])) {
          uint64_t t = arg_info(op->args[1])->val;
          t = do_constant_folding(op->opc, ctx->type, t, op->args[2]);
          return tcg_opt_gen_movi(ctx, op, op->args[0], t);
      }
 -    return false;
 +
 +    z_mask = arg_info(op->args[1])->z_mask;
 +    switch (op->opc) {
 +    case INDEX_op_bswap16_i32:
 +    case INDEX_op_bswap16_i64:
 +        z_mask = bswap16(z_mask);
 +        sign = INT16_MIN;
 +        break;
 +    case INDEX_op_bswap32_i32:
 +    case INDEX_op_bswap32_i64:
 +        z_mask = bswap32(z_mask);
 +        sign = INT32_MIN;
 +        break;
 +    case INDEX_op_bswap64_i64:
 +        z_mask = bswap64(z_mask);
 +        sign = INT64_MIN;
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +
 +    switch (op->args[2] & (TCG_BSWAP_OZ | TCG_BSWAP_OS)) {
 +    case TCG_BSWAP_OZ:
 +        break;
 +    case TCG_BSWAP_OS:
 +        /* If the sign bit may be 1, force all the bits above to 1. */
 +        if (z_mask & sign) {
 +            z_mask |= sign;
 +        }
 +        break;
 +    default:
 +        /* The high bits are undefined: force all bits above the sign to 1. */
 +        z_mask |= sign << 1;
 +        break;
 +    }
 +    ctx->z_mask = z_mask;
 +
 +    return fold_masks(ctx, op);
  }
  static bool fold_call(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_call(OptContext *ctx, TCGOp *op)
  static bool fold_count_zeros(OptContext *ctx, TCGOp *op)
  {
 +    uint64_t z_mask;
 +
      if (arg_is_const(op->args[1])) {
          uint64_t t = arg_info(op->args[1])->val;
@@ -XXX,XX +XXX,XX @@ static bool fold_count_zeros(OptContext *ctx, TCGOp *op)
          }
-         return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[2]);
+-        pinsn->haddr = ptb->haddr2 + pinsn->vaddr - ptb->vaddr2;
 +        insn->haddr = ptb->haddr2 + pc - ptb->vaddr2;
      }
 +
-+    switch (ctx->type) {
++    plugin_gen_empty_callback(PLUGIN_GEN_FROM_INSN);
 +    case TCG_TYPE_I32:
 +        z_mask = 31;
 +        break;
 +    case TCG_TYPE_I64:
 +        z_mask = 63;
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +    ctx->z_mask = arg_info(op->args[2])->z_mask | z_mask;
 +
      return false;
  }
- static bool fold_ctpop(OptContext *ctx, TCGOp *op)
+ void plugin_gen_insn_end(void)
  {
 -    return fold_const1(ctx, op);
 +    if (fold_const1(ctx, op)) {
 +        return true;
 +    }
 +
 +    switch (ctx->type) {
 +    case TCG_TYPE_I32:
 +        ctx->z_mask = 32 | 31;
 +        break;
 +    case TCG_TYPE_I64:
 +        ctx->z_mask = 64 | 63;
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +    return false;
  }
  static bool fold_deposit(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
          t1 = deposit64(t1, op->args[3], op->args[4], t2);
          return tcg_opt_gen_movi(ctx, op, op->args[0], t1);
      }
 +
 +    ctx->z_mask = deposit64(arg_info(op->args[1])->z_mask,
 +                            op->args[3], op->args[4],
 +                            arg_info(op->args[2])->z_mask);
      return false;
  }
@@ -XXX,XX +XXX,XX @@ static bool fold_eqv(OptContext *ctx, TCGOp *op)
  static bool fold_extract(OptContext *ctx, TCGOp *op)
  {
 +    uint64_t z_mask_old, z_mask;
 +
      if (arg_is_const(op->args[1])) {
          uint64_t t;
@@ -XXX,XX +XXX,XX @@ static bool fold_extract(OptContext *ctx, TCGOp *op)
          t = extract64(t, op->args[2], op->args[3]);
          return tcg_opt_gen_movi(ctx, op, op->args[0], t);
      }
 -    return false;
 +
 +    z_mask_old = arg_info(op->args[1])->z_mask;
 +    z_mask = extract64(z_mask_old, op->args[2], op->args[3]);
 +    if (op->args[2] == 0) {
 +        ctx->a_mask = z_mask_old ^ z_mask;
 +    }
 +    ctx->z_mask = z_mask;
 +
 +    return fold_masks(ctx, op);
  }
  static bool fold_extract2(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_extract2(OptContext *ctx, TCGOp *op)
  static bool fold_exts(OptContext *ctx, TCGOp *op)
  {
 -    return fold_const1(ctx, op);
 +    uint64_t z_mask_old, z_mask, sign;
 +    bool type_change = false;
 +
 +    if (fold_const1(ctx, op)) {
 +        return true;
 +    }
 +
 +    z_mask_old = z_mask = arg_info(op->args[1])->z_mask;
 +
 +    switch (op->opc) {
 +    CASE_OP_32_64(ext8s):
 +        sign = INT8_MIN;
 +        z_mask = (uint8_t)z_mask;
 +        break;
 +    CASE_OP_32_64(ext16s):
 +        sign = INT16_MIN;
 +        z_mask = (uint16_t)z_mask;
 +        break;
 +    case INDEX_op_ext_i32_i64:
 +        type_change = true;
 +        QEMU_FALLTHROUGH;
 +    case INDEX_op_ext32s_i64:
 +        sign = INT32_MIN;
 +        z_mask = (uint32_t)z_mask;
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +
 +    if (z_mask & sign) {
 +        z_mask |= sign;
 +    } else if (!type_change) {
 +        ctx->a_mask = z_mask_old ^ z_mask;
 +    }
 +    ctx->z_mask = z_mask;
 +
 +    return fold_masks(ctx, op);
  }
  static bool fold_extu(OptContext *ctx, TCGOp *op)
  {
 -    return fold_const1(ctx, op);
 +    uint64_t z_mask_old, z_mask;
 +    bool type_change = false;
 +
 +    if (fold_const1(ctx, op)) {
 +        return true;
 +    }
 +
 +    z_mask_old = z_mask = arg_info(op->args[1])->z_mask;
 +
 +    switch (op->opc) {
 +    CASE_OP_32_64(ext8u):
 +        z_mask = (uint8_t)z_mask;
 +        break;
 +    CASE_OP_32_64(ext16u):
 +        z_mask = (uint16_t)z_mask;
 +        break;
 +    case INDEX_op_extrl_i64_i32:
 +    case INDEX_op_extu_i32_i64:
 +        type_change = true;
 +        QEMU_FALLTHROUGH;
 +    case INDEX_op_ext32u_i64:
 +        z_mask = (uint32_t)z_mask;
 +        break;
 +    case INDEX_op_extrh_i64_i32:
 +        type_change = true;
 +        z_mask >>= 32;
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +
 +    ctx->z_mask = z_mask;
 +    if (!type_change) {
 +        ctx->a_mask = z_mask_old ^ z_mask;
 +    }
 +    return fold_masks(ctx, op);
  }
  static bool fold_mb(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_movcond(OptContext *ctx, TCGOp *op)
          return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[4 - i]);
      }
 +    ctx->z_mask = arg_info(op->args[3])->z_mask
 +                | arg_info(op->args[4])->z_mask;
 +
      if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
          uint64_t tv = arg_info(op->args[3])->val;
          uint64_t fv = arg_info(op->args[4])->val;
@@ -XXX,XX +XXX,XX @@ static bool fold_nand(OptContext *ctx, TCGOp *op)
  static bool fold_neg(OptContext *ctx, TCGOp *op)
  {
 +    uint64_t z_mask;
 +
      if (fold_const1(ctx, op)) {
          return true;
      }
 +
 +    /* Set to 1 all bits to the left of the rightmost.  */
 +    z_mask = arg_info(op->args[1])->z_mask;
 +    ctx->z_mask = -(z_mask & -z_mask);
 +
      /*
       * Because of fold_sub_to_neg, we want to always return true,
       * via finish_folding.
@@ -XXX,XX +XXX,XX @@ static bool fold_or(OptContext *ctx, TCGOp *op)
          fold_xx_to_x(ctx, op)) {
          return true;
      }
 -    return false;
 +
 +    ctx->z_mask = arg_info(op->args[1])->z_mask
 +                | arg_info(op->args[2])->z_mask;
 +    return fold_masks(ctx, op);
  }
  static bool fold_orc(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_orc(OptContext *ctx, TCGOp *op)
  static bool fold_qemu_ld(OptContext *ctx, TCGOp *op)
  {
 +    const TCGOpDef *def = &tcg_op_defs[op->opc];
 +    MemOpIdx oi = op->args[def->nb_oargs + def->nb_iargs];
 +    MemOp mop = get_memop(oi);
 +    int width = 8 * memop_size(mop);
 +
 +    if (!(mop & MO_SIGN) && width < 64) {
 +        ctx->z_mask = MAKE_64BIT_MASK(0, width);
 +    }
 +
      /* Opcodes that touch guest memory stop the mb optimization.  */
      ctx->prev_mb = NULL;
      return false;
@@ -XXX,XX +XXX,XX @@ static bool fold_setcond(OptContext *ctx, TCGOp *op)
      if (i >= 0) {
          return tcg_opt_gen_movi(ctx, op, op->args[0], i);
      }
 +
 +    ctx->z_mask = 1;
      return false;
  }
@@ -XXX,XX +XXX,XX @@ static bool fold_setcond2(OptContext *ctx, TCGOp *op)
          op->opc = INDEX_op_setcond_i32;
          break;
      }
 +
 +    ctx->z_mask = 1;
      return false;
   do_setcond_const:
@@ -XXX,XX +XXX,XX @@ static bool fold_setcond2(OptContext *ctx, TCGOp *op)
  static bool fold_sextract(OptContext *ctx, TCGOp *op)
  {
 +    int64_t z_mask_old, z_mask;
 +
      if (arg_is_const(op->args[1])) {
          uint64_t t;
@@ -XXX,XX +XXX,XX @@ static bool fold_sextract(OptContext *ctx, TCGOp *op)
          t = sextract64(t, op->args[2], op->args[3]);
          return tcg_opt_gen_movi(ctx, op, op->args[0], t);
      }
 -    return false;
 +
 +    z_mask_old = arg_info(op->args[1])->z_mask;
 +    z_mask = sextract64(z_mask_old, op->args[2], op->args[3]);
 +    if (op->args[2] == 0 && z_mask >= 0) {
 +        ctx->a_mask = z_mask_old ^ z_mask;
 +    }
 +    ctx->z_mask = z_mask;
 +
 +    return fold_masks(ctx, op);
  }
  static bool fold_shift(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_shift(OptContext *ctx, TCGOp *op)
          fold_xi_to_x(ctx, op, 0)) {
          return true;
      }
 +
 +    if (arg_is_const(op->args[2])) {
 +        ctx->z_mask = do_constant_folding(op->opc, ctx->type,
 +                                          arg_info(op->args[1])->z_mask,
 +                                          arg_info(op->args[2])->val);
 +        return fold_masks(ctx, op);
 +    }
      return false;
  }
@@ -XXX,XX +XXX,XX @@ static bool fold_sub2_i32(OptContext *ctx, TCGOp *op)
      return fold_addsub2_i32(ctx, op, false);
  }
 +static bool fold_tcg_ld(OptContext *ctx, TCGOp *op)
 +{
 +    /* We can't do any folding with a load, but we can record bits. */
 +    switch (op->opc) {
 +    CASE_OP_32_64(ld8u):
 +        ctx->z_mask = MAKE_64BIT_MASK(0, 8);
 +        break;
 +    CASE_OP_32_64(ld16u):
 +        ctx->z_mask = MAKE_64BIT_MASK(0, 16);
 +        break;
 +    case INDEX_op_ld32u_i64:
 +        ctx->z_mask = MAKE_64BIT_MASK(0, 32);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +    return false;
 +}
 +
  static bool fold_xor(OptContext *ctx, TCGOp *op)
  {
      if (fold_const2(ctx, op) ||
@@ -XXX,XX +XXX,XX @@ static bool fold_xor(OptContext *ctx, TCGOp *op)
          fold_xi_to_not(ctx, op, -1)) {
          return true;
      }
 -    return false;
 +
 +    ctx->z_mask = arg_info(op->args[1])->z_mask
 +                | arg_info(op->args[2])->z_mask;
 +    return fold_masks(ctx, op);
  }
  /* Propagate constants and copies, fold constant expressions. */
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
      }
      QTAILQ_FOREACH_SAFE(op, &s->ops, link, op_next) {
 -        uint64_t z_mask, partmask, affected, tmp;
          TCGOpcode opc = op->opc;
          const TCGOpDef *def;
          bool done = false;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
              break;
          }
 -        /* Simplify using known-zero bits. Currently only ops with a single
 -           output argument is supported. */
 -        z_mask = -1;
 -        affected = -1;
 -        switch (opc) {
 -        CASE_OP_32_64(ext8s):
 -            if ((arg_info(op->args[1])->z_mask & 0x80) != 0) {
 -                break;
 -            }
 -            QEMU_FALLTHROUGH;
 -        CASE_OP_32_64(ext8u):
 -            z_mask = 0xff;
 -            goto and_const;
 -        CASE_OP_32_64(ext16s):
 -            if ((arg_info(op->args[1])->z_mask & 0x8000) != 0) {
 -                break;
 -            }
 -            QEMU_FALLTHROUGH;
 -        CASE_OP_32_64(ext16u):
 -            z_mask = 0xffff;
 -            goto and_const;
 -        case INDEX_op_ext32s_i64:
 -            if ((arg_info(op->args[1])->z_mask & 0x80000000) != 0) {
 -                break;
 -            }
 -            QEMU_FALLTHROUGH;
 -        case INDEX_op_ext32u_i64:
 -            z_mask = 0xffffffffU;
 -            goto and_const;
 -
 -        CASE_OP_32_64(and):
 -            z_mask = arg_info(op->args[2])->z_mask;
 -            if (arg_is_const(op->args[2])) {
 -        and_const:
 -                affected = arg_info(op->args[1])->z_mask & ~z_mask;
 -            }
 -            z_mask = arg_info(op->args[1])->z_mask & z_mask;
 -            break;
 -
 -        case INDEX_op_ext_i32_i64:
 -            if ((arg_info(op->args[1])->z_mask & 0x80000000) != 0) {
 -                break;
 -            }
 -            QEMU_FALLTHROUGH;
 -        case INDEX_op_extu_i32_i64:
 -            /* We do not compute affected as it is a size changing op.  */
 -            z_mask = (uint32_t)arg_info(op->args[1])->z_mask;
 -            break;
 -
 -        CASE_OP_32_64(andc):
 -            /* Known-zeros does not imply known-ones.  Therefore unless
 -               op->args[2] is constant, we can't infer anything from it.  */
 -            if (arg_is_const(op->args[2])) {
 -                z_mask = ~arg_info(op->args[2])->z_mask;
 -                goto and_const;
 -            }
 -            /* But we certainly know nothing outside args[1] may be set. */
 -            z_mask = arg_info(op->args[1])->z_mask;
 -            break;
 -
 -        case INDEX_op_sar_i32:
 -            if (arg_is_const(op->args[2])) {
 -                tmp = arg_info(op->args[2])->val & 31;
 -                z_mask = (int32_t)arg_info(op->args[1])->z_mask >> tmp;
 -            }
 -            break;
 -        case INDEX_op_sar_i64:
 -            if (arg_is_const(op->args[2])) {
 -                tmp = arg_info(op->args[2])->val & 63;
 -                z_mask = (int64_t)arg_info(op->args[1])->z_mask >> tmp;
 -            }
 -            break;
 -
 -        case INDEX_op_shr_i32:
 -            if (arg_is_const(op->args[2])) {
 -                tmp = arg_info(op->args[2])->val & 31;
 -                z_mask = (uint32_t)arg_info(op->args[1])->z_mask >> tmp;
 -            }
 -            break;
 -        case INDEX_op_shr_i64:
 -            if (arg_is_const(op->args[2])) {
 -                tmp = arg_info(op->args[2])->val & 63;
 -                z_mask = (uint64_t)arg_info(op->args[1])->z_mask >> tmp;
 -            }
 -            break;
 -
 -        case INDEX_op_extrl_i64_i32:
 -            z_mask = (uint32_t)arg_info(op->args[1])->z_mask;
 -            break;
 -        case INDEX_op_extrh_i64_i32:
 -            z_mask = (uint64_t)arg_info(op->args[1])->z_mask >> 32;
 -            break;
 -
 -        CASE_OP_32_64(shl):
 -            if (arg_is_const(op->args[2])) {
 -                tmp = arg_info(op->args[2])->val & (TCG_TARGET_REG_BITS - 1);
 -                z_mask = arg_info(op->args[1])->z_mask << tmp;
 -            }
 -            break;
 -
 -        CASE_OP_32_64(neg):
 -            /* Set to 1 all bits to the left of the rightmost.  */
 -            z_mask = -(arg_info(op->args[1])->z_mask
 -                       & -arg_info(op->args[1])->z_mask);
 -            break;
 -
 -        CASE_OP_32_64(deposit):
 -            z_mask = deposit64(arg_info(op->args[1])->z_mask,
 -                               op->args[3], op->args[4],
 -                               arg_info(op->args[2])->z_mask);
 -            break;
 -
 -        CASE_OP_32_64(extract):
 -            z_mask = extract64(arg_info(op->args[1])->z_mask,
 -                               op->args[2], op->args[3]);
 -            if (op->args[2] == 0) {
 -                affected = arg_info(op->args[1])->z_mask & ~z_mask;
 -            }
 -            break;
 -        CASE_OP_32_64(sextract):
 -            z_mask = sextract64(arg_info(op->args[1])->z_mask,
 -                                op->args[2], op->args[3]);
 -            if (op->args[2] == 0 && (tcg_target_long)z_mask >= 0) {
 -                affected = arg_info(op->args[1])->z_mask & ~z_mask;
 -            }
 -            break;
 -
 -        CASE_OP_32_64(or):
 -        CASE_OP_32_64(xor):
 -            z_mask = arg_info(op->args[1])->z_mask
 -                   | arg_info(op->args[2])->z_mask;
 -            break;
 -
 -        case INDEX_op_clz_i32:
 -        case INDEX_op_ctz_i32:
 -            z_mask = arg_info(op->args[2])->z_mask | 31;
 -            break;
 -
 -        case INDEX_op_clz_i64:
 -        case INDEX_op_ctz_i64:
 -            z_mask = arg_info(op->args[2])->z_mask | 63;
 -            break;
 -
 -        case INDEX_op_ctpop_i32:
 -            z_mask = 32 | 31;
 -            break;
 -        case INDEX_op_ctpop_i64:
 -            z_mask = 64 | 63;
 -            break;
 -
 -        CASE_OP_32_64(setcond):
 -        case INDEX_op_setcond2_i32:
 -            z_mask = 1;
 -            break;
 -
 -        CASE_OP_32_64(movcond):
 -            z_mask = arg_info(op->args[3])->z_mask
 -                   | arg_info(op->args[4])->z_mask;
 -            break;
 -
 -        CASE_OP_32_64(ld8u):
 -            z_mask = 0xff;
 -            break;
 -        CASE_OP_32_64(ld16u):
 -            z_mask = 0xffff;
 -            break;
 -        case INDEX_op_ld32u_i64:
 -            z_mask = 0xffffffffu;
 -            break;
 -
 -        CASE_OP_32_64(qemu_ld):
 -            {
 -                MemOpIdx oi = op->args[def->nb_oargs + def->nb_iargs];
 -                MemOp mop = get_memop(oi);
 -                if (!(mop & MO_SIGN)) {
 -                    z_mask = (2ULL << ((8 << (mop & MO_SIZE)) - 1)) - 1;
 -                }
 -            }
 -            break;
 -
 -        CASE_OP_32_64(bswap16):
 -            z_mask = arg_info(op->args[1])->z_mask;
 -            if (z_mask <= 0xffff) {
 -                op->args[2] |= TCG_BSWAP_IZ;
 -            }
 -            z_mask = bswap16(z_mask);
 -            switch (op->args[2] & (TCG_BSWAP_OZ | TCG_BSWAP_OS)) {
 -            case TCG_BSWAP_OZ:
 -                break;
 -            case TCG_BSWAP_OS:
 -                z_mask = (int16_t)z_mask;
 -                break;
 -            default: /* undefined high bits */
 -                z_mask |= MAKE_64BIT_MASK(16, 48);
 -                break;
 -            }
 -            break;
 -
 -        case INDEX_op_bswap32_i64:
 -            z_mask = arg_info(op->args[1])->z_mask;
 -            if (z_mask <= 0xffffffffu) {
 -                op->args[2] |= TCG_BSWAP_IZ;
 -            }
 -            z_mask = bswap32(z_mask);
 -            switch (op->args[2] & (TCG_BSWAP_OZ | TCG_BSWAP_OS)) {
 -            case TCG_BSWAP_OZ:
 -                break;
 -            case TCG_BSWAP_OS:
 -                z_mask = (int32_t)z_mask;
 -                break;
 -            default: /* undefined high bits */
 -                z_mask |= MAKE_64BIT_MASK(32, 32);
 -                break;
 -            }
 -            break;
 -
 -        default:
 -            break;
 -        }
 -
 -        /* 32-bit ops generate 32-bit results.  For the result is zero test
 -           below, we can ignore high bits, but for further optimizations we
 -           need to record that the high bits contain garbage.  */
 -        partmask = z_mask;
 -        if (ctx.type == TCG_TYPE_I32) {
 -            z_mask |= ~(tcg_target_ulong)0xffffffffu;
 -            partmask &= 0xffffffffu;
 -            affected &= 0xffffffffu;
 -        }
 -        ctx.z_mask = z_mask;
 -
 -        if (partmask == 0) {
 -            tcg_opt_gen_movi(&ctx, op, op->args[0], 0);
 -            continue;
 -        }
 -        if (affected == 0) {
 -            tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
 -            continue;
 -        }
 +        /* Assume all bits affected, and no bits known zero. */
 +        ctx.a_mask = -1;
 +        ctx.z_mask = -1;
          /*
           * Process each opcode.
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
          case INDEX_op_extrh_i64_i32:
              done = fold_extu(&ctx, op);
              break;
 +        CASE_OP_32_64(ld8u):
 +        CASE_OP_32_64(ld16u):
 +        case INDEX_op_ld32u_i64:
 +            done = fold_tcg_ld(&ctx, op);
 +            break;
          case INDEX_op_mb:
              done = fold_mb(&ctx, op);
              break;
 --
-.25.1
+.34.1

-[PULL 44/56] tcg/optimize: Expand fold_mulu2_i32 to all 4-arg multiplies
+Deleted patch
-Rename to fold_multiply2, and handle muls2_i32, mulu2_i64,
-and muls2_i64.
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 44 +++++++++++++++++++++++++++++++++++---------
-file changed, 35 insertions(+), 9 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_mul_highpart(OptContext *ctx, TCGOp *op)
-     return false;
- }
--static bool fold_mulu2_i32(OptContext *ctx, TCGOp *op)
-+static bool fold_multiply2(OptContext *ctx, TCGOp *op)
- {
-     if (arg_is_const(op->args[2]) && arg_is_const(op->args[3])) {
--        uint32_t a = arg_info(op->args[2])->val;
--        uint32_t b = arg_info(op->args[3])->val;
--        uint64_t r = (uint64_t)a * b;
-+        uint64_t a = arg_info(op->args[2])->val;
-+        uint64_t b = arg_info(op->args[3])->val;
-+        uint64_t h, l;
-         TCGArg rl, rh;
--        TCGOp *op2 = tcg_op_insert_before(ctx->tcg, op, INDEX_op_mov_i32);
-+        TCGOp *op2;
-+
-+        switch (op->opc) {
-+        case INDEX_op_mulu2_i32:
-+            l = (uint64_t)(uint32_t)a * (uint32_t)b;
-+            h = (int32_t)(l >> 32);
-+            l = (int32_t)l;
-+            break;
-+        case INDEX_op_muls2_i32:
-+            l = (int64_t)(int32_t)a * (int32_t)b;
-+            h = l >> 32;
-+            l = (int32_t)l;
-+            break;
-+        case INDEX_op_mulu2_i64:
-+            mulu64(&l, &h, a, b);
-+            break;
-+        case INDEX_op_muls2_i64:
-+            muls64(&l, &h, a, b);
-+            break;
-+        default:
-+            g_assert_not_reached();
-+        }
-         rl = op->args[0];
-         rh = op->args[1];
--        tcg_opt_gen_movi(ctx, op, rl, (int32_t)r);
--        tcg_opt_gen_movi(ctx, op2, rh, (int32_t)(r >> 32));
-+
-+        /* The proper opcode is supplied by tcg_opt_gen_mov. */
-+        op2 = tcg_op_insert_before(ctx->tcg, op, 0);
-+
-+        tcg_opt_gen_movi(ctx, op, rl, l);
-+        tcg_opt_gen_movi(ctx, op2, rh, h);
-         return true;
-     }
-     return false;
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         CASE_OP_32_64(muluh):
-             done = fold_mul_highpart(&ctx, op);
-             break;
--        case INDEX_op_mulu2_i32:
--            done = fold_mulu2_i32(&ctx, op);
-+        CASE_OP_32_64(muls2):
-+        CASE_OP_32_64(mulu2):
-+            done = fold_multiply2(&ctx, op);
-             break;
-         CASE_OP_32_64(nand):
-             done = fold_nand(&ctx, op);
---
-.25.1

-[PULL 45/56] tcg/optimize: Expand fold_addsub2_i32 to 64-bit ops
+Deleted patch
-Rename to fold_addsub2.
-Use Int128 to implement the wider operation.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 65 ++++++++++++++++++++++++++++++++++----------------
-file changed, 44 insertions(+), 21 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@
-  */
- #include "qemu/osdep.h"
-+#include "qemu/int128.h"
- #include "tcg/tcg-op.h"
- #include "tcg-internal.h"
-@@ -XXX,XX +XXX,XX @@ static bool fold_add(OptContext *ctx, TCGOp *op)
-     return false;
- }
--static bool fold_addsub2_i32(OptContext *ctx, TCGOp *op, bool add)
-+static bool fold_addsub2(OptContext *ctx, TCGOp *op, bool add)
- {
-     if (arg_is_const(op->args[2]) && arg_is_const(op->args[3]) &&
-         arg_is_const(op->args[4]) && arg_is_const(op->args[5])) {
--        uint32_t al = arg_info(op->args[2])->val;
--        uint32_t ah = arg_info(op->args[3])->val;
--        uint32_t bl = arg_info(op->args[4])->val;
--        uint32_t bh = arg_info(op->args[5])->val;
--        uint64_t a = ((uint64_t)ah << 32) | al;
--        uint64_t b = ((uint64_t)bh << 32) | bl;
-+        uint64_t al = arg_info(op->args[2])->val;
-+        uint64_t ah = arg_info(op->args[3])->val;
-+        uint64_t bl = arg_info(op->args[4])->val;
-+        uint64_t bh = arg_info(op->args[5])->val;
-         TCGArg rl, rh;
--        TCGOp *op2 = tcg_op_insert_before(ctx->tcg, op, INDEX_op_mov_i32);
-+        TCGOp *op2;
--        if (add) {
--            a += b;
-+        if (ctx->type == TCG_TYPE_I32) {
-+            uint64_t a = deposit64(al, 32, 32, ah);
-+            uint64_t b = deposit64(bl, 32, 32, bh);
-+
-+            if (add) {
-+                a += b;
-+            } else {
-+                a -= b;
-+            }
-+
-+            al = sextract64(a, 0, 32);
-+            ah = sextract64(a, 32, 32);
-         } else {
--            a -= b;
-+            Int128 a = int128_make128(al, ah);
-+            Int128 b = int128_make128(bl, bh);
-+
-+            if (add) {
-+                a = int128_add(a, b);
-+            } else {
-+                a = int128_sub(a, b);
-+            }
-+
-+            al = int128_getlo(a);
-+            ah = int128_gethi(a);
-         }
-         rl = op->args[0];
-         rh = op->args[1];
--        tcg_opt_gen_movi(ctx, op, rl, (int32_t)a);
--        tcg_opt_gen_movi(ctx, op2, rh, (int32_t)(a >> 32));
-+
-+        /* The proper opcode is supplied by tcg_opt_gen_mov. */
-+        op2 = tcg_op_insert_before(ctx->tcg, op, 0);
-+
-+        tcg_opt_gen_movi(ctx, op, rl, al);
-+        tcg_opt_gen_movi(ctx, op2, rh, ah);
-         return true;
-     }
-     return false;
- }
--static bool fold_add2_i32(OptContext *ctx, TCGOp *op)
-+static bool fold_add2(OptContext *ctx, TCGOp *op)
- {
--    return fold_addsub2_i32(ctx, op, true);
-+    return fold_addsub2(ctx, op, true);
- }
- static bool fold_and(OptContext *ctx, TCGOp *op)
-@@ -XXX,XX +XXX,XX @@ static bool fold_sub(OptContext *ctx, TCGOp *op)
-     return false;
- }
--static bool fold_sub2_i32(OptContext *ctx, TCGOp *op)
-+static bool fold_sub2(OptContext *ctx, TCGOp *op)
- {
--    return fold_addsub2_i32(ctx, op, false);
-+    return fold_addsub2(ctx, op, false);
- }
- static bool fold_tcg_ld(OptContext *ctx, TCGOp *op)
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         CASE_OP_32_64_VEC(add):
-             done = fold_add(&ctx, op);
-             break;
--        case INDEX_op_add2_i32:
--            done = fold_add2_i32(&ctx, op);
-+        CASE_OP_32_64(add2):
-+            done = fold_add2(&ctx, op);
-             break;
-         CASE_OP_32_64_VEC(and):
-             done = fold_and(&ctx, op);
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-         CASE_OP_32_64_VEC(sub):
-             done = fold_sub(&ctx, op);
-             break;
--        case INDEX_op_sub2_i32:
--            done = fold_sub2_i32(&ctx, op);
-+        CASE_OP_32_64(sub2):
-+            done = fold_sub2(&ctx, op);
-             break;
-         CASE_OP_32_64_VEC(xor):
-             done = fold_xor(&ctx, op);
---
-.25.1

-[PULL 46/56] tcg/optimize: Sink commutative operand swapping into fold functions
+Deleted patch
-Most of these are handled by creating a fold_const2_commutative
-to handle all of the binary operators.  The rest were already
-handled on a case-by-case basis in the switch, and have their
-own fold function in which to place the call.
-We now have only one major switch on TCGOpcode.
-Introduce NO_DEST and a block comment for swap_commutative in
-order to make the handling of brcond and movcond opcodes cleaner.
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 142 ++++++++++++++++++++++++-------------------------
-file changed, 70 insertions(+), 72 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static int do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
-     return -1;
- }
-+/**
-+ * swap_commutative:
-+ * @dest: TCGArg of the destination argument, or NO_DEST.
-+ * @p1: first paired argument
-+ * @p2: second paired argument
-+ *
-+ * If *@p1 is a constant and *@p2 is not, swap.
-+ * If *@p2 matches @dest, swap.
-+ * Return true if a swap was performed.
-+ */
-+
-+#define NO_DEST  temp_arg(NULL)
-+
- static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
- {
-     TCGArg a1 = *p1, a2 = *p2;
-@@ -XXX,XX +XXX,XX @@ static bool fold_const2(OptContext *ctx, TCGOp *op)
-     return false;
- }
-+static bool fold_const2_commutative(OptContext *ctx, TCGOp *op)
-+{
-+    swap_commutative(op->args[0], &op->args[1], &op->args[2]);
-+    return fold_const2(ctx, op);
-+}
-+
- static bool fold_masks(OptContext *ctx, TCGOp *op)
- {
-     uint64_t a_mask = ctx->a_mask;
-@@ -XXX,XX +XXX,XX @@ static bool fold_xx_to_x(OptContext *ctx, TCGOp *op)
- static bool fold_add(OptContext *ctx, TCGOp *op)
- {
--    if (fold_const2(ctx, op) ||
-+    if (fold_const2_commutative(ctx, op) ||
-         fold_xi_to_x(ctx, op, 0)) {
-         return true;
-     }
-@@ -XXX,XX +XXX,XX @@ static bool fold_addsub2(OptContext *ctx, TCGOp *op, bool add)
- static bool fold_add2(OptContext *ctx, TCGOp *op)
- {
-+    /* Note that the high and low parts may be independently swapped. */
-+    swap_commutative(op->args[0], &op->args[2], &op->args[4]);
-+    swap_commutative(op->args[1], &op->args[3], &op->args[5]);
-+
-     return fold_addsub2(ctx, op, true);
- }
-@@ -XXX,XX +XXX,XX @@ static bool fold_and(OptContext *ctx, TCGOp *op)
- {
-     uint64_t z1, z2;
--    if (fold_const2(ctx, op) ||
-+    if (fold_const2_commutative(ctx, op) ||
-         fold_xi_to_i(ctx, op, 0) ||
-         fold_xi_to_x(ctx, op, -1) ||
-         fold_xx_to_x(ctx, op)) {
-@@ -XXX,XX +XXX,XX @@ static bool fold_andc(OptContext *ctx, TCGOp *op)
- static bool fold_brcond(OptContext *ctx, TCGOp *op)
- {
-     TCGCond cond = op->args[2];
--    int i = do_constant_folding_cond(ctx->type, op->args[0], op->args[1], cond);
-+    int i;
-+    if (swap_commutative(NO_DEST, &op->args[0], &op->args[1])) {
-+        op->args[2] = cond = tcg_swap_cond(cond);
-+    }
-+
-+    i = do_constant_folding_cond(ctx->type, op->args[0], op->args[1], cond);
-     if (i == 0) {
-         tcg_op_remove(ctx->tcg, op);
-         return true;
-@@ -XXX,XX +XXX,XX @@ static bool fold_brcond(OptContext *ctx, TCGOp *op)
- static bool fold_brcond2(OptContext *ctx, TCGOp *op)
- {
-     TCGCond cond = op->args[4];
--    int i = do_constant_folding_cond2(&op->args[0], &op->args[2], cond);
-     TCGArg label = op->args[5];
--    int inv = 0;
-+    int i, inv = 0;
-+    if (swap_commutative2(&op->args[0], &op->args[2])) {
-+        op->args[4] = cond = tcg_swap_cond(cond);
-+    }
-+
-+    i = do_constant_folding_cond2(&op->args[0], &op->args[2], cond);
-     if (i >= 0) {
-         goto do_brcond_const;
-     }
-@@ -XXX,XX +XXX,XX @@ static bool fold_dup2(OptContext *ctx, TCGOp *op)
- static bool fold_eqv(OptContext *ctx, TCGOp *op)
- {
--    if (fold_const2(ctx, op) ||
-+    if (fold_const2_commutative(ctx, op) ||
-         fold_xi_to_x(ctx, op, -1) ||
-         fold_xi_to_not(ctx, op, 0)) {
-         return true;
-@@ -XXX,XX +XXX,XX @@ static bool fold_mov(OptContext *ctx, TCGOp *op)
- static bool fold_movcond(OptContext *ctx, TCGOp *op)
- {
-     TCGCond cond = op->args[5];
--    int i = do_constant_folding_cond(ctx->type, op->args[1], op->args[2], cond);
-+    int i;
-+    if (swap_commutative(NO_DEST, &op->args[1], &op->args[2])) {
-+        op->args[5] = cond = tcg_swap_cond(cond);
-+    }
-+    /*
-+     * Canonicalize the "false" input reg to match the destination reg so
-+     * that the tcg backend can implement a "move if true" operation.
-+     */
-+    if (swap_commutative(op->args[0], &op->args[4], &op->args[3])) {
-+        op->args[5] = cond = tcg_invert_cond(cond);
-+    }
-+
-+    i = do_constant_folding_cond(ctx->type, op->args[1], op->args[2], cond);
-     if (i >= 0) {
-         return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[4 - i]);
-     }
-@@ -XXX,XX +XXX,XX @@ static bool fold_mul(OptContext *ctx, TCGOp *op)
- static bool fold_mul_highpart(OptContext *ctx, TCGOp *op)
- {
--    if (fold_const2(ctx, op) ||
-+    if (fold_const2_commutative(ctx, op) ||
-         fold_xi_to_i(ctx, op, 0)) {
-         return true;
-     }
-@@ -XXX,XX +XXX,XX @@ static bool fold_mul_highpart(OptContext *ctx, TCGOp *op)
- static bool fold_multiply2(OptContext *ctx, TCGOp *op)
- {
-+    swap_commutative(op->args[0], &op->args[2], &op->args[3]);
-+
-     if (arg_is_const(op->args[2]) && arg_is_const(op->args[3])) {
-         uint64_t a = arg_info(op->args[2])->val;
-         uint64_t b = arg_info(op->args[3])->val;
-@@ -XXX,XX +XXX,XX @@ static bool fold_multiply2(OptContext *ctx, TCGOp *op)
- static bool fold_nand(OptContext *ctx, TCGOp *op)
- {
--    if (fold_const2(ctx, op) ||
-+    if (fold_const2_commutative(ctx, op) ||
-         fold_xi_to_not(ctx, op, -1)) {
-         return true;
-     }
-@@ -XXX,XX +XXX,XX @@ static bool fold_neg(OptContext *ctx, TCGOp *op)
- static bool fold_nor(OptContext *ctx, TCGOp *op)
- {
--    if (fold_const2(ctx, op) ||
-+    if (fold_const2_commutative(ctx, op) ||
-         fold_xi_to_not(ctx, op, 0)) {
-         return true;
-     }
-@@ -XXX,XX +XXX,XX @@ static bool fold_not(OptContext *ctx, TCGOp *op)
- static bool fold_or(OptContext *ctx, TCGOp *op)
- {
--    if (fold_const2(ctx, op) ||
-+    if (fold_const2_commutative(ctx, op) ||
-         fold_xi_to_x(ctx, op, 0) ||
-         fold_xx_to_x(ctx, op)) {
-         return true;
-@@ -XXX,XX +XXX,XX @@ static bool fold_remainder(OptContext *ctx, TCGOp *op)
- static bool fold_setcond(OptContext *ctx, TCGOp *op)
- {
-     TCGCond cond = op->args[3];
--    int i = do_constant_folding_cond(ctx->type, op->args[1], op->args[2], cond);
-+    int i;
-+    if (swap_commutative(op->args[0], &op->args[1], &op->args[2])) {
-+        op->args[3] = cond = tcg_swap_cond(cond);
-+    }
-+
-+    i = do_constant_folding_cond(ctx->type, op->args[1], op->args[2], cond);
-     if (i >= 0) {
-         return tcg_opt_gen_movi(ctx, op, op->args[0], i);
-     }
-@@ -XXX,XX +XXX,XX @@ static bool fold_setcond(OptContext *ctx, TCGOp *op)
- static bool fold_setcond2(OptContext *ctx, TCGOp *op)
- {
-     TCGCond cond = op->args[5];
--    int i = do_constant_folding_cond2(&op->args[1], &op->args[3], cond);
--    int inv = 0;
-+    int i, inv = 0;
-+    if (swap_commutative2(&op->args[1], &op->args[3])) {
-+        op->args[5] = cond = tcg_swap_cond(cond);
-+    }
-+
-+    i = do_constant_folding_cond2(&op->args[1], &op->args[3], cond);
-     if (i >= 0) {
-         goto do_setcond_const;
-     }
-@@ -XXX,XX +XXX,XX @@ static bool fold_tcg_ld(OptContext *ctx, TCGOp *op)
- static bool fold_xor(OptContext *ctx, TCGOp *op)
- {
--    if (fold_const2(ctx, op) ||
-+    if (fold_const2_commutative(ctx, op) ||
-         fold_xx_to_i(ctx, op, 0) ||
-         fold_xi_to_x(ctx, op, 0) ||
-         fold_xi_to_not(ctx, op, -1)) {
-@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
-             ctx.type = TCG_TYPE_I32;
-         }
--        /* For commutative operations make constant second argument */
--        switch (opc) {
--        CASE_OP_32_64_VEC(add):
--        CASE_OP_32_64_VEC(mul):
--        CASE_OP_32_64_VEC(and):
--        CASE_OP_32_64_VEC(or):
--        CASE_OP_32_64_VEC(xor):
--        CASE_OP_32_64(eqv):
--        CASE_OP_32_64(nand):
--        CASE_OP_32_64(nor):
--        CASE_OP_32_64(muluh):
--        CASE_OP_32_64(mulsh):
--            swap_commutative(op->args[0], &op->args[1], &op->args[2]);
--            break;
--        CASE_OP_32_64(brcond):
--            if (swap_commutative(-1, &op->args[0], &op->args[1])) {
--                op->args[2] = tcg_swap_cond(op->args[2]);
--            }
--            break;
--        CASE_OP_32_64(setcond):
--            if (swap_commutative(op->args[0], &op->args[1], &op->args[2])) {
--                op->args[3] = tcg_swap_cond(op->args[3]);
--            }
--            break;
--        CASE_OP_32_64(movcond):
--            if (swap_commutative(-1, &op->args[1], &op->args[2])) {
--                op->args[5] = tcg_swap_cond(op->args[5]);
--            }
--            /* For movcond, we canonicalize the "false" input reg to match
--               the destination reg so that the tcg backend can implement
--               a "move if true" operation.  */
--            if (swap_commutative(op->args[0], &op->args[4], &op->args[3])) {
--                op->args[5] = tcg_invert_cond(op->args[5]);
--            }
--            break;
--        CASE_OP_32_64(add2):
--            swap_commutative(op->args[0], &op->args[2], &op->args[4]);
--            swap_commutative(op->args[1], &op->args[3], &op->args[5]);
--            break;
--        CASE_OP_32_64(mulu2):
--        CASE_OP_32_64(muls2):
--            swap_commutative(op->args[0], &op->args[2], &op->args[3]);
--            break;
--        case INDEX_op_brcond2_i32:
--            if (swap_commutative2(&op->args[0], &op->args[2])) {
--                op->args[4] = tcg_swap_cond(op->args[4]);
--            }
--            break;
--        case INDEX_op_setcond2_i32:
--            if (swap_commutative2(&op->args[1], &op->args[3])) {
--                op->args[5] = tcg_swap_cond(op->args[5]);
--            }
--            break;
--        default:
--            break;
--        }
--
-         /* Assume all bits affected, and no bits known zero. */
-         ctx.a_mask = -1;
-         ctx.z_mask = -1;
---
-.25.1

-[PULL 47/56] tcg/optimize: Stop forcing z_mask to "garbage" for 32-bit values
+Deleted patch
-This "garbage" setting pre-dates the addition of the type
-changing opcodes INDEX_op_ext_i32_i64, INDEX_op_extu_i32_i64,
-and INDEX_op_extr{l,h}_i64_i32.
-So now we have a definitive points at which to adjust z_mask
-to eliminate such bits from the 32-bit operands.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 35 ++++++++++++++++-------------------
-file changed, 16 insertions(+), 19 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static void init_ts_info(OptContext *ctx, TCGTemp *ts)
-         ti->is_const = true;
-         ti->val = ts->val;
-         ti->z_mask = ts->val;
--        if (TCG_TARGET_REG_BITS > 32 && ts->type == TCG_TYPE_I32) {
--            /* High bits of a 32-bit quantity are garbage.  */
--            ti->z_mask |= ~0xffffffffull;
--        }
-     } else {
-         ti->is_const = false;
-         ti->z_mask = -1;
-@@ -XXX,XX +XXX,XX @@ static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
-     TCGTemp *src_ts = arg_temp(src);
-     TempOptInfo *di;
-     TempOptInfo *si;
--    uint64_t z_mask;
-     TCGOpcode new_op;
-     if (ts_are_copies(dst_ts, src_ts)) {
-@@ -XXX,XX +XXX,XX @@ static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
-     op->args[0] = dst;
-     op->args[1] = src;
--    z_mask = si->z_mask;
--    if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_mov_i32) {
--        /* High bits of the destination are now garbage.  */
--        z_mask |= ~0xffffffffull;
--    }
--    di->z_mask = z_mask;
-+    di->z_mask = si->z_mask;
-     if (src_ts->type == dst_ts->type) {
-         TempOptInfo *ni = ts_info(si->next_copy);
-@@ -XXX,XX +XXX,XX @@ static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
- static bool tcg_opt_gen_movi(OptContext *ctx, TCGOp *op,
-                              TCGArg dst, uint64_t val)
- {
--    /* Convert movi to mov with constant temp. */
--    TCGTemp *tv = tcg_constant_internal(ctx->type, val);
-+    TCGTemp *tv;
-+    if (ctx->type == TCG_TYPE_I32) {
-+        val = (int32_t)val;
-+    }
-+
-+    /* Convert movi to mov with constant temp. */
-+    tv = tcg_constant_internal(ctx->type, val);
-     init_ts_info(ctx, tv);
-     return tcg_opt_gen_mov(ctx, op, dst, temp_arg(tv));
- }
-@@ -XXX,XX +XXX,XX @@ static bool fold_masks(OptContext *ctx, TCGOp *op)
-     uint64_t z_mask = ctx->z_mask;
-     /*
--     * 32-bit ops generate 32-bit results.  For the result is zero test
--     * below, we can ignore high bits, but for further optimizations we
--     * need to record that the high bits contain garbage.
-+     * 32-bit ops generate 32-bit results, which for the purpose of
-+     * simplifying tcg are sign-extended.  Certainly that's how we
-+     * represent our constants elsewhere.  Note that the bits will
-+     * be reset properly for a 64-bit value when encountering the
-+     * type changing opcodes.
-      */
-     if (ctx->type == TCG_TYPE_I32) {
--        ctx->z_mask |= MAKE_64BIT_MASK(32, 32);
--        a_mask &= MAKE_64BIT_MASK(0, 32);
--        z_mask &= MAKE_64BIT_MASK(0, 32);
-+        a_mask = (int32_t)a_mask;
-+        z_mask = (int32_t)z_mask;
-+        ctx->z_mask = z_mask;
-     }
-     if (z_mask == 0) {
---
-.25.1

-[PULL 48/56] tcg/optimize: Use fold_xx_to_i for orc
+Deleted patch
-Recognize the constant function for or-complement.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 1 +
-file changed, 1 insertion(+)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_or(OptContext *ctx, TCGOp *op)
- static bool fold_orc(OptContext *ctx, TCGOp *op)
- {
-     if (fold_const2(ctx, op) ||
-+        fold_xx_to_i(ctx, op, -1) ||
-         fold_xi_to_x(ctx, op, -1) ||
-         fold_ix_to_not(ctx, op, 0)) {
-         return true;
---
-.25.1

-[PULL 49/56] tcg/optimize: Use fold_xi_to_x for mul
+Deleted patch
-Recognize the identity function for low-part multiply.
-Suggested-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 3 ++-
-file changed, 2 insertions(+), 1 deletion(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_movcond(OptContext *ctx, TCGOp *op)
- static bool fold_mul(OptContext *ctx, TCGOp *op)
- {
-     if (fold_const2(ctx, op) ||
--        fold_xi_to_i(ctx, op, 0)) {
-+        fold_xi_to_i(ctx, op, 0) ||
-+        fold_xi_to_x(ctx, op, 1)) {
-         return true;
-     }
-     return false;
---
-.25.1

-[PULL 50/56] tcg/optimize: Use fold_xi_to_x for div
+Deleted patch
-Recognize the identity function for division.
-Suggested-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
----
- tcg/optimize.c | 6 +++++-
-file changed, 5 insertions(+), 1 deletion(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
-+++ b/tcg/optimize.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
- static bool fold_divide(OptContext *ctx, TCGOp *op)
- {
--    return fold_const2(ctx, op);
-+    if (fold_const2(ctx, op) ||
-+        fold_xi_to_x(ctx, op, 1)) {
-+        return true;
-+    }
-+    return false;
- }
- static bool fold_dup(OptContext *ctx, TCGOp *op)
---
-.25.1

-[PULL 53/56] tcg/optimize: Propagate sign info for logical operations
+[PULL 19/20] plugins: Inline plugin_gen_empty_callback
-Sign repetitions are perforce all identical, whether they are 1 or 0.
+Each caller can use tcg_gen_plugin_cb directly.
 Bitwise operations preserve the relative quantity of the repetitions.
 Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/optimize.c | 29 +++++++++++++++++++++++++++++
+ accel/tcg/plugin-gen.c | 19 +++----------------
-file changed, 29 insertions(+)
+file changed, 3 insertions(+), 16 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
+diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
+--- a/accel/tcg/plugin-gen.c
-+++ b/tcg/optimize.c
++++ b/accel/tcg/plugin-gen.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_and(OptContext *ctx, TCGOp *op)
+@@ -XXX,XX +XXX,XX @@ enum plugin_gen_from {
-     z2 = arg_info(op->args[2])->z_mask;
+     PLUGIN_GEN_AFTER_TB,
-     ctx->z_mask = z1 & z2;
+ };
-+    /*
+-static void plugin_gen_empty_callback(enum plugin_gen_from from)
-+     * Sign repetitions are perforce all identical, whether they are 1 or 0.
+-{
-+     * Bitwise operations preserve the relative quantity of the repetitions.
+-    switch (from) {
-+     */
+-    case PLUGIN_GEN_AFTER_INSN:
-+    ctx->s_mask = arg_info(op->args[1])->s_mask
+-    case PLUGIN_GEN_FROM_TB:
-+                & arg_info(op->args[2])->s_mask;
+-    case PLUGIN_GEN_FROM_INSN:
-+
+-        tcg_gen_plugin_cb(from);
-     /*
+-        break;
-      * Known-zeros does not imply known-ones.  Therefore unless
+-    default:
-      * arg2 is constant, we can't infer affected bits from it.
+-        g_assert_not_reached();
-@@ -XXX,XX +XXX,XX @@ static bool fold_andc(OptContext *ctx, TCGOp *op)
+-    }
 -}
 -
  /* called before finishing a TB with exit_tb, goto_tb or goto_ptr */
  void plugin_gen_disable_mem_helpers(void)
  {
@@ -XXX,XX +XXX,XX @@ bool plugin_gen_tb_start(CPUState *cpu, const DisasContextBase *db,
          ptb->mem_only = mem_only;
          ptb->mem_helper = false;
 -        plugin_gen_empty_callback(PLUGIN_GEN_FROM_TB);
 +        tcg_gen_plugin_cb(PLUGIN_GEN_FROM_TB);
      }
-     ctx->z_mask = z1;
+     tcg_ctx->plugin_insn = NULL;
-+    ctx->s_mask = arg_info(op->args[1])->s_mask
+@@ -XXX,XX +XXX,XX @@ void plugin_gen_insn_start(CPUState *cpu, const DisasContextBase *db)
-+                & arg_info(op->args[2])->s_mask;
+         insn->haddr = ptb->haddr2 + pc - ptb->vaddr2;
-     return fold_masks(ctx, op);
+     }
 -    plugin_gen_empty_callback(PLUGIN_GEN_FROM_INSN);
 +    tcg_gen_plugin_cb(PLUGIN_GEN_FROM_INSN);
  }
-@@ -XXX,XX +XXX,XX @@ static bool fold_eqv(OptContext *ctx, TCGOp *op)
+ void plugin_gen_insn_end(void)
-         fold_xi_to_not(ctx, op, 0)) {
+ {
-         return true;
+-    plugin_gen_empty_callback(PLUGIN_GEN_AFTER_INSN);
-     }
++    tcg_gen_plugin_cb(PLUGIN_GEN_AFTER_INSN);
 +
 +    ctx->s_mask = arg_info(op->args[1])->s_mask
 +                & arg_info(op->args[2])->s_mask;
      return false;
  }
-@@ -XXX,XX +XXX,XX @@ static bool fold_movcond(OptContext *ctx, TCGOp *op)
+ /*
      ctx->z_mask = arg_info(op->args[3])->z_mask
                  | arg_info(op->args[4])->z_mask;
 +    ctx->s_mask = arg_info(op->args[3])->s_mask
 +                & arg_info(op->args[4])->s_mask;
      if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
          uint64_t tv = arg_info(op->args[3])->val;
@@ -XXX,XX +XXX,XX @@ static bool fold_nand(OptContext *ctx, TCGOp *op)
          fold_xi_to_not(ctx, op, -1)) {
          return true;
      }
 +
 +    ctx->s_mask = arg_info(op->args[1])->s_mask
 +                & arg_info(op->args[2])->s_mask;
      return false;
  }
@@ -XXX,XX +XXX,XX @@ static bool fold_nor(OptContext *ctx, TCGOp *op)
          fold_xi_to_not(ctx, op, 0)) {
          return true;
      }
 +
 +    ctx->s_mask = arg_info(op->args[1])->s_mask
 +                & arg_info(op->args[2])->s_mask;
      return false;
  }
@@ -XXX,XX +XXX,XX @@ static bool fold_not(OptContext *ctx, TCGOp *op)
          return true;
      }
 +    ctx->s_mask = arg_info(op->args[1])->s_mask;
 +
      /* Because of fold_to_not, we want to always return true, via finish. */
      finish_folding(ctx, op);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool fold_or(OptContext *ctx, TCGOp *op)
      ctx->z_mask = arg_info(op->args[1])->z_mask
                  | arg_info(op->args[2])->z_mask;
 +    ctx->s_mask = arg_info(op->args[1])->s_mask
 +                & arg_info(op->args[2])->s_mask;
      return fold_masks(ctx, op);
  }
@@ -XXX,XX +XXX,XX @@ static bool fold_orc(OptContext *ctx, TCGOp *op)
          fold_ix_to_not(ctx, op, 0)) {
          return true;
      }
 +
 +    ctx->s_mask = arg_info(op->args[1])->s_mask
 +                & arg_info(op->args[2])->s_mask;
      return false;
  }
@@ -XXX,XX +XXX,XX @@ static bool fold_xor(OptContext *ctx, TCGOp *op)
      ctx->z_mask = arg_info(op->args[1])->z_mask
                  | arg_info(op->args[2])->z_mask;
 +    ctx->s_mask = arg_info(op->args[1])->s_mask
 +                & arg_info(op->args[2])->s_mask;
      return fold_masks(ctx, op);
  }
 --
-.25.1
+.34.1

-[PULL 51/56] tcg/optimize: Use fold_xx_to_i for rem
+[PULL 20/20] plugins: Update the documentation block for plugin-gen.c
-Recognize the constant function for remainder.
+Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
 Suggested-by: Luis Pires <luis.pires@eldorado.org.br>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/optimize.c | 6 +++++-
+ accel/tcg/plugin-gen.c | 31 ++++---------------------------
-file changed, 5 insertions(+), 1 deletion(-)
+file changed, 4 insertions(+), 27 deletions(-)
-diff --git a/tcg/optimize.c b/tcg/optimize.c
+diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/optimize.c
+--- a/accel/tcg/plugin-gen.c
-+++ b/tcg/optimize.c
++++ b/accel/tcg/plugin-gen.c
-@@ -XXX,XX +XXX,XX @@ static bool fold_qemu_st(OptContext *ctx, TCGOp *op)
+@@ -XXX,XX +XXX,XX @@
+  * Injecting the desired instrumentation could be done with a second
- static bool fold_remainder(OptContext *ctx, TCGOp *op)
+  * translation pass that combined the instrumentation requests, but that
- {
+  * would be ugly and inefficient since we would decode the guest code twice.
--    return fold_const2(ctx, op);
+- * Instead, during TB translation we add "empty" instrumentation calls for all
-+    if (fold_const2(ctx, op) ||
+- * possible instrumentation events, and then once we collect the instrumentation
-+        fold_xx_to_i(ctx, op, 0)) {
+- * requests from plugins, we either "fill in" those empty events or remove them
-+        return true;
+- * if they have no requests.
-+    }
+- *
-+    return false;
+- * When "filling in" an event we first copy the empty callback's TCG ops. This
- }
+- * might seem unnecessary, but it is done to support an arbitrary number
+- * of callbacks per event. Take for example a regular instruction callback.
- static bool fold_setcond(OptContext *ctx, TCGOp *op)
+- * We first generate a callback to an empty helper function. Then, if two
 - * plugins register one callback each for this instruction, we make two copies
 - * of the TCG ops generated for the empty callback, substituting the function
 - * pointer that points to the empty helper function with the plugins' desired
 - * callback functions. After that we remove the empty callback's ops.
 - *
 - * Note that the location in TCGOp.args[] of the pointer to a helper function
 - * varies across different guest and host architectures. Instead of duplicating
 - * the logic that figures this out, we rely on the fact that the empty
 - * callbacks point to empty functions that are unique pointers in the program.
 - * Thus, to find the right location we just have to look for a match in
 - * TCGOp.args[]. This is the main reason why we first copy an empty callback's
 - * TCG ops and then fill them in; regardless of whether we have one or many
 - * callbacks for that event, the logic to add all of them is the same.
 - *
 - * When generating more than one callback per event, we make a small
 - * optimization to avoid generating redundant operations. For instance, for the
 - * second and all subsequent callbacks of an event, we do not need to reload the
 - * CPU's index into a TCG temp, since the first callback did it already.
 + * Instead, during TB translation we add "plugin_cb" marker opcodes
 + * for all possible instrumentation events, and then once we collect the
 + * instrumentation requests from plugins, we generate code for those markers
 + * or remove them if they have no requests.
   */
  #include "qemu/osdep.h"
  #include "qemu/plugin.h"
 --
-.25.1
+.34.1

The following changes since commit c52d69e7dbaaed0ffdef8125e79218672c30161d:

Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20211027' into staging (2021-10-27 11:45:18 -0700)

are available in the Git repository at:

https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20211027

for you to fetch changes up to 820c025f0dcacf2f3c12735b1f162893fbfa7bc6:

tcg/optimize: Propagate sign info for shifting (2021-10-27 17:11:23 -0700)

----------------------------------------------------------------
Improvements to qemu/int128
Fixes for 128/64 division.
Cleanup tcg/optimize.c
Optimize redundant sign extensions

----------------------------------------------------------------
Frédéric Pétrot (1):
      qemu/int128: Add int128_{not,xor}

Luis Pires (4):
      host-utils: move checks out of divu128/divs128
      host-utils: move udiv_qrnnd() to host-utils
      host-utils: add 128-bit quotient support to divu128/divs128
      host-utils: add unit tests for divu128/divs128

Richard Henderson (51):
      tcg/optimize: Rename "mask" to "z_mask"
      tcg/optimize: Split out OptContext
      tcg/optimize: Remove do_default label
      tcg/optimize: Change tcg_opt_gen_{mov,movi} interface
      tcg/optimize: Move prev_mb into OptContext
      tcg/optimize: Split out init_arguments
      tcg/optimize: Split out copy_propagate
      tcg/optimize: Split out fold_call
      tcg/optimize: Drop nb_oargs, nb_iargs locals
      tcg/optimize: Change fail return for do_constant_folding_cond*
      tcg/optimize: Return true from tcg_opt_gen_{mov,movi}
      tcg/optimize: Split out finish_folding
      tcg/optimize: Use a boolean to avoid a mass of continues
      tcg/optimize: Split out fold_mb, fold_qemu_{ld,st}
      tcg/optimize: Split out fold_const{1,2}
      tcg/optimize: Split out fold_setcond2
      tcg/optimize: Split out fold_brcond2
      tcg/optimize: Split out fold_brcond
      tcg/optimize: Split out fold_setcond
      tcg/optimize: Split out fold_mulu2_i32
      tcg/optimize: Split out fold_addsub2_i32
      tcg/optimize: Split out fold_movcond
      tcg/optimize: Split out fold_extract2
      tcg/optimize: Split out fold_extract, fold_sextract
      tcg/optimize: Split out fold_deposit
      tcg/optimize: Split out fold_count_zeros
      tcg/optimize: Split out fold_bswap
      tcg/optimize: Split out fold_dup, fold_dup2
      tcg/optimize: Split out fold_mov
      tcg/optimize: Split out fold_xx_to_i
      tcg/optimize: Split out fold_xx_to_x
      tcg/optimize: Split out fold_xi_to_i
      tcg/optimize: Add type to OptContext
      tcg/optimize: Split out fold_to_not
      tcg/optimize: Split out fold_sub_to_neg
      tcg/optimize: Split out fold_xi_to_x
      tcg/optimize: Split out fold_ix_to_i
      tcg/optimize: Split out fold_masks
      tcg/optimize: Expand fold_mulu2_i32 to all 4-arg multiplies
      tcg/optimize: Expand fold_addsub2_i32 to 64-bit ops
      tcg/optimize: Sink commutative operand swapping into fold functions
      tcg/optimize: Stop forcing z_mask to "garbage" for 32-bit values
      tcg/optimize: Use fold_xx_to_i for orc
      tcg/optimize: Use fold_xi_to_x for mul
      tcg/optimize: Use fold_xi_to_x for div
      tcg/optimize: Use fold_xx_to_i for rem
      tcg/optimize: Optimize sign extensions
      tcg/optimize: Propagate sign info for logical operations
      tcg/optimize: Propagate sign info for setcond
      tcg/optimize: Propagate sign info for bit counting
      tcg/optimize: Propagate sign info for shifting

From: Frédéric Pétrot <frederic.petrot@univ-grenoble-alpes.fr>

Addition of not and xor on 128-bit integers.

Signed-off-by: Frédéric Pétrot <frederic.petrot@univ-grenoble-alpes.fr>
Co-authored-by: Fabien Portas <fabien.portas@grenoble-inp.org>
Message-Id: <20211025122818.168890-3-frederic.petrot@univ-grenoble-alpes.fr>
[rth: Split out logical operations.]
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/int128.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -XXX,XX +XXX,XX @@ static inline Int128 int128_exts64(int64_t a)
     return a;
 }
 
+static inline Int128 int128_not(Int128 a)
+{
+    return ~a;
+}
+
 static inline Int128 int128_and(Int128 a, Int128 b)
 {
     return a & b;
@@ -XXX,XX +XXX,XX @@ static inline Int128 int128_or(Int128 a, Int128 b)
     return a | b;
 }
 
+static inline Int128 int128_xor(Int128 a, Int128 b)
+{
+    return a ^ b;
+}
+
 static inline Int128 int128_rshift(Int128 a, int n)
 {
     return a >> n;
@@ -XXX,XX +XXX,XX @@ static inline Int128 int128_exts64(int64_t a)
     return int128_make128(a, (a < 0) ? -1 : 0);
 }
 
+static inline Int128 int128_not(Int128 a)
+{
+    return int128_make128(~a.lo, ~a.hi);
+}
+
 static inline Int128 int128_and(Int128 a, Int128 b)
 {
     return int128_make128(a.lo & b.lo, a.hi & b.hi);
@@ -XXX,XX +XXX,XX @@ static inline Int128 int128_or(Int128 a, Int128 b)
     return int128_make128(a.lo | b.lo, a.hi | b.hi);
 }
 
+static inline Int128 int128_xor(Int128 a, Int128 b)
+{
+    return int128_make128(a.lo ^ b.lo, a.hi ^ b.hi);
+}
+
 static inline Int128 int128_rshift(Int128 a, int n)
 {
     int64_t h;
-- 
2.25.1

From: Luis Pires <luis.pires@eldorado.org.br>

In preparation for changing the divu128/divs128 implementations
to allow for quotients larger than 64 bits, move the div-by-zero
and overflow checks to the callers.

Signed-off-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20211025191154.350831-2-luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/hw/clock.h        |  5 +++--
 include/qemu/host-utils.h | 34 ++++++++++++---------------------
 target/ppc/int_helper.c   | 14 +++++++++-----
 util/host-utils.c         | 40 ++++++++++++++++++---------------------
 4 files changed, 42 insertions(+), 51 deletions(-)

diff --git a/include/hw/clock.h b/include/hw/clock.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/clock.h
+++ b/include/hw/clock.h
@@ -XXX,XX +XXX,XX @@ static inline uint64_t clock_ns_to_ticks(const Clock *clk, uint64_t ns)
         return 0;
     }
     /*
-     * Ignore divu128() return value as we've caught div-by-zero and don't
-     * need different behaviour for overflow.
+     * BUG: when CONFIG_INT128 is not defined, the current implementation of
+     * divu128 does not return a valid truncated quotient, so the result will
+     * be wrong.
      */
     divu128(&lo, &hi, clk->period);
     return lo;
diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/host-utils.h
+++ b/include/qemu/host-utils.h
@@ -XXX,XX +XXX,XX @@ static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
     return (__int128_t)a * b / c;
 }
 
-static inline int divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor)
+static inline void divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor)
 {
-    if (divisor == 0) {
-        return 1;
-    } else {
-        __uint128_t dividend = ((__uint128_t)*phigh << 64) | *plow;
-        __uint128_t result = dividend / divisor;
-        *plow = result;
-        *phigh = dividend % divisor;
-        return result > UINT64_MAX;
-    }
+    __uint128_t dividend = ((__uint128_t)*phigh << 64) | *plow;
+    __uint128_t result = dividend / divisor;
+    *plow = result;
+    *phigh = dividend % divisor;
 }
 
-static inline int divs128(int64_t *plow, int64_t *phigh, int64_t divisor)
+static inline void divs128(int64_t *plow, int64_t *phigh, int64_t divisor)
 {
-    if (divisor == 0) {
-        return 1;
-    } else {
-        __int128_t dividend = ((__int128_t)*phigh << 64) | (uint64_t)*plow;
-        __int128_t result = dividend / divisor;
-        *plow = result;
-        *phigh = dividend % divisor;
-        return result != *plow;
-    }
+    __int128_t dividend = ((__int128_t)*phigh << 64) | (uint64_t)*plow;
+    __int128_t result = dividend / divisor;
+    *plow = result;
+    *phigh = dividend % divisor;
 }
 #else
 void muls64(uint64_t *plow, uint64_t *phigh, int64_t a, int64_t b);
 void mulu64(uint64_t *plow, uint64_t *phigh, uint64_t a, uint64_t b);
-int divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor);
-int divs128(int64_t *plow, int64_t *phigh, int64_t divisor);
+void divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor);
+void divs128(int64_t *plow, int64_t *phigh, int64_t divisor);
 
 static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
 {
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -XXX,XX +XXX,XX @@ uint64_t helper_divdeu(CPUPPCState *env, uint64_t ra, uint64_t rb, uint32_t oe)
     uint64_t rt = 0;
     int overflow = 0;
 
-    overflow = divu128(&rt, &ra, rb);
-
-    if (unlikely(overflow)) {
+    if (unlikely(rb == 0 || ra >= rb)) {
+        overflow = 1;
         rt = 0; /* Undefined */
+    } else {
+        divu128(&rt, &ra, rb);
     }
 
     if (oe) {
@@ -XXX,XX +XXX,XX @@ uint64_t helper_divde(CPUPPCState *env, uint64_t rau, uint64_t rbu, uint32_t oe)
     int64_t rt = 0;
     int64_t ra = (int64_t)rau;
     int64_t rb = (int64_t)rbu;
-    int overflow = divs128(&rt, &ra, rb);
+    int overflow = 0;
 
-    if (unlikely(overflow)) {
+    if (unlikely(rb == 0 || uabs64(ra) >= uabs64(rb))) {
+        overflow = 1;
         rt = 0; /* Undefined */
+    } else {
+        divs128(&rt, &ra, rb);
     }
 
     if (oe) {
diff --git a/util/host-utils.c b/util/host-utils.c
index XXXXXXX..XXXXXXX 100644
--- a/util/host-utils.c
+++ b/util/host-utils.c
@@ -XXX,XX +XXX,XX @@ void muls64 (uint64_t *plow, uint64_t *phigh, int64_t a, int64_t b)
     *phigh = rh;
 }
 
-/* Unsigned 128x64 division.  Returns 1 if overflow (divide by zero or */
-/* quotient exceeds 64 bits).  Otherwise returns quotient via plow and */
-/* remainder via phigh. */
-int divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor)
+/*
+ * Unsigned 128-by-64 division. Returns quotient via plow and
+ * remainder via phigh.
+ * The result must fit in 64 bits (plow) - otherwise, the result
+ * is undefined.
+ * This function will cause a division by zero if passed a zero divisor.
+ */
+void divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor)
 {
     uint64_t dhi = *phigh;
     uint64_t dlo = *plow;
     unsigned i;
     uint64_t carry = 0;
 
-    if (divisor == 0) {
-        return 1;
-    } else if (dhi == 0) {
+    if (divisor == 0 || dhi == 0) {
         *plow  = dlo / divisor;
         *phigh = dlo % divisor;
-        return 0;
-    } else if (dhi >= divisor) {
-        return 1;
     } else {
 
         for (i = 0; i < 64; i++) {
@@ -XXX,XX +XXX,XX @@ int divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor)
 
         *plow = dlo;
         *phigh = dhi;
-        return 0;
     }
 }
 
-int divs128(int64_t *plow, int64_t *phigh, int64_t divisor)
+/*
+ * Signed 128-by-64 division. Returns quotient via plow and
+ * remainder via phigh.
+ * The result must fit in 64 bits (plow) - otherwise, the result
+ * is undefined.
+ * This function will cause a division by zero if passed a zero divisor.
+ */
+void divs128(int64_t *plow, int64_t *phigh, int64_t divisor)
 {
     int sgn_dvdnd = *phigh < 0;
     int sgn_divsr = divisor < 0;
-    int overflow = 0;
 
     if (sgn_dvdnd) {
         *plow = ~(*plow);
@@ -XXX,XX +XXX,XX @@ int divs128(int64_t *plow, int64_t *phigh, int64_t divisor)
         divisor = 0 - divisor;
     }
 
-    overflow = divu128((uint64_t *)plow, (uint64_t *)phigh, (uint64_t)divisor);
+    divu128((uint64_t *)plow, (uint64_t *)phigh, (uint64_t)divisor);
 
     if (sgn_dvdnd  ^ sgn_divsr) {
         *plow = 0 - *plow;
     }
-
-    if (!overflow) {
-        if ((*plow < 0) ^ (sgn_dvdnd ^ sgn_divsr)) {
-            overflow = 1;
-        }
-    }
-
-    return overflow;
 }
 #endif
 
-- 
2.25.1

From: Luis Pires <luis.pires@eldorado.org.br>

Move udiv_qrnnd() from include/fpu/softfloat-macros.h to host-utils,
so it can be reused by divu128().

Signed-off-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20211025191154.350831-3-luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/fpu/softfloat-macros.h | 82 ----------------------------------
 include/qemu/host-utils.h      | 81 +++++++++++++++++++++++++++++++++
 2 files changed, 81 insertions(+), 82 deletions(-)

diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index XXXXXXX..XXXXXXX 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -XXX,XX +XXX,XX @@
  * so some portions are provided under:
  *  the SoftFloat-2a license
  *  the BSD license
- *  GPL-v2-or-later
  *
  * Any future contributions to this file after December 1st 2014 will be
  * taken to be licensed under the Softfloat-2a license unless specifically
@@ -XXX,XX +XXX,XX @@ this code that are retained.
  * THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-/* Portions of this work are licensed under the terms of the GNU GPL,
- * version 2 or later. See the COPYING file in the top-level directory.
- */
-
 #ifndef FPU_SOFTFLOAT_MACROS_H
 #define FPU_SOFTFLOAT_MACROS_H
 
@@ -XXX,XX +XXX,XX @@ static inline uint64_t estimateDiv128To64(uint64_t a0, uint64_t a1, uint64_t b)
 
 }
 
-/* From the GNU Multi Precision Library - longlong.h __udiv_qrnnd
- * (https://gmplib.org/repo/gmp/file/tip/longlong.h)
- *
- * Licensed under the GPLv2/LGPLv3
- */
-static inline uint64_t udiv_qrnnd(uint64_t *r, uint64_t n1,
-                                  uint64_t n0, uint64_t d)
-{
-#if defined(__x86_64__)
-    uint64_t q;
-    asm("divq %4" : "=a"(q), "=d"(*r) : "0"(n0), "1"(n1), "rm"(d));
-    return q;
-#elif defined(__s390x__) && !defined(__clang__)
-    /* Need to use a TImode type to get an even register pair for DLGR.  */
-    unsigned __int128 n = (unsigned __int128)n1 << 64 | n0;
-    asm("dlgr %0, %1" : "+r"(n) : "r"(d));
-    *r = n >> 64;
-    return n;
-#elif defined(_ARCH_PPC64) && defined(_ARCH_PWR7)
-    /* From Power ISA 2.06, programming note for divdeu.  */
-    uint64_t q1, q2, Q, r1, r2, R;
-    asm("divdeu %0,%2,%4; divdu %1,%3,%4"
-        : "=&r"(q1), "=r"(q2)
-        : "r"(n1), "r"(n0), "r"(d));
-    r1 = -(q1 * d);         /* low part of (n1<<64) - (q1 * d) */
-    r2 = n0 - (q2 * d);
-    Q = q1 + q2;
-    R = r1 + r2;
-    if (R >= d || R < r2) { /* overflow implies R > d */
-        Q += 1;
-        R -= d;
-    }
-    *r = R;
-    return Q;
-#else
-    uint64_t d0, d1, q0, q1, r1, r0, m;
-
-    d0 = (uint32_t)d;
-    d1 = d >> 32;
-
-    r1 = n1 % d1;
-    q1 = n1 / d1;
-    m = q1 * d0;
-    r1 = (r1 << 32) | (n0 >> 32);
-    if (r1 < m) {
-        q1 -= 1;
-        r1 += d;
-        if (r1 >= d) {
-            if (r1 < m) {
-                q1 -= 1;
-                r1 += d;
-            }
-        }
-    }
-    r1 -= m;
-
-    r0 = r1 % d1;
-    q0 = r1 / d1;
-    m = q0 * d0;
-    r0 = (r0 << 32) | (uint32_t)n0;
-    if (r0 < m) {
-        q0 -= 1;
-        r0 += d;
-        if (r0 >= d) {
-            if (r0 < m) {
-                q0 -= 1;
-                r0 += d;
-            }
-        }
-    }
-    r0 -= m;
-
-    *r = r0;
-    return (q1 << 32) | q0;
-#endif
-}
-
 /*----------------------------------------------------------------------------
 | Returns an approximation to the square root of the 32-bit significand given
 | by `a'.  Considered as an integer, `a' must be at least 2^31.  If bit 0 of
diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/host-utils.h
+++ b/include/qemu/host-utils.h
@@ -XXX,XX +XXX,XX @@
  * THE SOFTWARE.
  */
 
+/* Portions of this work are licensed under the terms of the GNU GPL,
+ * version 2 or later. See the COPYING file in the top-level directory.
+ */
+
 #ifndef HOST_UTILS_H
 #define HOST_UTILS_H
 
@@ -XXX,XX +XXX,XX @@ void urshift(uint64_t *plow, uint64_t *phigh, int32_t shift);
  */
 void ulshift(uint64_t *plow, uint64_t *phigh, int32_t shift, bool *overflow);
 
+/* From the GNU Multi Precision Library - longlong.h __udiv_qrnnd
+ * (https://gmplib.org/repo/gmp/file/tip/longlong.h)
+ *
+ * Licensed under the GPLv2/LGPLv3
+ */
+static inline uint64_t udiv_qrnnd(uint64_t *r, uint64_t n1,
+                                  uint64_t n0, uint64_t d)
+{
+#if defined(__x86_64__)
+    uint64_t q;
+    asm("divq %4" : "=a"(q), "=d"(*r) : "0"(n0), "1"(n1), "rm"(d));
+    return q;
+#elif defined(__s390x__) && !defined(__clang__)
+    /* Need to use a TImode type to get an even register pair for DLGR.  */
+    unsigned __int128 n = (unsigned __int128)n1 << 64 | n0;
+    asm("dlgr %0, %1" : "+r"(n) : "r"(d));
+    *r = n >> 64;
+    return n;
+#elif defined(_ARCH_PPC64) && defined(_ARCH_PWR7)
+    /* From Power ISA 2.06, programming note for divdeu.  */
+    uint64_t q1, q2, Q, r1, r2, R;
+    asm("divdeu %0,%2,%4; divdu %1,%3,%4"
+        : "=&r"(q1), "=r"(q2)
+        : "r"(n1), "r"(n0), "r"(d));
+    r1 = -(q1 * d);         /* low part of (n1<<64) - (q1 * d) */
+    r2 = n0 - (q2 * d);
+    Q = q1 + q2;
+    R = r1 + r2;
+    if (R >= d || R < r2) { /* overflow implies R > d */
+        Q += 1;
+        R -= d;
+    }
+    *r = R;
+    return Q;
+#else
+    uint64_t d0, d1, q0, q1, r1, r0, m;
+
+    d0 = (uint32_t)d;
+    d1 = d >> 32;
+
+    r1 = n1 % d1;
+    q1 = n1 / d1;
+    m = q1 * d0;
+    r1 = (r1 << 32) | (n0 >> 32);
+    if (r1 < m) {
+        q1 -= 1;
+        r1 += d;
+        if (r1 >= d) {
+            if (r1 < m) {
+                q1 -= 1;
+                r1 += d;
+            }
+        }
+    }
+    r1 -= m;
+
+    r0 = r1 % d1;
+    q0 = r1 / d1;
+    m = q0 * d0;
+    r0 = (r0 << 32) | (uint32_t)n0;
+    if (r0 < m) {
+        q0 -= 1;
+        r0 += d;
+        if (r0 >= d) {
+            if (r0 < m) {
+                q0 -= 1;
+                r0 += d;
+            }
+        }
+    }
+    r0 -= m;
+
+    *r = r0;
+    return (q1 << 32) | q0;
+#endif
+}
+
 #endif
-- 
2.25.1

From: Luis Pires <luis.pires@eldorado.org.br>

These will be used to implement new decimal floating point
instructions from Power ISA 3.1.

The remainder is now returned directly by divu128/divs128,
freeing up phigh to receive the high 64 bits of the quotient.

Signed-off-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20211025191154.350831-4-luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/hw/clock.h        |   6 +-
 include/qemu/host-utils.h |  20 ++++--
 target/ppc/int_helper.c   |   9 +--
 util/host-utils.c         | 133 +++++++++++++++++++++++++-------------
 4 files changed, 108 insertions(+), 60 deletions(-)

diff --git a/include/hw/clock.h b/include/hw/clock.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/clock.h
+++ b/include/hw/clock.h
@@ -XXX,XX +XXX,XX @@ static inline uint64_t clock_ns_to_ticks(const Clock *clk, uint64_t ns)
     if (clk->period == 0) {
         return 0;
     }
-    /*
-     * BUG: when CONFIG_INT128 is not defined, the current implementation of
-     * divu128 does not return a valid truncated quotient, so the result will
-     * be wrong.
-     */
+
     divu128(&lo, &hi, clk->period);
     return lo;
 }
diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/host-utils.h
+++ b/include/qemu/host-utils.h
@@ -XXX,XX +XXX,XX @@ static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
     return (__int128_t)a * b / c;
 }
 
-static inline void divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor)
+static inline uint64_t divu128(uint64_t *plow, uint64_t *phigh,
+                               uint64_t divisor)
 {
     __uint128_t dividend = ((__uint128_t)*phigh << 64) | *plow;
     __uint128_t result = dividend / divisor;
+
     *plow = result;
-    *phigh = dividend % divisor;
+    *phigh = result >> 64;
+    return dividend % divisor;
 }
 
-static inline void divs128(int64_t *plow, int64_t *phigh, int64_t divisor)
+static inline int64_t divs128(uint64_t *plow, int64_t *phigh,
+                              int64_t divisor)
 {
-    __int128_t dividend = ((__int128_t)*phigh << 64) | (uint64_t)*plow;
+    __int128_t dividend = ((__int128_t)*phigh << 64) | *plow;
     __int128_t result = dividend / divisor;
+
     *plow = result;
-    *phigh = dividend % divisor;
+    *phigh = result >> 64;
+    return dividend % divisor;
 }
 #else
 void muls64(uint64_t *plow, uint64_t *phigh, int64_t a, int64_t b);
 void mulu64(uint64_t *plow, uint64_t *phigh, uint64_t a, uint64_t b);
-void divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor);
-void divs128(int64_t *plow, int64_t *phigh, int64_t divisor);
+uint64_t divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor);
+int64_t divs128(uint64_t *plow, int64_t *phigh, int64_t divisor);
 
 static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
 {
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -XXX,XX +XXX,XX @@ uint64_t helper_divdeu(CPUPPCState *env, uint64_t ra, uint64_t rb, uint32_t oe)
 
 uint64_t helper_divde(CPUPPCState *env, uint64_t rau, uint64_t rbu, uint32_t oe)
 {
-    int64_t rt = 0;
+    uint64_t rt = 0;
     int64_t ra = (int64_t)rau;
     int64_t rb = (int64_t)rbu;
     int overflow = 0;
@@ -XXX,XX +XXX,XX @@ uint32_t helper_bcdcfsq(ppc_avr_t *r, ppc_avr_t *b, uint32_t ps)
     int cr;
     uint64_t lo_value;
     uint64_t hi_value;
+    uint64_t rem;
     ppc_avr_t ret = { .u64 = { 0, 0 } };
 
     if (b->VsrSD(0) < 0) {
@@ -XXX,XX +XXX,XX @@ uint32_t helper_bcdcfsq(ppc_avr_t *r, ppc_avr_t *b, uint32_t ps)
          * In that case, we leave r unchanged.
          */
     } else {
-        divu128(&lo_value, &hi_value, 1000000000000000ULL);
+        rem = divu128(&lo_value, &hi_value, 1000000000000000ULL);
 
-        for (i = 1; i < 16; hi_value /= 10, i++) {
-            bcd_put_digit(&ret, hi_value % 10, i);
+        for (i = 1; i < 16; rem /= 10, i++) {
+            bcd_put_digit(&ret, rem % 10, i);
         }
 
         for (; i < 32; lo_value /= 10, i++) {
diff --git a/util/host-utils.c b/util/host-utils.c
index XXXXXXX..XXXXXXX 100644
--- a/util/host-utils.c
+++ b/util/host-utils.c
@@ -XXX,XX +XXX,XX @@ void muls64 (uint64_t *plow, uint64_t *phigh, int64_t a, int64_t b)
 }
 
 /*
- * Unsigned 128-by-64 division. Returns quotient via plow and
- * remainder via phigh.
- * The result must fit in 64 bits (plow) - otherwise, the result
- * is undefined.
- * This function will cause a division by zero if passed a zero divisor.
+ * Unsigned 128-by-64 division.
+ * Returns the remainder.
+ * Returns quotient via plow and phigh.
+ * Also returns the remainder via the function return value.
  */
-void divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor)
+uint64_t divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor)
 {
     uint64_t dhi = *phigh;
     uint64_t dlo = *plow;
-    unsigned i;
-    uint64_t carry = 0;
+    uint64_t rem, dhighest;
+    int sh;
 
     if (divisor == 0 || dhi == 0) {
         *plow  = dlo / divisor;
-        *phigh = dlo % divisor;
+        *phigh = 0;
+        return dlo % divisor;
     } else {
+        sh = clz64(divisor);
 
-        for (i = 0; i < 64; i++) {
-            carry = dhi >> 63;
-            dhi = (dhi << 1) | (dlo >> 63);
-            if (carry || (dhi >= divisor)) {
-                dhi -= divisor;
-                carry = 1;
-            } else {
-                carry = 0;
+        if (dhi < divisor) {
+            if (sh != 0) {
+                /* normalize the divisor, shifting the dividend accordingly */
+                divisor <<= sh;
+                dhi = (dhi << sh) | (dlo >> (64 - sh));
+                dlo <<= sh;
             }
-            dlo = (dlo << 1) | carry;
+
+            *phigh = 0;
+            *plow = udiv_qrnnd(&rem, dhi, dlo, divisor);
+        } else {
+            if (sh != 0) {
+                /* normalize the divisor, shifting the dividend accordingly */
+                divisor <<= sh;
+                dhighest = dhi >> (64 - sh);
+                dhi = (dhi << sh) | (dlo >> (64 - sh));
+                dlo <<= sh;
+
+                *phigh = udiv_qrnnd(&dhi, dhighest, dhi, divisor);
+            } else {
+                /**
+                 * dhi >= divisor
+                 * Since the MSB of divisor is set (sh == 0),
+                 * (dhi - divisor) < divisor
+                 *
+                 * Thus, the high part of the quotient is 1, and we can
+                 * calculate the low part with a single call to udiv_qrnnd
+                 * after subtracting divisor from dhi
+                 */
+                dhi -= divisor;
+                *phigh = 1;
+            }
+
+            *plow = udiv_qrnnd(&rem, dhi, dlo, divisor);
         }
 
-        *plow = dlo;
-        *phigh = dhi;
+        /*
+         * since the dividend/divisor might have been normalized,
+         * the remainder might also have to be shifted back
+         */
+        return rem >> sh;
     }
 }
 
 /*
- * Signed 128-by-64 division. Returns quotient via plow and
- * remainder via phigh.
- * The result must fit in 64 bits (plow) - otherwise, the result
- * is undefined.
- * This function will cause a division by zero if passed a zero divisor.
+ * Signed 128-by-64 division.
+ * Returns quotient via plow and phigh.
+ * Also returns the remainder via the function return value.
  */
-void divs128(int64_t *plow, int64_t *phigh, int64_t divisor)
+int64_t divs128(uint64_t *plow, int64_t *phigh, int64_t divisor)
 {
-    int sgn_dvdnd = *phigh < 0;
-    int sgn_divsr = divisor < 0;
+    bool neg_quotient = false, neg_remainder = false;
+    uint64_t unsig_hi = *phigh, unsig_lo = *plow;
+    uint64_t rem;
 
-    if (sgn_dvdnd) {
-        *plow = ~(*plow);
-        *phigh = ~(*phigh);
-        if (*plow == (int64_t)-1) {
+    if (*phigh < 0) {
+        neg_quotient = !neg_quotient;
+        neg_remainder = !neg_remainder;
+
+        if (unsig_lo == 0) {
+            unsig_hi = -unsig_hi;
+        } else {
+            unsig_hi = ~unsig_hi;
+            unsig_lo = -unsig_lo;
+        }
+    }
+
+    if (divisor < 0) {
+        neg_quotient = !neg_quotient;
+
+        divisor = -divisor;
+    }
+
+    rem = divu128(&unsig_lo, &unsig_hi, (uint64_t)divisor);
+
+    if (neg_quotient) {
+        if (unsig_lo == 0) {
+            *phigh = -unsig_hi;
             *plow = 0;
-            (*phigh)++;
-         } else {
-            (*plow)++;
-         }
+        } else {
+            *phigh = ~unsig_hi;
+            *plow = -unsig_lo;
+        }
+    } else {
+        *phigh = unsig_hi;
+        *plow = unsig_lo;
     }
 
-    if (sgn_divsr) {
-        divisor = 0 - divisor;
-    }
-
-    divu128((uint64_t *)plow, (uint64_t *)phigh, (uint64_t)divisor);
-
-    if (sgn_dvdnd  ^ sgn_divsr) {
-        *plow = 0 - *plow;
+    if (neg_remainder) {
+        return -rem;
+    } else {
+        return rem;
     }
 }
 #endif
-- 
2.25.1

From: Luis Pires <luis.pires@eldorado.org.br>

Signed-off-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20211025191154.350831-5-luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tests/unit/test-div128.c | 197 +++++++++++++++++++++++++++++++++++++++
 tests/unit/meson.build   |   1 +
 2 files changed, 198 insertions(+)
 create mode 100644 tests/unit/test-div128.c

diff --git a/tests/unit/test-div128.c b/tests/unit/test-div128.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/unit/test-div128.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * Test 128-bit division functions
+ *
+ * Copyright (c) 2021 Instituto de Pesquisas Eldorado (eldorado.org.br)
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+
+typedef struct {
+    uint64_t high;
+    uint64_t low;
+    uint64_t rhigh;
+    uint64_t rlow;
+    uint64_t divisor;
+    uint64_t remainder;
+} test_data_unsigned;
+
+typedef struct {
+    int64_t high;
+    uint64_t low;
+    int64_t rhigh;
+    uint64_t rlow;
+    int64_t divisor;
+    int64_t remainder;
+} test_data_signed;
+
+static const test_data_unsigned test_table_unsigned[] = {
+    /* Dividend fits in 64 bits */
+    { 0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000000ULL, 0x0000000000000000ULL,
+      0x0000000000000001ULL, 0x0000000000000000ULL},
+    { 0x0000000000000000ULL, 0x0000000000000001ULL,
+      0x0000000000000000ULL, 0x0000000000000001ULL,
+      0x0000000000000001ULL, 0x0000000000000000ULL},
+    { 0x0000000000000000ULL, 0x0000000000000003ULL,
+      0x0000000000000000ULL, 0x0000000000000001ULL,
+      0x0000000000000002ULL, 0x0000000000000001ULL},
+    { 0x0000000000000000ULL, 0x8000000000000000ULL,
+      0x0000000000000000ULL, 0x8000000000000000ULL,
+      0x0000000000000001ULL, 0x0000000000000000ULL},
+    { 0x0000000000000000ULL, 0xa000000000000000ULL,
+      0x0000000000000000ULL, 0x0000000000000002ULL,
+      0x4000000000000000ULL, 0x2000000000000000ULL},
+    { 0x0000000000000000ULL, 0x8000000000000000ULL,
+      0x0000000000000000ULL, 0x0000000000000001ULL,
+      0x8000000000000000ULL, 0x0000000000000000ULL},
+
+    /* Dividend > 64 bits, with MSB 0 */
+    { 0x123456789abcdefeULL, 0xefedcba987654321ULL,
+      0x123456789abcdefeULL, 0xefedcba987654321ULL,
+      0x0000000000000001ULL, 0x0000000000000000ULL},
+    { 0x123456789abcdefeULL, 0xefedcba987654321ULL,
+      0x0000000000000001ULL, 0x000000000000000dULL,
+      0x123456789abcdefeULL, 0x03456789abcdf03bULL},
+    { 0x123456789abcdefeULL, 0xefedcba987654321ULL,
+      0x0123456789abcdefULL, 0xeefedcba98765432ULL,
+      0x0000000000000010ULL, 0x0000000000000001ULL},
+
+    /* Dividend > 64 bits, with MSB 1 */
+    { 0xfeeddccbbaa99887ULL, 0x766554433221100fULL,
+      0xfeeddccbbaa99887ULL, 0x766554433221100fULL,
+      0x0000000000000001ULL, 0x0000000000000000ULL},
+    { 0xfeeddccbbaa99887ULL, 0x766554433221100fULL,
+      0x0000000000000001ULL, 0x0000000000000000ULL,
+      0xfeeddccbbaa99887ULL, 0x766554433221100fULL},
+    { 0xfeeddccbbaa99887ULL, 0x766554433221100fULL,
+      0x0feeddccbbaa9988ULL, 0x7766554433221100ULL,
+      0x0000000000000010ULL, 0x000000000000000fULL},
+    { 0xfeeddccbbaa99887ULL, 0x766554433221100fULL,
+      0x000000000000000eULL, 0x00f0f0f0f0f0f35aULL,
+      0x123456789abcdefeULL, 0x0f8922bc55ef90c3ULL},
+
+    /**
+     * Divisor == 64 bits, with MSB 1
+     * and high 64 bits of dividend >= divisor
+     * (for testing normalization)
+     */
+    { 0xfeeddccbbaa99887ULL, 0x766554433221100fULL,
+      0x0000000000000001ULL, 0x0000000000000000ULL,
+      0xfeeddccbbaa99887ULL, 0x766554433221100fULL},
+    { 0xfeeddccbbaa99887ULL, 0x766554433221100fULL,
+      0x0000000000000001ULL, 0xfddbb9977553310aULL,
+      0x8000000000000001ULL, 0x78899aabbccddf05ULL},
+
+    /* Dividend > 64 bits, divisor almost as big */
+    { 0x0000000000000001ULL, 0x23456789abcdef01ULL,
+      0x0000000000000000ULL, 0x000000000000000fULL,
+      0x123456789abcdefeULL, 0x123456789abcde1fULL},
+};
+
+static const test_data_signed test_table_signed[] = {
+    /* Positive dividend, positive/negative divisors */
+    { 0x0000000000000000LL, 0x0000000000bc614eULL,
+      0x0000000000000000LL, 0x0000000000bc614eULL,
+      0x0000000000000001LL, 0x0000000000000000LL},
+    { 0x0000000000000000LL, 0x0000000000bc614eULL,
+      0xffffffffffffffffLL, 0xffffffffff439eb2ULL,
+      0xffffffffffffffffLL, 0x0000000000000000LL},
+    { 0x0000000000000000LL, 0x0000000000bc614eULL,
+      0x0000000000000000LL, 0x00000000005e30a7ULL,
+      0x0000000000000002LL, 0x0000000000000000LL},
+    { 0x0000000000000000LL, 0x0000000000bc614eULL,
+      0xffffffffffffffffLL, 0xffffffffffa1cf59ULL,
+      0xfffffffffffffffeLL, 0x0000000000000000LL},
+    { 0x0000000000000000LL, 0x0000000000bc614eULL,
+      0x0000000000000000LL, 0x0000000000178c29ULL,
+      0x0000000000000008LL, 0x0000000000000006LL},
+    { 0x0000000000000000LL, 0x0000000000bc614eULL,
+      0xffffffffffffffffLL, 0xffffffffffe873d7ULL,
+      0xfffffffffffffff8LL, 0x0000000000000006LL},
+    { 0x0000000000000000LL, 0x0000000000bc614eULL,
+      0x0000000000000000LL, 0x000000000000550dULL,
+      0x0000000000000237LL, 0x0000000000000183LL},
+    { 0x0000000000000000LL, 0x0000000000bc614eULL,
+      0xffffffffffffffffLL, 0xffffffffffffaaf3ULL,
+      0xfffffffffffffdc9LL, 0x0000000000000183LL},
+
+    /* Negative dividend, positive/negative divisors */
+    { 0xffffffffffffffffLL, 0xffffffffff439eb2ULL,
+      0xffffffffffffffffLL, 0xffffffffff439eb2ULL,
+      0x0000000000000001LL, 0x0000000000000000LL},
+    { 0xffffffffffffffffLL, 0xffffffffff439eb2ULL,
+      0x0000000000000000LL, 0x0000000000bc614eULL,
+      0xffffffffffffffffLL, 0x0000000000000000LL},
+    { 0xffffffffffffffffLL, 0xffffffffff439eb2ULL,
+      0xffffffffffffffffLL, 0xffffffffffa1cf59ULL,
+      0x0000000000000002LL, 0x0000000000000000LL},
+    { 0xffffffffffffffffLL, 0xffffffffff439eb2ULL,
+      0x0000000000000000LL, 0x00000000005e30a7ULL,
+      0xfffffffffffffffeLL, 0x0000000000000000LL},
+    { 0xffffffffffffffffLL, 0xffffffffff439eb2ULL,
+      0xffffffffffffffffLL, 0xffffffffffe873d7ULL,
+      0x0000000000000008LL, 0xfffffffffffffffaLL},
+    { 0xffffffffffffffffLL, 0xffffffffff439eb2ULL,
+      0x0000000000000000LL, 0x0000000000178c29ULL,
+      0xfffffffffffffff8LL, 0xfffffffffffffffaLL},
+    { 0xffffffffffffffffLL, 0xffffffffff439eb2ULL,
+      0xffffffffffffffffLL, 0xffffffffffffaaf3ULL,
+      0x0000000000000237LL, 0xfffffffffffffe7dLL},
+    { 0xffffffffffffffffLL, 0xffffffffff439eb2ULL,
+      0x0000000000000000LL, 0x000000000000550dULL,
+      0xfffffffffffffdc9LL, 0xfffffffffffffe7dLL},
+};
+
+static void test_divu128(void)
+{
+    int i;
+    uint64_t rem;
+    test_data_unsigned tmp;
+
+    for (i = 0; i < ARRAY_SIZE(test_table_unsigned); ++i) {
+        tmp = test_table_unsigned[i];
+
+        rem = divu128(&tmp.low, &tmp.high, tmp.divisor);
+        g_assert_cmpuint(tmp.low, ==, tmp.rlow);
+        g_assert_cmpuint(tmp.high, ==, tmp.rhigh);
+        g_assert_cmpuint(rem, ==, tmp.remainder);
+    }
+}
+
+static void test_divs128(void)
+{
+    int i;
+    int64_t rem;
+    test_data_signed tmp;
+
+    for (i = 0; i < ARRAY_SIZE(test_table_signed); ++i) {
+        tmp = test_table_signed[i];
+
+        rem = divs128(&tmp.low, &tmp.high, tmp.divisor);
+        g_assert_cmpuint(tmp.low, ==, tmp.rlow);
+        g_assert_cmpuint(tmp.high, ==, tmp.rhigh);
+        g_assert_cmpuint(rem, ==, tmp.remainder);
+    }
+}
+
+int main(int argc, char **argv)
+{
+    g_test_init(&argc, &argv, NULL);
+    g_test_add_func("/host-utils/test_divu128", test_divu128);
+    g_test_add_func("/host-utils/test_divs128", test_divs128);
+    return g_test_run();
+}
diff --git a/tests/unit/meson.build b/tests/unit/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/tests/unit/meson.build
+++ b/tests/unit/meson.build
@@ -XXX,XX +XXX,XX @@ tests = {
   # all code tested by test-x86-cpuid is inside topology.h
   'test-x86-cpuid': [],
   'test-cutils': [],
+  'test-div128': [],
   'test-shift128': [],
   'test-mul64': [],
   # all code tested by test-int128 is inside int128.h
-- 
2.25.1

Prepare for tracking different masks by renaming this one.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 142 +++++++++++++++++++++++++------------------------
 1 file changed, 72 insertions(+), 70 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ typedef struct TempOptInfo {
     TCGTemp *prev_copy;
     TCGTemp *next_copy;
     uint64_t val;
-    uint64_t mask;
+    uint64_t z_mask;  /* mask bit is 0 if and only if value bit is 0 */
 } TempOptInfo;
 
 static inline TempOptInfo *ts_info(TCGTemp *ts)
@@ -XXX,XX +XXX,XX @@ static void reset_ts(TCGTemp *ts)
     ti->next_copy = ts;
     ti->prev_copy = ts;
     ti->is_const = false;
-    ti->mask = -1;
+    ti->z_mask = -1;
 }
 
 static void reset_temp(TCGArg arg)
@@ -XXX,XX +XXX,XX @@ static void init_ts_info(TCGTempSet *temps_used, TCGTemp *ts)
     if (ts->kind == TEMP_CONST) {
         ti->is_const = true;
         ti->val = ts->val;
-        ti->mask = ts->val;
+        ti->z_mask = ts->val;
         if (TCG_TARGET_REG_BITS > 32 && ts->type == TCG_TYPE_I32) {
             /* High bits of a 32-bit quantity are garbage.  */
-            ti->mask |= ~0xffffffffull;
+            ti->z_mask |= ~0xffffffffull;
         }
     } else {
         ti->is_const = false;
-        ti->mask = -1;
+        ti->z_mask = -1;
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
     const TCGOpDef *def;
     TempOptInfo *di;
     TempOptInfo *si;
-    uint64_t mask;
+    uint64_t z_mask;
     TCGOpcode new_op;
 
     if (ts_are_copies(dst_ts, src_ts)) {
@@ -XXX,XX +XXX,XX @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
     op->args[0] = dst;
     op->args[1] = src;
 
-    mask = si->mask;
+    z_mask = si->z_mask;
     if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_mov_i32) {
         /* High bits of the destination are now garbage.  */
-        mask |= ~0xffffffffull;
+        z_mask |= ~0xffffffffull;
     }
-    di->mask = mask;
+    di->z_mask = z_mask;
 
     if (src_ts->type == dst_ts->type) {
         TempOptInfo *ni = ts_info(si->next_copy);
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
     }
 
     QTAILQ_FOREACH_SAFE(op, &s->ops, link, op_next) {
-        uint64_t mask, partmask, affected, tmp;
+        uint64_t z_mask, partmask, affected, tmp;
         int nb_oargs, nb_iargs;
         TCGOpcode opc = op->opc;
         const TCGOpDef *def = &tcg_op_defs[opc];
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
 
         /* Simplify using known-zero bits. Currently only ops with a single
            output argument is supported. */
-        mask = -1;
+        z_mask = -1;
         affected = -1;
         switch (opc) {
         CASE_OP_32_64(ext8s):
-            if ((arg_info(op->args[1])->mask & 0x80) != 0) {
+            if ((arg_info(op->args[1])->z_mask & 0x80) != 0) {
                 break;
             }
             QEMU_FALLTHROUGH;
         CASE_OP_32_64(ext8u):
-            mask = 0xff;
+            z_mask = 0xff;
             goto and_const;
         CASE_OP_32_64(ext16s):
-            if ((arg_info(op->args[1])->mask & 0x8000) != 0) {
+            if ((arg_info(op->args[1])->z_mask & 0x8000) != 0) {
                 break;
             }
             QEMU_FALLTHROUGH;
         CASE_OP_32_64(ext16u):
-            mask = 0xffff;
+            z_mask = 0xffff;
             goto and_const;
         case INDEX_op_ext32s_i64:
-            if ((arg_info(op->args[1])->mask & 0x80000000) != 0) {
+            if ((arg_info(op->args[1])->z_mask & 0x80000000) != 0) {
                 break;
             }
             QEMU_FALLTHROUGH;
         case INDEX_op_ext32u_i64:
-            mask = 0xffffffffU;
+            z_mask = 0xffffffffU;
             goto and_const;
 
         CASE_OP_32_64(and):
-            mask = arg_info(op->args[2])->mask;
+            z_mask = arg_info(op->args[2])->z_mask;
             if (arg_is_const(op->args[2])) {
         and_const:
-                affected = arg_info(op->args[1])->mask & ~mask;
+                affected = arg_info(op->args[1])->z_mask & ~z_mask;
             }
-            mask = arg_info(op->args[1])->mask & mask;
+            z_mask = arg_info(op->args[1])->z_mask & z_mask;
             break;
 
         case INDEX_op_ext_i32_i64:
-            if ((arg_info(op->args[1])->mask & 0x80000000) != 0) {
+            if ((arg_info(op->args[1])->z_mask & 0x80000000) != 0) {
                 break;
             }
             QEMU_FALLTHROUGH;
         case INDEX_op_extu_i32_i64:
             /* We do not compute affected as it is a size changing op.  */
-            mask = (uint32_t)arg_info(op->args[1])->mask;
+            z_mask = (uint32_t)arg_info(op->args[1])->z_mask;
             break;
 
         CASE_OP_32_64(andc):
             /* Known-zeros does not imply known-ones.  Therefore unless
                op->args[2] is constant, we can't infer anything from it.  */
             if (arg_is_const(op->args[2])) {
-                mask = ~arg_info(op->args[2])->mask;
+                z_mask = ~arg_info(op->args[2])->z_mask;
                 goto and_const;
             }
             /* But we certainly know nothing outside args[1] may be set. */
-            mask = arg_info(op->args[1])->mask;
+            z_mask = arg_info(op->args[1])->z_mask;
             break;
 
         case INDEX_op_sar_i32:
             if (arg_is_const(op->args[2])) {
                 tmp = arg_info(op->args[2])->val & 31;
-                mask = (int32_t)arg_info(op->args[1])->mask >> tmp;
+                z_mask = (int32_t)arg_info(op->args[1])->z_mask >> tmp;
             }
             break;
         case INDEX_op_sar_i64:
             if (arg_is_const(op->args[2])) {
                 tmp = arg_info(op->args[2])->val & 63;
-                mask = (int64_t)arg_info(op->args[1])->mask >> tmp;
+                z_mask = (int64_t)arg_info(op->args[1])->z_mask >> tmp;
             }
             break;
 
         case INDEX_op_shr_i32:
             if (arg_is_const(op->args[2])) {
                 tmp = arg_info(op->args[2])->val & 31;
-                mask = (uint32_t)arg_info(op->args[1])->mask >> tmp;
+                z_mask = (uint32_t)arg_info(op->args[1])->z_mask >> tmp;
             }
             break;
         case INDEX_op_shr_i64:
             if (arg_is_const(op->args[2])) {
                 tmp = arg_info(op->args[2])->val & 63;
-                mask = (uint64_t)arg_info(op->args[1])->mask >> tmp;
+                z_mask = (uint64_t)arg_info(op->args[1])->z_mask >> tmp;
             }
             break;
 
         case INDEX_op_extrl_i64_i32:
-            mask = (uint32_t)arg_info(op->args[1])->mask;
+            z_mask = (uint32_t)arg_info(op->args[1])->z_mask;
             break;
         case INDEX_op_extrh_i64_i32:
-            mask = (uint64_t)arg_info(op->args[1])->mask >> 32;
+            z_mask = (uint64_t)arg_info(op->args[1])->z_mask >> 32;
             break;
 
         CASE_OP_32_64(shl):
             if (arg_is_const(op->args[2])) {
                 tmp = arg_info(op->args[2])->val & (TCG_TARGET_REG_BITS - 1);
-                mask = arg_info(op->args[1])->mask << tmp;
+                z_mask = arg_info(op->args[1])->z_mask << tmp;
             }
             break;
 
         CASE_OP_32_64(neg):
             /* Set to 1 all bits to the left of the rightmost.  */
-            mask = -(arg_info(op->args[1])->mask
-                     & -arg_info(op->args[1])->mask);
+            z_mask = -(arg_info(op->args[1])->z_mask
+                       & -arg_info(op->args[1])->z_mask);
             break;
 
         CASE_OP_32_64(deposit):
-            mask = deposit64(arg_info(op->args[1])->mask,
-                             op->args[3], op->args[4],
-                             arg_info(op->args[2])->mask);
+            z_mask = deposit64(arg_info(op->args[1])->z_mask,
+                               op->args[3], op->args[4],
+                               arg_info(op->args[2])->z_mask);
             break;
 
         CASE_OP_32_64(extract):
-            mask = extract64(arg_info(op->args[1])->mask,
-                             op->args[2], op->args[3]);
+            z_mask = extract64(arg_info(op->args[1])->z_mask,
+                               op->args[2], op->args[3]);
             if (op->args[2] == 0) {
-                affected = arg_info(op->args[1])->mask & ~mask;
+                affected = arg_info(op->args[1])->z_mask & ~z_mask;
             }
             break;
         CASE_OP_32_64(sextract):
-            mask = sextract64(arg_info(op->args[1])->mask,
-                              op->args[2], op->args[3]);
-            if (op->args[2] == 0 && (tcg_target_long)mask >= 0) {
-                affected = arg_info(op->args[1])->mask & ~mask;
+            z_mask = sextract64(arg_info(op->args[1])->z_mask,
+                                op->args[2], op->args[3]);
+            if (op->args[2] == 0 && (tcg_target_long)z_mask >= 0) {
+                affected = arg_info(op->args[1])->z_mask & ~z_mask;
             }
             break;
 
         CASE_OP_32_64(or):
         CASE_OP_32_64(xor):
-            mask = arg_info(op->args[1])->mask | arg_info(op->args[2])->mask;
+            z_mask = arg_info(op->args[1])->z_mask
+                   | arg_info(op->args[2])->z_mask;
             break;
 
         case INDEX_op_clz_i32:
         case INDEX_op_ctz_i32:
-            mask = arg_info(op->args[2])->mask | 31;
+            z_mask = arg_info(op->args[2])->z_mask | 31;
             break;
 
         case INDEX_op_clz_i64:
         case INDEX_op_ctz_i64:
-            mask = arg_info(op->args[2])->mask | 63;
+            z_mask = arg_info(op->args[2])->z_mask | 63;
             break;
 
         case INDEX_op_ctpop_i32:
-            mask = 32 | 31;
+            z_mask = 32 | 31;
             break;
         case INDEX_op_ctpop_i64:
-            mask = 64 | 63;
+            z_mask = 64 | 63;
             break;
 
         CASE_OP_32_64(setcond):
         case INDEX_op_setcond2_i32:
-            mask = 1;
+            z_mask = 1;
             break;
 
         CASE_OP_32_64(movcond):
-            mask = arg_info(op->args[3])->mask | arg_info(op->args[4])->mask;
+            z_mask = arg_info(op->args[3])->z_mask
+                   | arg_info(op->args[4])->z_mask;
             break;
 
         CASE_OP_32_64(ld8u):
-            mask = 0xff;
+            z_mask = 0xff;
             break;
         CASE_OP_32_64(ld16u):
-            mask = 0xffff;
+            z_mask = 0xffff;
             break;
         case INDEX_op_ld32u_i64:
-            mask = 0xffffffffu;
+            z_mask = 0xffffffffu;
             break;
 
         CASE_OP_32_64(qemu_ld):
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 MemOpIdx oi = op->args[nb_oargs + nb_iargs];
                 MemOp mop = get_memop(oi);
                 if (!(mop & MO_SIGN)) {
-                    mask = (2ULL << ((8 << (mop & MO_SIZE)) - 1)) - 1;
+                    z_mask = (2ULL << ((8 << (mop & MO_SIZE)) - 1)) - 1;
                 }
             }
             break;
 
         CASE_OP_32_64(bswap16):
-            mask = arg_info(op->args[1])->mask;
-            if (mask <= 0xffff) {
+            z_mask = arg_info(op->args[1])->z_mask;
+            if (z_mask <= 0xffff) {
                 op->args[2] |= TCG_BSWAP_IZ;
             }
-            mask = bswap16(mask);
+            z_mask = bswap16(z_mask);
             switch (op->args[2] & (TCG_BSWAP_OZ | TCG_BSWAP_OS)) {
             case TCG_BSWAP_OZ:
                 break;
             case TCG_BSWAP_OS:
-                mask = (int16_t)mask;
+                z_mask = (int16_t)z_mask;
                 break;
             default: /* undefined high bits */
-                mask |= MAKE_64BIT_MASK(16, 48);
+                z_mask |= MAKE_64BIT_MASK(16, 48);
                 break;
             }
             break;
 
         case INDEX_op_bswap32_i64:
-            mask = arg_info(op->args[1])->mask;
-            if (mask <= 0xffffffffu) {
+            z_mask = arg_info(op->args[1])->z_mask;
+            if (z_mask <= 0xffffffffu) {
                 op->args[2] |= TCG_BSWAP_IZ;
             }
-            mask = bswap32(mask);
+            z_mask = bswap32(z_mask);
             switch (op->args[2] & (TCG_BSWAP_OZ | TCG_BSWAP_OS)) {
             case TCG_BSWAP_OZ:
                 break;
             case TCG_BSWAP_OS:
-                mask = (int32_t)mask;
+                z_mask = (int32_t)z_mask;
                 break;
             default: /* undefined high bits */
-                mask |= MAKE_64BIT_MASK(32, 32);
+                z_mask |= MAKE_64BIT_MASK(32, 32);
                 break;
             }
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         /* 32-bit ops generate 32-bit results.  For the result is zero test
            below, we can ignore high bits, but for further optimizations we
            need to record that the high bits contain garbage.  */
-        partmask = mask;
+        partmask = z_mask;
         if (!(def->flags & TCG_OPF_64BIT)) {
-            mask |= ~(tcg_target_ulong)0xffffffffu;
+            z_mask |= ~(tcg_target_ulong)0xffffffffu;
             partmask &= 0xffffffffu;
             affected &= 0xffffffffu;
         }
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                    vs the high word of the input.  */
             do_setcond_high:
                 reset_temp(op->args[0]);
-                arg_info(op->args[0])->mask = 1;
+                arg_info(op->args[0])->z_mask = 1;
                 op->opc = INDEX_op_setcond_i32;
                 op->args[1] = op->args[2];
                 op->args[2] = op->args[4];
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 }
             do_setcond_low:
                 reset_temp(op->args[0]);
-                arg_info(op->args[0])->mask = 1;
+                arg_info(op->args[0])->z_mask = 1;
                 op->opc = INDEX_op_setcond_i32;
                 op->args[2] = op->args[3];
                 op->args[3] = op->args[5];
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             /* Default case: we know nothing about operation (or were unable
                to compute the operation result) so no propagation is done.
                We trash everything if the operation is the end of a basic
-               block, otherwise we only trash the output args.  "mask" is
+               block, otherwise we only trash the output args.  "z_mask" is
                the non-zero bits mask for the first output arg.  */
             if (def->flags & TCG_OPF_BB_END) {
                 memset(&temps_used, 0, sizeof(temps_used));
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                     /* Save the corresponding known-zero bits mask for the
                        first output argument (only one supported so far). */
                     if (i == 0) {
-                        arg_info(op->args[i])->mask = mask;
+                        arg_info(op->args[i])->z_mask = z_mask;
                     }
                 }
             }
-- 
2.25.1

Provide what will become a larger context for splitting
the very large tcg_optimize function.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 77 ++++++++++++++++++++++++++------------------------
 1 file changed, 40 insertions(+), 37 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ typedef struct TempOptInfo {
     uint64_t z_mask;  /* mask bit is 0 if and only if value bit is 0 */
 } TempOptInfo;
 
+typedef struct OptContext {
+    TCGTempSet temps_used;
+} OptContext;
+
 static inline TempOptInfo *ts_info(TCGTemp *ts)
 {
     return ts->state_ptr;
@@ -XXX,XX +XXX,XX @@ static void reset_temp(TCGArg arg)
 }
 
 /* Initialize and activate a temporary.  */
-static void init_ts_info(TCGTempSet *temps_used, TCGTemp *ts)
+static void init_ts_info(OptContext *ctx, TCGTemp *ts)
 {
     size_t idx = temp_idx(ts);
     TempOptInfo *ti;
 
-    if (test_bit(idx, temps_used->l)) {
+    if (test_bit(idx, ctx->temps_used.l)) {
         return;
     }
-    set_bit(idx, temps_used->l);
+    set_bit(idx, ctx->temps_used.l);
 
     ti = ts->state_ptr;
     if (ti == NULL) {
@@ -XXX,XX +XXX,XX @@ static void init_ts_info(TCGTempSet *temps_used, TCGTemp *ts)
     }
 }
 
-static void init_arg_info(TCGTempSet *temps_used, TCGArg arg)
+static void init_arg_info(OptContext *ctx, TCGArg arg)
 {
-    init_ts_info(temps_used, arg_temp(arg));
+    init_ts_info(ctx, arg_temp(arg));
 }
 
 static TCGTemp *find_better_copy(TCGContext *s, TCGTemp *ts)
@@ -XXX,XX +XXX,XX @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
     }
 }
 
-static void tcg_opt_gen_movi(TCGContext *s, TCGTempSet *temps_used,
+static void tcg_opt_gen_movi(TCGContext *s, OptContext *ctx,
                              TCGOp *op, TCGArg dst, uint64_t val)
 {
     const TCGOpDef *def = &tcg_op_defs[op->opc];
@@ -XXX,XX +XXX,XX @@ static void tcg_opt_gen_movi(TCGContext *s, TCGTempSet *temps_used,
 
     /* Convert movi to mov with constant temp. */
     tv = tcg_constant_internal(type, val);
-    init_ts_info(temps_used, tv);
+    init_ts_info(ctx, tv);
     tcg_opt_gen_mov(s, op, dst, temp_arg(tv));
 }
 
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
 {
     int nb_temps, nb_globals, i;
     TCGOp *op, *op_next, *prev_mb = NULL;
-    TCGTempSet temps_used;
+    OptContext ctx = {};
 
     /* Array VALS has an element for each temp.
        If this temp holds a constant then its value is kept in VALS' element.
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
     nb_temps = s->nb_temps;
     nb_globals = s->nb_globals;
 
-    memset(&temps_used, 0, sizeof(temps_used));
     for (i = 0; i < nb_temps; ++i) {
         s->temps[i].state_ptr = NULL;
     }
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             for (i = 0; i < nb_oargs + nb_iargs; i++) {
                 TCGTemp *ts = arg_temp(op->args[i]);
                 if (ts) {
-                    init_ts_info(&temps_used, ts);
+                    init_ts_info(&ctx, ts);
                 }
             }
         } else {
             nb_oargs = def->nb_oargs;
             nb_iargs = def->nb_iargs;
             for (i = 0; i < nb_oargs + nb_iargs; i++) {
-                init_arg_info(&temps_used, op->args[i]);
+                init_arg_info(&ctx, op->args[i]);
             }
         }
 
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(rotr):
             if (arg_is_const(op->args[1])
                 && arg_info(op->args[1])->val == 0) {
-                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], 0);
+                tcg_opt_gen_movi(s, &ctx, op, op->args[0], 0);
                 continue;
             }
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
 
         if (partmask == 0) {
             tcg_debug_assert(nb_oargs == 1);
-            tcg_opt_gen_movi(s, &temps_used, op, op->args[0], 0);
+            tcg_opt_gen_movi(s, &ctx, op, op->args[0], 0);
             continue;
         }
         if (affected == 0) {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(mulsh):
             if (arg_is_const(op->args[2])
                 && arg_info(op->args[2])->val == 0) {
-                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], 0);
+                tcg_opt_gen_movi(s, &ctx, op, op->args[0], 0);
                 continue;
             }
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64_VEC(sub):
         CASE_OP_32_64_VEC(xor):
             if (args_are_copies(op->args[1], op->args[2])) {
-                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], 0);
+                tcg_opt_gen_movi(s, &ctx, op, op->args[0], 0);
                 continue;
             }
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             if (arg_is_const(op->args[1])) {
                 tmp = arg_info(op->args[1])->val;
                 tmp = dup_const(TCGOP_VECE(op), tmp);
-                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         case INDEX_op_dup2_vec:
             assert(TCG_TARGET_REG_BITS == 32);
             if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
-                tcg_opt_gen_movi(s, &temps_used, op, op->args[0],
+                tcg_opt_gen_movi(s, &ctx, op, op->args[0],
                                  deposit64(arg_info(op->args[1])->val, 32, 32,
                                            arg_info(op->args[2])->val));
                 break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         case INDEX_op_extrh_i64_i32:
             if (arg_is_const(op->args[1])) {
                 tmp = do_constant_folding(opc, arg_info(op->args[1])->val, 0);
-                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             if (arg_is_const(op->args[1])) {
                 tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
                                           op->args[2]);
-                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
                 tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
                                           arg_info(op->args[2])->val);
-                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 TCGArg v = arg_info(op->args[1])->val;
                 if (v != 0) {
                     tmp = do_constant_folding(opc, v, 0);
-                    tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
+                    tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
                 } else {
                     tcg_opt_gen_mov(s, op, op->args[0], op->args[2]);
                 }
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 tmp = deposit64(arg_info(op->args[1])->val,
                                 op->args[3], op->args[4],
                                 arg_info(op->args[2])->val);
-                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             if (arg_is_const(op->args[1])) {
                 tmp = extract64(arg_info(op->args[1])->val,
                                 op->args[2], op->args[3]);
-                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             if (arg_is_const(op->args[1])) {
                 tmp = sextract64(arg_info(op->args[1])->val,
                                  op->args[2], op->args[3]);
-                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                     tmp = (int32_t)(((uint32_t)v1 >> shr) |
                                     ((uint32_t)v2 << (32 - shr)));
                 }
-                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             tmp = do_constant_folding_cond(opc, op->args[1],
                                            op->args[2], op->args[3]);
             if (tmp != 2) {
-                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
                 break;
             }
             goto do_default;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                                            op->args[1], op->args[2]);
             if (tmp != 2) {
                 if (tmp) {
-                    memset(&temps_used, 0, sizeof(temps_used));
+                    memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
                     op->opc = INDEX_op_br;
                     op->args[0] = op->args[3];
                 } else {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
 
                 rl = op->args[0];
                 rh = op->args[1];
-                tcg_opt_gen_movi(s, &temps_used, op, rl, (int32_t)a);
-                tcg_opt_gen_movi(s, &temps_used, op2, rh, (int32_t)(a >> 32));
+                tcg_opt_gen_movi(s, &ctx, op, rl, (int32_t)a);
+                tcg_opt_gen_movi(s, &ctx, op2, rh, (int32_t)(a >> 32));
                 break;
             }
             goto do_default;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
 
                 rl = op->args[0];
                 rh = op->args[1];
-                tcg_opt_gen_movi(s, &temps_used, op, rl, (int32_t)r);
-                tcg_opt_gen_movi(s, &temps_used, op2, rh, (int32_t)(r >> 32));
+                tcg_opt_gen_movi(s, &ctx, op, rl, (int32_t)r);
+                tcg_opt_gen_movi(s, &ctx, op2, rh, (int32_t)(r >> 32));
                 break;
             }
             goto do_default;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             if (tmp != 2) {
                 if (tmp) {
             do_brcond_true:
-                    memset(&temps_used, 0, sizeof(temps_used));
+                    memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
                     op->opc = INDEX_op_br;
                     op->args[0] = op->args[5];
                 } else {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 /* Simplify LT/GE comparisons vs zero to a single compare
                    vs the high word of the input.  */
             do_brcond_high:
-                memset(&temps_used, 0, sizeof(temps_used));
+                memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
                 op->opc = INDEX_op_brcond_i32;
                 op->args[0] = op->args[1];
                 op->args[1] = op->args[3];
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                     goto do_default;
                 }
             do_brcond_low:
-                memset(&temps_used, 0, sizeof(temps_used));
+                memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
                 op->opc = INDEX_op_brcond_i32;
                 op->args[1] = op->args[2];
                 op->args[2] = op->args[4];
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                                             op->args[5]);
             if (tmp != 2) {
             do_setcond_const:
-                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
+                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
             } else if ((op->args[5] == TCG_COND_LT
                         || op->args[5] == TCG_COND_GE)
                        && arg_is_const(op->args[3])
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             if (!(tcg_call_flags(op)
                   & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS))) {
                 for (i = 0; i < nb_globals; i++) {
-                    if (test_bit(i, temps_used.l)) {
+                    if (test_bit(i, ctx.temps_used.l)) {
                         reset_ts(&s->temps[i]);
                     }
                 }
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                block, otherwise we only trash the output args.  "z_mask" is
                the non-zero bits mask for the first output arg.  */
             if (def->flags & TCG_OPF_BB_END) {
-                memset(&temps_used, 0, sizeof(temps_used));
+                memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
             } else {
         do_reset_output:
                 for (i = 0; i < nb_oargs; i++) {
-- 
2.25.1

Break the final cleanup clause out of the main switch
statement.  When fully folding an opcode to mov/movi,
use "continue" to process the next opcode, else break
to fall into the final cleanup.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 190 ++++++++++++++++++++++++-------------------------
 1 file changed, 94 insertions(+), 96 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         switch (opc) {
         CASE_OP_32_64_VEC(mov):
             tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
-            break;
+            continue;
 
         case INDEX_op_dup_vec:
             if (arg_is_const(op->args[1])) {
                 tmp = arg_info(op->args[1])->val;
                 tmp = dup_const(TCGOP_VECE(op), tmp);
                 tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
-                break;
+                continue;
             }
-            goto do_default;
+            break;
 
         case INDEX_op_dup2_vec:
             assert(TCG_TARGET_REG_BITS == 32);
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 tcg_opt_gen_movi(s, &ctx, op, op->args[0],
                                  deposit64(arg_info(op->args[1])->val, 32, 32,
                                            arg_info(op->args[2])->val));
-                break;
+                continue;
             } else if (args_are_copies(op->args[1], op->args[2])) {
                 op->opc = INDEX_op_dup_vec;
                 TCGOP_VECE(op) = MO_32;
                 nb_iargs = 1;
             }
-            goto do_default;
+            break;
 
         CASE_OP_32_64(not):
         CASE_OP_32_64(neg):
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             if (arg_is_const(op->args[1])) {
                 tmp = do_constant_folding(opc, arg_info(op->args[1])->val, 0);
                 tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
-                break;
+                continue;
             }
-            goto do_default;
+            break;
 
         CASE_OP_32_64(bswap16):
         CASE_OP_32_64(bswap32):
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
                                           op->args[2]);
                 tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
-                break;
+                continue;
             }
-            goto do_default;
+            break;
 
         CASE_OP_32_64(add):
         CASE_OP_32_64(sub):
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
                                           arg_info(op->args[2])->val);
                 tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
-                break;
+                continue;
             }
-            goto do_default;
+            break;
 
         CASE_OP_32_64(clz):
         CASE_OP_32_64(ctz):
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 } else {
                     tcg_opt_gen_mov(s, op, op->args[0], op->args[2]);
                 }
-                break;
+                continue;
             }
-            goto do_default;
+            break;
 
         CASE_OP_32_64(deposit):
             if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                                 op->args[3], op->args[4],
                                 arg_info(op->args[2])->val);
                 tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
-                break;
+                continue;
             }
-            goto do_default;
+            break;
 
         CASE_OP_32_64(extract):
             if (arg_is_const(op->args[1])) {
                 tmp = extract64(arg_info(op->args[1])->val,
                                 op->args[2], op->args[3]);
                 tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
-                break;
+                continue;
             }
-            goto do_default;
+            break;
 
         CASE_OP_32_64(sextract):
             if (arg_is_const(op->args[1])) {
                 tmp = sextract64(arg_info(op->args[1])->val,
                                  op->args[2], op->args[3]);
                 tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
-                break;
+                continue;
             }
-            goto do_default;
+            break;
 
         CASE_OP_32_64(extract2):
             if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                                     ((uint32_t)v2 << (32 - shr)));
                 }
                 tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
-                break;
+                continue;
             }
-            goto do_default;
+            break;
 
         CASE_OP_32_64(setcond):
             tmp = do_constant_folding_cond(opc, op->args[1],
                                            op->args[2], op->args[3]);
             if (tmp != 2) {
                 tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
-                break;
+                continue;
             }
-            goto do_default;
+            break;
 
         CASE_OP_32_64(brcond):
             tmp = do_constant_folding_cond(opc, op->args[0],
                                            op->args[1], op->args[2]);
-            if (tmp != 2) {
-                if (tmp) {
-                    memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
-                    op->opc = INDEX_op_br;
-                    op->args[0] = op->args[3];
-                } else {
-                    tcg_op_remove(s, op);
-                }
+            switch (tmp) {
+            case 0:
+                tcg_op_remove(s, op);
+                continue;
+            case 1:
+                memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
+                op->opc = opc = INDEX_op_br;
+                op->args[0] = op->args[3];
                 break;
             }
-            goto do_default;
+            break;
 
         CASE_OP_32_64(movcond):
             tmp = do_constant_folding_cond(opc, op->args[1],
                                            op->args[2], op->args[5]);
             if (tmp != 2) {
                 tcg_opt_gen_mov(s, op, op->args[0], op->args[4-tmp]);
-                break;
+                continue;
             }
             if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
                 uint64_t tv = arg_info(op->args[3])->val;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 if (fv == 1 && tv == 0) {
                     cond = tcg_invert_cond(cond);
                 } else if (!(tv == 1 && fv == 0)) {
-                    goto do_default;
+                    break;
                 }
                 op->args[3] = cond;
                 op->opc = opc = (opc == INDEX_op_movcond_i32
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                                  : INDEX_op_setcond_i64);
                 nb_iargs = 2;
             }
-            goto do_default;
+            break;
 
         case INDEX_op_add2_i32:
         case INDEX_op_sub2_i32:
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 rh = op->args[1];
                 tcg_opt_gen_movi(s, &ctx, op, rl, (int32_t)a);
                 tcg_opt_gen_movi(s, &ctx, op2, rh, (int32_t)(a >> 32));
-                break;
+                continue;
             }
-            goto do_default;
+            break;
 
         case INDEX_op_mulu2_i32:
             if (arg_is_const(op->args[2]) && arg_is_const(op->args[3])) {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 rh = op->args[1];
                 tcg_opt_gen_movi(s, &ctx, op, rl, (int32_t)r);
                 tcg_opt_gen_movi(s, &ctx, op2, rh, (int32_t)(r >> 32));
-                break;
+                continue;
             }
-            goto do_default;
+            break;
 
         case INDEX_op_brcond2_i32:
             tmp = do_constant_folding_cond2(&op->args[0], &op->args[2],
                                             op->args[4]);
-            if (tmp != 2) {
-                if (tmp) {
-            do_brcond_true:
-                    memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
-                    op->opc = INDEX_op_br;
-                    op->args[0] = op->args[5];
-                } else {
+            if (tmp == 0) {
             do_brcond_false:
-                    tcg_op_remove(s, op);
-                }
-            } else if ((op->args[4] == TCG_COND_LT
-                        || op->args[4] == TCG_COND_GE)
-                       && arg_is_const(op->args[2])
-                       && arg_info(op->args[2])->val == 0
-                       && arg_is_const(op->args[3])
-                       && arg_info(op->args[3])->val == 0) {
+                tcg_op_remove(s, op);
+                continue;
+            }
+            if (tmp == 1) {
+            do_brcond_true:
+                op->opc = opc = INDEX_op_br;
+                op->args[0] = op->args[5];
+                break;
+            }
+            if ((op->args[4] == TCG_COND_LT || op->args[4] == TCG_COND_GE)
+                 && arg_is_const(op->args[2])
+                 && arg_info(op->args[2])->val == 0
+                 && arg_is_const(op->args[3])
+                 && arg_info(op->args[3])->val == 0) {
                 /* Simplify LT/GE comparisons vs zero to a single compare
                    vs the high word of the input.  */
             do_brcond_high:
-                memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
-                op->opc = INDEX_op_brcond_i32;
+                op->opc = opc = INDEX_op_brcond_i32;
                 op->args[0] = op->args[1];
                 op->args[1] = op->args[3];
                 op->args[2] = op->args[4];
                 op->args[3] = op->args[5];
-            } else if (op->args[4] == TCG_COND_EQ) {
+                break;
+            }
+            if (op->args[4] == TCG_COND_EQ) {
                 /* Simplify EQ comparisons where one of the pairs
                    can be simplified.  */
                 tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 if (tmp == 0) {
                     goto do_brcond_false;
                 } else if (tmp != 1) {
-                    goto do_default;
+                    break;
                 }
             do_brcond_low:
                 memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 op->args[1] = op->args[2];
                 op->args[2] = op->args[4];
                 op->args[3] = op->args[5];
-            } else if (op->args[4] == TCG_COND_NE) {
+                break;
+            }
+            if (op->args[4] == TCG_COND_NE) {
                 /* Simplify NE comparisons where one of the pairs
                    can be simplified.  */
                 tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 } else if (tmp == 1) {
                     goto do_brcond_true;
                 }
-                goto do_default;
-            } else {
-                goto do_default;
             }
             break;
 
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             if (tmp != 2) {
             do_setcond_const:
                 tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
-            } else if ((op->args[5] == TCG_COND_LT
-                        || op->args[5] == TCG_COND_GE)
-                       && arg_is_const(op->args[3])
-                       && arg_info(op->args[3])->val == 0
-                       && arg_is_const(op->args[4])
-                       && arg_info(op->args[4])->val == 0) {
+                continue;
+            }
+            if ((op->args[5] == TCG_COND_LT || op->args[5] == TCG_COND_GE)
+                 && arg_is_const(op->args[3])
+                 && arg_info(op->args[3])->val == 0
+                 && arg_is_const(op->args[4])
+                 && arg_info(op->args[4])->val == 0) {
                 /* Simplify LT/GE comparisons vs zero to a single compare
                    vs the high word of the input.  */
             do_setcond_high:
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 op->args[1] = op->args[2];
                 op->args[2] = op->args[4];
                 op->args[3] = op->args[5];
-            } else if (op->args[5] == TCG_COND_EQ) {
+                break;
+            }
+            if (op->args[5] == TCG_COND_EQ) {
                 /* Simplify EQ comparisons where one of the pairs
                    can be simplified.  */
                 tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 if (tmp == 0) {
                     goto do_setcond_high;
                 } else if (tmp != 1) {
-                    goto do_default;
+                    break;
                 }
             do_setcond_low:
                 reset_temp(op->args[0]);
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 op->opc = INDEX_op_setcond_i32;
                 op->args[2] = op->args[3];
                 op->args[3] = op->args[5];
-            } else if (op->args[5] == TCG_COND_NE) {
+                break;
+            }
+            if (op->args[5] == TCG_COND_NE) {
                 /* Simplify NE comparisons where one of the pairs
                    can be simplified.  */
                 tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 } else if (tmp == 1) {
                     goto do_setcond_const;
                 }
-                goto do_default;
-            } else {
-                goto do_default;
             }
             break;
 
-        case INDEX_op_call:
-            if (!(tcg_call_flags(op)
+        default:
+            break;
+        }
+
+        /* Some of the folding above can change opc. */
+        opc = op->opc;
+        def = &tcg_op_defs[opc];
+        if (def->flags & TCG_OPF_BB_END) {
+            memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
+        } else {
+            if (opc == INDEX_op_call &&
+                !(tcg_call_flags(op)
                   & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS))) {
                 for (i = 0; i < nb_globals; i++) {
                     if (test_bit(i, ctx.temps_used.l)) {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                     }
                 }
             }
-            goto do_reset_output;
 
-        default:
-        do_default:
-            /* Default case: we know nothing about operation (or were unable
-               to compute the operation result) so no propagation is done.
-               We trash everything if the operation is the end of a basic
-               block, otherwise we only trash the output args.  "z_mask" is
-               the non-zero bits mask for the first output arg.  */
-            if (def->flags & TCG_OPF_BB_END) {
-                memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
-            } else {
-        do_reset_output:
-                for (i = 0; i < nb_oargs; i++) {
-                    reset_temp(op->args[i]);
-                    /* Save the corresponding known-zero bits mask for the
-                       first output argument (only one supported so far). */
-                    if (i == 0) {
-                        arg_info(op->args[i])->z_mask = z_mask;
-                    }
+            for (i = 0; i < nb_oargs; i++) {
+                reset_temp(op->args[i]);
+                /* Save the corresponding known-zero bits mask for the
+                   first output argument (only one supported so far). */
+                if (i == 0) {
+                    arg_info(op->args[i])->z_mask = z_mask;
                 }
             }
-            break;
         }
 
         /* Eliminate duplicate and redundant fence instructions.  */
-- 
2.25.1

Adjust the interface to take the OptContext parameter instead
of TCGContext or both.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 67 +++++++++++++++++++++++++-------------------------
 1 file changed, 34 insertions(+), 33 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ typedef struct TempOptInfo {
 } TempOptInfo;
 
 typedef struct OptContext {
+    TCGContext *tcg;
     TCGTempSet temps_used;
 } OptContext;
 
@@ -XXX,XX +XXX,XX @@ static bool args_are_copies(TCGArg arg1, TCGArg arg2)
     return ts_are_copies(arg_temp(arg1), arg_temp(arg2));
 }
 
-static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
+static void tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
 {
     TCGTemp *dst_ts = arg_temp(dst);
     TCGTemp *src_ts = arg_temp(src);
@@ -XXX,XX +XXX,XX @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
     TCGOpcode new_op;
 
     if (ts_are_copies(dst_ts, src_ts)) {
-        tcg_op_remove(s, op);
+        tcg_op_remove(ctx->tcg, op);
         return;
     }
 
@@ -XXX,XX +XXX,XX @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
     }
 }
 
-static void tcg_opt_gen_movi(TCGContext *s, OptContext *ctx,
-                             TCGOp *op, TCGArg dst, uint64_t val)
+static void tcg_opt_gen_movi(OptContext *ctx, TCGOp *op,
+                             TCGArg dst, uint64_t val)
 {
     const TCGOpDef *def = &tcg_op_defs[op->opc];
     TCGType type;
@@ -XXX,XX +XXX,XX @@ static void tcg_opt_gen_movi(TCGContext *s, OptContext *ctx,
     /* Convert movi to mov with constant temp. */
     tv = tcg_constant_internal(type, val);
     init_ts_info(ctx, tv);
-    tcg_opt_gen_mov(s, op, dst, temp_arg(tv));
+    tcg_opt_gen_mov(ctx, op, dst, temp_arg(tv));
 }
 
 static uint64_t do_constant_folding_2(TCGOpcode op, uint64_t x, uint64_t y)
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
 {
     int nb_temps, nb_globals, i;
     TCGOp *op, *op_next, *prev_mb = NULL;
-    OptContext ctx = {};
+    OptContext ctx = { .tcg = s };
 
     /* Array VALS has an element for each temp.
        If this temp holds a constant then its value is kept in VALS' element.
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(rotr):
             if (arg_is_const(op->args[1])
                 && arg_info(op->args[1])->val == 0) {
-                tcg_opt_gen_movi(s, &ctx, op, op->args[0], 0);
+                tcg_opt_gen_movi(&ctx, op, op->args[0], 0);
                 continue;
             }
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             if (!arg_is_const(op->args[1])
                 && arg_is_const(op->args[2])
                 && arg_info(op->args[2])->val == 0) {
-                tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
+                tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
                 continue;
             }
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             if (!arg_is_const(op->args[1])
                 && arg_is_const(op->args[2])
                 && arg_info(op->args[2])->val == -1) {
-                tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
+                tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
                 continue;
             }
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
 
         if (partmask == 0) {
             tcg_debug_assert(nb_oargs == 1);
-            tcg_opt_gen_movi(s, &ctx, op, op->args[0], 0);
+            tcg_opt_gen_movi(&ctx, op, op->args[0], 0);
             continue;
         }
         if (affected == 0) {
             tcg_debug_assert(nb_oargs == 1);
-            tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
+            tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
             continue;
         }
 
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(mulsh):
             if (arg_is_const(op->args[2])
                 && arg_info(op->args[2])->val == 0) {
-                tcg_opt_gen_movi(s, &ctx, op, op->args[0], 0);
+                tcg_opt_gen_movi(&ctx, op, op->args[0], 0);
                 continue;
             }
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64_VEC(or):
         CASE_OP_32_64_VEC(and):
             if (args_are_copies(op->args[1], op->args[2])) {
-                tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
+                tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
                 continue;
             }
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64_VEC(sub):
         CASE_OP_32_64_VEC(xor):
             if (args_are_copies(op->args[1], op->args[2])) {
-                tcg_opt_gen_movi(s, &ctx, op, op->args[0], 0);
+                tcg_opt_gen_movi(&ctx, op, op->args[0], 0);
                 continue;
             }
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
            allocator where needed and possible.  Also detect copies. */
         switch (opc) {
         CASE_OP_32_64_VEC(mov):
-            tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
+            tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
             continue;
 
         case INDEX_op_dup_vec:
             if (arg_is_const(op->args[1])) {
                 tmp = arg_info(op->args[1])->val;
                 tmp = dup_const(TCGOP_VECE(op), tmp);
-                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
+                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
                 continue;
             }
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         case INDEX_op_dup2_vec:
             assert(TCG_TARGET_REG_BITS == 32);
             if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
-                tcg_opt_gen_movi(s, &ctx, op, op->args[0],
+                tcg_opt_gen_movi(&ctx, op, op->args[0],
                                  deposit64(arg_info(op->args[1])->val, 32, 32,
                                            arg_info(op->args[2])->val));
                 continue;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         case INDEX_op_extrh_i64_i32:
             if (arg_is_const(op->args[1])) {
                 tmp = do_constant_folding(opc, arg_info(op->args[1])->val, 0);
-                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
+                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
                 continue;
             }
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             if (arg_is_const(op->args[1])) {
                 tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
                                           op->args[2]);
-                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
+                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
                 continue;
             }
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
                 tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
                                           arg_info(op->args[2])->val);
-                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
+                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
                 continue;
             }
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 TCGArg v = arg_info(op->args[1])->val;
                 if (v != 0) {
                     tmp = do_constant_folding(opc, v, 0);
-                    tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
+                    tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
                 } else {
-                    tcg_opt_gen_mov(s, op, op->args[0], op->args[2]);
+                    tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[2]);
                 }
                 continue;
             }
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 tmp = deposit64(arg_info(op->args[1])->val,
                                 op->args[3], op->args[4],
                                 arg_info(op->args[2])->val);
-                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
+                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
                 continue;
             }
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             if (arg_is_const(op->args[1])) {
                 tmp = extract64(arg_info(op->args[1])->val,
                                 op->args[2], op->args[3]);
-                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
+                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
                 continue;
             }
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             if (arg_is_const(op->args[1])) {
                 tmp = sextract64(arg_info(op->args[1])->val,
                                  op->args[2], op->args[3]);
-                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
+                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
                 continue;
             }
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                     tmp = (int32_t)(((uint32_t)v1 >> shr) |
                                     ((uint32_t)v2 << (32 - shr)));
                 }
-                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
+                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
                 continue;
             }
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             tmp = do_constant_folding_cond(opc, op->args[1],
                                            op->args[2], op->args[3]);
             if (tmp != 2) {
-                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
+                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
                 continue;
             }
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             tmp = do_constant_folding_cond(opc, op->args[1],
                                            op->args[2], op->args[5]);
             if (tmp != 2) {
-                tcg_opt_gen_mov(s, op, op->args[0], op->args[4-tmp]);
+                tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[4-tmp]);
                 continue;
             }
             if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
 
                 rl = op->args[0];
                 rh = op->args[1];
-                tcg_opt_gen_movi(s, &ctx, op, rl, (int32_t)a);
-                tcg_opt_gen_movi(s, &ctx, op2, rh, (int32_t)(a >> 32));
+                tcg_opt_gen_movi(&ctx, op, rl, (int32_t)a);
+                tcg_opt_gen_movi(&ctx, op2, rh, (int32_t)(a >> 32));
                 continue;
             }
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
 
                 rl = op->args[0];
                 rh = op->args[1];
-                tcg_opt_gen_movi(s, &ctx, op, rl, (int32_t)r);
-                tcg_opt_gen_movi(s, &ctx, op2, rh, (int32_t)(r >> 32));
+                tcg_opt_gen_movi(&ctx, op, rl, (int32_t)r);
+                tcg_opt_gen_movi(&ctx, op2, rh, (int32_t)(r >> 32));
                 continue;
             }
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                                             op->args[5]);
             if (tmp != 2) {
             do_setcond_const:
-                tcg_opt_gen_movi(s, &ctx, op, op->args[0], tmp);
+                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
                 continue;
             }
             if ((op->args[5] == TCG_COND_LT || op->args[5] == TCG_COND_GE)
-- 
2.25.1

This will expose the variable to subroutines that
will be broken out of tcg_optimize.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ typedef struct TempOptInfo {
 
 typedef struct OptContext {
     TCGContext *tcg;
+    TCGOp *prev_mb;
     TCGTempSet temps_used;
 } OptContext;
 
@@ -XXX,XX +XXX,XX @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
 void tcg_optimize(TCGContext *s)
 {
     int nb_temps, nb_globals, i;
-    TCGOp *op, *op_next, *prev_mb = NULL;
+    TCGOp *op, *op_next;
     OptContext ctx = { .tcg = s };
 
     /* Array VALS has an element for each temp.
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         }
 
         /* Eliminate duplicate and redundant fence instructions.  */
-        if (prev_mb) {
+        if (ctx.prev_mb) {
             switch (opc) {
             case INDEX_op_mb:
                 /* Merge two barriers of the same type into one,
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                  * barrier.  This is stricter than specified but for
                  * the purposes of TCG is better than not optimizing.
                  */
-                prev_mb->args[0] |= op->args[0];
+                ctx.prev_mb->args[0] |= op->args[0];
                 tcg_op_remove(s, op);
                 break;
 
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             case INDEX_op_qemu_st_i64:
             case INDEX_op_call:
                 /* Opcodes that touch guest memory stop the optimization.  */
-                prev_mb = NULL;
+                ctx.prev_mb = NULL;
                 break;
             }
         } else if (opc == INDEX_op_mb) {
-            prev_mb = op;
+            ctx.prev_mb = op;
         }
     }
 }
-- 
2.25.1

There was no real reason for calls to have separate code here.
Unify init for calls vs non-calls using the call path, which
handles TCG_CALL_DUMMY_ARG.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static void init_ts_info(OptContext *ctx, TCGTemp *ts)
     }
 }
 
-static void init_arg_info(OptContext *ctx, TCGArg arg)
-{
-    init_ts_info(ctx, arg_temp(arg));
-}
-
 static TCGTemp *find_better_copy(TCGContext *s, TCGTemp *ts)
 {
     TCGTemp *i, *g, *l;
@@ -XXX,XX +XXX,XX @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
     return false;
 }
 
+static void init_arguments(OptContext *ctx, TCGOp *op, int nb_args)
+{
+    for (int i = 0; i < nb_args; i++) {
+        TCGTemp *ts = arg_temp(op->args[i]);
+        if (ts) {
+            init_ts_info(ctx, ts);
+        }
+    }
+}
+
 /* Propagate constants and copies, fold constant expressions. */
 void tcg_optimize(TCGContext *s)
 {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         if (opc == INDEX_op_call) {
             nb_oargs = TCGOP_CALLO(op);
             nb_iargs = TCGOP_CALLI(op);
-            for (i = 0; i < nb_oargs + nb_iargs; i++) {
-                TCGTemp *ts = arg_temp(op->args[i]);
-                if (ts) {
-                    init_ts_info(&ctx, ts);
-                }
-            }
         } else {
             nb_oargs = def->nb_oargs;
             nb_iargs = def->nb_iargs;
-            for (i = 0; i < nb_oargs + nb_iargs; i++) {
-                init_arg_info(&ctx, op->args[i]);
-            }
         }
+        init_arguments(&ctx, op, nb_oargs + nb_iargs);
 
         /* Do copy propagation */
         for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
-- 
2.25.1

Continue splitting tcg_optimize.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static void init_arguments(OptContext *ctx, TCGOp *op, int nb_args)
     }
 }
 
+static void copy_propagate(OptContext *ctx, TCGOp *op,
+                           int nb_oargs, int nb_iargs)
+{
+    TCGContext *s = ctx->tcg;
+
+    for (int i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
+        TCGTemp *ts = arg_temp(op->args[i]);
+        if (ts && ts_is_copy(ts)) {
+            op->args[i] = temp_arg(find_better_copy(s, ts));
+        }
+    }
+}
+
 /* Propagate constants and copies, fold constant expressions. */
 void tcg_optimize(TCGContext *s)
 {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             nb_iargs = def->nb_iargs;
         }
         init_arguments(&ctx, op, nb_oargs + nb_iargs);
-
-        /* Do copy propagation */
-        for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
-            TCGTemp *ts = arg_temp(op->args[i]);
-            if (ts && ts_is_copy(ts)) {
-                op->args[i] = temp_arg(find_better_copy(s, ts));
-            }
-        }
+        copy_propagate(&ctx, op, nb_oargs, nb_iargs);
 
         /* For commutative operations make constant second argument */
         switch (opc) {
-- 
2.25.1

Calls are special in that they have a variable number
of arguments, and need to be able to clobber globals.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 63 ++++++++++++++++++++++++++++++++------------------
 1 file changed, 41 insertions(+), 22 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static void copy_propagate(OptContext *ctx, TCGOp *op,
     }
 }
 
+static bool fold_call(OptContext *ctx, TCGOp *op)
+{
+    TCGContext *s = ctx->tcg;
+    int nb_oargs = TCGOP_CALLO(op);
+    int nb_iargs = TCGOP_CALLI(op);
+    int flags, i;
+
+    init_arguments(ctx, op, nb_oargs + nb_iargs);
+    copy_propagate(ctx, op, nb_oargs, nb_iargs);
+
+    /* If the function reads or writes globals, reset temp data. */
+    flags = tcg_call_flags(op);
+    if (!(flags & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS))) {
+        int nb_globals = s->nb_globals;
+
+        for (i = 0; i < nb_globals; i++) {
+            if (test_bit(i, ctx->temps_used.l)) {
+                reset_ts(&ctx->tcg->temps[i]);
+            }
+        }
+    }
+
+    /* Reset temp data for outputs. */
+    for (i = 0; i < nb_oargs; i++) {
+        reset_temp(op->args[i]);
+    }
+
+    /* Stop optimizing MB across calls. */
+    ctx->prev_mb = NULL;
+    return true;
+}
+
 /* Propagate constants and copies, fold constant expressions. */
 void tcg_optimize(TCGContext *s)
 {
-    int nb_temps, nb_globals, i;
+    int nb_temps, i;
     TCGOp *op, *op_next;
     OptContext ctx = { .tcg = s };
 
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
        available through the doubly linked circular list. */
 
     nb_temps = s->nb_temps;
-    nb_globals = s->nb_globals;
-
     for (i = 0; i < nb_temps; ++i) {
         s->temps[i].state_ptr = NULL;
     }
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         uint64_t z_mask, partmask, affected, tmp;
         int nb_oargs, nb_iargs;
         TCGOpcode opc = op->opc;
-        const TCGOpDef *def = &tcg_op_defs[opc];
+        const TCGOpDef *def;
 
-        /* Count the arguments, and initialize the temps that are
-           going to be used */
+        /* Calls are special. */
         if (opc == INDEX_op_call) {
-            nb_oargs = TCGOP_CALLO(op);
-            nb_iargs = TCGOP_CALLI(op);
-        } else {
-            nb_oargs = def->nb_oargs;
-            nb_iargs = def->nb_iargs;
+            fold_call(&ctx, op);
+            continue;
         }
+
+        def = &tcg_op_defs[opc];
+        nb_oargs = def->nb_oargs;
+        nb_iargs = def->nb_iargs;
         init_arguments(&ctx, op, nb_oargs + nb_iargs);
         copy_propagate(&ctx, op, nb_oargs, nb_iargs);
 
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         if (def->flags & TCG_OPF_BB_END) {
             memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
         } else {
-            if (opc == INDEX_op_call &&
-                !(tcg_call_flags(op)
-                  & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS))) {
-                for (i = 0; i < nb_globals; i++) {
-                    if (test_bit(i, ctx.temps_used.l)) {
-                        reset_ts(&s->temps[i]);
-                    }
-                }
-            }
-
             for (i = 0; i < nb_oargs; i++) {
                 reset_temp(op->args[i]);
                 /* Save the corresponding known-zero bits mask for the
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             case INDEX_op_qemu_st_i32:
             case INDEX_op_qemu_st8_i32:
             case INDEX_op_qemu_st_i64:
-            case INDEX_op_call:
                 /* Opcodes that touch guest memory stop the optimization.  */
                 ctx.prev_mb = NULL;
                 break;
-- 
2.25.1

Rather than try to keep these up-to-date across folding,
re-read nb_oargs at the end, after re-reading the opcode.

A couple of asserts need dropping, but that will take care
of itself as we split the function further.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
 
     QTAILQ_FOREACH_SAFE(op, &s->ops, link, op_next) {
         uint64_t z_mask, partmask, affected, tmp;
-        int nb_oargs, nb_iargs;
         TCGOpcode opc = op->opc;
         const TCGOpDef *def;
 
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         }
 
         def = &tcg_op_defs[opc];
-        nb_oargs = def->nb_oargs;
-        nb_iargs = def->nb_iargs;
-        init_arguments(&ctx, op, nb_oargs + nb_iargs);
-        copy_propagate(&ctx, op, nb_oargs, nb_iargs);
+        init_arguments(&ctx, op, def->nb_oargs + def->nb_iargs);
+        copy_propagate(&ctx, op, def->nb_oargs, def->nb_iargs);
 
         /* For commutative operations make constant second argument */
         switch (opc) {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
 
         CASE_OP_32_64(qemu_ld):
             {
-                MemOpIdx oi = op->args[nb_oargs + nb_iargs];
+                MemOpIdx oi = op->args[def->nb_oargs + def->nb_iargs];
                 MemOp mop = get_memop(oi);
                 if (!(mop & MO_SIGN)) {
                     z_mask = (2ULL << ((8 << (mop & MO_SIZE)) - 1)) - 1;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         }
 
         if (partmask == 0) {
-            tcg_debug_assert(nb_oargs == 1);
             tcg_opt_gen_movi(&ctx, op, op->args[0], 0);
             continue;
         }
         if (affected == 0) {
-            tcg_debug_assert(nb_oargs == 1);
             tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
             continue;
         }
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             } else if (args_are_copies(op->args[1], op->args[2])) {
                 op->opc = INDEX_op_dup_vec;
                 TCGOP_VECE(op) = MO_32;
-                nb_iargs = 1;
             }
             break;
 
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 op->opc = opc = (opc == INDEX_op_movcond_i32
                                  ? INDEX_op_setcond_i32
                                  : INDEX_op_setcond_i64);
-                nb_iargs = 2;
             }
             break;
 
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         if (def->flags & TCG_OPF_BB_END) {
             memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
         } else {
+            int nb_oargs = def->nb_oargs;
             for (i = 0; i < nb_oargs; i++) {
                 reset_temp(op->args[i]);
                 /* Save the corresponding known-zero bits mask for the
-- 
2.25.1

Return -1 instead of 2 for failure, so that we can
use comparisons against 0 for all cases.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 145 +++++++++++++++++++++++++------------------------
 1 file changed, 74 insertions(+), 71 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool do_constant_folding_cond_eq(TCGCond c)
     }
 }
 
-/* Return 2 if the condition can't be simplified, and the result
-   of the condition (0 or 1) if it can */
-static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
-                                       TCGArg y, TCGCond c)
+/*
+ * Return -1 if the condition can't be simplified,
+ * and the result of the condition (0 or 1) if it can.
+ */
+static int do_constant_folding_cond(TCGOpcode op, TCGArg x,
+                                    TCGArg y, TCGCond c)
 {
     uint64_t xv = arg_info(x)->val;
     uint64_t yv = arg_info(y)->val;
@@ -XXX,XX +XXX,XX @@ static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
         case TCG_COND_GEU:
             return 1;
         default:
-            return 2;
+            return -1;
         }
     }
-    return 2;
+    return -1;
 }
 
-/* Return 2 if the condition can't be simplified, and the result
-   of the condition (0 or 1) if it can */
-static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
+/*
+ * Return -1 if the condition can't be simplified,
+ * and the result of the condition (0 or 1) if it can.
+ */
+static int do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
 {
     TCGArg al = p1[0], ah = p1[1];
     TCGArg bl = p2[0], bh = p2[1];
@@ -XXX,XX +XXX,XX @@ static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
     if (args_are_copies(al, bl) && args_are_copies(ah, bh)) {
         return do_constant_folding_cond_eq(c);
     }
-    return 2;
+    return -1;
 }
 
 static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             break;
 
         CASE_OP_32_64(setcond):
-            tmp = do_constant_folding_cond(opc, op->args[1],
-                                           op->args[2], op->args[3]);
-            if (tmp != 2) {
-                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
+            i = do_constant_folding_cond(opc, op->args[1],
+                                         op->args[2], op->args[3]);
+            if (i >= 0) {
+                tcg_opt_gen_movi(&ctx, op, op->args[0], i);
                 continue;
             }
             break;
 
         CASE_OP_32_64(brcond):
-            tmp = do_constant_folding_cond(opc, op->args[0],
-                                           op->args[1], op->args[2]);
-            switch (tmp) {
-            case 0:
+            i = do_constant_folding_cond(opc, op->args[0],
+                                         op->args[1], op->args[2]);
+            if (i == 0) {
                 tcg_op_remove(s, op);
                 continue;
-            case 1:
+            } else if (i > 0) {
                 memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
                 op->opc = opc = INDEX_op_br;
                 op->args[0] = op->args[3];
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             break;
 
         CASE_OP_32_64(movcond):
-            tmp = do_constant_folding_cond(opc, op->args[1],
-                                           op->args[2], op->args[5]);
-            if (tmp != 2) {
-                tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[4-tmp]);
+            i = do_constant_folding_cond(opc, op->args[1],
+                                         op->args[2], op->args[5]);
+            if (i >= 0) {
+                tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[4 - i]);
                 continue;
             }
             if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             break;
 
         case INDEX_op_brcond2_i32:
-            tmp = do_constant_folding_cond2(&op->args[0], &op->args[2],
-                                            op->args[4]);
-            if (tmp == 0) {
+            i = do_constant_folding_cond2(&op->args[0], &op->args[2],
+                                          op->args[4]);
+            if (i == 0) {
             do_brcond_false:
                 tcg_op_remove(s, op);
                 continue;
             }
-            if (tmp == 1) {
+            if (i > 0) {
             do_brcond_true:
                 op->opc = opc = INDEX_op_br;
                 op->args[0] = op->args[5];
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             if (op->args[4] == TCG_COND_EQ) {
                 /* Simplify EQ comparisons where one of the pairs
                    can be simplified.  */
-                tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
-                                               op->args[0], op->args[2],
-                                               TCG_COND_EQ);
-                if (tmp == 0) {
+                i = do_constant_folding_cond(INDEX_op_brcond_i32,
+                                             op->args[0], op->args[2],
+                                             TCG_COND_EQ);
+                if (i == 0) {
                     goto do_brcond_false;
-                } else if (tmp == 1) {
+                } else if (i > 0) {
                     goto do_brcond_high;
                 }
-                tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
-                                               op->args[1], op->args[3],
-                                               TCG_COND_EQ);
-                if (tmp == 0) {
+                i = do_constant_folding_cond(INDEX_op_brcond_i32,
+                                             op->args[1], op->args[3],
+                                             TCG_COND_EQ);
+                if (i == 0) {
                     goto do_brcond_false;
-                } else if (tmp != 1) {
+                } else if (i < 0) {
                     break;
                 }
             do_brcond_low:
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             if (op->args[4] == TCG_COND_NE) {
                 /* Simplify NE comparisons where one of the pairs
                    can be simplified.  */
-                tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
-                                               op->args[0], op->args[2],
-                                               TCG_COND_NE);
-                if (tmp == 0) {
+                i = do_constant_folding_cond(INDEX_op_brcond_i32,
+                                             op->args[0], op->args[2],
+                                             TCG_COND_NE);
+                if (i == 0) {
                     goto do_brcond_high;
-                } else if (tmp == 1) {
+                } else if (i > 0) {
                     goto do_brcond_true;
                 }
-                tmp = do_constant_folding_cond(INDEX_op_brcond_i32,
-                                               op->args[1], op->args[3],
-                                               TCG_COND_NE);
-                if (tmp == 0) {
+                i = do_constant_folding_cond(INDEX_op_brcond_i32,
+                                             op->args[1], op->args[3],
+                                             TCG_COND_NE);
+                if (i == 0) {
                     goto do_brcond_low;
-                } else if (tmp == 1) {
+                } else if (i > 0) {
                     goto do_brcond_true;
                 }
             }
             break;
 
         case INDEX_op_setcond2_i32:
-            tmp = do_constant_folding_cond2(&op->args[1], &op->args[3],
-                                            op->args[5]);
-            if (tmp != 2) {
+            i = do_constant_folding_cond2(&op->args[1], &op->args[3],
+                                          op->args[5]);
+            if (i >= 0) {
             do_setcond_const:
-                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
+                tcg_opt_gen_movi(&ctx, op, op->args[0], i);
                 continue;
             }
             if ((op->args[5] == TCG_COND_LT || op->args[5] == TCG_COND_GE)
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             if (op->args[5] == TCG_COND_EQ) {
                 /* Simplify EQ comparisons where one of the pairs
                    can be simplified.  */
-                tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
-                                               op->args[1], op->args[3],
-                                               TCG_COND_EQ);
-                if (tmp == 0) {
+                i = do_constant_folding_cond(INDEX_op_setcond_i32,
+                                             op->args[1], op->args[3],
+                                             TCG_COND_EQ);
+                if (i == 0) {
                     goto do_setcond_const;
-                } else if (tmp == 1) {
+                } else if (i > 0) {
                     goto do_setcond_high;
                 }
-                tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
-                                               op->args[2], op->args[4],
-                                               TCG_COND_EQ);
-                if (tmp == 0) {
+                i = do_constant_folding_cond(INDEX_op_setcond_i32,
+                                             op->args[2], op->args[4],
+                                             TCG_COND_EQ);
+                if (i == 0) {
                     goto do_setcond_high;
-                } else if (tmp != 1) {
+                } else if (i < 0) {
                     break;
                 }
             do_setcond_low:
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             if (op->args[5] == TCG_COND_NE) {
                 /* Simplify NE comparisons where one of the pairs
                    can be simplified.  */
-                tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
-                                               op->args[1], op->args[3],
-                                               TCG_COND_NE);
-                if (tmp == 0) {
+                i = do_constant_folding_cond(INDEX_op_setcond_i32,
+                                             op->args[1], op->args[3],
+                                             TCG_COND_NE);
+                if (i == 0) {
                     goto do_setcond_high;
-                } else if (tmp == 1) {
+                } else if (i > 0) {
                     goto do_setcond_const;
                 }
-                tmp = do_constant_folding_cond(INDEX_op_setcond_i32,
-                                               op->args[2], op->args[4],
-                                               TCG_COND_NE);
-                if (tmp == 0) {
+                i = do_constant_folding_cond(INDEX_op_setcond_i32,
+                                             op->args[2], op->args[4],
+                                             TCG_COND_NE);
+                if (i == 0) {
                     goto do_setcond_low;
-                } else if (tmp == 1) {
+                } else if (i > 0) {
                     goto do_setcond_const;
                 }
             }
-- 
2.25.1

This will allow callers to tail call to these functions
and return true indicating processing complete.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool args_are_copies(TCGArg arg1, TCGArg arg2)
     return ts_are_copies(arg_temp(arg1), arg_temp(arg2));
 }
 
-static void tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
+static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
 {
     TCGTemp *dst_ts = arg_temp(dst);
     TCGTemp *src_ts = arg_temp(src);
@@ -XXX,XX +XXX,XX @@ static void tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
 
     if (ts_are_copies(dst_ts, src_ts)) {
         tcg_op_remove(ctx->tcg, op);
-        return;
+        return true;
     }
 
     reset_ts(dst_ts);
@@ -XXX,XX +XXX,XX @@ static void tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
         di->is_const = si->is_const;
         di->val = si->val;
     }
+    return true;
 }
 
-static void tcg_opt_gen_movi(OptContext *ctx, TCGOp *op,
+static bool tcg_opt_gen_movi(OptContext *ctx, TCGOp *op,
                              TCGArg dst, uint64_t val)
 {
     const TCGOpDef *def = &tcg_op_defs[op->opc];
@@ -XXX,XX +XXX,XX @@ static void tcg_opt_gen_movi(OptContext *ctx, TCGOp *op,
     /* Convert movi to mov with constant temp. */
     tv = tcg_constant_internal(type, val);
     init_ts_info(ctx, tv);
-    tcg_opt_gen_mov(ctx, op, dst, temp_arg(tv));
+    return tcg_opt_gen_mov(ctx, op, dst, temp_arg(tv));
 }
 
 static uint64_t do_constant_folding_2(TCGOpcode op, uint64_t x, uint64_t y)
-- 
2.25.1

Copy z_mask into OptContext, for writeback to the
first output within the new function.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 49 +++++++++++++++++++++++++++++++++----------------
 1 file changed, 33 insertions(+), 16 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ typedef struct OptContext {
     TCGContext *tcg;
     TCGOp *prev_mb;
     TCGTempSet temps_used;
+
+    /* In flight values from optimization. */
+    uint64_t z_mask;
 } OptContext;
 
 static inline TempOptInfo *ts_info(TCGTemp *ts)
@@ -XXX,XX +XXX,XX @@ static void copy_propagate(OptContext *ctx, TCGOp *op,
     }
 }
 
+static void finish_folding(OptContext *ctx, TCGOp *op)
+{
+    const TCGOpDef *def = &tcg_op_defs[op->opc];
+    int i, nb_oargs;
+
+    /*
+     * For an opcode that ends a BB, reset all temp data.
+     * We do no cross-BB optimization.
+     */
+    if (def->flags & TCG_OPF_BB_END) {
+        memset(&ctx->temps_used, 0, sizeof(ctx->temps_used));
+        ctx->prev_mb = NULL;
+        return;
+    }
+
+    nb_oargs = def->nb_oargs;
+    for (i = 0; i < nb_oargs; i++) {
+        reset_temp(op->args[i]);
+        /*
+         * Save the corresponding known-zero bits mask for the
+         * first output argument (only one supported so far).
+         */
+        if (i == 0) {
+            arg_info(op->args[i])->z_mask = ctx->z_mask;
+        }
+    }
+}
+
 static bool fold_call(OptContext *ctx, TCGOp *op)
 {
     TCGContext *s = ctx->tcg;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             partmask &= 0xffffffffu;
             affected &= 0xffffffffu;
         }
+        ctx.z_mask = z_mask;
 
         if (partmask == 0) {
             tcg_opt_gen_movi(&ctx, op, op->args[0], 0);
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             break;
         }
 
-        /* Some of the folding above can change opc. */
-        opc = op->opc;
-        def = &tcg_op_defs[opc];
-        if (def->flags & TCG_OPF_BB_END) {
-            memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
-        } else {
-            int nb_oargs = def->nb_oargs;
-            for (i = 0; i < nb_oargs; i++) {
-                reset_temp(op->args[i]);
-                /* Save the corresponding known-zero bits mask for the
-                   first output argument (only one supported so far). */
-                if (i == 0) {
-                    arg_info(op->args[i])->z_mask = z_mask;
-                }
-            }
-        }
+        finish_folding(&ctx, op);
 
         /* Eliminate duplicate and redundant fence instructions.  */
         if (ctx.prev_mb) {
-- 
2.25.1

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         uint64_t z_mask, partmask, affected, tmp;
         TCGOpcode opc = op->opc;
         const TCGOpDef *def;
+        bool done = false;
 
         /* Calls are special. */
         if (opc == INDEX_op_call) {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
            allocator where needed and possible.  Also detect copies. */
         switch (opc) {
         CASE_OP_32_64_VEC(mov):
-            tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
-            continue;
+            done = tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
+            break;
 
         case INDEX_op_dup_vec:
             if (arg_is_const(op->args[1])) {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             break;
         }
 
-        finish_folding(&ctx, op);
+        if (!done) {
+            finish_folding(&ctx, op);
+        }
 
         /* Eliminate duplicate and redundant fence instructions.  */
         if (ctx.prev_mb) {
-- 
2.25.1

This puts the separate mb optimization into the same framework
as the others.  While fold_qemu_{ld,st} are currently identical,
that won't last as more code gets moved.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 89 +++++++++++++++++++++++++++++---------------------
 1 file changed, 51 insertions(+), 38 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool fold_call(OptContext *ctx, TCGOp *op)
     return true;
 }
 
+static bool fold_mb(OptContext *ctx, TCGOp *op)
+{
+    /* Eliminate duplicate and redundant fence instructions.  */
+    if (ctx->prev_mb) {
+        /*
+         * Merge two barriers of the same type into one,
+         * or a weaker barrier into a stronger one,
+         * or two weaker barriers into a stronger one.
+         *   mb X; mb Y => mb X|Y
+         *   mb; strl => mb; st
+         *   ldaq; mb => ld; mb
+         *   ldaq; strl => ld; mb; st
+         * Other combinations are also merged into a strong
+         * barrier.  This is stricter than specified but for
+         * the purposes of TCG is better than not optimizing.
+         */
+        ctx->prev_mb->args[0] |= op->args[0];
+        tcg_op_remove(ctx->tcg, op);
+    } else {
+        ctx->prev_mb = op;
+    }
+    return true;
+}
+
+static bool fold_qemu_ld(OptContext *ctx, TCGOp *op)
+{
+    /* Opcodes that touch guest memory stop the mb optimization.  */
+    ctx->prev_mb = NULL;
+    return false;
+}
+
+static bool fold_qemu_st(OptContext *ctx, TCGOp *op)
+{
+    /* Opcodes that touch guest memory stop the mb optimization.  */
+    ctx->prev_mb = NULL;
+    return false;
+}
+
 /* Propagate constants and copies, fold constant expressions. */
 void tcg_optimize(TCGContext *s)
 {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             }
             break;
 
+        case INDEX_op_mb:
+            done = fold_mb(&ctx, op);
+            break;
+        case INDEX_op_qemu_ld_i32:
+        case INDEX_op_qemu_ld_i64:
+            done = fold_qemu_ld(&ctx, op);
+            break;
+        case INDEX_op_qemu_st_i32:
+        case INDEX_op_qemu_st8_i32:
+        case INDEX_op_qemu_st_i64:
+            done = fold_qemu_st(&ctx, op);
+            break;
+
         default:
             break;
         }
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         if (!done) {
             finish_folding(&ctx, op);
         }
-
-        /* Eliminate duplicate and redundant fence instructions.  */
-        if (ctx.prev_mb) {
-            switch (opc) {
-            case INDEX_op_mb:
-                /* Merge two barriers of the same type into one,
-                 * or a weaker barrier into a stronger one,
-                 * or two weaker barriers into a stronger one.
-                 *   mb X; mb Y => mb X|Y
-                 *   mb; strl => mb; st
-                 *   ldaq; mb => ld; mb
-                 *   ldaq; strl => ld; mb; st
-                 * Other combinations are also merged into a strong
-                 * barrier.  This is stricter than specified but for
-                 * the purposes of TCG is better than not optimizing.
-                 */
-                ctx.prev_mb->args[0] |= op->args[0];
-                tcg_op_remove(s, op);
-                break;
-
-            default:
-                /* Opcodes that end the block stop the optimization.  */
-                if ((def->flags & TCG_OPF_BB_END) == 0) {
-                    break;
-                }
-                /* fallthru */
-            case INDEX_op_qemu_ld_i32:
-            case INDEX_op_qemu_ld_i64:
-            case INDEX_op_qemu_st_i32:
-            case INDEX_op_qemu_st8_i32:
-            case INDEX_op_qemu_st_i64:
-                /* Opcodes that touch guest memory stop the optimization.  */
-                ctx.prev_mb = NULL;
-                break;
-            }
-        } else if (opc == INDEX_op_mb) {
-            ctx.prev_mb = op;
-        }
     }
 }
-- 
2.25.1

Split out a whole bunch of placeholder functions, which are
currently identical.  That won't last as more code gets moved.

Use CASE_32_64_VEC for some logical operators that previously
missed the addition of vectors.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 271 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 219 insertions(+), 52 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static void finish_folding(OptContext *ctx, TCGOp *op)
     }
 }
 
+/*
+ * The fold_* functions return true when processing is complete,
+ * usually by folding the operation to a constant or to a copy,
+ * and calling tcg_opt_gen_{mov,movi}.  They may do other things,
+ * like collect information about the value produced, for use in
+ * optimizing a subsequent operation.
+ *
+ * These first fold_* functions are all helpers, used by other
+ * folders for more specific operations.
+ */
+
+static bool fold_const1(OptContext *ctx, TCGOp *op)
+{
+    if (arg_is_const(op->args[1])) {
+        uint64_t t;
+
+        t = arg_info(op->args[1])->val;
+        t = do_constant_folding(op->opc, t, 0);
+        return tcg_opt_gen_movi(ctx, op, op->args[0], t);
+    }
+    return false;
+}
+
+static bool fold_const2(OptContext *ctx, TCGOp *op)
+{
+    if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
+        uint64_t t1 = arg_info(op->args[1])->val;
+        uint64_t t2 = arg_info(op->args[2])->val;
+
+        t1 = do_constant_folding(op->opc, t1, t2);
+        return tcg_opt_gen_movi(ctx, op, op->args[0], t1);
+    }
+    return false;
+}
+
+/*
+ * These outermost fold_<op> functions are sorted alphabetically.
+ */
+
+static bool fold_add(OptContext *ctx, TCGOp *op)
+{
+    return fold_const2(ctx, op);
+}
+
+static bool fold_and(OptContext *ctx, TCGOp *op)
+{
+    return fold_const2(ctx, op);
+}
+
+static bool fold_andc(OptContext *ctx, TCGOp *op)
+{
+    return fold_const2(ctx, op);
+}
+
 static bool fold_call(OptContext *ctx, TCGOp *op)
 {
     TCGContext *s = ctx->tcg;
@@ -XXX,XX +XXX,XX @@ static bool fold_call(OptContext *ctx, TCGOp *op)
     return true;
 }
 
+static bool fold_ctpop(OptContext *ctx, TCGOp *op)
+{
+    return fold_const1(ctx, op);
+}
+
+static bool fold_divide(OptContext *ctx, TCGOp *op)
+{
+    return fold_const2(ctx, op);
+}
+
+static bool fold_eqv(OptContext *ctx, TCGOp *op)
+{
+    return fold_const2(ctx, op);
+}
+
+static bool fold_exts(OptContext *ctx, TCGOp *op)
+{
+    return fold_const1(ctx, op);
+}
+
+static bool fold_extu(OptContext *ctx, TCGOp *op)
+{
+    return fold_const1(ctx, op);
+}
+
 static bool fold_mb(OptContext *ctx, TCGOp *op)
 {
     /* Eliminate duplicate and redundant fence instructions.  */
@@ -XXX,XX +XXX,XX @@ static bool fold_mb(OptContext *ctx, TCGOp *op)
     return true;
 }
 
+static bool fold_mul(OptContext *ctx, TCGOp *op)
+{
+    return fold_const2(ctx, op);
+}
+
+static bool fold_mul_highpart(OptContext *ctx, TCGOp *op)
+{
+    return fold_const2(ctx, op);
+}
+
+static bool fold_nand(OptContext *ctx, TCGOp *op)
+{
+    return fold_const2(ctx, op);
+}
+
+static bool fold_neg(OptContext *ctx, TCGOp *op)
+{
+    return fold_const1(ctx, op);
+}
+
+static bool fold_nor(OptContext *ctx, TCGOp *op)
+{
+    return fold_const2(ctx, op);
+}
+
+static bool fold_not(OptContext *ctx, TCGOp *op)
+{
+    return fold_const1(ctx, op);
+}
+
+static bool fold_or(OptContext *ctx, TCGOp *op)
+{
+    return fold_const2(ctx, op);
+}
+
+static bool fold_orc(OptContext *ctx, TCGOp *op)
+{
+    return fold_const2(ctx, op);
+}
+
 static bool fold_qemu_ld(OptContext *ctx, TCGOp *op)
 {
     /* Opcodes that touch guest memory stop the mb optimization.  */
@@ -XXX,XX +XXX,XX @@ static bool fold_qemu_st(OptContext *ctx, TCGOp *op)
     return false;
 }
 
+static bool fold_remainder(OptContext *ctx, TCGOp *op)
+{
+    return fold_const2(ctx, op);
+}
+
+static bool fold_shift(OptContext *ctx, TCGOp *op)
+{
+    return fold_const2(ctx, op);
+}
+
+static bool fold_sub(OptContext *ctx, TCGOp *op)
+{
+    return fold_const2(ctx, op);
+}
+
+static bool fold_xor(OptContext *ctx, TCGOp *op)
+{
+    return fold_const2(ctx, op);
+}
+
 /* Propagate constants and copies, fold constant expressions. */
 void tcg_optimize(TCGContext *s)
 {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             }
             break;
 
-        CASE_OP_32_64(not):
-        CASE_OP_32_64(neg):
-        CASE_OP_32_64(ext8s):
-        CASE_OP_32_64(ext8u):
-        CASE_OP_32_64(ext16s):
-        CASE_OP_32_64(ext16u):
-        CASE_OP_32_64(ctpop):
-        case INDEX_op_ext32s_i64:
-        case INDEX_op_ext32u_i64:
-        case INDEX_op_ext_i32_i64:
-        case INDEX_op_extu_i32_i64:
-        case INDEX_op_extrl_i64_i32:
-        case INDEX_op_extrh_i64_i32:
-            if (arg_is_const(op->args[1])) {
-                tmp = do_constant_folding(opc, arg_info(op->args[1])->val, 0);
-                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
-                continue;
-            }
-            break;
-
         CASE_OP_32_64(bswap16):
         CASE_OP_32_64(bswap32):
         case INDEX_op_bswap64_i64:
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             }
             break;
 
-        CASE_OP_32_64(add):
-        CASE_OP_32_64(sub):
-        CASE_OP_32_64(mul):
-        CASE_OP_32_64(or):
-        CASE_OP_32_64(and):
-        CASE_OP_32_64(xor):
-        CASE_OP_32_64(shl):
-        CASE_OP_32_64(shr):
-        CASE_OP_32_64(sar):
-        CASE_OP_32_64(rotl):
-        CASE_OP_32_64(rotr):
-        CASE_OP_32_64(andc):
-        CASE_OP_32_64(orc):
-        CASE_OP_32_64(eqv):
-        CASE_OP_32_64(nand):
-        CASE_OP_32_64(nor):
-        CASE_OP_32_64(muluh):
-        CASE_OP_32_64(mulsh):
-        CASE_OP_32_64(div):
-        CASE_OP_32_64(divu):
-        CASE_OP_32_64(rem):
-        CASE_OP_32_64(remu):
-            if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
-                tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
-                                          arg_info(op->args[2])->val);
-                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
-                continue;
-            }
-            break;
-
         CASE_OP_32_64(clz):
         CASE_OP_32_64(ctz):
             if (arg_is_const(op->args[1])) {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             }
             break;
 
+        default:
+            break;
+
+        /* ---------------------------------------------------------- */
+        /* Sorted alphabetically by opcode as much as possible. */
+
+        CASE_OP_32_64_VEC(add):
+            done = fold_add(&ctx, op);
+            break;
+        CASE_OP_32_64_VEC(and):
+            done = fold_and(&ctx, op);
+            break;
+        CASE_OP_32_64_VEC(andc):
+            done = fold_andc(&ctx, op);
+            break;
+        CASE_OP_32_64(ctpop):
+            done = fold_ctpop(&ctx, op);
+            break;
+        CASE_OP_32_64(div):
+        CASE_OP_32_64(divu):
+            done = fold_divide(&ctx, op);
+            break;
+        CASE_OP_32_64(eqv):
+            done = fold_eqv(&ctx, op);
+            break;
+        CASE_OP_32_64(ext8s):
+        CASE_OP_32_64(ext16s):
+        case INDEX_op_ext32s_i64:
+        case INDEX_op_ext_i32_i64:
+            done = fold_exts(&ctx, op);
+            break;
+        CASE_OP_32_64(ext8u):
+        CASE_OP_32_64(ext16u):
+        case INDEX_op_ext32u_i64:
+        case INDEX_op_extu_i32_i64:
+        case INDEX_op_extrl_i64_i32:
+        case INDEX_op_extrh_i64_i32:
+            done = fold_extu(&ctx, op);
+            break;
         case INDEX_op_mb:
             done = fold_mb(&ctx, op);
             break;
+        CASE_OP_32_64(mul):
+            done = fold_mul(&ctx, op);
+            break;
+        CASE_OP_32_64(mulsh):
+        CASE_OP_32_64(muluh):
+            done = fold_mul_highpart(&ctx, op);
+            break;
+        CASE_OP_32_64(nand):
+            done = fold_nand(&ctx, op);
+            break;
+        CASE_OP_32_64(neg):
+            done = fold_neg(&ctx, op);
+            break;
+        CASE_OP_32_64(nor):
+            done = fold_nor(&ctx, op);
+            break;
+        CASE_OP_32_64_VEC(not):
+            done = fold_not(&ctx, op);
+            break;
+        CASE_OP_32_64_VEC(or):
+            done = fold_or(&ctx, op);
+            break;
+        CASE_OP_32_64_VEC(orc):
+            done = fold_orc(&ctx, op);
+            break;
         case INDEX_op_qemu_ld_i32:
         case INDEX_op_qemu_ld_i64:
             done = fold_qemu_ld(&ctx, op);
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         case INDEX_op_qemu_st_i64:
             done = fold_qemu_st(&ctx, op);
             break;
-
-        default:
+        CASE_OP_32_64(rem):
+        CASE_OP_32_64(remu):
+            done = fold_remainder(&ctx, op);
+            break;
+        CASE_OP_32_64(rotl):
+        CASE_OP_32_64(rotr):
+        CASE_OP_32_64(sar):
+        CASE_OP_32_64(shl):
+        CASE_OP_32_64(shr):
+            done = fold_shift(&ctx, op);
+            break;
+        CASE_OP_32_64_VEC(sub):
+            done = fold_sub(&ctx, op);
+            break;
+        CASE_OP_32_64_VEC(xor):
+            done = fold_xor(&ctx, op);
             break;
         }
 
-- 
2.25.1

Reduce some code duplication by folding the NE and EQ cases.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 145 ++++++++++++++++++++++++-------------------------
 1 file changed, 72 insertions(+), 73 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool fold_remainder(OptContext *ctx, TCGOp *op)
     return fold_const2(ctx, op);
 }
 
+static bool fold_setcond2(OptContext *ctx, TCGOp *op)
+{
+    TCGCond cond = op->args[5];
+    int i = do_constant_folding_cond2(&op->args[1], &op->args[3], cond);
+    int inv = 0;
+
+    if (i >= 0) {
+        goto do_setcond_const;
+    }
+
+    switch (cond) {
+    case TCG_COND_LT:
+    case TCG_COND_GE:
+        /*
+         * Simplify LT/GE comparisons vs zero to a single compare
+         * vs the high word of the input.
+         */
+        if (arg_is_const(op->args[3]) && arg_info(op->args[3])->val == 0 &&
+            arg_is_const(op->args[4]) && arg_info(op->args[4])->val == 0) {
+            goto do_setcond_high;
+        }
+        break;
+
+    case TCG_COND_NE:
+        inv = 1;
+        QEMU_FALLTHROUGH;
+    case TCG_COND_EQ:
+        /*
+         * Simplify EQ/NE comparisons where one of the pairs
+         * can be simplified.
+         */
+        i = do_constant_folding_cond(INDEX_op_setcond_i32, op->args[1],
+                                     op->args[3], cond);
+        switch (i ^ inv) {
+        case 0:
+            goto do_setcond_const;
+        case 1:
+            goto do_setcond_high;
+        }
+
+        i = do_constant_folding_cond(INDEX_op_setcond_i32, op->args[2],
+                                     op->args[4], cond);
+        switch (i ^ inv) {
+        case 0:
+            goto do_setcond_const;
+        case 1:
+            op->args[2] = op->args[3];
+            op->args[3] = cond;
+            op->opc = INDEX_op_setcond_i32;
+            break;
+        }
+        break;
+
+    default:
+        break;
+
+    do_setcond_high:
+        op->args[1] = op->args[2];
+        op->args[2] = op->args[4];
+        op->args[3] = cond;
+        op->opc = INDEX_op_setcond_i32;
+        break;
+    }
+    return false;
+
+ do_setcond_const:
+    return tcg_opt_gen_movi(ctx, op, op->args[0], i);
+}
+
 static bool fold_shift(OptContext *ctx, TCGOp *op)
 {
     return fold_const2(ctx, op);
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             }
             break;
 
-        case INDEX_op_setcond2_i32:
-            i = do_constant_folding_cond2(&op->args[1], &op->args[3],
-                                          op->args[5]);
-            if (i >= 0) {
-            do_setcond_const:
-                tcg_opt_gen_movi(&ctx, op, op->args[0], i);
-                continue;
-            }
-            if ((op->args[5] == TCG_COND_LT || op->args[5] == TCG_COND_GE)
-                 && arg_is_const(op->args[3])
-                 && arg_info(op->args[3])->val == 0
-                 && arg_is_const(op->args[4])
-                 && arg_info(op->args[4])->val == 0) {
-                /* Simplify LT/GE comparisons vs zero to a single compare
-                   vs the high word of the input.  */
-            do_setcond_high:
-                reset_temp(op->args[0]);
-                arg_info(op->args[0])->z_mask = 1;
-                op->opc = INDEX_op_setcond_i32;
-                op->args[1] = op->args[2];
-                op->args[2] = op->args[4];
-                op->args[3] = op->args[5];
-                break;
-            }
-            if (op->args[5] == TCG_COND_EQ) {
-                /* Simplify EQ comparisons where one of the pairs
-                   can be simplified.  */
-                i = do_constant_folding_cond(INDEX_op_setcond_i32,
-                                             op->args[1], op->args[3],
-                                             TCG_COND_EQ);
-                if (i == 0) {
-                    goto do_setcond_const;
-                } else if (i > 0) {
-                    goto do_setcond_high;
-                }
-                i = do_constant_folding_cond(INDEX_op_setcond_i32,
-                                             op->args[2], op->args[4],
-                                             TCG_COND_EQ);
-                if (i == 0) {
-                    goto do_setcond_high;
-                } else if (i < 0) {
-                    break;
-                }
-            do_setcond_low:
-                reset_temp(op->args[0]);
-                arg_info(op->args[0])->z_mask = 1;
-                op->opc = INDEX_op_setcond_i32;
-                op->args[2] = op->args[3];
-                op->args[3] = op->args[5];
-                break;
-            }
-            if (op->args[5] == TCG_COND_NE) {
-                /* Simplify NE comparisons where one of the pairs
-                   can be simplified.  */
-                i = do_constant_folding_cond(INDEX_op_setcond_i32,
-                                             op->args[1], op->args[3],
-                                             TCG_COND_NE);
-                if (i == 0) {
-                    goto do_setcond_high;
-                } else if (i > 0) {
-                    goto do_setcond_const;
-                }
-                i = do_constant_folding_cond(INDEX_op_setcond_i32,
-                                             op->args[2], op->args[4],
-                                             TCG_COND_NE);
-                if (i == 0) {
-                    goto do_setcond_low;
-                } else if (i > 0) {
-                    goto do_setcond_const;
-                }
-            }
-            break;
-
         default:
             break;
 
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(shr):
             done = fold_shift(&ctx, op);
             break;
+        case INDEX_op_setcond2_i32:
+            done = fold_setcond2(&ctx, op);
+            break;
         CASE_OP_32_64_VEC(sub):
             done = fold_sub(&ctx, op);
             break;
-- 
2.25.1

Reduce some code duplication by folding the NE and EQ cases.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 159 +++++++++++++++++++++++++------------------------
 1 file changed, 81 insertions(+), 78 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool fold_andc(OptContext *ctx, TCGOp *op)
     return fold_const2(ctx, op);
 }
 
+static bool fold_brcond2(OptContext *ctx, TCGOp *op)
+{
+    TCGCond cond = op->args[4];
+    int i = do_constant_folding_cond2(&op->args[0], &op->args[2], cond);
+    TCGArg label = op->args[5];
+    int inv = 0;
+
+    if (i >= 0) {
+        goto do_brcond_const;
+    }
+
+    switch (cond) {
+    case TCG_COND_LT:
+    case TCG_COND_GE:
+        /*
+         * Simplify LT/GE comparisons vs zero to a single compare
+         * vs the high word of the input.
+         */
+        if (arg_is_const(op->args[2]) && arg_info(op->args[2])->val == 0 &&
+            arg_is_const(op->args[3]) && arg_info(op->args[3])->val == 0) {
+            goto do_brcond_high;
+        }
+        break;
+
+    case TCG_COND_NE:
+        inv = 1;
+        QEMU_FALLTHROUGH;
+    case TCG_COND_EQ:
+        /*
+         * Simplify EQ/NE comparisons where one of the pairs
+         * can be simplified.
+         */
+        i = do_constant_folding_cond(INDEX_op_brcond_i32, op->args[0],
+                                     op->args[2], cond);
+        switch (i ^ inv) {
+        case 0:
+            goto do_brcond_const;
+        case 1:
+            goto do_brcond_high;
+        }
+
+        i = do_constant_folding_cond(INDEX_op_brcond_i32, op->args[1],
+                                     op->args[3], cond);
+        switch (i ^ inv) {
+        case 0:
+            goto do_brcond_const;
+        case 1:
+            op->opc = INDEX_op_brcond_i32;
+            op->args[1] = op->args[2];
+            op->args[2] = cond;
+            op->args[3] = label;
+            break;
+        }
+        break;
+
+    default:
+        break;
+
+    do_brcond_high:
+        op->opc = INDEX_op_brcond_i32;
+        op->args[0] = op->args[1];
+        op->args[1] = op->args[3];
+        op->args[2] = cond;
+        op->args[3] = label;
+        break;
+
+    do_brcond_const:
+        if (i == 0) {
+            tcg_op_remove(ctx->tcg, op);
+            return true;
+        }
+        op->opc = INDEX_op_br;
+        op->args[0] = label;
+        break;
+    }
+    return false;
+}
+
 static bool fold_call(OptContext *ctx, TCGOp *op)
 {
     TCGContext *s = ctx->tcg;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             }
             break;
 
-        case INDEX_op_brcond2_i32:
-            i = do_constant_folding_cond2(&op->args[0], &op->args[2],
-                                          op->args[4]);
-            if (i == 0) {
-            do_brcond_false:
-                tcg_op_remove(s, op);
-                continue;
-            }
-            if (i > 0) {
-            do_brcond_true:
-                op->opc = opc = INDEX_op_br;
-                op->args[0] = op->args[5];
-                break;
-            }
-            if ((op->args[4] == TCG_COND_LT || op->args[4] == TCG_COND_GE)
-                 && arg_is_const(op->args[2])
-                 && arg_info(op->args[2])->val == 0
-                 && arg_is_const(op->args[3])
-                 && arg_info(op->args[3])->val == 0) {
-                /* Simplify LT/GE comparisons vs zero to a single compare
-                   vs the high word of the input.  */
-            do_brcond_high:
-                op->opc = opc = INDEX_op_brcond_i32;
-                op->args[0] = op->args[1];
-                op->args[1] = op->args[3];
-                op->args[2] = op->args[4];
-                op->args[3] = op->args[5];
-                break;
-            }
-            if (op->args[4] == TCG_COND_EQ) {
-                /* Simplify EQ comparisons where one of the pairs
-                   can be simplified.  */
-                i = do_constant_folding_cond(INDEX_op_brcond_i32,
-                                             op->args[0], op->args[2],
-                                             TCG_COND_EQ);
-                if (i == 0) {
-                    goto do_brcond_false;
-                } else if (i > 0) {
-                    goto do_brcond_high;
-                }
-                i = do_constant_folding_cond(INDEX_op_brcond_i32,
-                                             op->args[1], op->args[3],
-                                             TCG_COND_EQ);
-                if (i == 0) {
-                    goto do_brcond_false;
-                } else if (i < 0) {
-                    break;
-                }
-            do_brcond_low:
-                memset(&ctx.temps_used, 0, sizeof(ctx.temps_used));
-                op->opc = INDEX_op_brcond_i32;
-                op->args[1] = op->args[2];
-                op->args[2] = op->args[4];
-                op->args[3] = op->args[5];
-                break;
-            }
-            if (op->args[4] == TCG_COND_NE) {
-                /* Simplify NE comparisons where one of the pairs
-                   can be simplified.  */
-                i = do_constant_folding_cond(INDEX_op_brcond_i32,
-                                             op->args[0], op->args[2],
-                                             TCG_COND_NE);
-                if (i == 0) {
-                    goto do_brcond_high;
-                } else if (i > 0) {
-                    goto do_brcond_true;
-                }
-                i = do_constant_folding_cond(INDEX_op_brcond_i32,
-                                             op->args[1], op->args[3],
-                                             TCG_COND_NE);
-                if (i == 0) {
-                    goto do_brcond_low;
-                } else if (i > 0) {
-                    goto do_brcond_true;
-                }
-            }
-            break;
-
         default:
             break;
 
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64_VEC(andc):
             done = fold_andc(&ctx, op);
             break;
+        case INDEX_op_brcond2_i32:
+            done = fold_brcond2(&ctx, op);
+            break;
         CASE_OP_32_64(ctpop):
             done = fold_ctpop(&ctx, op);
             break;
-- 
2.25.1

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 33 +++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 37 +++++++++++++++++++++----------------
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool fold_mul_highpart(OptContext *ctx, TCGOp *op)
     return fold_const2(ctx, op);
 }
 
+static bool fold_mulu2_i32(OptContext *ctx, TCGOp *op)
+{
+    if (arg_is_const(op->args[2]) && arg_is_const(op->args[3])) {
+        uint32_t a = arg_info(op->args[2])->val;
+        uint32_t b = arg_info(op->args[3])->val;
+        uint64_t r = (uint64_t)a * b;
+        TCGArg rl, rh;
+        TCGOp *op2 = tcg_op_insert_before(ctx->tcg, op, INDEX_op_mov_i32);
+
+        rl = op->args[0];
+        rh = op->args[1];
+        tcg_opt_gen_movi(ctx, op, rl, (int32_t)r);
+        tcg_opt_gen_movi(ctx, op2, rh, (int32_t)(r >> 32));
+        return true;
+    }
+    return false;
+}
+
 static bool fold_nand(OptContext *ctx, TCGOp *op)
 {
     return fold_const2(ctx, op);
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             }
             break;
 
-        case INDEX_op_mulu2_i32:
-            if (arg_is_const(op->args[2]) && arg_is_const(op->args[3])) {
-                uint32_t a = arg_info(op->args[2])->val;
-                uint32_t b = arg_info(op->args[3])->val;
-                uint64_t r = (uint64_t)a * b;
-                TCGArg rl, rh;
-                TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_mov_i32);
-
-                rl = op->args[0];
-                rh = op->args[1];
-                tcg_opt_gen_movi(&ctx, op, rl, (int32_t)r);
-                tcg_opt_gen_movi(&ctx, op2, rh, (int32_t)(r >> 32));
-                continue;
-            }
-            break;
-
         default:
             break;
 
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(muluh):
             done = fold_mul_highpart(&ctx, op);
             break;
+        case INDEX_op_mulu2_i32:
+            done = fold_mulu2_i32(&ctx, op);
+            break;
         CASE_OP_32_64(nand):
             done = fold_nand(&ctx, op);
             break;
-- 
2.25.1

Add two additional helpers, fold_add2_i32 and fold_sub2_i32
which will not be simple wrappers forever.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 70 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 44 insertions(+), 26 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool fold_add(OptContext *ctx, TCGOp *op)
     return fold_const2(ctx, op);
 }
 
+static bool fold_addsub2_i32(OptContext *ctx, TCGOp *op, bool add)
+{
+    if (arg_is_const(op->args[2]) && arg_is_const(op->args[3]) &&
+        arg_is_const(op->args[4]) && arg_is_const(op->args[5])) {
+        uint32_t al = arg_info(op->args[2])->val;
+        uint32_t ah = arg_info(op->args[3])->val;
+        uint32_t bl = arg_info(op->args[4])->val;
+        uint32_t bh = arg_info(op->args[5])->val;
+        uint64_t a = ((uint64_t)ah << 32) | al;
+        uint64_t b = ((uint64_t)bh << 32) | bl;
+        TCGArg rl, rh;
+        TCGOp *op2 = tcg_op_insert_before(ctx->tcg, op, INDEX_op_mov_i32);
+
+        if (add) {
+            a += b;
+        } else {
+            a -= b;
+        }
+
+        rl = op->args[0];
+        rh = op->args[1];
+        tcg_opt_gen_movi(ctx, op, rl, (int32_t)a);
+        tcg_opt_gen_movi(ctx, op2, rh, (int32_t)(a >> 32));
+        return true;
+    }
+    return false;
+}
+
+static bool fold_add2_i32(OptContext *ctx, TCGOp *op)
+{
+    return fold_addsub2_i32(ctx, op, true);
+}
+
 static bool fold_and(OptContext *ctx, TCGOp *op)
 {
     return fold_const2(ctx, op);
@@ -XXX,XX +XXX,XX @@ static bool fold_sub(OptContext *ctx, TCGOp *op)
     return fold_const2(ctx, op);
 }
 
+static bool fold_sub2_i32(OptContext *ctx, TCGOp *op)
+{
+    return fold_addsub2_i32(ctx, op, false);
+}
+
 static bool fold_xor(OptContext *ctx, TCGOp *op)
 {
     return fold_const2(ctx, op);
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             }
             break;
 
-        case INDEX_op_add2_i32:
-        case INDEX_op_sub2_i32:
-            if (arg_is_const(op->args[2]) && arg_is_const(op->args[3])
-                && arg_is_const(op->args[4]) && arg_is_const(op->args[5])) {
-                uint32_t al = arg_info(op->args[2])->val;
-                uint32_t ah = arg_info(op->args[3])->val;
-                uint32_t bl = arg_info(op->args[4])->val;
-                uint32_t bh = arg_info(op->args[5])->val;
-                uint64_t a = ((uint64_t)ah << 32) | al;
-                uint64_t b = ((uint64_t)bh << 32) | bl;
-                TCGArg rl, rh;
-                TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_mov_i32);
-
-                if (opc == INDEX_op_add2_i32) {
-                    a += b;
-                } else {
-                    a -= b;
-                }
-
-                rl = op->args[0];
-                rh = op->args[1];
-                tcg_opt_gen_movi(&ctx, op, rl, (int32_t)a);
-                tcg_opt_gen_movi(&ctx, op2, rh, (int32_t)(a >> 32));
-                continue;
-            }
-            break;
 
         default:
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64_VEC(add):
             done = fold_add(&ctx, op);
             break;
+        case INDEX_op_add2_i32:
+            done = fold_add2_i32(&ctx, op);
+            break;
         CASE_OP_32_64_VEC(and):
             done = fold_and(&ctx, op);
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64_VEC(sub):
             done = fold_sub(&ctx, op);
             break;
+        case INDEX_op_sub2_i32:
+            done = fold_sub2_i32(&ctx, op);
+            break;
         CASE_OP_32_64_VEC(xor):
             done = fold_xor(&ctx, op);
             break;
-- 
2.25.1

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 56 ++++++++++++++++++++++++++++----------------------
 1 file changed, 31 insertions(+), 25 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool fold_mb(OptContext *ctx, TCGOp *op)
     return true;
 }
 
+static bool fold_movcond(OptContext *ctx, TCGOp *op)
+{
+    TCGOpcode opc = op->opc;
+    TCGCond cond = op->args[5];
+    int i = do_constant_folding_cond(opc, op->args[1], op->args[2], cond);
+
+    if (i >= 0) {
+        return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[4 - i]);
+    }
+
+    if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
+        uint64_t tv = arg_info(op->args[3])->val;
+        uint64_t fv = arg_info(op->args[4])->val;
+
+        opc = (opc == INDEX_op_movcond_i32
+               ? INDEX_op_setcond_i32 : INDEX_op_setcond_i64);
+
+        if (tv == 1 && fv == 0) {
+            op->opc = opc;
+            op->args[3] = cond;
+        } else if (fv == 1 && tv == 0) {
+            op->opc = opc;
+            op->args[3] = tcg_invert_cond(cond);
+        }
+    }
+    return false;
+}
+
 static bool fold_mul(OptContext *ctx, TCGOp *op)
 {
     return fold_const2(ctx, op);
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             }
             break;
 
-        CASE_OP_32_64(movcond):
-            i = do_constant_folding_cond(opc, op->args[1],
-                                         op->args[2], op->args[5]);
-            if (i >= 0) {
-                tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[4 - i]);
-                continue;
-            }
-            if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
-                uint64_t tv = arg_info(op->args[3])->val;
-                uint64_t fv = arg_info(op->args[4])->val;
-                TCGCond cond = op->args[5];
-
-                if (fv == 1 && tv == 0) {
-                    cond = tcg_invert_cond(cond);
-                } else if (!(tv == 1 && fv == 0)) {
-                    break;
-                }
-                op->args[3] = cond;
-                op->opc = opc = (opc == INDEX_op_movcond_i32
-                                 ? INDEX_op_setcond_i32
-                                 : INDEX_op_setcond_i64);
-            }
-            break;
-
-
         default:
             break;
 
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         case INDEX_op_mb:
             done = fold_mb(&ctx, op);
             break;
+        CASE_OP_32_64(movcond):
+            done = fold_movcond(&ctx, op);
+            break;
         CASE_OP_32_64(mul):
             done = fold_mul(&ctx, op);
             break;
-- 
2.25.1

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 39 ++++++++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool fold_eqv(OptContext *ctx, TCGOp *op)
     return fold_const2(ctx, op);
 }
 
+static bool fold_extract2(OptContext *ctx, TCGOp *op)
+{
+    if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
+        uint64_t v1 = arg_info(op->args[1])->val;
+        uint64_t v2 = arg_info(op->args[2])->val;
+        int shr = op->args[3];
+
+        if (op->opc == INDEX_op_extract2_i64) {
+            v1 >>= shr;
+            v2 <<= 64 - shr;
+        } else {
+            v1 = (uint32_t)v1 >> shr;
+            v2 = (int32_t)v2 << (32 - shr);
+        }
+        return tcg_opt_gen_movi(ctx, op, op->args[0], v1 | v2);
+    }
+    return false;
+}
+
 static bool fold_exts(OptContext *ctx, TCGOp *op)
 {
     return fold_const1(ctx, op);
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             }
             break;
 
-        CASE_OP_32_64(extract2):
-            if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
-                uint64_t v1 = arg_info(op->args[1])->val;
-                uint64_t v2 = arg_info(op->args[2])->val;
-                int shr = op->args[3];
-
-                if (opc == INDEX_op_extract2_i64) {
-                    tmp = (v1 >> shr) | (v2 << (64 - shr));
-                } else {
-                    tmp = (int32_t)(((uint32_t)v1 >> shr) |
-                                    ((uint32_t)v2 << (32 - shr)));
-                }
-                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
-                continue;
-            }
-            break;
-
         default:
             break;
 
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(eqv):
             done = fold_eqv(&ctx, op);
             break;
+        CASE_OP_32_64(extract2):
+            done = fold_extract2(&ctx, op);
+            break;
         CASE_OP_32_64(ext8s):
         CASE_OP_32_64(ext16s):
         case INDEX_op_ext32s_i64:
-- 
2.25.1

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 48 ++++++++++++++++++++++++++++++------------------
 1 file changed, 30 insertions(+), 18 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool fold_eqv(OptContext *ctx, TCGOp *op)
     return fold_const2(ctx, op);
 }
 
+static bool fold_extract(OptContext *ctx, TCGOp *op)
+{
+    if (arg_is_const(op->args[1])) {
+        uint64_t t;
+
+        t = arg_info(op->args[1])->val;
+        t = extract64(t, op->args[2], op->args[3]);
+        return tcg_opt_gen_movi(ctx, op, op->args[0], t);
+    }
+    return false;
+}
+
 static bool fold_extract2(OptContext *ctx, TCGOp *op)
 {
     if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
@@ -XXX,XX +XXX,XX @@ static bool fold_setcond2(OptContext *ctx, TCGOp *op)
     return tcg_opt_gen_movi(ctx, op, op->args[0], i);
 }
 
+static bool fold_sextract(OptContext *ctx, TCGOp *op)
+{
+    if (arg_is_const(op->args[1])) {
+        uint64_t t;
+
+        t = arg_info(op->args[1])->val;
+        t = sextract64(t, op->args[2], op->args[3]);
+        return tcg_opt_gen_movi(ctx, op, op->args[0], t);
+    }
+    return false;
+}
+
 static bool fold_shift(OptContext *ctx, TCGOp *op)
 {
     return fold_const2(ctx, op);
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             }
             break;
 
-        CASE_OP_32_64(extract):
-            if (arg_is_const(op->args[1])) {
-                tmp = extract64(arg_info(op->args[1])->val,
-                                op->args[2], op->args[3]);
-                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
-                continue;
-            }
-            break;
-
-        CASE_OP_32_64(sextract):
-            if (arg_is_const(op->args[1])) {
-                tmp = sextract64(arg_info(op->args[1])->val,
-                                 op->args[2], op->args[3]);
-                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
-                continue;
-            }
-            break;
-
         default:
             break;
 
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(eqv):
             done = fold_eqv(&ctx, op);
             break;
+        CASE_OP_32_64(extract):
+            done = fold_extract(&ctx, op);
+            break;
         CASE_OP_32_64(extract2):
             done = fold_extract2(&ctx, op);
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         case INDEX_op_setcond2_i32:
             done = fold_setcond2(&ctx, op);
             break;
+        CASE_OP_32_64(sextract):
+            done = fold_sextract(&ctx, op);
+            break;
         CASE_OP_32_64_VEC(sub):
             done = fold_sub(&ctx, op);
             break;
-- 
2.25.1

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool fold_ctpop(OptContext *ctx, TCGOp *op)
     return fold_const1(ctx, op);
 }
 
+static bool fold_deposit(OptContext *ctx, TCGOp *op)
+{
+    if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
+        uint64_t t1 = arg_info(op->args[1])->val;
+        uint64_t t2 = arg_info(op->args[2])->val;
+
+        t1 = deposit64(t1, op->args[3], op->args[4], t2);
+        return tcg_opt_gen_movi(ctx, op, op->args[0], t1);
+    }
+    return false;
+}
+
 static bool fold_divide(OptContext *ctx, TCGOp *op)
 {
     return fold_const2(ctx, op);
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             }
             break;
 
-        CASE_OP_32_64(deposit):
-            if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
-                tmp = deposit64(arg_info(op->args[1])->val,
-                                op->args[3], op->args[4],
-                                arg_info(op->args[2])->val);
-                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
-                continue;
-            }
-            break;
-
         default:
             break;
 
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(ctpop):
             done = fold_ctpop(&ctx, op);
             break;
+        CASE_OP_32_64(deposit):
+            done = fold_deposit(&ctx, op);
+            break;
         CASE_OP_32_64(div):
         CASE_OP_32_64(divu):
             done = fold_divide(&ctx, op);
-- 
2.25.1

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool fold_brcond2(OptContext *ctx, TCGOp *op)
     return false;
 }
 
+static bool fold_bswap(OptContext *ctx, TCGOp *op)
+{
+    if (arg_is_const(op->args[1])) {
+        uint64_t t = arg_info(op->args[1])->val;
+
+        t = do_constant_folding(op->opc, t, op->args[2]);
+        return tcg_opt_gen_movi(ctx, op, op->args[0], t);
+    }
+    return false;
+}
+
 static bool fold_call(OptContext *ctx, TCGOp *op)
 {
     TCGContext *s = ctx->tcg;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             }
             break;
 
-        CASE_OP_32_64(bswap16):
-        CASE_OP_32_64(bswap32):
-        case INDEX_op_bswap64_i64:
-            if (arg_is_const(op->args[1])) {
-                tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
-                                          op->args[2]);
-                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
-                continue;
-            }
-            break;
-
         default:
             break;
 
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         case INDEX_op_brcond2_i32:
             done = fold_brcond2(&ctx, op);
             break;
+        CASE_OP_32_64(bswap16):
+        CASE_OP_32_64(bswap32):
+        case INDEX_op_bswap64_i64:
+            done = fold_bswap(&ctx, op);
+            break;
         CASE_OP_32_64(clz):
         CASE_OP_32_64(ctz):
             done = fold_count_zeros(&ctx, op);
-- 
2.25.1

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 53 +++++++++++++++++++++++++++++---------------------
 1 file changed, 31 insertions(+), 22 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool fold_divide(OptContext *ctx, TCGOp *op)
     return fold_const2(ctx, op);
 }
 
+static bool fold_dup(OptContext *ctx, TCGOp *op)
+{
+    if (arg_is_const(op->args[1])) {
+        uint64_t t = arg_info(op->args[1])->val;
+        t = dup_const(TCGOP_VECE(op), t);
+        return tcg_opt_gen_movi(ctx, op, op->args[0], t);
+    }
+    return false;
+}
+
+static bool fold_dup2(OptContext *ctx, TCGOp *op)
+{
+    if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
+        uint64_t t = deposit64(arg_info(op->args[1])->val, 32, 32,
+                               arg_info(op->args[2])->val);
+        return tcg_opt_gen_movi(ctx, op, op->args[0], t);
+    }
+
+    if (args_are_copies(op->args[1], op->args[2])) {
+        op->opc = INDEX_op_dup_vec;
+        TCGOP_VECE(op) = MO_32;
+    }
+    return false;
+}
+
 static bool fold_eqv(OptContext *ctx, TCGOp *op)
 {
     return fold_const2(ctx, op);
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             done = tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
             break;
 
-        case INDEX_op_dup_vec:
-            if (arg_is_const(op->args[1])) {
-                tmp = arg_info(op->args[1])->val;
-                tmp = dup_const(TCGOP_VECE(op), tmp);
-                tcg_opt_gen_movi(&ctx, op, op->args[0], tmp);
-                continue;
-            }
-            break;
-
-        case INDEX_op_dup2_vec:
-            assert(TCG_TARGET_REG_BITS == 32);
-            if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
-                tcg_opt_gen_movi(&ctx, op, op->args[0],
-                                 deposit64(arg_info(op->args[1])->val, 32, 32,
-                                           arg_info(op->args[2])->val));
-                continue;
-            } else if (args_are_copies(op->args[1], op->args[2])) {
-                op->opc = INDEX_op_dup_vec;
-                TCGOP_VECE(op) = MO_32;
-            }
-            break;
-
         default:
             break;
 
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(divu):
             done = fold_divide(&ctx, op);
             break;
+        case INDEX_op_dup_vec:
+            done = fold_dup(&ctx, op);
+            break;
+        case INDEX_op_dup2_vec:
+            done = fold_dup2(&ctx, op);
+            break;
         CASE_OP_32_64(eqv):
             done = fold_eqv(&ctx, op);
             break;
-- 
2.25.1

This is the final entry in the main switch that was in a
different form.  After this, we have the option to convert
the switch into a function dispatch table.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool fold_mb(OptContext *ctx, TCGOp *op)
     return true;
 }
 
+static bool fold_mov(OptContext *ctx, TCGOp *op)
+{
+    return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[1]);
+}
+
 static bool fold_movcond(OptContext *ctx, TCGOp *op)
 {
     TCGOpcode opc = op->opc;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             break;
         }
 
-        /* Propagate constants through copy operations and do constant
-           folding.  Constants will be substituted to arguments by register
-           allocator where needed and possible.  Also detect copies. */
+        /*
+         * Process each opcode.
+         * Sorted alphabetically by opcode as much as possible.
+         */
         switch (opc) {
-        CASE_OP_32_64_VEC(mov):
-            done = tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
-            break;
-
-        default:
-            break;
-
-        /* ---------------------------------------------------------- */
-        /* Sorted alphabetically by opcode as much as possible. */
-
         CASE_OP_32_64_VEC(add):
             done = fold_add(&ctx, op);
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         case INDEX_op_mb:
             done = fold_mb(&ctx, op);
             break;
+        CASE_OP_32_64_VEC(mov):
+            done = fold_mov(&ctx, op);
+            break;
         CASE_OP_32_64(movcond):
             done = fold_movcond(&ctx, op);
             break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64_VEC(xor):
             done = fold_xor(&ctx, op);
             break;
+        default:
+            break;
         }
 
         if (!done) {
-- 
2.25.1

Pull the "op r, a, a => movi r, 0" optimization into a function,
and use it in the outer opcode fold functions.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 41 ++++++++++++++++++++++++-----------------
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool fold_const2(OptContext *ctx, TCGOp *op)
     return false;
 }
 
+/* If the binary operation has both arguments equal, fold to @i. */
+static bool fold_xx_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
+{
+    if (args_are_copies(op->args[1], op->args[2])) {
+        return tcg_opt_gen_movi(ctx, op, op->args[0], i);
+    }
+    return false;
+}
+
 /*
  * These outermost fold_<op> functions are sorted alphabetically.
  */
@@ -XXX,XX +XXX,XX @@ static bool fold_and(OptContext *ctx, TCGOp *op)
 
 static bool fold_andc(OptContext *ctx, TCGOp *op)
 {
-    return fold_const2(ctx, op);
+    if (fold_const2(ctx, op) ||
+        fold_xx_to_i(ctx, op, 0)) {
+        return true;
+    }
+    return false;
 }
 
 static bool fold_brcond(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_shift(OptContext *ctx, TCGOp *op)
 
 static bool fold_sub(OptContext *ctx, TCGOp *op)
 {
-    return fold_const2(ctx, op);
+    if (fold_const2(ctx, op) ||
+        fold_xx_to_i(ctx, op, 0)) {
+        return true;
+    }
+    return false;
 }
 
 static bool fold_sub2_i32(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_sub2_i32(OptContext *ctx, TCGOp *op)
 
 static bool fold_xor(OptContext *ctx, TCGOp *op)
 {
-    return fold_const2(ctx, op);
+    if (fold_const2(ctx, op) ||
+        fold_xx_to_i(ctx, op, 0)) {
+        return true;
+    }
+    return false;
 }
 
 /* Propagate constants and copies, fold constant expressions. */
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             break;
         }
 
-        /* Simplify expression for "op r, a, a => movi r, 0" cases */
-        switch (opc) {
-        CASE_OP_32_64_VEC(andc):
-        CASE_OP_32_64_VEC(sub):
-        CASE_OP_32_64_VEC(xor):
-            if (args_are_copies(op->args[1], op->args[2])) {
-                tcg_opt_gen_movi(&ctx, op, op->args[0], 0);
-                continue;
-            }
-            break;
-        default:
-            break;
-        }
-
         /*
          * Process each opcode.
          * Sorted alphabetically by opcode as much as possible.
-- 
2.25.1

Pull the "op r, a, a => mov r, a" optimization into a function,
and use it in the outer opcode fold functions.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 39 ++++++++++++++++++++++++---------------
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool fold_xx_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
     return false;
 }
 
+/* If the binary operation has both arguments equal, fold to identity. */
+static bool fold_xx_to_x(OptContext *ctx, TCGOp *op)
+{
+    if (args_are_copies(op->args[1], op->args[2])) {
+        return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[1]);
+    }
+    return false;
+}
+
 /*
  * These outermost fold_<op> functions are sorted alphabetically.
+ *
+ * The ordering of the transformations should be:
+ *   1) those that produce a constant
+ *   2) those that produce a copy
+ *   3) those that produce information about the result value.
  */
 
 static bool fold_add(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_add2_i32(OptContext *ctx, TCGOp *op)
 
 static bool fold_and(OptContext *ctx, TCGOp *op)
 {
-    return fold_const2(ctx, op);
+    if (fold_const2(ctx, op) ||
+        fold_xx_to_x(ctx, op)) {
+        return true;
+    }
+    return false;
 }
 
 static bool fold_andc(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_not(OptContext *ctx, TCGOp *op)
 
 static bool fold_or(OptContext *ctx, TCGOp *op)
 {
-    return fold_const2(ctx, op);
+    if (fold_const2(ctx, op) ||
+        fold_xx_to_x(ctx, op)) {
+        return true;
+    }
+    return false;
 }
 
 static bool fold_orc(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             break;
         }
 
-        /* Simplify expression for "op r, a, a => mov r, a" cases */
-        switch (opc) {
-        CASE_OP_32_64_VEC(or):
-        CASE_OP_32_64_VEC(and):
-            if (args_are_copies(op->args[1], op->args[2])) {
-                tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
-                continue;
-            }
-            break;
-        default:
-            break;
-        }
-
         /*
          * Process each opcode.
          * Sorted alphabetically by opcode as much as possible.
-- 
2.25.1

Pull the "op r, a, 0 => movi r, 0" optimization into a function,
and use it in the outer opcode fold functions.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 38 ++++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool fold_const2(OptContext *ctx, TCGOp *op)
     return false;
 }
 
+/* If the binary operation has second argument @i, fold to @i. */
+static bool fold_xi_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
+{
+    if (arg_is_const(op->args[2]) && arg_info(op->args[2])->val == i) {
+        return tcg_opt_gen_movi(ctx, op, op->args[0], i);
+    }
+    return false;
+}
+
 /* If the binary operation has both arguments equal, fold to @i. */
 static bool fold_xx_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
 {
@@ -XXX,XX +XXX,XX @@ static bool fold_add2_i32(OptContext *ctx, TCGOp *op)
 static bool fold_and(OptContext *ctx, TCGOp *op)
 {
     if (fold_const2(ctx, op) ||
+        fold_xi_to_i(ctx, op, 0) ||
         fold_xx_to_x(ctx, op)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ static bool fold_movcond(OptContext *ctx, TCGOp *op)
 
 static bool fold_mul(OptContext *ctx, TCGOp *op)
 {
-    return fold_const2(ctx, op);
+    if (fold_const2(ctx, op) ||
+        fold_xi_to_i(ctx, op, 0)) {
+        return true;
+    }
+    return false;
 }
 
 static bool fold_mul_highpart(OptContext *ctx, TCGOp *op)
 {
-    return fold_const2(ctx, op);
+    if (fold_const2(ctx, op) ||
+        fold_xi_to_i(ctx, op, 0)) {
+        return true;
+    }
+    return false;
 }
 
 static bool fold_mulu2_i32(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             continue;
         }
 
-        /* Simplify expression for "op r, a, 0 => movi r, 0" cases */
-        switch (opc) {
-        CASE_OP_32_64_VEC(and):
-        CASE_OP_32_64_VEC(mul):
-        CASE_OP_32_64(muluh):
-        CASE_OP_32_64(mulsh):
-            if (arg_is_const(op->args[2])
-                && arg_info(op->args[2])->val == 0) {
-                tcg_opt_gen_movi(&ctx, op, op->args[0], 0);
-                continue;
-            }
-            break;
-        default:
-            break;
-        }
-
         /*
          * Process each opcode.
          * Sorted alphabetically by opcode as much as possible.
-- 
2.25.1

Compute the type of the operation early.

There are at least 4 places that used a def->flags ladder
to determine the type of the operation being optimized.

There were two places that assumed !TCG_OPF_64BIT means
TCG_TYPE_I32, and so could potentially compute incorrect
results for vector operations.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 149 +++++++++++++++++++++++++++++--------------------
 1 file changed, 89 insertions(+), 60 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ typedef struct OptContext {
 
     /* In flight values from optimization. */
     uint64_t z_mask;
+    TCGType type;
 } OptContext;
 
 static inline TempOptInfo *ts_info(TCGTemp *ts)
@@ -XXX,XX +XXX,XX @@ static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
 {
     TCGTemp *dst_ts = arg_temp(dst);
     TCGTemp *src_ts = arg_temp(src);
-    const TCGOpDef *def;
     TempOptInfo *di;
     TempOptInfo *si;
     uint64_t z_mask;
@@ -XXX,XX +XXX,XX @@ static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
     reset_ts(dst_ts);
     di = ts_info(dst_ts);
     si = ts_info(src_ts);
-    def = &tcg_op_defs[op->opc];
-    if (def->flags & TCG_OPF_VECTOR) {
-        new_op = INDEX_op_mov_vec;
-    } else if (def->flags & TCG_OPF_64BIT) {
-        new_op = INDEX_op_mov_i64;
-    } else {
+
+    switch (ctx->type) {
+    case TCG_TYPE_I32:
         new_op = INDEX_op_mov_i32;
+        break;
+    case TCG_TYPE_I64:
+        new_op = INDEX_op_mov_i64;
+        break;
+    case TCG_TYPE_V64:
+    case TCG_TYPE_V128:
+    case TCG_TYPE_V256:
+        /* TCGOP_VECL and TCGOP_VECE remain unchanged.  */
+        new_op = INDEX_op_mov_vec;
+        break;
+    default:
+        g_assert_not_reached();
     }
     op->opc = new_op;
-    /* TCGOP_VECL and TCGOP_VECE remain unchanged.  */
     op->args[0] = dst;
     op->args[1] = src;
 
@@ -XXX,XX +XXX,XX @@ static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
 static bool tcg_opt_gen_movi(OptContext *ctx, TCGOp *op,
                              TCGArg dst, uint64_t val)
 {
-    const TCGOpDef *def = &tcg_op_defs[op->opc];
-    TCGType type;
-    TCGTemp *tv;
-
-    if (def->flags & TCG_OPF_VECTOR) {
-        type = TCGOP_VECL(op) + TCG_TYPE_V64;
-    } else if (def->flags & TCG_OPF_64BIT) {
-        type = TCG_TYPE_I64;
-    } else {
-        type = TCG_TYPE_I32;
-    }
-
     /* Convert movi to mov with constant temp. */
-    tv = tcg_constant_internal(type, val);
+    TCGTemp *tv = tcg_constant_internal(ctx->type, val);
+
     init_ts_info(ctx, tv);
     return tcg_opt_gen_mov(ctx, op, dst, temp_arg(tv));
 }
@@ -XXX,XX +XXX,XX @@ static uint64_t do_constant_folding_2(TCGOpcode op, uint64_t x, uint64_t y)
     }
 }
 
-static uint64_t do_constant_folding(TCGOpcode op, uint64_t x, uint64_t y)
+static uint64_t do_constant_folding(TCGOpcode op, TCGType type,
+                                    uint64_t x, uint64_t y)
 {
-    const TCGOpDef *def = &tcg_op_defs[op];
     uint64_t res = do_constant_folding_2(op, x, y);
-    if (!(def->flags & TCG_OPF_64BIT)) {
+    if (type == TCG_TYPE_I32) {
         res = (int32_t)res;
     }
     return res;
@@ -XXX,XX +XXX,XX @@ static bool do_constant_folding_cond_eq(TCGCond c)
  * Return -1 if the condition can't be simplified,
  * and the result of the condition (0 or 1) if it can.
  */
-static int do_constant_folding_cond(TCGOpcode op, TCGArg x,
+static int do_constant_folding_cond(TCGType type, TCGArg x,
                                     TCGArg y, TCGCond c)
 {
     uint64_t xv = arg_info(x)->val;
     uint64_t yv = arg_info(y)->val;
 
     if (arg_is_const(x) && arg_is_const(y)) {
-        const TCGOpDef *def = &tcg_op_defs[op];
-        tcg_debug_assert(!(def->flags & TCG_OPF_VECTOR));
-        if (def->flags & TCG_OPF_64BIT) {
-            return do_constant_folding_cond_64(xv, yv, c);
-        } else {
+        switch (type) {
+        case TCG_TYPE_I32:
             return do_constant_folding_cond_32(xv, yv, c);
+        case TCG_TYPE_I64:
+            return do_constant_folding_cond_64(xv, yv, c);
+        default:
+            /* Only scalar comparisons are optimizable */
+            return -1;
         }
     } else if (args_are_copies(x, y)) {
         return do_constant_folding_cond_eq(c);
@@ -XXX,XX +XXX,XX @@ static bool fold_const1(OptContext *ctx, TCGOp *op)
         uint64_t t;
 
         t = arg_info(op->args[1])->val;
-        t = do_constant_folding(op->opc, t, 0);
+        t = do_constant_folding(op->opc, ctx->type, t, 0);
         return tcg_opt_gen_movi(ctx, op, op->args[0], t);
     }
     return false;
@@ -XXX,XX +XXX,XX @@ static bool fold_const2(OptContext *ctx, TCGOp *op)
         uint64_t t1 = arg_info(op->args[1])->val;
         uint64_t t2 = arg_info(op->args[2])->val;
 
-        t1 = do_constant_folding(op->opc, t1, t2);
+        t1 = do_constant_folding(op->opc, ctx->type, t1, t2);
         return tcg_opt_gen_movi(ctx, op, op->args[0], t1);
     }
     return false;
@@ -XXX,XX +XXX,XX @@ static bool fold_andc(OptContext *ctx, TCGOp *op)
 static bool fold_brcond(OptContext *ctx, TCGOp *op)
 {
     TCGCond cond = op->args[2];
-    int i = do_constant_folding_cond(op->opc, op->args[0], op->args[1], cond);
+    int i = do_constant_folding_cond(ctx->type, op->args[0], op->args[1], cond);
 
     if (i == 0) {
         tcg_op_remove(ctx->tcg, op);
@@ -XXX,XX +XXX,XX @@ static bool fold_brcond2(OptContext *ctx, TCGOp *op)
          * Simplify EQ/NE comparisons where one of the pairs
          * can be simplified.
          */
-        i = do_constant_folding_cond(INDEX_op_brcond_i32, op->args[0],
+        i = do_constant_folding_cond(TCG_TYPE_I32, op->args[0],
                                      op->args[2], cond);
         switch (i ^ inv) {
         case 0:
@@ -XXX,XX +XXX,XX @@ static bool fold_brcond2(OptContext *ctx, TCGOp *op)
             goto do_brcond_high;
         }
 
-        i = do_constant_folding_cond(INDEX_op_brcond_i32, op->args[1],
+        i = do_constant_folding_cond(TCG_TYPE_I32, op->args[1],
                                      op->args[3], cond);
         switch (i ^ inv) {
         case 0:
@@ -XXX,XX +XXX,XX @@ static bool fold_bswap(OptContext *ctx, TCGOp *op)
     if (arg_is_const(op->args[1])) {
         uint64_t t = arg_info(op->args[1])->val;
 
-        t = do_constant_folding(op->opc, t, op->args[2]);
+        t = do_constant_folding(op->opc, ctx->type, t, op->args[2]);
         return tcg_opt_gen_movi(ctx, op, op->args[0], t);
     }
     return false;
@@ -XXX,XX +XXX,XX @@ static bool fold_count_zeros(OptContext *ctx, TCGOp *op)
         uint64_t t = arg_info(op->args[1])->val;
 
         if (t != 0) {
-            t = do_constant_folding(op->opc, t, 0);
+            t = do_constant_folding(op->opc, ctx->type, t, 0);
             return tcg_opt_gen_movi(ctx, op, op->args[0], t);
         }
         return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[2]);
@@ -XXX,XX +XXX,XX @@ static bool fold_mov(OptContext *ctx, TCGOp *op)
 
 static bool fold_movcond(OptContext *ctx, TCGOp *op)
 {
-    TCGOpcode opc = op->opc;
     TCGCond cond = op->args[5];
-    int i = do_constant_folding_cond(opc, op->args[1], op->args[2], cond);
+    int i = do_constant_folding_cond(ctx->type, op->args[1], op->args[2], cond);
 
     if (i >= 0) {
         return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[4 - i]);
@@ -XXX,XX +XXX,XX @@ static bool fold_movcond(OptContext *ctx, TCGOp *op)
     if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
         uint64_t tv = arg_info(op->args[3])->val;
         uint64_t fv = arg_info(op->args[4])->val;
+        TCGOpcode opc;
 
-        opc = (opc == INDEX_op_movcond_i32
-               ? INDEX_op_setcond_i32 : INDEX_op_setcond_i64);
+        switch (ctx->type) {
+        case TCG_TYPE_I32:
+            opc = INDEX_op_setcond_i32;
+            break;
+        case TCG_TYPE_I64:
+            opc = INDEX_op_setcond_i64;
+            break;
+        default:
+            g_assert_not_reached();
+        }
 
         if (tv == 1 && fv == 0) {
             op->opc = opc;
@@ -XXX,XX +XXX,XX @@ static bool fold_remainder(OptContext *ctx, TCGOp *op)
 static bool fold_setcond(OptContext *ctx, TCGOp *op)
 {
     TCGCond cond = op->args[3];
-    int i = do_constant_folding_cond(op->opc, op->args[1], op->args[2], cond);
+    int i = do_constant_folding_cond(ctx->type, op->args[1], op->args[2], cond);
 
     if (i >= 0) {
         return tcg_opt_gen_movi(ctx, op, op->args[0], i);
@@ -XXX,XX +XXX,XX @@ static bool fold_setcond2(OptContext *ctx, TCGOp *op)
          * Simplify EQ/NE comparisons where one of the pairs
          * can be simplified.
          */
-        i = do_constant_folding_cond(INDEX_op_setcond_i32, op->args[1],
+        i = do_constant_folding_cond(TCG_TYPE_I32, op->args[1],
                                      op->args[3], cond);
         switch (i ^ inv) {
         case 0:
@@ -XXX,XX +XXX,XX @@ static bool fold_setcond2(OptContext *ctx, TCGOp *op)
             goto do_setcond_high;
         }
 
-        i = do_constant_folding_cond(INDEX_op_setcond_i32, op->args[2],
+        i = do_constant_folding_cond(TCG_TYPE_I32, op->args[2],
                                      op->args[4], cond);
         switch (i ^ inv) {
         case 0:
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         init_arguments(&ctx, op, def->nb_oargs + def->nb_iargs);
         copy_propagate(&ctx, op, def->nb_oargs, def->nb_iargs);
 
+        /* Pre-compute the type of the operation. */
+        if (def->flags & TCG_OPF_VECTOR) {
+            ctx.type = TCG_TYPE_V64 + TCGOP_VECL(op);
+        } else if (def->flags & TCG_OPF_64BIT) {
+            ctx.type = TCG_TYPE_I64;
+        } else {
+            ctx.type = TCG_TYPE_I32;
+        }
+
         /* For commutative operations make constant second argument */
         switch (opc) {
         CASE_OP_32_64_VEC(add):
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                     /* Proceed with possible constant folding. */
                     break;
                 }
-                if (opc == INDEX_op_sub_i32) {
+                switch (ctx.type) {
+                case TCG_TYPE_I32:
                     neg_op = INDEX_op_neg_i32;
                     have_neg = TCG_TARGET_HAS_neg_i32;
-                } else if (opc == INDEX_op_sub_i64) {
+                    break;
+                case TCG_TYPE_I64:
                     neg_op = INDEX_op_neg_i64;
                     have_neg = TCG_TARGET_HAS_neg_i64;
-                } else if (TCG_TARGET_HAS_neg_vec) {
-                    TCGType type = TCGOP_VECL(op) + TCG_TYPE_V64;
-                    unsigned vece = TCGOP_VECE(op);
-                    neg_op = INDEX_op_neg_vec;
-                    have_neg = tcg_can_emit_vec_op(neg_op, type, vece) > 0;
-                } else {
                     break;
+                case TCG_TYPE_V64:
+                case TCG_TYPE_V128:
+                case TCG_TYPE_V256:
+                    neg_op = INDEX_op_neg_vec;
+                    have_neg = tcg_can_emit_vec_op(neg_op, ctx.type,
+                                                   TCGOP_VECE(op)) > 0;
+                    break;
+                default:
+                    g_assert_not_reached();
                 }
                 if (!have_neg) {
                     break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 TCGOpcode not_op;
                 bool have_not;
 
-                if (def->flags & TCG_OPF_VECTOR) {
-                    not_op = INDEX_op_not_vec;
-                    have_not = TCG_TARGET_HAS_not_vec;
-                } else if (def->flags & TCG_OPF_64BIT) {
-                    not_op = INDEX_op_not_i64;
-                    have_not = TCG_TARGET_HAS_not_i64;
-                } else {
+                switch (ctx.type) {
+                case TCG_TYPE_I32:
                     not_op = INDEX_op_not_i32;
                     have_not = TCG_TARGET_HAS_not_i32;
+                    break;
+                case TCG_TYPE_I64:
+                    not_op = INDEX_op_not_i64;
+                    have_not = TCG_TARGET_HAS_not_i64;
+                    break;
+                case TCG_TYPE_V64:
+                case TCG_TYPE_V128:
+                case TCG_TYPE_V256:
+                    not_op = INDEX_op_not_vec;
+                    have_not = TCG_TARGET_HAS_not_vec;
+                    break;
+                default:
+                    g_assert_not_reached();
                 }
                 if (!have_not) {
                     break;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
            below, we can ignore high bits, but for further optimizations we
            need to record that the high bits contain garbage.  */
         partmask = z_mask;
-        if (!(def->flags & TCG_OPF_64BIT)) {
+        if (ctx.type == TCG_TYPE_I32) {
             z_mask |= ~(tcg_target_ulong)0xffffffffu;
             partmask &= 0xffffffffu;
             affected &= 0xffffffffu;
-- 
2.25.1

Split out the conditional conversion from a more complex logical
operation to a simple NOT.  Create a couple more helpers to make
this easy for the outer-most logical operations.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 158 +++++++++++++++++++++++++++----------------------
 1 file changed, 86 insertions(+), 72 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool fold_const2(OptContext *ctx, TCGOp *op)
     return false;
 }
 
+/*
+ * Convert @op to NOT, if NOT is supported by the host.
+ * Return true f the conversion is successful, which will still
+ * indicate that the processing is complete.
+ */
+static bool fold_not(OptContext *ctx, TCGOp *op);
+static bool fold_to_not(OptContext *ctx, TCGOp *op, int idx)
+{
+    TCGOpcode not_op;
+    bool have_not;
+
+    switch (ctx->type) {
+    case TCG_TYPE_I32:
+        not_op = INDEX_op_not_i32;
+        have_not = TCG_TARGET_HAS_not_i32;
+        break;
+    case TCG_TYPE_I64:
+        not_op = INDEX_op_not_i64;
+        have_not = TCG_TARGET_HAS_not_i64;
+        break;
+    case TCG_TYPE_V64:
+    case TCG_TYPE_V128:
+    case TCG_TYPE_V256:
+        not_op = INDEX_op_not_vec;
+        have_not = TCG_TARGET_HAS_not_vec;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    if (have_not) {
+        op->opc = not_op;
+        op->args[1] = op->args[idx];
+        return fold_not(ctx, op);
+    }
+    return false;
+}
+
+/* If the binary operation has first argument @i, fold to NOT. */
+static bool fold_ix_to_not(OptContext *ctx, TCGOp *op, uint64_t i)
+{
+    if (arg_is_const(op->args[1]) && arg_info(op->args[1])->val == i) {
+        return fold_to_not(ctx, op, 2);
+    }
+    return false;
+}
+
 /* If the binary operation has second argument @i, fold to @i. */
 static bool fold_xi_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
 {
@@ -XXX,XX +XXX,XX @@ static bool fold_xi_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
     return false;
 }
 
+/* If the binary operation has second argument @i, fold to NOT. */
+static bool fold_xi_to_not(OptContext *ctx, TCGOp *op, uint64_t i)
+{
+    if (arg_is_const(op->args[2]) && arg_info(op->args[2])->val == i) {
+        return fold_to_not(ctx, op, 1);
+    }
+    return false;
+}
+
 /* If the binary operation has both arguments equal, fold to @i. */
 static bool fold_xx_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
 {
@@ -XXX,XX +XXX,XX @@ static bool fold_and(OptContext *ctx, TCGOp *op)
 static bool fold_andc(OptContext *ctx, TCGOp *op)
 {
     if (fold_const2(ctx, op) ||
-        fold_xx_to_i(ctx, op, 0)) {
+        fold_xx_to_i(ctx, op, 0) ||
+        fold_ix_to_not(ctx, op, -1)) {
         return true;
     }
     return false;
@@ -XXX,XX +XXX,XX @@ static bool fold_dup2(OptContext *ctx, TCGOp *op)
 
 static bool fold_eqv(OptContext *ctx, TCGOp *op)
 {
-    return fold_const2(ctx, op);
+    if (fold_const2(ctx, op) ||
+        fold_xi_to_not(ctx, op, 0)) {
+        return true;
+    }
+    return false;
 }
 
 static bool fold_extract(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_mulu2_i32(OptContext *ctx, TCGOp *op)
 
 static bool fold_nand(OptContext *ctx, TCGOp *op)
 {
-    return fold_const2(ctx, op);
+    if (fold_const2(ctx, op) ||
+        fold_xi_to_not(ctx, op, -1)) {
+        return true;
+    }
+    return false;
 }
 
 static bool fold_neg(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_neg(OptContext *ctx, TCGOp *op)
 
 static bool fold_nor(OptContext *ctx, TCGOp *op)
 {
-    return fold_const2(ctx, op);
+    if (fold_const2(ctx, op) ||
+        fold_xi_to_not(ctx, op, 0)) {
+        return true;
+    }
+    return false;
 }
 
 static bool fold_not(OptContext *ctx, TCGOp *op)
 {
-    return fold_const1(ctx, op);
+    if (fold_const1(ctx, op)) {
+        return true;
+    }
+
+    /* Because of fold_to_not, we want to always return true, via finish. */
+    finish_folding(ctx, op);
+    return true;
 }
 
 static bool fold_or(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_or(OptContext *ctx, TCGOp *op)
 
 static bool fold_orc(OptContext *ctx, TCGOp *op)
 {
-    return fold_const2(ctx, op);
+    if (fold_const2(ctx, op) ||
+        fold_ix_to_not(ctx, op, 0)) {
+        return true;
+    }
+    return false;
 }
 
 static bool fold_qemu_ld(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_sub2_i32(OptContext *ctx, TCGOp *op)
 static bool fold_xor(OptContext *ctx, TCGOp *op)
 {
     if (fold_const2(ctx, op) ||
-        fold_xx_to_i(ctx, op, 0)) {
+        fold_xx_to_i(ctx, op, 0) ||
+        fold_xi_to_not(ctx, op, -1)) {
         return true;
     }
     return false;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 }
             }
             break;
-        CASE_OP_32_64_VEC(xor):
-        CASE_OP_32_64(nand):
-            if (!arg_is_const(op->args[1])
-                && arg_is_const(op->args[2])
-                && arg_info(op->args[2])->val == -1) {
-                i = 1;
-                goto try_not;
-            }
-            break;
-        CASE_OP_32_64(nor):
-            if (!arg_is_const(op->args[1])
-                && arg_is_const(op->args[2])
-                && arg_info(op->args[2])->val == 0) {
-                i = 1;
-                goto try_not;
-            }
-            break;
-        CASE_OP_32_64_VEC(andc):
-            if (!arg_is_const(op->args[2])
-                && arg_is_const(op->args[1])
-                && arg_info(op->args[1])->val == -1) {
-                i = 2;
-                goto try_not;
-            }
-            break;
-        CASE_OP_32_64_VEC(orc):
-        CASE_OP_32_64(eqv):
-            if (!arg_is_const(op->args[2])
-                && arg_is_const(op->args[1])
-                && arg_info(op->args[1])->val == 0) {
-                i = 2;
-                goto try_not;
-            }
-            break;
-        try_not:
-            {
-                TCGOpcode not_op;
-                bool have_not;
-
-                switch (ctx.type) {
-                case TCG_TYPE_I32:
-                    not_op = INDEX_op_not_i32;
-                    have_not = TCG_TARGET_HAS_not_i32;
-                    break;
-                case TCG_TYPE_I64:
-                    not_op = INDEX_op_not_i64;
-                    have_not = TCG_TARGET_HAS_not_i64;
-                    break;
-                case TCG_TYPE_V64:
-                case TCG_TYPE_V128:
-                case TCG_TYPE_V256:
-                    not_op = INDEX_op_not_vec;
-                    have_not = TCG_TARGET_HAS_not_vec;
-                    break;
-                default:
-                    g_assert_not_reached();
-                }
-                if (!have_not) {
-                    break;
-                }
-                op->opc = not_op;
-                reset_temp(op->args[0]);
-                op->args[1] = op->args[i];
-                continue;
-            }
         default:
             break;
         }
-- 
2.25.1

Even though there is only one user, place this more complex
conversion into its own helper.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 89 ++++++++++++++++++++++++++------------------------
 1 file changed, 47 insertions(+), 42 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool fold_nand(OptContext *ctx, TCGOp *op)
 
 static bool fold_neg(OptContext *ctx, TCGOp *op)
 {
-    return fold_const1(ctx, op);
+    if (fold_const1(ctx, op)) {
+        return true;
+    }
+    /*
+     * Because of fold_sub_to_neg, we want to always return true,
+     * via finish_folding.
+     */
+    finish_folding(ctx, op);
+    return true;
 }
 
 static bool fold_nor(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_shift(OptContext *ctx, TCGOp *op)
     return fold_const2(ctx, op);
 }
 
+static bool fold_sub_to_neg(OptContext *ctx, TCGOp *op)
+{
+    TCGOpcode neg_op;
+    bool have_neg;
+
+    if (!arg_is_const(op->args[1]) || arg_info(op->args[1])->val != 0) {
+        return false;
+    }
+
+    switch (ctx->type) {
+    case TCG_TYPE_I32:
+        neg_op = INDEX_op_neg_i32;
+        have_neg = TCG_TARGET_HAS_neg_i32;
+        break;
+    case TCG_TYPE_I64:
+        neg_op = INDEX_op_neg_i64;
+        have_neg = TCG_TARGET_HAS_neg_i64;
+        break;
+    case TCG_TYPE_V64:
+    case TCG_TYPE_V128:
+    case TCG_TYPE_V256:
+        neg_op = INDEX_op_neg_vec;
+        have_neg = (TCG_TARGET_HAS_neg_vec &&
+                    tcg_can_emit_vec_op(neg_op, ctx->type, TCGOP_VECE(op)) > 0);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    if (have_neg) {
+        op->opc = neg_op;
+        op->args[1] = op->args[2];
+        return fold_neg(ctx, op);
+    }
+    return false;
+}
+
 static bool fold_sub(OptContext *ctx, TCGOp *op)
 {
     if (fold_const2(ctx, op) ||
-        fold_xx_to_i(ctx, op, 0)) {
+        fold_xx_to_i(ctx, op, 0) ||
+        fold_sub_to_neg(ctx, op)) {
         return true;
     }
     return false;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
                 continue;
             }
             break;
-        CASE_OP_32_64_VEC(sub):
-            {
-                TCGOpcode neg_op;
-                bool have_neg;
-
-                if (arg_is_const(op->args[2])) {
-                    /* Proceed with possible constant folding. */
-                    break;
-                }
-                switch (ctx.type) {
-                case TCG_TYPE_I32:
-                    neg_op = INDEX_op_neg_i32;
-                    have_neg = TCG_TARGET_HAS_neg_i32;
-                    break;
-                case TCG_TYPE_I64:
-                    neg_op = INDEX_op_neg_i64;
-                    have_neg = TCG_TARGET_HAS_neg_i64;
-                    break;
-                case TCG_TYPE_V64:
-                case TCG_TYPE_V128:
-                case TCG_TYPE_V256:
-                    neg_op = INDEX_op_neg_vec;
-                    have_neg = tcg_can_emit_vec_op(neg_op, ctx.type,
-                                                   TCGOP_VECE(op)) > 0;
-                    break;
-                default:
-                    g_assert_not_reached();
-                }
-                if (!have_neg) {
-                    break;
-                }
-                if (arg_is_const(op->args[1])
-                    && arg_info(op->args[1])->val == 0) {
-                    op->opc = neg_op;
-                    reset_temp(op->args[0]);
-                    op->args[1] = op->args[2];
-                    continue;
-                }
-            }
-            break;
         default:
             break;
         }
-- 
2.25.1

Pull the "op r, a, i => mov r, a" optimization into a function,
and use them in the outer-most logical operations.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 61 +++++++++++++++++++++-----------------------------
 1 file changed, 26 insertions(+), 35 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool fold_xi_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
     return false;
 }
 
+/* If the binary operation has second argument @i, fold to identity. */
+static bool fold_xi_to_x(OptContext *ctx, TCGOp *op, uint64_t i)
+{
+    if (arg_is_const(op->args[2]) && arg_info(op->args[2])->val == i) {
+        return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[1]);
+    }
+    return false;
+}
+
 /* If the binary operation has second argument @i, fold to NOT. */
 static bool fold_xi_to_not(OptContext *ctx, TCGOp *op, uint64_t i)
 {
@@ -XXX,XX +XXX,XX @@ static bool fold_xx_to_x(OptContext *ctx, TCGOp *op)
 
 static bool fold_add(OptContext *ctx, TCGOp *op)
 {
-    return fold_const2(ctx, op);
+    if (fold_const2(ctx, op) ||
+        fold_xi_to_x(ctx, op, 0)) {
+        return true;
+    }
+    return false;
 }
 
 static bool fold_addsub2_i32(OptContext *ctx, TCGOp *op, bool add)
@@ -XXX,XX +XXX,XX @@ static bool fold_and(OptContext *ctx, TCGOp *op)
 {
     if (fold_const2(ctx, op) ||
         fold_xi_to_i(ctx, op, 0) ||
+        fold_xi_to_x(ctx, op, -1) ||
         fold_xx_to_x(ctx, op)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ static bool fold_andc(OptContext *ctx, TCGOp *op)
 {
     if (fold_const2(ctx, op) ||
         fold_xx_to_i(ctx, op, 0) ||
+        fold_xi_to_x(ctx, op, 0) ||
         fold_ix_to_not(ctx, op, -1)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ static bool fold_dup2(OptContext *ctx, TCGOp *op)
 static bool fold_eqv(OptContext *ctx, TCGOp *op)
 {
     if (fold_const2(ctx, op) ||
+        fold_xi_to_x(ctx, op, -1) ||
         fold_xi_to_not(ctx, op, 0)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ static bool fold_not(OptContext *ctx, TCGOp *op)
 static bool fold_or(OptContext *ctx, TCGOp *op)
 {
     if (fold_const2(ctx, op) ||
+        fold_xi_to_x(ctx, op, 0) ||
         fold_xx_to_x(ctx, op)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ static bool fold_or(OptContext *ctx, TCGOp *op)
 static bool fold_orc(OptContext *ctx, TCGOp *op)
 {
     if (fold_const2(ctx, op) ||
+        fold_xi_to_x(ctx, op, -1) ||
         fold_ix_to_not(ctx, op, 0)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ static bool fold_sextract(OptContext *ctx, TCGOp *op)
 
 static bool fold_shift(OptContext *ctx, TCGOp *op)
 {
-    return fold_const2(ctx, op);
+    if (fold_const2(ctx, op) ||
+        fold_xi_to_x(ctx, op, 0)) {
+        return true;
+    }
+    return false;
 }
 
 static bool fold_sub_to_neg(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_sub(OptContext *ctx, TCGOp *op)
 {
     if (fold_const2(ctx, op) ||
         fold_xx_to_i(ctx, op, 0) ||
+        fold_xi_to_x(ctx, op, 0) ||
         fold_sub_to_neg(ctx, op)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ static bool fold_xor(OptContext *ctx, TCGOp *op)
 {
     if (fold_const2(ctx, op) ||
         fold_xx_to_i(ctx, op, 0) ||
+        fold_xi_to_x(ctx, op, 0) ||
         fold_xi_to_not(ctx, op, -1)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             break;
         }
 
-        /* Simplify expression for "op r, a, const => mov r, a" cases */
-        switch (opc) {
-        CASE_OP_32_64_VEC(add):
-        CASE_OP_32_64_VEC(sub):
-        CASE_OP_32_64_VEC(or):
-        CASE_OP_32_64_VEC(xor):
-        CASE_OP_32_64_VEC(andc):
-        CASE_OP_32_64(shl):
-        CASE_OP_32_64(shr):
-        CASE_OP_32_64(sar):
-        CASE_OP_32_64(rotl):
-        CASE_OP_32_64(rotr):
-            if (!arg_is_const(op->args[1])
-                && arg_is_const(op->args[2])
-                && arg_info(op->args[2])->val == 0) {
-                tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
-                continue;
-            }
-            break;
-        CASE_OP_32_64_VEC(and):
-        CASE_OP_32_64_VEC(orc):
-        CASE_OP_32_64(eqv):
-            if (!arg_is_const(op->args[1])
-                && arg_is_const(op->args[2])
-                && arg_info(op->args[2])->val == -1) {
-                tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
-                continue;
-            }
-            break;
-        default:
-            break;
-        }
-
         /* Simplify using known-zero bits. Currently only ops with a single
            output argument is supported. */
         z_mask = -1;
-- 
2.25.1

Pull the "op r, 0, b => movi r, 0" optimization into a function,
and use it in fold_shift.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 28 ++++++++++------------------
 1 file changed, 10 insertions(+), 18 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool fold_to_not(OptContext *ctx, TCGOp *op, int idx)
     return false;
 }
 
+/* If the binary operation has first argument @i, fold to @i. */
+static bool fold_ix_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
+{
+    if (arg_is_const(op->args[1]) && arg_info(op->args[1])->val == i) {
+        return tcg_opt_gen_movi(ctx, op, op->args[0], i);
+    }
+    return false;
+}
+
 /* If the binary operation has first argument @i, fold to NOT. */
 static bool fold_ix_to_not(OptContext *ctx, TCGOp *op, uint64_t i)
 {
@@ -XXX,XX +XXX,XX @@ static bool fold_sextract(OptContext *ctx, TCGOp *op)
 static bool fold_shift(OptContext *ctx, TCGOp *op)
 {
     if (fold_const2(ctx, op) ||
+        fold_ix_to_i(ctx, op, 0) ||
         fold_xi_to_x(ctx, op, 0)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             break;
         }
 
-        /* Simplify expressions for "shift/rot r, 0, a => movi r, 0",
-           and "sub r, 0, a => neg r, a" case.  */
-        switch (opc) {
-        CASE_OP_32_64(shl):
-        CASE_OP_32_64(shr):
-        CASE_OP_32_64(sar):
-        CASE_OP_32_64(rotl):
-        CASE_OP_32_64(rotr):
-            if (arg_is_const(op->args[1])
-                && arg_info(op->args[1])->val == 0) {
-                tcg_opt_gen_movi(&ctx, op, op->args[0], 0);
-                continue;
-            }
-            break;
-        default:
-            break;
-        }
-
         /* Simplify using known-zero bits. Currently only ops with a single
            output argument is supported. */
         z_mask = -1;
-- 
2.25.1

Move all of the known-zero optimizations into the per-opcode
functions.  Use fold_masks when there is a possibility of the
result being determined, and simply set ctx->z_mask otherwise.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 545 ++++++++++++++++++++++++++-----------------------
 1 file changed, 294 insertions(+), 251 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ typedef struct OptContext {
     TCGTempSet temps_used;
 
     /* In flight values from optimization. */
-    uint64_t z_mask;
+    uint64_t a_mask;  /* mask bit is 0 iff value identical to first input */
+    uint64_t z_mask;  /* mask bit is 0 iff value bit is 0 */
     TCGType type;
 } OptContext;
 
@@ -XXX,XX +XXX,XX @@ static bool fold_const2(OptContext *ctx, TCGOp *op)
     return false;
 }
 
+static bool fold_masks(OptContext *ctx, TCGOp *op)
+{
+    uint64_t a_mask = ctx->a_mask;
+    uint64_t z_mask = ctx->z_mask;
+
+    /*
+     * 32-bit ops generate 32-bit results.  For the result is zero test
+     * below, we can ignore high bits, but for further optimizations we
+     * need to record that the high bits contain garbage.
+     */
+    if (ctx->type == TCG_TYPE_I32) {
+        ctx->z_mask |= MAKE_64BIT_MASK(32, 32);
+        a_mask &= MAKE_64BIT_MASK(0, 32);
+        z_mask &= MAKE_64BIT_MASK(0, 32);
+    }
+
+    if (z_mask == 0) {
+        return tcg_opt_gen_movi(ctx, op, op->args[0], 0);
+    }
+    if (a_mask == 0) {
+        return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[1]);
+    }
+    return false;
+}
+
 /*
  * Convert @op to NOT, if NOT is supported by the host.
  * Return true f the conversion is successful, which will still
@@ -XXX,XX +XXX,XX @@ static bool fold_add2_i32(OptContext *ctx, TCGOp *op)
 
 static bool fold_and(OptContext *ctx, TCGOp *op)
 {
+    uint64_t z1, z2;
+
     if (fold_const2(ctx, op) ||
         fold_xi_to_i(ctx, op, 0) ||
         fold_xi_to_x(ctx, op, -1) ||
         fold_xx_to_x(ctx, op)) {
         return true;
     }
-    return false;
+
+    z1 = arg_info(op->args[1])->z_mask;
+    z2 = arg_info(op->args[2])->z_mask;
+    ctx->z_mask = z1 & z2;
+
+    /*
+     * Known-zeros does not imply known-ones.  Therefore unless
+     * arg2 is constant, we can't infer affected bits from it.
+     */
+    if (arg_is_const(op->args[2])) {
+        ctx->a_mask = z1 & ~z2;
+    }
+
+    return fold_masks(ctx, op);
 }
 
 static bool fold_andc(OptContext *ctx, TCGOp *op)
 {
+    uint64_t z1;
+
     if (fold_const2(ctx, op) ||
         fold_xx_to_i(ctx, op, 0) ||
         fold_xi_to_x(ctx, op, 0) ||
         fold_ix_to_not(ctx, op, -1)) {
         return true;
     }
-    return false;
+
+    z1 = arg_info(op->args[1])->z_mask;
+
+    /*
+     * Known-zeros does not imply known-ones.  Therefore unless
+     * arg2 is constant, we can't infer anything from it.
+     */
+    if (arg_is_const(op->args[2])) {
+        uint64_t z2 = ~arg_info(op->args[2])->z_mask;
+        ctx->a_mask = z1 & ~z2;
+        z1 &= z2;
+    }
+    ctx->z_mask = z1;
+
+    return fold_masks(ctx, op);
 }
 
 static bool fold_brcond(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_brcond2(OptContext *ctx, TCGOp *op)
 
 static bool fold_bswap(OptContext *ctx, TCGOp *op)
 {
+    uint64_t z_mask, sign;
+
     if (arg_is_const(op->args[1])) {
         uint64_t t = arg_info(op->args[1])->val;
 
         t = do_constant_folding(op->opc, ctx->type, t, op->args[2]);
         return tcg_opt_gen_movi(ctx, op, op->args[0], t);
     }
-    return false;
+
+    z_mask = arg_info(op->args[1])->z_mask;
+    switch (op->opc) {
+    case INDEX_op_bswap16_i32:
+    case INDEX_op_bswap16_i64:
+        z_mask = bswap16(z_mask);
+        sign = INT16_MIN;
+        break;
+    case INDEX_op_bswap32_i32:
+    case INDEX_op_bswap32_i64:
+        z_mask = bswap32(z_mask);
+        sign = INT32_MIN;
+        break;
+    case INDEX_op_bswap64_i64:
+        z_mask = bswap64(z_mask);
+        sign = INT64_MIN;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    switch (op->args[2] & (TCG_BSWAP_OZ | TCG_BSWAP_OS)) {
+    case TCG_BSWAP_OZ:
+        break;
+    case TCG_BSWAP_OS:
+        /* If the sign bit may be 1, force all the bits above to 1. */
+        if (z_mask & sign) {
+            z_mask |= sign;
+        }
+        break;
+    default:
+        /* The high bits are undefined: force all bits above the sign to 1. */
+        z_mask |= sign << 1;
+        break;
+    }
+    ctx->z_mask = z_mask;
+
+    return fold_masks(ctx, op);
 }
 
 static bool fold_call(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_call(OptContext *ctx, TCGOp *op)
 
 static bool fold_count_zeros(OptContext *ctx, TCGOp *op)
 {
+    uint64_t z_mask;
+
     if (arg_is_const(op->args[1])) {
         uint64_t t = arg_info(op->args[1])->val;
 
@@ -XXX,XX +XXX,XX @@ static bool fold_count_zeros(OptContext *ctx, TCGOp *op)
         }
         return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[2]);
     }
+
+    switch (ctx->type) {
+    case TCG_TYPE_I32:
+        z_mask = 31;
+        break;
+    case TCG_TYPE_I64:
+        z_mask = 63;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    ctx->z_mask = arg_info(op->args[2])->z_mask | z_mask;
+
     return false;
 }
 
 static bool fold_ctpop(OptContext *ctx, TCGOp *op)
 {
-    return fold_const1(ctx, op);
+    if (fold_const1(ctx, op)) {
+        return true;
+    }
+
+    switch (ctx->type) {
+    case TCG_TYPE_I32:
+        ctx->z_mask = 32 | 31;
+        break;
+    case TCG_TYPE_I64:
+        ctx->z_mask = 64 | 63;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    return false;
 }
 
 static bool fold_deposit(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
         t1 = deposit64(t1, op->args[3], op->args[4], t2);
         return tcg_opt_gen_movi(ctx, op, op->args[0], t1);
     }
+
+    ctx->z_mask = deposit64(arg_info(op->args[1])->z_mask,
+                            op->args[3], op->args[4],
+                            arg_info(op->args[2])->z_mask);
     return false;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool fold_eqv(OptContext *ctx, TCGOp *op)
 
 static bool fold_extract(OptContext *ctx, TCGOp *op)
 {
+    uint64_t z_mask_old, z_mask;
+
     if (arg_is_const(op->args[1])) {
         uint64_t t;
 
@@ -XXX,XX +XXX,XX @@ static bool fold_extract(OptContext *ctx, TCGOp *op)
         t = extract64(t, op->args[2], op->args[3]);
         return tcg_opt_gen_movi(ctx, op, op->args[0], t);
     }
-    return false;
+
+    z_mask_old = arg_info(op->args[1])->z_mask;
+    z_mask = extract64(z_mask_old, op->args[2], op->args[3]);
+    if (op->args[2] == 0) {
+        ctx->a_mask = z_mask_old ^ z_mask;
+    }
+    ctx->z_mask = z_mask;
+
+    return fold_masks(ctx, op);
 }
 
 static bool fold_extract2(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_extract2(OptContext *ctx, TCGOp *op)
 
 static bool fold_exts(OptContext *ctx, TCGOp *op)
 {
-    return fold_const1(ctx, op);
+    uint64_t z_mask_old, z_mask, sign;
+    bool type_change = false;
+
+    if (fold_const1(ctx, op)) {
+        return true;
+    }
+
+    z_mask_old = z_mask = arg_info(op->args[1])->z_mask;
+
+    switch (op->opc) {
+    CASE_OP_32_64(ext8s):
+        sign = INT8_MIN;
+        z_mask = (uint8_t)z_mask;
+        break;
+    CASE_OP_32_64(ext16s):
+        sign = INT16_MIN;
+        z_mask = (uint16_t)z_mask;
+        break;
+    case INDEX_op_ext_i32_i64:
+        type_change = true;
+        QEMU_FALLTHROUGH;
+    case INDEX_op_ext32s_i64:
+        sign = INT32_MIN;
+        z_mask = (uint32_t)z_mask;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    if (z_mask & sign) {
+        z_mask |= sign;
+    } else if (!type_change) {
+        ctx->a_mask = z_mask_old ^ z_mask;
+    }
+    ctx->z_mask = z_mask;
+
+    return fold_masks(ctx, op);
 }
 
 static bool fold_extu(OptContext *ctx, TCGOp *op)
 {
-    return fold_const1(ctx, op);
+    uint64_t z_mask_old, z_mask;
+    bool type_change = false;
+
+    if (fold_const1(ctx, op)) {
+        return true;
+    }
+
+    z_mask_old = z_mask = arg_info(op->args[1])->z_mask;
+
+    switch (op->opc) {
+    CASE_OP_32_64(ext8u):
+        z_mask = (uint8_t)z_mask;
+        break;
+    CASE_OP_32_64(ext16u):
+        z_mask = (uint16_t)z_mask;
+        break;
+    case INDEX_op_extrl_i64_i32:
+    case INDEX_op_extu_i32_i64:
+        type_change = true;
+        QEMU_FALLTHROUGH;
+    case INDEX_op_ext32u_i64:
+        z_mask = (uint32_t)z_mask;
+        break;
+    case INDEX_op_extrh_i64_i32:
+        type_change = true;
+        z_mask >>= 32;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    ctx->z_mask = z_mask;
+    if (!type_change) {
+        ctx->a_mask = z_mask_old ^ z_mask;
+    }
+    return fold_masks(ctx, op);
 }
 
 static bool fold_mb(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_movcond(OptContext *ctx, TCGOp *op)
         return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[4 - i]);
     }
 
+    ctx->z_mask = arg_info(op->args[3])->z_mask
+                | arg_info(op->args[4])->z_mask;
+
     if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
         uint64_t tv = arg_info(op->args[3])->val;
         uint64_t fv = arg_info(op->args[4])->val;
@@ -XXX,XX +XXX,XX @@ static bool fold_nand(OptContext *ctx, TCGOp *op)
 
 static bool fold_neg(OptContext *ctx, TCGOp *op)
 {
+    uint64_t z_mask;
+
     if (fold_const1(ctx, op)) {
         return true;
     }
+
+    /* Set to 1 all bits to the left of the rightmost.  */
+    z_mask = arg_info(op->args[1])->z_mask;
+    ctx->z_mask = -(z_mask & -z_mask);
+
     /*
      * Because of fold_sub_to_neg, we want to always return true,
      * via finish_folding.
@@ -XXX,XX +XXX,XX @@ static bool fold_or(OptContext *ctx, TCGOp *op)
         fold_xx_to_x(ctx, op)) {
         return true;
     }
-    return false;
+
+    ctx->z_mask = arg_info(op->args[1])->z_mask
+                | arg_info(op->args[2])->z_mask;
+    return fold_masks(ctx, op);
 }
 
 static bool fold_orc(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_orc(OptContext *ctx, TCGOp *op)
 
 static bool fold_qemu_ld(OptContext *ctx, TCGOp *op)
 {
+    const TCGOpDef *def = &tcg_op_defs[op->opc];
+    MemOpIdx oi = op->args[def->nb_oargs + def->nb_iargs];
+    MemOp mop = get_memop(oi);
+    int width = 8 * memop_size(mop);
+
+    if (!(mop & MO_SIGN) && width < 64) {
+        ctx->z_mask = MAKE_64BIT_MASK(0, width);
+    }
+
     /* Opcodes that touch guest memory stop the mb optimization.  */
     ctx->prev_mb = NULL;
     return false;
@@ -XXX,XX +XXX,XX @@ static bool fold_setcond(OptContext *ctx, TCGOp *op)
     if (i >= 0) {
         return tcg_opt_gen_movi(ctx, op, op->args[0], i);
     }
+
+    ctx->z_mask = 1;
     return false;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool fold_setcond2(OptContext *ctx, TCGOp *op)
         op->opc = INDEX_op_setcond_i32;
         break;
     }
+
+    ctx->z_mask = 1;
     return false;
 
  do_setcond_const:
@@ -XXX,XX +XXX,XX @@ static bool fold_setcond2(OptContext *ctx, TCGOp *op)
 
 static bool fold_sextract(OptContext *ctx, TCGOp *op)
 {
+    int64_t z_mask_old, z_mask;
+
     if (arg_is_const(op->args[1])) {
         uint64_t t;
 
@@ -XXX,XX +XXX,XX @@ static bool fold_sextract(OptContext *ctx, TCGOp *op)
         t = sextract64(t, op->args[2], op->args[3]);
         return tcg_opt_gen_movi(ctx, op, op->args[0], t);
     }
-    return false;
+
+    z_mask_old = arg_info(op->args[1])->z_mask;
+    z_mask = sextract64(z_mask_old, op->args[2], op->args[3]);
+    if (op->args[2] == 0 && z_mask >= 0) {
+        ctx->a_mask = z_mask_old ^ z_mask;
+    }
+    ctx->z_mask = z_mask;
+
+    return fold_masks(ctx, op);
 }
 
 static bool fold_shift(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_shift(OptContext *ctx, TCGOp *op)
         fold_xi_to_x(ctx, op, 0)) {
         return true;
     }
+
+    if (arg_is_const(op->args[2])) {
+        ctx->z_mask = do_constant_folding(op->opc, ctx->type,
+                                          arg_info(op->args[1])->z_mask,
+                                          arg_info(op->args[2])->val);
+        return fold_masks(ctx, op);
+    }
     return false;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool fold_sub2_i32(OptContext *ctx, TCGOp *op)
     return fold_addsub2_i32(ctx, op, false);
 }
 
+static bool fold_tcg_ld(OptContext *ctx, TCGOp *op)
+{
+    /* We can't do any folding with a load, but we can record bits. */
+    switch (op->opc) {
+    CASE_OP_32_64(ld8u):
+        ctx->z_mask = MAKE_64BIT_MASK(0, 8);
+        break;
+    CASE_OP_32_64(ld16u):
+        ctx->z_mask = MAKE_64BIT_MASK(0, 16);
+        break;
+    case INDEX_op_ld32u_i64:
+        ctx->z_mask = MAKE_64BIT_MASK(0, 32);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    return false;
+}
+
 static bool fold_xor(OptContext *ctx, TCGOp *op)
 {
     if (fold_const2(ctx, op) ||
@@ -XXX,XX +XXX,XX @@ static bool fold_xor(OptContext *ctx, TCGOp *op)
         fold_xi_to_not(ctx, op, -1)) {
         return true;
     }
-    return false;
+
+    ctx->z_mask = arg_info(op->args[1])->z_mask
+                | arg_info(op->args[2])->z_mask;
+    return fold_masks(ctx, op);
 }
 
 /* Propagate constants and copies, fold constant expressions. */
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
     }
 
     QTAILQ_FOREACH_SAFE(op, &s->ops, link, op_next) {
-        uint64_t z_mask, partmask, affected, tmp;
         TCGOpcode opc = op->opc;
         const TCGOpDef *def;
         bool done = false;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             break;
         }
 
-        /* Simplify using known-zero bits. Currently only ops with a single
-           output argument is supported. */
-        z_mask = -1;
-        affected = -1;
-        switch (opc) {
-        CASE_OP_32_64(ext8s):
-            if ((arg_info(op->args[1])->z_mask & 0x80) != 0) {
-                break;
-            }
-            QEMU_FALLTHROUGH;
-        CASE_OP_32_64(ext8u):
-            z_mask = 0xff;
-            goto and_const;
-        CASE_OP_32_64(ext16s):
-            if ((arg_info(op->args[1])->z_mask & 0x8000) != 0) {
-                break;
-            }
-            QEMU_FALLTHROUGH;
-        CASE_OP_32_64(ext16u):
-            z_mask = 0xffff;
-            goto and_const;
-        case INDEX_op_ext32s_i64:
-            if ((arg_info(op->args[1])->z_mask & 0x80000000) != 0) {
-                break;
-            }
-            QEMU_FALLTHROUGH;
-        case INDEX_op_ext32u_i64:
-            z_mask = 0xffffffffU;
-            goto and_const;
-
-        CASE_OP_32_64(and):
-            z_mask = arg_info(op->args[2])->z_mask;
-            if (arg_is_const(op->args[2])) {
-        and_const:
-                affected = arg_info(op->args[1])->z_mask & ~z_mask;
-            }
-            z_mask = arg_info(op->args[1])->z_mask & z_mask;
-            break;
-
-        case INDEX_op_ext_i32_i64:
-            if ((arg_info(op->args[1])->z_mask & 0x80000000) != 0) {
-                break;
-            }
-            QEMU_FALLTHROUGH;
-        case INDEX_op_extu_i32_i64:
-            /* We do not compute affected as it is a size changing op.  */
-            z_mask = (uint32_t)arg_info(op->args[1])->z_mask;
-            break;
-
-        CASE_OP_32_64(andc):
-            /* Known-zeros does not imply known-ones.  Therefore unless
-               op->args[2] is constant, we can't infer anything from it.  */
-            if (arg_is_const(op->args[2])) {
-                z_mask = ~arg_info(op->args[2])->z_mask;
-                goto and_const;
-            }
-            /* But we certainly know nothing outside args[1] may be set. */
-            z_mask = arg_info(op->args[1])->z_mask;
-            break;
-
-        case INDEX_op_sar_i32:
-            if (arg_is_const(op->args[2])) {
-                tmp = arg_info(op->args[2])->val & 31;
-                z_mask = (int32_t)arg_info(op->args[1])->z_mask >> tmp;
-            }
-            break;
-        case INDEX_op_sar_i64:
-            if (arg_is_const(op->args[2])) {
-                tmp = arg_info(op->args[2])->val & 63;
-                z_mask = (int64_t)arg_info(op->args[1])->z_mask >> tmp;
-            }
-            break;
-
-        case INDEX_op_shr_i32:
-            if (arg_is_const(op->args[2])) {
-                tmp = arg_info(op->args[2])->val & 31;
-                z_mask = (uint32_t)arg_info(op->args[1])->z_mask >> tmp;
-            }
-            break;
-        case INDEX_op_shr_i64:
-            if (arg_is_const(op->args[2])) {
-                tmp = arg_info(op->args[2])->val & 63;
-                z_mask = (uint64_t)arg_info(op->args[1])->z_mask >> tmp;
-            }
-            break;
-
-        case INDEX_op_extrl_i64_i32:
-            z_mask = (uint32_t)arg_info(op->args[1])->z_mask;
-            break;
-        case INDEX_op_extrh_i64_i32:
-            z_mask = (uint64_t)arg_info(op->args[1])->z_mask >> 32;
-            break;
-
-        CASE_OP_32_64(shl):
-            if (arg_is_const(op->args[2])) {
-                tmp = arg_info(op->args[2])->val & (TCG_TARGET_REG_BITS - 1);
-                z_mask = arg_info(op->args[1])->z_mask << tmp;
-            }
-            break;
-
-        CASE_OP_32_64(neg):
-            /* Set to 1 all bits to the left of the rightmost.  */
-            z_mask = -(arg_info(op->args[1])->z_mask
-                       & -arg_info(op->args[1])->z_mask);
-            break;
-
-        CASE_OP_32_64(deposit):
-            z_mask = deposit64(arg_info(op->args[1])->z_mask,
-                               op->args[3], op->args[4],
-                               arg_info(op->args[2])->z_mask);
-            break;
-
-        CASE_OP_32_64(extract):
-            z_mask = extract64(arg_info(op->args[1])->z_mask,
-                               op->args[2], op->args[3]);
-            if (op->args[2] == 0) {
-                affected = arg_info(op->args[1])->z_mask & ~z_mask;
-            }
-            break;
-        CASE_OP_32_64(sextract):
-            z_mask = sextract64(arg_info(op->args[1])->z_mask,
-                                op->args[2], op->args[3]);
-            if (op->args[2] == 0 && (tcg_target_long)z_mask >= 0) {
-                affected = arg_info(op->args[1])->z_mask & ~z_mask;
-            }
-            break;
-
-        CASE_OP_32_64(or):
-        CASE_OP_32_64(xor):
-            z_mask = arg_info(op->args[1])->z_mask
-                   | arg_info(op->args[2])->z_mask;
-            break;
-
-        case INDEX_op_clz_i32:
-        case INDEX_op_ctz_i32:
-            z_mask = arg_info(op->args[2])->z_mask | 31;
-            break;
-
-        case INDEX_op_clz_i64:
-        case INDEX_op_ctz_i64:
-            z_mask = arg_info(op->args[2])->z_mask | 63;
-            break;
-
-        case INDEX_op_ctpop_i32:
-            z_mask = 32 | 31;
-            break;
-        case INDEX_op_ctpop_i64:
-            z_mask = 64 | 63;
-            break;
-
-        CASE_OP_32_64(setcond):
-        case INDEX_op_setcond2_i32:
-            z_mask = 1;
-            break;
-
-        CASE_OP_32_64(movcond):
-            z_mask = arg_info(op->args[3])->z_mask
-                   | arg_info(op->args[4])->z_mask;
-            break;
-
-        CASE_OP_32_64(ld8u):
-            z_mask = 0xff;
-            break;
-        CASE_OP_32_64(ld16u):
-            z_mask = 0xffff;
-            break;
-        case INDEX_op_ld32u_i64:
-            z_mask = 0xffffffffu;
-            break;
-
-        CASE_OP_32_64(qemu_ld):
-            {
-                MemOpIdx oi = op->args[def->nb_oargs + def->nb_iargs];
-                MemOp mop = get_memop(oi);
-                if (!(mop & MO_SIGN)) {
-                    z_mask = (2ULL << ((8 << (mop & MO_SIZE)) - 1)) - 1;
-                }
-            }
-            break;
-
-        CASE_OP_32_64(bswap16):
-            z_mask = arg_info(op->args[1])->z_mask;
-            if (z_mask <= 0xffff) {
-                op->args[2] |= TCG_BSWAP_IZ;
-            }
-            z_mask = bswap16(z_mask);
-            switch (op->args[2] & (TCG_BSWAP_OZ | TCG_BSWAP_OS)) {
-            case TCG_BSWAP_OZ:
-                break;
-            case TCG_BSWAP_OS:
-                z_mask = (int16_t)z_mask;
-                break;
-            default: /* undefined high bits */
-                z_mask |= MAKE_64BIT_MASK(16, 48);
-                break;
-            }
-            break;
-
-        case INDEX_op_bswap32_i64:
-            z_mask = arg_info(op->args[1])->z_mask;
-            if (z_mask <= 0xffffffffu) {
-                op->args[2] |= TCG_BSWAP_IZ;
-            }
-            z_mask = bswap32(z_mask);
-            switch (op->args[2] & (TCG_BSWAP_OZ | TCG_BSWAP_OS)) {
-            case TCG_BSWAP_OZ:
-                break;
-            case TCG_BSWAP_OS:
-                z_mask = (int32_t)z_mask;
-                break;
-            default: /* undefined high bits */
-                z_mask |= MAKE_64BIT_MASK(32, 32);
-                break;
-            }
-            break;
-
-        default:
-            break;
-        }
-
-        /* 32-bit ops generate 32-bit results.  For the result is zero test
-           below, we can ignore high bits, but for further optimizations we
-           need to record that the high bits contain garbage.  */
-        partmask = z_mask;
-        if (ctx.type == TCG_TYPE_I32) {
-            z_mask |= ~(tcg_target_ulong)0xffffffffu;
-            partmask &= 0xffffffffu;
-            affected &= 0xffffffffu;
-        }
-        ctx.z_mask = z_mask;
-
-        if (partmask == 0) {
-            tcg_opt_gen_movi(&ctx, op, op->args[0], 0);
-            continue;
-        }
-        if (affected == 0) {
-            tcg_opt_gen_mov(&ctx, op, op->args[0], op->args[1]);
-            continue;
-        }
+        /* Assume all bits affected, and no bits known zero. */
+        ctx.a_mask = -1;
+        ctx.z_mask = -1;
 
         /*
          * Process each opcode.
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         case INDEX_op_extrh_i64_i32:
             done = fold_extu(&ctx, op);
             break;
+        CASE_OP_32_64(ld8u):
+        CASE_OP_32_64(ld16u):
+        case INDEX_op_ld32u_i64:
+            done = fold_tcg_ld(&ctx, op);
+            break;
         case INDEX_op_mb:
             done = fold_mb(&ctx, op);
             break;
-- 
2.25.1

Rename to fold_multiply2, and handle muls2_i32, mulu2_i64,
and muls2_i64.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 44 +++++++++++++++++++++++++++++++++++---------
 1 file changed, 35 insertions(+), 9 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool fold_mul_highpart(OptContext *ctx, TCGOp *op)
     return false;
 }
 
-static bool fold_mulu2_i32(OptContext *ctx, TCGOp *op)
+static bool fold_multiply2(OptContext *ctx, TCGOp *op)
 {
     if (arg_is_const(op->args[2]) && arg_is_const(op->args[3])) {
-        uint32_t a = arg_info(op->args[2])->val;
-        uint32_t b = arg_info(op->args[3])->val;
-        uint64_t r = (uint64_t)a * b;
+        uint64_t a = arg_info(op->args[2])->val;
+        uint64_t b = arg_info(op->args[3])->val;
+        uint64_t h, l;
         TCGArg rl, rh;
-        TCGOp *op2 = tcg_op_insert_before(ctx->tcg, op, INDEX_op_mov_i32);
+        TCGOp *op2;
+
+        switch (op->opc) {
+        case INDEX_op_mulu2_i32:
+            l = (uint64_t)(uint32_t)a * (uint32_t)b;
+            h = (int32_t)(l >> 32);
+            l = (int32_t)l;
+            break;
+        case INDEX_op_muls2_i32:
+            l = (int64_t)(int32_t)a * (int32_t)b;
+            h = l >> 32;
+            l = (int32_t)l;
+            break;
+        case INDEX_op_mulu2_i64:
+            mulu64(&l, &h, a, b);
+            break;
+        case INDEX_op_muls2_i64:
+            muls64(&l, &h, a, b);
+            break;
+        default:
+            g_assert_not_reached();
+        }
 
         rl = op->args[0];
         rh = op->args[1];
-        tcg_opt_gen_movi(ctx, op, rl, (int32_t)r);
-        tcg_opt_gen_movi(ctx, op2, rh, (int32_t)(r >> 32));
+
+        /* The proper opcode is supplied by tcg_opt_gen_mov. */
+        op2 = tcg_op_insert_before(ctx->tcg, op, 0);
+
+        tcg_opt_gen_movi(ctx, op, rl, l);
+        tcg_opt_gen_movi(ctx, op2, rh, h);
         return true;
     }
     return false;
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(muluh):
             done = fold_mul_highpart(&ctx, op);
             break;
-        case INDEX_op_mulu2_i32:
-            done = fold_mulu2_i32(&ctx, op);
+        CASE_OP_32_64(muls2):
+        CASE_OP_32_64(mulu2):
+            done = fold_multiply2(&ctx, op);
             break;
         CASE_OP_32_64(nand):
             done = fold_nand(&ctx, op);
-- 
2.25.1

Rename to fold_addsub2.
Use Int128 to implement the wider operation.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 65 ++++++++++++++++++++++++++++++++++----------------
 1 file changed, 44 insertions(+), 21 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/int128.h"
 #include "tcg/tcg-op.h"
 #include "tcg-internal.h"
 
@@ -XXX,XX +XXX,XX @@ static bool fold_add(OptContext *ctx, TCGOp *op)
     return false;
 }
 
-static bool fold_addsub2_i32(OptContext *ctx, TCGOp *op, bool add)
+static bool fold_addsub2(OptContext *ctx, TCGOp *op, bool add)
 {
     if (arg_is_const(op->args[2]) && arg_is_const(op->args[3]) &&
         arg_is_const(op->args[4]) && arg_is_const(op->args[5])) {
-        uint32_t al = arg_info(op->args[2])->val;
-        uint32_t ah = arg_info(op->args[3])->val;
-        uint32_t bl = arg_info(op->args[4])->val;
-        uint32_t bh = arg_info(op->args[5])->val;
-        uint64_t a = ((uint64_t)ah << 32) | al;
-        uint64_t b = ((uint64_t)bh << 32) | bl;
+        uint64_t al = arg_info(op->args[2])->val;
+        uint64_t ah = arg_info(op->args[3])->val;
+        uint64_t bl = arg_info(op->args[4])->val;
+        uint64_t bh = arg_info(op->args[5])->val;
         TCGArg rl, rh;
-        TCGOp *op2 = tcg_op_insert_before(ctx->tcg, op, INDEX_op_mov_i32);
+        TCGOp *op2;
 
-        if (add) {
-            a += b;
+        if (ctx->type == TCG_TYPE_I32) {
+            uint64_t a = deposit64(al, 32, 32, ah);
+            uint64_t b = deposit64(bl, 32, 32, bh);
+
+            if (add) {
+                a += b;
+            } else {
+                a -= b;
+            }
+
+            al = sextract64(a, 0, 32);
+            ah = sextract64(a, 32, 32);
         } else {
-            a -= b;
+            Int128 a = int128_make128(al, ah);
+            Int128 b = int128_make128(bl, bh);
+
+            if (add) {
+                a = int128_add(a, b);
+            } else {
+                a = int128_sub(a, b);
+            }
+
+            al = int128_getlo(a);
+            ah = int128_gethi(a);
         }
 
         rl = op->args[0];
         rh = op->args[1];
-        tcg_opt_gen_movi(ctx, op, rl, (int32_t)a);
-        tcg_opt_gen_movi(ctx, op2, rh, (int32_t)(a >> 32));
+
+        /* The proper opcode is supplied by tcg_opt_gen_mov. */
+        op2 = tcg_op_insert_before(ctx->tcg, op, 0);
+
+        tcg_opt_gen_movi(ctx, op, rl, al);
+        tcg_opt_gen_movi(ctx, op2, rh, ah);
         return true;
     }
     return false;
 }
 
-static bool fold_add2_i32(OptContext *ctx, TCGOp *op)
+static bool fold_add2(OptContext *ctx, TCGOp *op)
 {
-    return fold_addsub2_i32(ctx, op, true);
+    return fold_addsub2(ctx, op, true);
 }
 
 static bool fold_and(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static bool fold_sub(OptContext *ctx, TCGOp *op)
     return false;
 }
 
-static bool fold_sub2_i32(OptContext *ctx, TCGOp *op)
+static bool fold_sub2(OptContext *ctx, TCGOp *op)
 {
-    return fold_addsub2_i32(ctx, op, false);
+    return fold_addsub2(ctx, op, false);
 }
 
 static bool fold_tcg_ld(OptContext *ctx, TCGOp *op)
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64_VEC(add):
             done = fold_add(&ctx, op);
             break;
-        case INDEX_op_add2_i32:
-            done = fold_add2_i32(&ctx, op);
+        CASE_OP_32_64(add2):
+            done = fold_add2(&ctx, op);
             break;
         CASE_OP_32_64_VEC(and):
             done = fold_and(&ctx, op);
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64_VEC(sub):
             done = fold_sub(&ctx, op);
             break;
-        case INDEX_op_sub2_i32:
-            done = fold_sub2_i32(&ctx, op);
+        CASE_OP_32_64(sub2):
+            done = fold_sub2(&ctx, op);
             break;
         CASE_OP_32_64_VEC(xor):
             done = fold_xor(&ctx, op);
-- 
2.25.1

Most of these are handled by creating a fold_const2_commutative
to handle all of the binary operators.  The rest were already
handled on a case-by-case basis in the switch, and have their
own fold function in which to place the call.

We now have only one major switch on TCGOpcode.

Introduce NO_DEST and a block comment for swap_commutative in
order to make the handling of brcond and movcond opcodes cleaner.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 142 ++++++++++++++++++++++++-------------------------
 1 file changed, 70 insertions(+), 72 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static int do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
     return -1;
 }
 
+/**
+ * swap_commutative:
+ * @dest: TCGArg of the destination argument, or NO_DEST.
+ * @p1: first paired argument
+ * @p2: second paired argument
+ *
+ * If *@p1 is a constant and *@p2 is not, swap.
+ * If *@p2 matches @dest, swap.
+ * Return true if a swap was performed.
+ */
+
+#define NO_DEST  temp_arg(NULL)
+
 static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
 {
     TCGArg a1 = *p1, a2 = *p2;
@@ -XXX,XX +XXX,XX @@ static bool fold_const2(OptContext *ctx, TCGOp *op)
     return false;
 }
 
+static bool fold_const2_commutative(OptContext *ctx, TCGOp *op)
+{
+    swap_commutative(op->args[0], &op->args[1], &op->args[2]);
+    return fold_const2(ctx, op);
+}
+
 static bool fold_masks(OptContext *ctx, TCGOp *op)
 {
     uint64_t a_mask = ctx->a_mask;
@@ -XXX,XX +XXX,XX @@ static bool fold_xx_to_x(OptContext *ctx, TCGOp *op)
 
 static bool fold_add(OptContext *ctx, TCGOp *op)
 {
-    if (fold_const2(ctx, op) ||
+    if (fold_const2_commutative(ctx, op) ||
         fold_xi_to_x(ctx, op, 0)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ static bool fold_addsub2(OptContext *ctx, TCGOp *op, bool add)
 
 static bool fold_add2(OptContext *ctx, TCGOp *op)
 {
+    /* Note that the high and low parts may be independently swapped. */
+    swap_commutative(op->args[0], &op->args[2], &op->args[4]);
+    swap_commutative(op->args[1], &op->args[3], &op->args[5]);
+
     return fold_addsub2(ctx, op, true);
 }
 
@@ -XXX,XX +XXX,XX @@ static bool fold_and(OptContext *ctx, TCGOp *op)
 {
     uint64_t z1, z2;
 
-    if (fold_const2(ctx, op) ||
+    if (fold_const2_commutative(ctx, op) ||
         fold_xi_to_i(ctx, op, 0) ||
         fold_xi_to_x(ctx, op, -1) ||
         fold_xx_to_x(ctx, op)) {
@@ -XXX,XX +XXX,XX @@ static bool fold_andc(OptContext *ctx, TCGOp *op)
 static bool fold_brcond(OptContext *ctx, TCGOp *op)
 {
     TCGCond cond = op->args[2];
-    int i = do_constant_folding_cond(ctx->type, op->args[0], op->args[1], cond);
+    int i;
 
+    if (swap_commutative(NO_DEST, &op->args[0], &op->args[1])) {
+        op->args[2] = cond = tcg_swap_cond(cond);
+    }
+
+    i = do_constant_folding_cond(ctx->type, op->args[0], op->args[1], cond);
     if (i == 0) {
         tcg_op_remove(ctx->tcg, op);
         return true;
@@ -XXX,XX +XXX,XX @@ static bool fold_brcond(OptContext *ctx, TCGOp *op)
 static bool fold_brcond2(OptContext *ctx, TCGOp *op)
 {
     TCGCond cond = op->args[4];
-    int i = do_constant_folding_cond2(&op->args[0], &op->args[2], cond);
     TCGArg label = op->args[5];
-    int inv = 0;
+    int i, inv = 0;
 
+    if (swap_commutative2(&op->args[0], &op->args[2])) {
+        op->args[4] = cond = tcg_swap_cond(cond);
+    }
+
+    i = do_constant_folding_cond2(&op->args[0], &op->args[2], cond);
     if (i >= 0) {
         goto do_brcond_const;
     }
@@ -XXX,XX +XXX,XX @@ static bool fold_dup2(OptContext *ctx, TCGOp *op)
 
 static bool fold_eqv(OptContext *ctx, TCGOp *op)
 {
-    if (fold_const2(ctx, op) ||
+    if (fold_const2_commutative(ctx, op) ||
         fold_xi_to_x(ctx, op, -1) ||
         fold_xi_to_not(ctx, op, 0)) {
         return true;
@@ -XXX,XX +XXX,XX @@ static bool fold_mov(OptContext *ctx, TCGOp *op)
 static bool fold_movcond(OptContext *ctx, TCGOp *op)
 {
     TCGCond cond = op->args[5];
-    int i = do_constant_folding_cond(ctx->type, op->args[1], op->args[2], cond);
+    int i;
 
+    if (swap_commutative(NO_DEST, &op->args[1], &op->args[2])) {
+        op->args[5] = cond = tcg_swap_cond(cond);
+    }
+    /*
+     * Canonicalize the "false" input reg to match the destination reg so
+     * that the tcg backend can implement a "move if true" operation.
+     */
+    if (swap_commutative(op->args[0], &op->args[4], &op->args[3])) {
+        op->args[5] = cond = tcg_invert_cond(cond);
+    }
+
+    i = do_constant_folding_cond(ctx->type, op->args[1], op->args[2], cond);
     if (i >= 0) {
         return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[4 - i]);
     }
@@ -XXX,XX +XXX,XX @@ static bool fold_mul(OptContext *ctx, TCGOp *op)
 
 static bool fold_mul_highpart(OptContext *ctx, TCGOp *op)
 {
-    if (fold_const2(ctx, op) ||
+    if (fold_const2_commutative(ctx, op) ||
         fold_xi_to_i(ctx, op, 0)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ static bool fold_mul_highpart(OptContext *ctx, TCGOp *op)
 
 static bool fold_multiply2(OptContext *ctx, TCGOp *op)
 {
+    swap_commutative(op->args[0], &op->args[2], &op->args[3]);
+
     if (arg_is_const(op->args[2]) && arg_is_const(op->args[3])) {
         uint64_t a = arg_info(op->args[2])->val;
         uint64_t b = arg_info(op->args[3])->val;
@@ -XXX,XX +XXX,XX @@ static bool fold_multiply2(OptContext *ctx, TCGOp *op)
 
 static bool fold_nand(OptContext *ctx, TCGOp *op)
 {
-    if (fold_const2(ctx, op) ||
+    if (fold_const2_commutative(ctx, op) ||
         fold_xi_to_not(ctx, op, -1)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ static bool fold_neg(OptContext *ctx, TCGOp *op)
 
 static bool fold_nor(OptContext *ctx, TCGOp *op)
 {
-    if (fold_const2(ctx, op) ||
+    if (fold_const2_commutative(ctx, op) ||
         fold_xi_to_not(ctx, op, 0)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ static bool fold_not(OptContext *ctx, TCGOp *op)
 
 static bool fold_or(OptContext *ctx, TCGOp *op)
 {
-    if (fold_const2(ctx, op) ||
+    if (fold_const2_commutative(ctx, op) ||
         fold_xi_to_x(ctx, op, 0) ||
         fold_xx_to_x(ctx, op)) {
         return true;
@@ -XXX,XX +XXX,XX @@ static bool fold_remainder(OptContext *ctx, TCGOp *op)
 static bool fold_setcond(OptContext *ctx, TCGOp *op)
 {
     TCGCond cond = op->args[3];
-    int i = do_constant_folding_cond(ctx->type, op->args[1], op->args[2], cond);
+    int i;
 
+    if (swap_commutative(op->args[0], &op->args[1], &op->args[2])) {
+        op->args[3] = cond = tcg_swap_cond(cond);
+    }
+
+    i = do_constant_folding_cond(ctx->type, op->args[1], op->args[2], cond);
     if (i >= 0) {
         return tcg_opt_gen_movi(ctx, op, op->args[0], i);
     }
@@ -XXX,XX +XXX,XX @@ static bool fold_setcond(OptContext *ctx, TCGOp *op)
 static bool fold_setcond2(OptContext *ctx, TCGOp *op)
 {
     TCGCond cond = op->args[5];
-    int i = do_constant_folding_cond2(&op->args[1], &op->args[3], cond);
-    int inv = 0;
+    int i, inv = 0;
 
+    if (swap_commutative2(&op->args[1], &op->args[3])) {
+        op->args[5] = cond = tcg_swap_cond(cond);
+    }
+
+    i = do_constant_folding_cond2(&op->args[1], &op->args[3], cond);
     if (i >= 0) {
         goto do_setcond_const;
     }
@@ -XXX,XX +XXX,XX @@ static bool fold_tcg_ld(OptContext *ctx, TCGOp *op)
 
 static bool fold_xor(OptContext *ctx, TCGOp *op)
 {
-    if (fold_const2(ctx, op) ||
+    if (fold_const2_commutative(ctx, op) ||
         fold_xx_to_i(ctx, op, 0) ||
         fold_xi_to_x(ctx, op, 0) ||
         fold_xi_to_not(ctx, op, -1)) {
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             ctx.type = TCG_TYPE_I32;
         }
 
-        /* For commutative operations make constant second argument */
-        switch (opc) {
-        CASE_OP_32_64_VEC(add):
-        CASE_OP_32_64_VEC(mul):
-        CASE_OP_32_64_VEC(and):
-        CASE_OP_32_64_VEC(or):
-        CASE_OP_32_64_VEC(xor):
-        CASE_OP_32_64(eqv):
-        CASE_OP_32_64(nand):
-        CASE_OP_32_64(nor):
-        CASE_OP_32_64(muluh):
-        CASE_OP_32_64(mulsh):
-            swap_commutative(op->args[0], &op->args[1], &op->args[2]);
-            break;
-        CASE_OP_32_64(brcond):
-            if (swap_commutative(-1, &op->args[0], &op->args[1])) {
-                op->args[2] = tcg_swap_cond(op->args[2]);
-            }
-            break;
-        CASE_OP_32_64(setcond):
-            if (swap_commutative(op->args[0], &op->args[1], &op->args[2])) {
-                op->args[3] = tcg_swap_cond(op->args[3]);
-            }
-            break;
-        CASE_OP_32_64(movcond):
-            if (swap_commutative(-1, &op->args[1], &op->args[2])) {
-                op->args[5] = tcg_swap_cond(op->args[5]);
-            }
-            /* For movcond, we canonicalize the "false" input reg to match
-               the destination reg so that the tcg backend can implement
-               a "move if true" operation.  */
-            if (swap_commutative(op->args[0], &op->args[4], &op->args[3])) {
-                op->args[5] = tcg_invert_cond(op->args[5]);
-            }
-            break;
-        CASE_OP_32_64(add2):
-            swap_commutative(op->args[0], &op->args[2], &op->args[4]);
-            swap_commutative(op->args[1], &op->args[3], &op->args[5]);
-            break;
-        CASE_OP_32_64(mulu2):
-        CASE_OP_32_64(muls2):
-            swap_commutative(op->args[0], &op->args[2], &op->args[3]);
-            break;
-        case INDEX_op_brcond2_i32:
-            if (swap_commutative2(&op->args[0], &op->args[2])) {
-                op->args[4] = tcg_swap_cond(op->args[4]);
-            }
-            break;
-        case INDEX_op_setcond2_i32:
-            if (swap_commutative2(&op->args[1], &op->args[3])) {
-                op->args[5] = tcg_swap_cond(op->args[5]);
-            }
-            break;
-        default:
-            break;
-        }
-
         /* Assume all bits affected, and no bits known zero. */
         ctx.a_mask = -1;
         ctx.z_mask = -1;
-- 
2.25.1

This "garbage" setting pre-dates the addition of the type
changing opcodes INDEX_op_ext_i32_i64, INDEX_op_extu_i32_i64,
and INDEX_op_extr{l,h}_i64_i32.

So now we have a definitive points at which to adjust z_mask
to eliminate such bits from the 32-bit operands.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 35 ++++++++++++++++-------------------
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static void init_ts_info(OptContext *ctx, TCGTemp *ts)
         ti->is_const = true;
         ti->val = ts->val;
         ti->z_mask = ts->val;
-        if (TCG_TARGET_REG_BITS > 32 && ts->type == TCG_TYPE_I32) {
-            /* High bits of a 32-bit quantity are garbage.  */
-            ti->z_mask |= ~0xffffffffull;
-        }
     } else {
         ti->is_const = false;
         ti->z_mask = -1;
@@ -XXX,XX +XXX,XX @@ static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
     TCGTemp *src_ts = arg_temp(src);
     TempOptInfo *di;
     TempOptInfo *si;
-    uint64_t z_mask;
     TCGOpcode new_op;
 
     if (ts_are_copies(dst_ts, src_ts)) {
@@ -XXX,XX +XXX,XX @@ static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
     op->args[0] = dst;
     op->args[1] = src;
 
-    z_mask = si->z_mask;
-    if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_mov_i32) {
-        /* High bits of the destination are now garbage.  */
-        z_mask |= ~0xffffffffull;
-    }
-    di->z_mask = z_mask;
+    di->z_mask = si->z_mask;
 
     if (src_ts->type == dst_ts->type) {
         TempOptInfo *ni = ts_info(si->next_copy);
@@ -XXX,XX +XXX,XX @@ static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
 static bool tcg_opt_gen_movi(OptContext *ctx, TCGOp *op,
                              TCGArg dst, uint64_t val)
 {
-    /* Convert movi to mov with constant temp. */
-    TCGTemp *tv = tcg_constant_internal(ctx->type, val);
+    TCGTemp *tv;
 
+    if (ctx->type == TCG_TYPE_I32) {
+        val = (int32_t)val;
+    }
+
+    /* Convert movi to mov with constant temp. */
+    tv = tcg_constant_internal(ctx->type, val);
     init_ts_info(ctx, tv);
     return tcg_opt_gen_mov(ctx, op, dst, temp_arg(tv));
 }
@@ -XXX,XX +XXX,XX @@ static bool fold_masks(OptContext *ctx, TCGOp *op)
     uint64_t z_mask = ctx->z_mask;
 
     /*
-     * 32-bit ops generate 32-bit results.  For the result is zero test
-     * below, we can ignore high bits, but for further optimizations we
-     * need to record that the high bits contain garbage.
+     * 32-bit ops generate 32-bit results, which for the purpose of
+     * simplifying tcg are sign-extended.  Certainly that's how we
+     * represent our constants elsewhere.  Note that the bits will
+     * be reset properly for a 64-bit value when encountering the
+     * type changing opcodes.
      */
     if (ctx->type == TCG_TYPE_I32) {
-        ctx->z_mask |= MAKE_64BIT_MASK(32, 32);
-        a_mask &= MAKE_64BIT_MASK(0, 32);
-        z_mask &= MAKE_64BIT_MASK(0, 32);
+        a_mask = (int32_t)a_mask;
+        z_mask = (int32_t)z_mask;
+        ctx->z_mask = z_mask;
     }
 
     if (z_mask == 0) {
-- 
2.25.1

Certain targets, like riscv, produce signed 32-bit results.
This can lead to lots of redundant extensions as values are
manipulated.

Begin by tracking only the obvious sign-extensions, and
converting them to simple copies when possible.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 123 ++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 102 insertions(+), 21 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ typedef struct TempOptInfo {
     TCGTemp *next_copy;
     uint64_t val;
     uint64_t z_mask;  /* mask bit is 0 if and only if value bit is 0 */
+    uint64_t s_mask;  /* a left-aligned mask of clrsb(value) bits. */
 } TempOptInfo;
 
 typedef struct OptContext {
@@ -XXX,XX +XXX,XX @@ typedef struct OptContext {
     /* In flight values from optimization. */
     uint64_t a_mask;  /* mask bit is 0 iff value identical to first input */
     uint64_t z_mask;  /* mask bit is 0 iff value bit is 0 */
+    uint64_t s_mask;  /* mask of clrsb(value) bits */
     TCGType type;
 } OptContext;
 
+/* Calculate the smask for a specific value. */
+static uint64_t smask_from_value(uint64_t value)
+{
+    int rep = clrsb64(value);
+    return ~(~0ull >> rep);
+}
+
+/*
+ * Calculate the smask for a given set of known-zeros.
+ * If there are lots of zeros on the left, we can consider the remainder
+ * an unsigned field, and thus the corresponding signed field is one bit
+ * larger.
+ */
+static uint64_t smask_from_zmask(uint64_t zmask)
+{
+    /*
+     * Only the 0 bits are significant for zmask, thus the msb itself
+     * must be zero, else we have no sign information.
+     */
+    int rep = clz64(zmask);
+    if (rep == 0) {
+        return 0;
+    }
+    rep -= 1;
+    return ~(~0ull >> rep);
+}
+
 static inline TempOptInfo *ts_info(TCGTemp *ts)
 {
     return ts->state_ptr;
@@ -XXX,XX +XXX,XX @@ static void reset_ts(TCGTemp *ts)
     ti->prev_copy = ts;
     ti->is_const = false;
     ti->z_mask = -1;
+    ti->s_mask = 0;
 }
 
 static void reset_temp(TCGArg arg)
@@ -XXX,XX +XXX,XX @@ static void init_ts_info(OptContext *ctx, TCGTemp *ts)
         ti->is_const = true;
         ti->val = ts->val;
         ti->z_mask = ts->val;
+        ti->s_mask = smask_from_value(ts->val);
     } else {
         ti->is_const = false;
         ti->z_mask = -1;
+        ti->s_mask = 0;
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
     op->args[1] = src;
 
     di->z_mask = si->z_mask;
+    di->s_mask = si->s_mask;
 
     if (src_ts->type == dst_ts->type) {
         TempOptInfo *ni = ts_info(si->next_copy);
@@ -XXX,XX +XXX,XX @@ static void finish_folding(OptContext *ctx, TCGOp *op)
 
     nb_oargs = def->nb_oargs;
     for (i = 0; i < nb_oargs; i++) {
-        reset_temp(op->args[i]);
+        TCGTemp *ts = arg_temp(op->args[i]);
+        reset_ts(ts);
         /*
-         * Save the corresponding known-zero bits mask for the
+         * Save the corresponding known-zero/sign bits mask for the
          * first output argument (only one supported so far).
          */
         if (i == 0) {
-            arg_info(op->args[i])->z_mask = ctx->z_mask;
+            ts_info(ts)->z_mask = ctx->z_mask;
+            ts_info(ts)->s_mask = ctx->s_mask;
         }
     }
 }
@@ -XXX,XX +XXX,XX @@ static bool fold_masks(OptContext *ctx, TCGOp *op)
 {
     uint64_t a_mask = ctx->a_mask;
     uint64_t z_mask = ctx->z_mask;
+    uint64_t s_mask = ctx->s_mask;
 
     /*
      * 32-bit ops generate 32-bit results, which for the purpose of
@@ -XXX,XX +XXX,XX @@ static bool fold_masks(OptContext *ctx, TCGOp *op)
     if (ctx->type == TCG_TYPE_I32) {
         a_mask = (int32_t)a_mask;
         z_mask = (int32_t)z_mask;
+        s_mask |= MAKE_64BIT_MASK(32, 32);
         ctx->z_mask = z_mask;
+        ctx->s_mask = s_mask;
     }
 
     if (z_mask == 0) {
@@ -XXX,XX +XXX,XX @@ static bool fold_brcond2(OptContext *ctx, TCGOp *op)
 
 static bool fold_bswap(OptContext *ctx, TCGOp *op)
 {
-    uint64_t z_mask, sign;
+    uint64_t z_mask, s_mask, sign;
 
     if (arg_is_const(op->args[1])) {
         uint64_t t = arg_info(op->args[1])->val;
@@ -XXX,XX +XXX,XX @@ static bool fold_bswap(OptContext *ctx, TCGOp *op)
     }
 
     z_mask = arg_info(op->args[1])->z_mask;
+
     switch (op->opc) {
     case INDEX_op_bswap16_i32:
     case INDEX_op_bswap16_i64:
@@ -XXX,XX +XXX,XX @@ static bool fold_bswap(OptContext *ctx, TCGOp *op)
     default:
         g_assert_not_reached();
     }
+    s_mask = smask_from_zmask(z_mask);
 
     switch (op->args[2] & (TCG_BSWAP_OZ | TCG_BSWAP_OS)) {
     case TCG_BSWAP_OZ:
@@ -XXX,XX +XXX,XX @@ static bool fold_bswap(OptContext *ctx, TCGOp *op)
         /* If the sign bit may be 1, force all the bits above to 1. */
         if (z_mask & sign) {
             z_mask |= sign;
+            s_mask = sign << 1;
         }
         break;
     default:
         /* The high bits are undefined: force all bits above the sign to 1. */
         z_mask |= sign << 1;
+        s_mask = 0;
         break;
     }
     ctx->z_mask = z_mask;
+    ctx->s_mask = s_mask;
 
     return fold_masks(ctx, op);
 }
@@ -XXX,XX +XXX,XX @@ static bool fold_eqv(OptContext *ctx, TCGOp *op)
 static bool fold_extract(OptContext *ctx, TCGOp *op)
 {
     uint64_t z_mask_old, z_mask;
+    int pos = op->args[2];
+    int len = op->args[3];
 
     if (arg_is_const(op->args[1])) {
         uint64_t t;
 
         t = arg_info(op->args[1])->val;
-        t = extract64(t, op->args[2], op->args[3]);
+        t = extract64(t, pos, len);
         return tcg_opt_gen_movi(ctx, op, op->args[0], t);
     }
 
     z_mask_old = arg_info(op->args[1])->z_mask;
-    z_mask = extract64(z_mask_old, op->args[2], op->args[3]);
-    if (op->args[2] == 0) {
+    z_mask = extract64(z_mask_old, pos, len);
+    if (pos == 0) {
         ctx->a_mask = z_mask_old ^ z_mask;
     }
     ctx->z_mask = z_mask;
+    ctx->s_mask = smask_from_zmask(z_mask);
 
     return fold_masks(ctx, op);
 }
@@ -XXX,XX +XXX,XX @@ static bool fold_extract2(OptContext *ctx, TCGOp *op)
 
 static bool fold_exts(OptContext *ctx, TCGOp *op)
 {
-    uint64_t z_mask_old, z_mask, sign;
+    uint64_t s_mask_old, s_mask, z_mask, sign;
     bool type_change = false;
 
     if (fold_const1(ctx, op)) {
         return true;
     }
 
-    z_mask_old = z_mask = arg_info(op->args[1])->z_mask;
+    z_mask = arg_info(op->args[1])->z_mask;
+    s_mask = arg_info(op->args[1])->s_mask;
+    s_mask_old = s_mask;
 
     switch (op->opc) {
     CASE_OP_32_64(ext8s):
@@ -XXX,XX +XXX,XX @@ static bool fold_exts(OptContext *ctx, TCGOp *op)
 
     if (z_mask & sign) {
         z_mask |= sign;
-    } else if (!type_change) {
-        ctx->a_mask = z_mask_old ^ z_mask;
     }
+    s_mask |= sign << 1;
+
     ctx->z_mask = z_mask;
+    ctx->s_mask = s_mask;
+    if (!type_change) {
+        ctx->a_mask = s_mask & ~s_mask_old;
+    }
 
     return fold_masks(ctx, op);
 }
@@ -XXX,XX +XXX,XX @@ static bool fold_extu(OptContext *ctx, TCGOp *op)
     }
 
     ctx->z_mask = z_mask;
+    ctx->s_mask = smask_from_zmask(z_mask);
     if (!type_change) {
         ctx->a_mask = z_mask_old ^ z_mask;
     }
@@ -XXX,XX +XXX,XX @@ static bool fold_qemu_ld(OptContext *ctx, TCGOp *op)
     MemOp mop = get_memop(oi);
     int width = 8 * memop_size(mop);
 
-    if (!(mop & MO_SIGN) && width < 64) {
-        ctx->z_mask = MAKE_64BIT_MASK(0, width);
+    if (width < 64) {
+        ctx->s_mask = MAKE_64BIT_MASK(width, 64 - width);
+        if (!(mop & MO_SIGN)) {
+            ctx->z_mask = MAKE_64BIT_MASK(0, width);
+            ctx->s_mask <<= 1;
+        }
     }
 
     /* Opcodes that touch guest memory stop the mb optimization.  */
@@ -XXX,XX +XXX,XX @@ static bool fold_setcond2(OptContext *ctx, TCGOp *op)
 
 static bool fold_sextract(OptContext *ctx, TCGOp *op)
 {
-    int64_t z_mask_old, z_mask;
+    uint64_t z_mask, s_mask, s_mask_old;
+    int pos = op->args[2];
+    int len = op->args[3];
 
     if (arg_is_const(op->args[1])) {
         uint64_t t;
 
         t = arg_info(op->args[1])->val;
-        t = sextract64(t, op->args[2], op->args[3]);
+        t = sextract64(t, pos, len);
         return tcg_opt_gen_movi(ctx, op, op->args[0], t);
     }
 
-    z_mask_old = arg_info(op->args[1])->z_mask;
-    z_mask = sextract64(z_mask_old, op->args[2], op->args[3]);
-    if (op->args[2] == 0 && z_mask >= 0) {
-        ctx->a_mask = z_mask_old ^ z_mask;
-    }
+    z_mask = arg_info(op->args[1])->z_mask;
+    z_mask = sextract64(z_mask, pos, len);
     ctx->z_mask = z_mask;
 
+    s_mask_old = arg_info(op->args[1])->s_mask;
+    s_mask = sextract64(s_mask_old, pos, len);
+    s_mask |= MAKE_64BIT_MASK(len, 64 - len);
+    ctx->s_mask = s_mask;
+
+    if (pos == 0) {
+        ctx->a_mask = s_mask & ~s_mask_old;
+    }
+
     return fold_masks(ctx, op);
 }
 
@@ -XXX,XX +XXX,XX @@ static bool fold_tcg_ld(OptContext *ctx, TCGOp *op)
 {
     /* We can't do any folding with a load, but we can record bits. */
     switch (op->opc) {
+    CASE_OP_32_64(ld8s):
+        ctx->s_mask = MAKE_64BIT_MASK(8, 56);
+        break;
     CASE_OP_32_64(ld8u):
         ctx->z_mask = MAKE_64BIT_MASK(0, 8);
+        ctx->s_mask = MAKE_64BIT_MASK(9, 55);
+        break;
+    CASE_OP_32_64(ld16s):
+        ctx->s_mask = MAKE_64BIT_MASK(16, 48);
         break;
     CASE_OP_32_64(ld16u):
         ctx->z_mask = MAKE_64BIT_MASK(0, 16);
+        ctx->s_mask = MAKE_64BIT_MASK(17, 47);
+        break;
+    case INDEX_op_ld32s_i64:
+        ctx->s_mask = MAKE_64BIT_MASK(32, 32);
         break;
     case INDEX_op_ld32u_i64:
         ctx->z_mask = MAKE_64BIT_MASK(0, 32);
+        ctx->s_mask = MAKE_64BIT_MASK(33, 31);
         break;
     default:
         g_assert_not_reached();
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
             ctx.type = TCG_TYPE_I32;
         }
 
-        /* Assume all bits affected, and no bits known zero. */
+        /* Assume all bits affected, no bits known zero, no sign reps. */
         ctx.a_mask = -1;
         ctx.z_mask = -1;
+        ctx.s_mask = 0;
 
         /*
          * Process each opcode.
@@ -XXX,XX +XXX,XX @@ void tcg_optimize(TCGContext *s)
         case INDEX_op_extrh_i64_i32:
             done = fold_extu(&ctx, op);
             break;
+        CASE_OP_32_64(ld8s):
         CASE_OP_32_64(ld8u):
+        CASE_OP_32_64(ld16s):
         CASE_OP_32_64(ld16u):
+        case INDEX_op_ld32s_i64:
         case INDEX_op_ld32u_i64:
             done = fold_tcg_ld(&ctx, op);
             break;
-- 
2.25.1

Sign repetitions are perforce all identical, whether they are 1 or 0.
Bitwise operations preserve the relative quantity of the repetitions.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static bool fold_and(OptContext *ctx, TCGOp *op)
     z2 = arg_info(op->args[2])->z_mask;
     ctx->z_mask = z1 & z2;
 
+    /*
+     * Sign repetitions are perforce all identical, whether they are 1 or 0.
+     * Bitwise operations preserve the relative quantity of the repetitions.
+     */
+    ctx->s_mask = arg_info(op->args[1])->s_mask
+                & arg_info(op->args[2])->s_mask;
+
     /*
      * Known-zeros does not imply known-ones.  Therefore unless
      * arg2 is constant, we can't infer affected bits from it.
@@ -XXX,XX +XXX,XX @@ static bool fold_andc(OptContext *ctx, TCGOp *op)
     }
     ctx->z_mask = z1;
 
+    ctx->s_mask = arg_info(op->args[1])->s_mask
+                & arg_info(op->args[2])->s_mask;
     return fold_masks(ctx, op);
 }
 
@@ -XXX,XX +XXX,XX @@ static bool fold_eqv(OptContext *ctx, TCGOp *op)
         fold_xi_to_not(ctx, op, 0)) {
         return true;
     }
+
+    ctx->s_mask = arg_info(op->args[1])->s_mask
+                & arg_info(op->args[2])->s_mask;
     return false;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool fold_movcond(OptContext *ctx, TCGOp *op)
 
     ctx->z_mask = arg_info(op->args[3])->z_mask
                 | arg_info(op->args[4])->z_mask;
+    ctx->s_mask = arg_info(op->args[3])->s_mask
+                & arg_info(op->args[4])->s_mask;
 
     if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
         uint64_t tv = arg_info(op->args[3])->val;
@@ -XXX,XX +XXX,XX @@ static bool fold_nand(OptContext *ctx, TCGOp *op)
         fold_xi_to_not(ctx, op, -1)) {
         return true;
     }
+
+    ctx->s_mask = arg_info(op->args[1])->s_mask
+                & arg_info(op->args[2])->s_mask;
     return false;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool fold_nor(OptContext *ctx, TCGOp *op)
         fold_xi_to_not(ctx, op, 0)) {
         return true;
     }
+
+    ctx->s_mask = arg_info(op->args[1])->s_mask
+                & arg_info(op->args[2])->s_mask;
     return false;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool fold_not(OptContext *ctx, TCGOp *op)
         return true;
     }
 
+    ctx->s_mask = arg_info(op->args[1])->s_mask;
+
     /* Because of fold_to_not, we want to always return true, via finish. */
     finish_folding(ctx, op);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool fold_or(OptContext *ctx, TCGOp *op)
 
     ctx->z_mask = arg_info(op->args[1])->z_mask
                 | arg_info(op->args[2])->z_mask;
+    ctx->s_mask = arg_info(op->args[1])->s_mask
+                & arg_info(op->args[2])->s_mask;
     return fold_masks(ctx, op);
 }
 
@@ -XXX,XX +XXX,XX @@ static bool fold_orc(OptContext *ctx, TCGOp *op)
         fold_ix_to_not(ctx, op, 0)) {
         return true;
     }
+
+    ctx->s_mask = arg_info(op->args[1])->s_mask
+                & arg_info(op->args[2])->s_mask;
     return false;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool fold_xor(OptContext *ctx, TCGOp *op)
 
     ctx->z_mask = arg_info(op->args[1])->z_mask
                 | arg_info(op->args[2])->z_mask;
+    ctx->s_mask = arg_info(op->args[1])->s_mask
+                & arg_info(op->args[2])->s_mask;
     return fold_masks(ctx, op);
 }
 
-- 
2.25.1

For constant shifts, we can simply shift the s_mask.

For variable shifts, we know that sar does not reduce
the s_mask, which helps for sequences like

ext32s_i64  t, in
    sar_i64     t, t, v
    ext32s_i64  out, t

allowing the final extend to be eliminated.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 47 insertions(+), 3 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -XXX,XX +XXX,XX @@ static uint64_t smask_from_zmask(uint64_t zmask)
     return ~(~0ull >> rep);
 }
 
+/*
+ * Recreate a properly left-aligned smask after manipulation.
+ * Some bit-shuffling, particularly shifts and rotates, may
+ * retain sign bits on the left, but may scatter disconnected
+ * sign bits on the right.  Retain only what remains to the left.
+ */
+static uint64_t smask_from_smask(int64_t smask)
+{
+    /* Only the 1 bits are significant for smask */
+    return smask_from_zmask(~smask);
+}
+
 static inline TempOptInfo *ts_info(TCGTemp *ts)
 {
     return ts->state_ptr;
@@ -XXX,XX +XXX,XX @@ static bool fold_sextract(OptContext *ctx, TCGOp *op)
 
 static bool fold_shift(OptContext *ctx, TCGOp *op)
 {
+    uint64_t s_mask, z_mask, sign;
+
     if (fold_const2(ctx, op) ||
         fold_ix_to_i(ctx, op, 0) ||
         fold_xi_to_x(ctx, op, 0)) {
         return true;
     }
 
+    s_mask = arg_info(op->args[1])->s_mask;
+    z_mask = arg_info(op->args[1])->z_mask;
+
     if (arg_is_const(op->args[2])) {
-        ctx->z_mask = do_constant_folding(op->opc, ctx->type,
-                                          arg_info(op->args[1])->z_mask,
-                                          arg_info(op->args[2])->val);
+        int sh = arg_info(op->args[2])->val;
+
+        ctx->z_mask = do_constant_folding(op->opc, ctx->type, z_mask, sh);
+
+        s_mask = do_constant_folding(op->opc, ctx->type, s_mask, sh);
+        ctx->s_mask = smask_from_smask(s_mask);
+
         return fold_masks(ctx, op);
     }
+
+    switch (op->opc) {
+    CASE_OP_32_64(sar):
+        /*
+         * Arithmetic right shift will not reduce the number of
+         * input sign repetitions.
+         */
+        ctx->s_mask = s_mask;
+        break;
+    CASE_OP_32_64(shr):
+        /*
+         * If the sign bit is known zero, then logical right shift
+         * will not reduced the number of input sign repetitions.
+         */
+        sign = (s_mask & -s_mask) >> 1;
+        if (!(z_mask & sign)) {
+            ctx->s_mask = s_mask;
+        }
+        break;
+    default:
+        break;
+    }
+
     return false;
 }
 
-- 
2.25.1

The following changes since commit 9c6c079bc6723da8061ccfb44361d67b1dd785dd:

Merge tag 'pull-target-arm-20240430' of https://git.linaro.org/people/pmaydell/qemu-arm into staging (2024-04-30 09:58:54 -0700)

are available in the Git repository at:

https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20240501

for you to fetch changes up to 917d7f8d948d706e275c9f33169b9dd0149ded1e:

plugins: Update the documentation block for plugin-gen.c (2024-04-30 16:12:05 -0700)

----------------------------------------------------------------
plugins: Rewrite plugin tcg expansion

----------------------------------------------------------------
Richard Henderson (20):
      tcg: Make tcg/helper-info.h self-contained
      tcg: Pass function pointer to tcg_gen_call*
      plugins: Zero new qemu_plugin_dyn_cb entries
      plugins: Move function pointer in qemu_plugin_dyn_cb
      plugins: Create TCGHelperInfo for all out-of-line callbacks
      plugins: Use emit_before_op for PLUGIN_GEN_AFTER_INSN
      plugins: Use emit_before_op for PLUGIN_GEN_FROM_TB
      plugins: Add PLUGIN_GEN_AFTER_TB
      plugins: Use emit_before_op for PLUGIN_GEN_FROM_INSN
      plugins: Use emit_before_op for PLUGIN_GEN_FROM_MEM
      plugins: Remove plugin helpers
      tcg: Remove TCG_CALL_PLUGIN
      tcg: Remove INDEX_op_plugin_cb_{start,end}
      plugins: Simplify callback queues
      plugins: Introduce PLUGIN_CB_MEM_REGULAR
      plugins: Replace pr_ops with a proper debug dump flag
      plugins: Split out common cb expanders
      plugins: Merge qemu_plugin_tb_insn_get to plugin-gen.c
      plugins: Inline plugin_gen_empty_callback
      plugins: Update the documentation block for plugin-gen.c

Move MAX_CALL_IARGS from tcg.h and include for
the define of TCG_TARGET_REG_BITS.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/helper-info.h | 3 +++
 include/tcg/tcg.h         | 2 --
 tcg/tci.c                 | 1 +
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/tcg/helper-info.h b/include/tcg/helper-info.h
index XXXXXXX..XXXXXXX 100644
--- a/include/tcg/helper-info.h
+++ b/include/tcg/helper-info.h
@@ -XXX,XX +XXX,XX @@
 #ifdef CONFIG_TCG_INTERPRETER
 #include <ffi.h>
 #endif
+#include "tcg-target-reg-bits.h"
+
+#define MAX_CALL_IARGS  7
 
 /*
  * Describe the calling convention of a given argument type.
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index XXXXXXX..XXXXXXX 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -XXX,XX +XXX,XX @@
 /* XXX: make safe guess about sizes */
 #define MAX_OP_PER_INSTR 266
 
-#define MAX_CALL_IARGS  7
-
 #define CPU_TEMP_BUF_NLONGS 128
 #define TCG_STATIC_FRAME_SIZE  (CPU_TEMP_BUF_NLONGS * sizeof(long))
 
diff --git a/tcg/tci.c b/tcg/tci.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -XXX,XX +XXX,XX @@
 
 #include "qemu/osdep.h"
 #include "tcg/tcg.h"
+#include "tcg/helper-info.h"
 #include "tcg/tcg-ldst.h"
 #include <ffi.h>
 
-- 
2.34.1

For normal helpers, read the function pointer from the
structure earlier.  For plugins, this will allow the
function pointer to come from elsewhere.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg.h             | 21 +++++++++-------
 include/exec/helper-gen.h.inc | 24 ++++++++++++-------
 tcg/tcg.c                     | 45 +++++++++++++++++++----------------
 3 files changed, 52 insertions(+), 38 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index XXXXXXX..XXXXXXX 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -XXX,XX +XXX,XX @@ typedef struct TCGTargetOpDef {
 
 bool tcg_op_supported(TCGOpcode op);
 
-void tcg_gen_call0(TCGHelperInfo *, TCGTemp *ret);
-void tcg_gen_call1(TCGHelperInfo *, TCGTemp *ret, TCGTemp *);
-void tcg_gen_call2(TCGHelperInfo *, TCGTemp *ret, TCGTemp *, TCGTemp *);
-void tcg_gen_call3(TCGHelperInfo *, TCGTemp *ret, TCGTemp *,
+void tcg_gen_call0(void *func, TCGHelperInfo *, TCGTemp *ret);
+void tcg_gen_call1(void *func, TCGHelperInfo *, TCGTemp *ret, TCGTemp *);
+void tcg_gen_call2(void *func, TCGHelperInfo *, TCGTemp *ret,
                    TCGTemp *, TCGTemp *);
-void tcg_gen_call4(TCGHelperInfo *, TCGTemp *ret, TCGTemp *, TCGTemp *,
-                   TCGTemp *, TCGTemp *);
-void tcg_gen_call5(TCGHelperInfo *, TCGTemp *ret, TCGTemp *, TCGTemp *,
+void tcg_gen_call3(void *func, TCGHelperInfo *, TCGTemp *ret,
                    TCGTemp *, TCGTemp *, TCGTemp *);
-void tcg_gen_call6(TCGHelperInfo *, TCGTemp *ret, TCGTemp *, TCGTemp *,
+void tcg_gen_call4(void *func, TCGHelperInfo *, TCGTemp *ret,
                    TCGTemp *, TCGTemp *, TCGTemp *, TCGTemp *);
-void tcg_gen_call7(TCGHelperInfo *, TCGTemp *ret, TCGTemp *, TCGTemp *,
+void tcg_gen_call5(void *func, TCGHelperInfo *, TCGTemp *ret,
                    TCGTemp *, TCGTemp *, TCGTemp *, TCGTemp *, TCGTemp *);
+void tcg_gen_call6(void *func, TCGHelperInfo *, TCGTemp *ret,
+                   TCGTemp *, TCGTemp *, TCGTemp *, TCGTemp *,
+                   TCGTemp *, TCGTemp *);
+void tcg_gen_call7(void *func, TCGHelperInfo *, TCGTemp *ret,
+                   TCGTemp *, TCGTemp *, TCGTemp *, TCGTemp *,
+                   TCGTemp *, TCGTemp *, TCGTemp *);
 
 TCGOp *tcg_emit_op(TCGOpcode opc, unsigned nargs);
 void tcg_op_remove(TCGContext *s, TCGOp *op);
diff --git a/include/exec/helper-gen.h.inc b/include/exec/helper-gen.h.inc
index XXXXXXX..XXXXXXX 100644
--- a/include/exec/helper-gen.h.inc
+++ b/include/exec/helper-gen.h.inc
@@ -XXX,XX +XXX,XX @@
 extern TCGHelperInfo glue(helper_info_, name);                          \
 static inline void glue(gen_helper_, name)(dh_retvar_decl0(ret))        \
 {                                                                       \
-    tcg_gen_call0(&glue(helper_info_, name), dh_retvar(ret));           \
+    tcg_gen_call0(glue(helper_info_,name).func,                         \
+                  &glue(helper_info_,name), dh_retvar(ret));            \
 }
 
 #define DEF_HELPER_FLAGS_1(name, flags, ret, t1)                        \
@@ -XXX,XX +XXX,XX @@ extern TCGHelperInfo glue(helper_info_, name);                          \
 static inline void glue(gen_helper_, name)(dh_retvar_decl(ret)          \
     dh_arg_decl(t1, 1))                                                 \
 {                                                                       \
-    tcg_gen_call1(&glue(helper_info_, name), dh_retvar(ret),            \
+    tcg_gen_call1(glue(helper_info_,name).func,                         \
+                  &glue(helper_info_,name), dh_retvar(ret),             \
                   dh_arg(t1, 1));                                       \
 }
 
@@ -XXX,XX +XXX,XX @@ extern TCGHelperInfo glue(helper_info_, name);                          \
 static inline void glue(gen_helper_, name)(dh_retvar_decl(ret)          \
     dh_arg_decl(t1, 1), dh_arg_decl(t2, 2))                             \
 {                                                                       \
-    tcg_gen_call2(&glue(helper_info_, name), dh_retvar(ret),            \
+    tcg_gen_call2(glue(helper_info_,name).func,                         \
+                  &glue(helper_info_,name), dh_retvar(ret),             \
                   dh_arg(t1, 1), dh_arg(t2, 2));                        \
 }
 
@@ -XXX,XX +XXX,XX @@ extern TCGHelperInfo glue(helper_info_, name);                          \
 static inline void glue(gen_helper_, name)(dh_retvar_decl(ret)          \
     dh_arg_decl(t1, 1), dh_arg_decl(t2, 2), dh_arg_decl(t3, 3))         \
 {                                                                       \
-    tcg_gen_call3(&glue(helper_info_, name), dh_retvar(ret),            \
+    tcg_gen_call3(glue(helper_info_,name).func,                         \
+                  &glue(helper_info_,name), dh_retvar(ret),             \
                   dh_arg(t1, 1), dh_arg(t2, 2), dh_arg(t3, 3));         \
 }
 
@@ -XXX,XX +XXX,XX @@ static inline void glue(gen_helper_, name)(dh_retvar_decl(ret)          \
     dh_arg_decl(t1, 1), dh_arg_decl(t2, 2),                             \
     dh_arg_decl(t3, 3), dh_arg_decl(t4, 4))                             \
 {                                                                       \
-    tcg_gen_call4(&glue(helper_info_, name), dh_retvar(ret),            \
+    tcg_gen_call4(glue(helper_info_,name).func,                         \
+                  &glue(helper_info_,name), dh_retvar(ret),             \
                   dh_arg(t1, 1), dh_arg(t2, 2),                         \
                   dh_arg(t3, 3), dh_arg(t4, 4));                        \
 }
@@ -XXX,XX +XXX,XX @@ static inline void glue(gen_helper_, name)(dh_retvar_decl(ret)          \
     dh_arg_decl(t1, 1), dh_arg_decl(t2, 2), dh_arg_decl(t3, 3),         \
     dh_arg_decl(t4, 4), dh_arg_decl(t5, 5))                             \
 {                                                                       \
-    tcg_gen_call5(&glue(helper_info_, name), dh_retvar(ret),            \
+    tcg_gen_call5(glue(helper_info_,name).func,                         \
+                  &glue(helper_info_,name), dh_retvar(ret),             \
                   dh_arg(t1, 1), dh_arg(t2, 2), dh_arg(t3, 3),          \
                   dh_arg(t4, 4), dh_arg(t5, 5));                        \
 }
@@ -XXX,XX +XXX,XX @@ static inline void glue(gen_helper_, name)(dh_retvar_decl(ret)          \
     dh_arg_decl(t1, 1), dh_arg_decl(t2, 2), dh_arg_decl(t3, 3),         \
     dh_arg_decl(t4, 4), dh_arg_decl(t5, 5), dh_arg_decl(t6, 6))         \
 {                                                                       \
-    tcg_gen_call6(&glue(helper_info_, name), dh_retvar(ret),            \
+    tcg_gen_call6(glue(helper_info_,name).func,                         \
+                  &glue(helper_info_,name), dh_retvar(ret),             \
                   dh_arg(t1, 1), dh_arg(t2, 2), dh_arg(t3, 3),          \
                   dh_arg(t4, 4), dh_arg(t5, 5), dh_arg(t6, 6));         \
 }
@@ -XXX,XX +XXX,XX @@ static inline void glue(gen_helper_, name)(dh_retvar_decl(ret)          \
     dh_arg_decl(t4, 4), dh_arg_decl(t5, 5), dh_arg_decl(t6, 6),         \
     dh_arg_decl(t7, 7))                                                 \
 {                                                                       \
-    tcg_gen_call7(&glue(helper_info_, name), dh_retvar(ret),            \
+    tcg_gen_call7(glue(helper_info_,name).func,                         \
+                  &glue(helper_info_,name), dh_retvar(ret),             \
                   dh_arg(t1, 1), dh_arg(t2, 2), dh_arg(t3, 3),          \
                   dh_arg(t4, 4), dh_arg(t5, 5), dh_arg(t6, 6),          \
                   dh_arg(t7, 7));                                       \
diff --git a/tcg/tcg.c b/tcg/tcg.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -XXX,XX +XXX,XX @@ bool tcg_op_supported(TCGOpcode op)
 
 static TCGOp *tcg_op_alloc(TCGOpcode opc, unsigned nargs);
 
-static void tcg_gen_callN(TCGHelperInfo *info, TCGTemp *ret, TCGTemp **args)
+static void tcg_gen_callN(void *func, TCGHelperInfo *info,
+                          TCGTemp *ret, TCGTemp **args)
 {
     TCGv_i64 extend_free[MAX_CALL_IARGS];
     int n_extend = 0;
@@ -XXX,XX +XXX,XX @@ static void tcg_gen_callN(TCGHelperInfo *info, TCGTemp *ret, TCGTemp **args)
             g_assert_not_reached();
         }
     }
-    op->args[pi++] = (uintptr_t)info->func;
+    op->args[pi++] = (uintptr_t)func;
     op->args[pi++] = (uintptr_t)info;
     tcg_debug_assert(pi == total_args);
 
@@ -XXX,XX +XXX,XX @@ static void tcg_gen_callN(TCGHelperInfo *info, TCGTemp *ret, TCGTemp **args)
     }
 }
 
-void tcg_gen_call0(TCGHelperInfo *info, TCGTemp *ret)
+void tcg_gen_call0(void *func, TCGHelperInfo *info, TCGTemp *ret)
 {
-    tcg_gen_callN(info, ret, NULL);
+    tcg_gen_callN(func, info, ret, NULL);
 }
 
-void tcg_gen_call1(TCGHelperInfo *info, TCGTemp *ret, TCGTemp *t1)
+void tcg_gen_call1(void *func, TCGHelperInfo *info, TCGTemp *ret, TCGTemp *t1)
 {
-    tcg_gen_callN(info, ret, &t1);
+    tcg_gen_callN(func, info, ret, &t1);
 }
 
-void tcg_gen_call2(TCGHelperInfo *info, TCGTemp *ret, TCGTemp *t1, TCGTemp *t2)
+void tcg_gen_call2(void *func, TCGHelperInfo *info, TCGTemp *ret,
+                   TCGTemp *t1, TCGTemp *t2)
 {
     TCGTemp *args[2] = { t1, t2 };
-    tcg_gen_callN(info, ret, args);
+    tcg_gen_callN(func, info, ret, args);
 }
 
-void tcg_gen_call3(TCGHelperInfo *info, TCGTemp *ret, TCGTemp *t1,
-                   TCGTemp *t2, TCGTemp *t3)
+void tcg_gen_call3(void *func, TCGHelperInfo *info, TCGTemp *ret,
+                   TCGTemp *t1, TCGTemp *t2, TCGTemp *t3)
 {
     TCGTemp *args[3] = { t1, t2, t3 };
-    tcg_gen_callN(info, ret, args);
+    tcg_gen_callN(func, info, ret, args);
 }
 
-void tcg_gen_call4(TCGHelperInfo *info, TCGTemp *ret, TCGTemp *t1,
-                   TCGTemp *t2, TCGTemp *t3, TCGTemp *t4)
+void tcg_gen_call4(void *func, TCGHelperInfo *info, TCGTemp *ret,
+                   TCGTemp *t1, TCGTemp *t2, TCGTemp *t3, TCGTemp *t4)
 {
     TCGTemp *args[4] = { t1, t2, t3, t4 };
-    tcg_gen_callN(info, ret, args);
+    tcg_gen_callN(func, info, ret, args);
 }
 
-void tcg_gen_call5(TCGHelperInfo *info, TCGTemp *ret, TCGTemp *t1,
+void tcg_gen_call5(void *func, TCGHelperInfo *info, TCGTemp *ret, TCGTemp *t1,
                    TCGTemp *t2, TCGTemp *t3, TCGTemp *t4, TCGTemp *t5)
 {
     TCGTemp *args[5] = { t1, t2, t3, t4, t5 };
-    tcg_gen_callN(info, ret, args);
+    tcg_gen_callN(func, info, ret, args);
 }
 
-void tcg_gen_call6(TCGHelperInfo *info, TCGTemp *ret, TCGTemp *t1, TCGTemp *t2,
-                   TCGTemp *t3, TCGTemp *t4, TCGTemp *t5, TCGTemp *t6)
+void tcg_gen_call6(void *func, TCGHelperInfo *info, TCGTemp *ret,
+                   TCGTemp *t1, TCGTemp *t2, TCGTemp *t3,
+                   TCGTemp *t4, TCGTemp *t5, TCGTemp *t6)
 {
     TCGTemp *args[6] = { t1, t2, t3, t4, t5, t6 };
-    tcg_gen_callN(info, ret, args);
+    tcg_gen_callN(func, info, ret, args);
 }
 
-void tcg_gen_call7(TCGHelperInfo *info, TCGTemp *ret, TCGTemp *t1,
+void tcg_gen_call7(void *func, TCGHelperInfo *info, TCGTemp *ret, TCGTemp *t1,
                    TCGTemp *t2, TCGTemp *t3, TCGTemp *t4,
                    TCGTemp *t5, TCGTemp *t6, TCGTemp *t7)
 {
     TCGTemp *args[7] = { t1, t2, t3, t4, t5, t6, t7 };
-    tcg_gen_callN(info, ret, args);
+    tcg_gen_callN(func, info, ret, args);
 }
 
 static void tcg_reg_alloc_start(TCGContext *s)
-- 
2.34.1

The out-of-line function pointer is mutually exclusive
with inline expansion, so move it into the union.
Wrap the pointer in a structure named 'regular' to match
PLUGIN_CB_REGULAR.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/plugin.h  | 4 +++-
 accel/tcg/plugin-gen.c | 4 ++--
 plugins/core.c         | 8 ++++----
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/plugin.h
+++ b/include/qemu/plugin.h
@@ -XXX,XX +XXX,XX @@ enum plugin_dyn_cb_subtype {
  * instance of a callback to be called upon the execution of a particular TB.
  */
 struct qemu_plugin_dyn_cb {
-    union qemu_plugin_cb_sig f;
     void *userp;
     enum plugin_dyn_cb_subtype type;
     /* @rw applies to mem callbacks only (both regular and inline) */
     enum qemu_plugin_mem_rw rw;
     /* fields specific to each dyn_cb type go here */
     union {
+        struct {
+            union qemu_plugin_cb_sig f;
+        } regular;
         struct {
             qemu_plugin_u64 entry;
             enum qemu_plugin_op op;
diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -XXX,XX +XXX,XX @@ static TCGOp *append_udata_cb(const struct qemu_plugin_dyn_cb *cb,
     }
 
     /* call */
-    op = copy_call(&begin_op, op, cb->f.vcpu_udata, cb_idx);
+    op = copy_call(&begin_op, op, cb->regular.f.vcpu_udata, cb_idx);
 
     return op;
 }
@@ -XXX,XX +XXX,XX @@ static TCGOp *append_mem_cb(const struct qemu_plugin_dyn_cb *cb,
 
     if (type == PLUGIN_GEN_CB_MEM) {
         /* call */
-        op = copy_call(&begin_op, op, cb->f.vcpu_udata, cb_idx);
+        op = copy_call(&begin_op, op, cb->regular.f.vcpu_udata, cb_idx);
     }
 
     return op;
diff --git a/plugins/core.c b/plugins/core.c
index XXXXXXX..XXXXXXX 100644
--- a/plugins/core.c
+++ b/plugins/core.c
@@ -XXX,XX +XXX,XX @@ void plugin_register_dyn_cb__udata(GArray **arr,
 
     dyn_cb->userp = udata;
     /* Note flags are discarded as unused. */
-    dyn_cb->f.vcpu_udata = cb;
+    dyn_cb->regular.f.vcpu_udata = cb;
     dyn_cb->type = PLUGIN_CB_REGULAR;
 }
 
@@ -XXX,XX +XXX,XX @@ void plugin_register_vcpu_mem_cb(GArray **arr,
     /* Note flags are discarded as unused. */
     dyn_cb->type = PLUGIN_CB_REGULAR;
     dyn_cb->rw = rw;
-    dyn_cb->f.generic = cb;
+    dyn_cb->regular.f.vcpu_mem = cb;
 }
 
 /*
@@ -XXX,XX +XXX,XX @@ void qemu_plugin_vcpu_mem_cb(CPUState *cpu, uint64_t vaddr,
         }
         switch (cb->type) {
         case PLUGIN_CB_REGULAR:
-            cb->f.vcpu_mem(cpu->cpu_index, make_plugin_meminfo(oi, rw),
-                           vaddr, cb->userp);
+            cb->regular.f.vcpu_mem(cpu->cpu_index, make_plugin_meminfo(oi, rw),
+                                   vaddr, cb->userp);
             break;
         case PLUGIN_CB_INLINE:
             exec_inline_op(cb, cpu->cpu_index);
-- 
2.34.1

TCGHelperInfo includes the ABI for every function call.

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/plugin.h |  1 +
 plugins/core.c        | 51 ++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/plugin.h
+++ b/include/qemu/plugin.h
@@ -XXX,XX +XXX,XX @@ struct qemu_plugin_dyn_cb {
     union {
         struct {
             union qemu_plugin_cb_sig f;
+            TCGHelperInfo *info;
         } regular;
         struct {
             qemu_plugin_u64 entry;
diff --git a/plugins/core.c b/plugins/core.c
index XXXXXXX..XXXXXXX 100644
--- a/plugins/core.c
+++ b/plugins/core.c
@@ -XXX,XX +XXX,XX @@ void plugin_register_dyn_cb__udata(GArray **arr,
                                    enum qemu_plugin_cb_flags flags,
                                    void *udata)
 {
-    struct qemu_plugin_dyn_cb *dyn_cb = plugin_get_dyn_cb(arr);
+    static TCGHelperInfo info[3] = {
+        [QEMU_PLUGIN_CB_NO_REGS].flags = TCG_CALL_NO_RWG | TCG_CALL_PLUGIN,
+        [QEMU_PLUGIN_CB_R_REGS].flags = TCG_CALL_NO_WG | TCG_CALL_PLUGIN,
+        [QEMU_PLUGIN_CB_RW_REGS].flags = TCG_CALL_PLUGIN,
+        /*
+         * Match qemu_plugin_vcpu_udata_cb_t:
+         *   void (*)(uint32_t, void *)
+         */
+        [0 ... 2].typemask = (dh_typemask(void, 0) |
+                              dh_typemask(i32, 1) |
+                              dh_typemask(ptr, 2))
+    };
 
+    struct qemu_plugin_dyn_cb *dyn_cb = plugin_get_dyn_cb(arr);
     dyn_cb->userp = udata;
-    /* Note flags are discarded as unused. */
-    dyn_cb->regular.f.vcpu_udata = cb;
     dyn_cb->type = PLUGIN_CB_REGULAR;
+    dyn_cb->regular.f.vcpu_udata = cb;
+
+    assert((unsigned)flags < ARRAY_SIZE(info));
+    dyn_cb->regular.info = &info[flags];
 }
 
 void plugin_register_vcpu_mem_cb(GArray **arr,
@@ -XXX,XX +XXX,XX @@ void plugin_register_vcpu_mem_cb(GArray **arr,
                                  enum qemu_plugin_mem_rw rw,
                                  void *udata)
 {
-    struct qemu_plugin_dyn_cb *dyn_cb;
+    /*
+     * Expect that the underlying type for enum qemu_plugin_meminfo_t
+     * is either int32_t or uint32_t, aka int or unsigned int.
+     */
+    QEMU_BUILD_BUG_ON(
+        !__builtin_types_compatible_p(qemu_plugin_meminfo_t, uint32_t) &&
+        !__builtin_types_compatible_p(qemu_plugin_meminfo_t, int32_t));
 
-    dyn_cb = plugin_get_dyn_cb(arr);
+    static TCGHelperInfo info[3] = {
+        [QEMU_PLUGIN_CB_NO_REGS].flags = TCG_CALL_NO_RWG | TCG_CALL_PLUGIN,
+        [QEMU_PLUGIN_CB_R_REGS].flags = TCG_CALL_NO_WG | TCG_CALL_PLUGIN,
+        [QEMU_PLUGIN_CB_RW_REGS].flags = TCG_CALL_PLUGIN,
+        /*
+         * Match qemu_plugin_vcpu_mem_cb_t:
+         *   void (*)(uint32_t, qemu_plugin_meminfo_t, uint64_t, void *)
+         */
+        [0 ... 2].typemask =
+            (dh_typemask(void, 0) |
+             dh_typemask(i32, 1) |
+             (__builtin_types_compatible_p(qemu_plugin_meminfo_t, uint32_t)
+              ? dh_typemask(i32, 2) : dh_typemask(s32, 2)) |
+             dh_typemask(i64, 3) |
+             dh_typemask(ptr, 4))
+    };
+
+    struct qemu_plugin_dyn_cb *dyn_cb = plugin_get_dyn_cb(arr);
     dyn_cb->userp = udata;
-    /* Note flags are discarded as unused. */
     dyn_cb->type = PLUGIN_CB_REGULAR;
     dyn_cb->rw = rw;
     dyn_cb->regular.f.vcpu_mem = cb;
+
+    assert((unsigned)flags < ARRAY_SIZE(info));
+    dyn_cb->regular.info = &info[flags];
 }
 
 /*
-- 
2.34.1

Introduce a new plugin_cb op and migrate one operation.
By using emit_before_op, we do not need to emit opcodes
early and modify them later -- we can simply emit the
final set of opcodes once.

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg-op-common.h |  1 +
 include/tcg/tcg-opc.h       |  1 +
 accel/tcg/plugin-gen.c      | 74 +++++++++++++++++++++----------------
 tcg/tcg-op.c                |  5 +++
 4 files changed, 50 insertions(+), 31 deletions(-)

diff --git a/include/tcg/tcg-op-common.h b/include/tcg/tcg-op-common.h
index XXXXXXX..XXXXXXX 100644
--- a/include/tcg/tcg-op-common.h
+++ b/include/tcg/tcg-op-common.h
@@ -XXX,XX +XXX,XX @@ void tcg_gen_goto_tb(unsigned idx);
  */
 void tcg_gen_lookup_and_goto_ptr(void);
 
+void tcg_gen_plugin_cb(unsigned from);
 void tcg_gen_plugin_cb_start(unsigned from, unsigned type, unsigned wr);
 void tcg_gen_plugin_cb_end(void);
 
diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index XXXXXXX..XXXXXXX 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -XXX,XX +XXX,XX @@ DEF(exit_tb, 0, 0, 1, TCG_OPF_BB_EXIT | TCG_OPF_BB_END)
 DEF(goto_tb, 0, 0, 1, TCG_OPF_BB_EXIT | TCG_OPF_BB_END)
 DEF(goto_ptr, 0, 1, 0, TCG_OPF_BB_EXIT | TCG_OPF_BB_END)
 
+DEF(plugin_cb, 0, 0, 1, TCG_OPF_NOT_PRESENT)
 DEF(plugin_cb_start, 0, 0, 3, TCG_OPF_NOT_PRESENT)
 DEF(plugin_cb_end, 0, 0, 0, TCG_OPF_NOT_PRESENT)
 
diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_empty_callback(enum plugin_gen_from from)
 {
     switch (from) {
     case PLUGIN_GEN_AFTER_INSN:
-        gen_wrapped(from, PLUGIN_GEN_DISABLE_MEM_HELPER,
-                    gen_empty_mem_helper);
+        tcg_gen_plugin_cb(from);
         break;
     case PLUGIN_GEN_FROM_INSN:
         /*
@@ -XXX,XX +XXX,XX @@ static void inject_mem_enable_helper(struct qemu_plugin_tb *ptb,
     inject_mem_helper(begin_op, arr);
 }
 
-static void inject_mem_disable_helper(struct qemu_plugin_insn *plugin_insn,
-                                      TCGOp *begin_op)
-{
-    if (likely(!plugin_insn->mem_helper)) {
-        rm_ops(begin_op);
-        return;
-    }
-    inject_mem_helper(begin_op, NULL);
-}
-
 /* called before finishing a TB with exit_tb, goto_tb or goto_ptr */
 void plugin_gen_disable_mem_helpers(void)
 {
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_enable_mem_helper(struct qemu_plugin_tb *ptb,
     inject_mem_enable_helper(ptb, insn, begin_op);
 }
 
-static void plugin_gen_disable_mem_helper(struct qemu_plugin_tb *ptb,
-                                          TCGOp *begin_op, int insn_idx)
+static void gen_disable_mem_helper(struct qemu_plugin_tb *ptb,
+                                   struct qemu_plugin_insn *insn)
 {
-    struct qemu_plugin_insn *insn = g_ptr_array_index(ptb->insns, insn_idx);
-    inject_mem_disable_helper(insn, begin_op);
+    if (insn->mem_helper) {
+        tcg_gen_st_ptr(tcg_constant_ptr(0), tcg_env,
+                       offsetof(CPUState, plugin_mem_cbs) -
+                       offsetof(ArchCPU, env));
+    }
 }
 
 /* #define DEBUG_PLUGIN_GEN_OPS */
@@ -XXX,XX +XXX,XX @@ static void pr_ops(void)
 
 static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
 {
-    TCGOp *op;
+    TCGOp *op, *next;
     int insn_idx = -1;
 
     pr_ops();
 
-    QTAILQ_FOREACH(op, &tcg_ctx->ops, link) {
+    /*
+     * While injecting code, we cannot afford to reuse any ebb temps
+     * that might be live within the existing opcode stream.
+     * The simplest solution is to release them all and create new.
+     */
+    memset(tcg_ctx->free_temps, 0, sizeof(tcg_ctx->free_temps));
+
+    QTAILQ_FOREACH_SAFE(op, &tcg_ctx->ops, link, next) {
         switch (op->opc) {
         case INDEX_op_insn_start:
             insn_idx++;
             break;
+
+        case INDEX_op_plugin_cb:
+        {
+            enum plugin_gen_from from = op->args[0];
+            struct qemu_plugin_insn *insn = NULL;
+
+            if (insn_idx >= 0) {
+                insn = g_ptr_array_index(plugin_tb->insns, insn_idx);
+            }
+
+            tcg_ctx->emit_before_op = op;
+
+            switch (from) {
+            case PLUGIN_GEN_AFTER_INSN:
+                assert(insn != NULL);
+                gen_disable_mem_helper(plugin_tb, insn);
+                break;
+            default:
+                g_assert_not_reached();
+            }
+
+            tcg_ctx->emit_before_op = NULL;
+            tcg_op_remove(tcg_ctx, op);
+            break;
+        }
+
         case INDEX_op_plugin_cb_start:
         {
             enum plugin_gen_from from = op->args[0];
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
 
                 break;
             }
-            case PLUGIN_GEN_AFTER_INSN:
-            {
-                g_assert(insn_idx >= 0);
-
-                switch (type) {
-                case PLUGIN_GEN_DISABLE_MEM_HELPER:
-                    plugin_gen_disable_mem_helper(plugin_tb, op, insn_idx);
-                    break;
-                default:
-                    g_assert_not_reached();
-                }
-                break;
-            }
             default:
                 g_assert_not_reached();
             }
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -XXX,XX +XXX,XX @@ void tcg_gen_mb(TCGBar mb_type)
     }
 }
 
+void tcg_gen_plugin_cb(unsigned from)
+{
+    tcg_gen_op1(INDEX_op_plugin_cb, from);
+}
+
 void tcg_gen_plugin_cb_start(unsigned from, unsigned type, unsigned wr)
 {
     tcg_gen_op3(INDEX_op_plugin_cb_start, from, type, wr);
-- 
2.34.1

By having the qemu_plugin_cb_flags be recorded in the TCGHelperInfo,
we no longer need to distinguish PLUGIN_CB_REGULAR from
PLUGIN_CB_REGULAR_R, so place all TB callbacks in the same queue.

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/plugin-gen.c | 96 +++++++++++++++++++++++++-----------------
 plugins/api.c          |  6 +--
 2 files changed, 58 insertions(+), 44 deletions(-)

diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_empty_callback(enum plugin_gen_from from)
 {
     switch (from) {
     case PLUGIN_GEN_AFTER_INSN:
+    case PLUGIN_GEN_FROM_TB:
         tcg_gen_plugin_cb(from);
         break;
     case PLUGIN_GEN_FROM_INSN:
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_empty_callback(enum plugin_gen_from from)
          */
         gen_wrapped(from, PLUGIN_GEN_ENABLE_MEM_HELPER,
                     gen_empty_mem_helper);
-        /* fall through */
-    case PLUGIN_GEN_FROM_TB:
         gen_wrapped(from, PLUGIN_GEN_CB_UDATA, gen_empty_udata_cb_no_rwg);
         gen_wrapped(from, PLUGIN_GEN_CB_UDATA_R, gen_empty_udata_cb_no_wg);
         gen_wrapped(from, PLUGIN_GEN_CB_INLINE, gen_empty_inline_cb);
@@ -XXX,XX +XXX,XX @@ void plugin_gen_disable_mem_helpers(void)
                    offsetof(CPUState, plugin_mem_cbs) - offsetof(ArchCPU, env));
 }
 
-static void plugin_gen_tb_udata(const struct qemu_plugin_tb *ptb,
-                                TCGOp *begin_op)
-{
-    inject_udata_cb(ptb->cbs[PLUGIN_CB_REGULAR], begin_op);
-}
-
-static void plugin_gen_tb_udata_r(const struct qemu_plugin_tb *ptb,
-                                  TCGOp *begin_op)
-{
-    inject_udata_cb(ptb->cbs[PLUGIN_CB_REGULAR_R], begin_op);
-}
-
-static void plugin_gen_tb_inline(const struct qemu_plugin_tb *ptb,
-                                 TCGOp *begin_op)
-{
-    inject_inline_cb(ptb->cbs[PLUGIN_CB_INLINE], begin_op, op_ok);
-}
-
 static void plugin_gen_insn_udata(const struct qemu_plugin_tb *ptb,
                                   TCGOp *begin_op, int insn_idx)
 {
@@ -XXX,XX +XXX,XX @@ static void gen_disable_mem_helper(struct qemu_plugin_tb *ptb,
     }
 }
 
+static void gen_udata_cb(struct qemu_plugin_dyn_cb *cb)
+{
+    TCGv_i32 cpu_index = tcg_temp_ebb_new_i32();
+
+    tcg_gen_ld_i32(cpu_index, tcg_env,
+                   -offsetof(ArchCPU, env) + offsetof(CPUState, cpu_index));
+    tcg_gen_call2(cb->regular.f.vcpu_udata, cb->regular.info, NULL,
+                  tcgv_i32_temp(cpu_index),
+                  tcgv_ptr_temp(tcg_constant_ptr(cb->userp)));
+    tcg_temp_free_i32(cpu_index);
+}
+
+static void gen_inline_cb(struct qemu_plugin_dyn_cb *cb)
+{
+    GArray *arr = cb->inline_insn.entry.score->data;
+    size_t offset = cb->inline_insn.entry.offset;
+    TCGv_i32 cpu_index = tcg_temp_ebb_new_i32();
+    TCGv_i64 val = tcg_temp_ebb_new_i64();
+    TCGv_ptr ptr = tcg_temp_ebb_new_ptr();
+
+    tcg_gen_ld_i32(cpu_index, tcg_env,
+                   -offsetof(ArchCPU, env) + offsetof(CPUState, cpu_index));
+    tcg_gen_muli_i32(cpu_index, cpu_index, g_array_get_element_size(arr));
+    tcg_gen_ext_i32_ptr(ptr, cpu_index);
+    tcg_temp_free_i32(cpu_index);
+
+    tcg_gen_addi_ptr(ptr, ptr, (intptr_t)arr->data);
+    tcg_gen_ld_i64(val, ptr, offset);
+    tcg_gen_addi_i64(val, val, cb->inline_insn.imm);
+    tcg_gen_st_i64(val, ptr, offset);
+
+    tcg_temp_free_i64(val);
+    tcg_temp_free_ptr(ptr);
+}
+
 /* #define DEBUG_PLUGIN_GEN_OPS */
 static void pr_ops(void)
 {
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
         {
             enum plugin_gen_from from = op->args[0];
             struct qemu_plugin_insn *insn = NULL;
+            const GArray *cbs;
+            int i, n;
 
             if (insn_idx >= 0) {
                 insn = g_ptr_array_index(plugin_tb->insns, insn_idx);
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
                 assert(insn != NULL);
                 gen_disable_mem_helper(plugin_tb, insn);
                 break;
+
+            case PLUGIN_GEN_FROM_TB:
+                assert(insn == NULL);
+
+                cbs = plugin_tb->cbs[PLUGIN_CB_REGULAR];
+                for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
+                    struct qemu_plugin_dyn_cb *cb =
+                        &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
+                    gen_udata_cb(cb);
+                }
+
+                cbs = plugin_tb->cbs[PLUGIN_CB_INLINE];
+                for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
+                    struct qemu_plugin_dyn_cb *cb =
+                        &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
+                    gen_inline_cb(cb);
+                }
+                break;
+
             default:
                 g_assert_not_reached();
             }
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
             enum plugin_gen_cb type = op->args[1];
 
             switch (from) {
-            case PLUGIN_GEN_FROM_TB:
-            {
-                g_assert(insn_idx == -1);
-
-                switch (type) {
-                case PLUGIN_GEN_CB_UDATA:
-                    plugin_gen_tb_udata(plugin_tb, op);
-                    break;
-                case PLUGIN_GEN_CB_UDATA_R:
-                    plugin_gen_tb_udata_r(plugin_tb, op);
-                    break;
-                case PLUGIN_GEN_CB_INLINE:
-                    plugin_gen_tb_inline(plugin_tb, op);
-                    break;
-                default:
-                    g_assert_not_reached();
-                }
-                break;
-            }
             case PLUGIN_GEN_FROM_INSN:
             {
                 g_assert(insn_idx >= 0);
diff --git a/plugins/api.c b/plugins/api.c
index XXXXXXX..XXXXXXX 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -XXX,XX +XXX,XX @@ void qemu_plugin_register_vcpu_tb_exec_cb(struct qemu_plugin_tb *tb,
                                           void *udata)
 {
     if (!tb->mem_only) {
-        int index = flags == QEMU_PLUGIN_CB_R_REGS ||
-                    flags == QEMU_PLUGIN_CB_RW_REGS ?
-                    PLUGIN_CB_REGULAR_R : PLUGIN_CB_REGULAR;
-
-        plugin_register_dyn_cb__udata(&tb->cbs[index],
+        plugin_register_dyn_cb__udata(&tb->cbs[PLUGIN_CB_REGULAR],
                                       cb, flags, udata);
     }
 }
-- 
2.34.1

Delay test of plugin_tb->mem_helper until the inject pass.

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/plugin-gen.c | 37 ++++++++++++++++---------------------
 1 file changed, 16 insertions(+), 21 deletions(-)

diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -XXX,XX +XXX,XX @@ enum plugin_gen_from {
     PLUGIN_GEN_FROM_INSN,
     PLUGIN_GEN_FROM_MEM,
     PLUGIN_GEN_AFTER_INSN,
+    PLUGIN_GEN_AFTER_TB,
     PLUGIN_GEN_N_FROMS,
 };
 
@@ -XXX,XX +XXX,XX @@ static void inject_mem_enable_helper(struct qemu_plugin_tb *ptb,
 /* called before finishing a TB with exit_tb, goto_tb or goto_ptr */
 void plugin_gen_disable_mem_helpers(void)
 {
-    /*
-     * We could emit the clearing unconditionally and be done. However, this can
-     * be wasteful if for instance plugins don't track memory accesses, or if
-     * most TBs don't use helpers. Instead, emit the clearing iff the TB calls
-     * helpers that might access guest memory.
-     *
-     * Note: we do not reset plugin_tb->mem_helper here; a TB might have several
-     * exit points, and we want to emit the clearing from all of them.
-     */
-    if (!tcg_ctx->plugin_tb->mem_helper) {
-        return;
+    if (tcg_ctx->plugin_insn) {
+        tcg_gen_plugin_cb(PLUGIN_GEN_AFTER_TB);
     }
-    tcg_gen_st_ptr(tcg_constant_ptr(NULL), tcg_env,
-                   offsetof(CPUState, plugin_mem_cbs) - offsetof(ArchCPU, env));
 }
 
 static void plugin_gen_insn_udata(const struct qemu_plugin_tb *ptb,
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_enable_mem_helper(struct qemu_plugin_tb *ptb,
     inject_mem_enable_helper(ptb, insn, begin_op);
 }
 
-static void gen_disable_mem_helper(struct qemu_plugin_tb *ptb,
-                                   struct qemu_plugin_insn *insn)
+static void gen_disable_mem_helper(void)
 {
-    if (insn->mem_helper) {
-        tcg_gen_st_ptr(tcg_constant_ptr(0), tcg_env,
-                       offsetof(CPUState, plugin_mem_cbs) -
-                       offsetof(ArchCPU, env));
-    }
+    tcg_gen_st_ptr(tcg_constant_ptr(0), tcg_env,
+                   offsetof(CPUState, plugin_mem_cbs) -
+                   offsetof(ArchCPU, env));
 }
 
 static void gen_udata_cb(struct qemu_plugin_dyn_cb *cb)
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
             tcg_ctx->emit_before_op = op;
 
             switch (from) {
+            case PLUGIN_GEN_AFTER_TB:
+                if (plugin_tb->mem_helper) {
+                    gen_disable_mem_helper();
+                }
+                break;
+
             case PLUGIN_GEN_AFTER_INSN:
                 assert(insn != NULL);
-                gen_disable_mem_helper(plugin_tb, insn);
+                if (insn->mem_helper) {
+                    gen_disable_mem_helper();
+                }
                 break;
 
             case PLUGIN_GEN_FROM_TB:
-- 
2.34.1

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/plugin.h  |   1 -
 accel/tcg/plugin-gen.c | 286 ++++++++++-------------------------------
 plugins/api.c          |   8 +-
 3 files changed, 67 insertions(+), 228 deletions(-)

diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/plugin.h
+++ b/include/qemu/plugin.h
@@ -XXX,XX +XXX,XX @@ enum plugin_dyn_cb_type {
 
 enum plugin_dyn_cb_subtype {
     PLUGIN_CB_REGULAR,
-    PLUGIN_CB_REGULAR_R,
     PLUGIN_CB_INLINE,
     PLUGIN_N_CB_SUBTYPES,
 };
diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -XXX,XX +XXX,XX @@ void HELPER(plugin_vcpu_mem_cb)(unsigned int vcpu_index,
                                 void *userdata)
 { }
 
-static void gen_empty_udata_cb(void (*gen_helper)(TCGv_i32, TCGv_ptr))
-{
-    TCGv_i32 cpu_index = tcg_temp_ebb_new_i32();
-    TCGv_ptr udata = tcg_temp_ebb_new_ptr();
-
-    tcg_gen_movi_ptr(udata, 0);
-    tcg_gen_ld_i32(cpu_index, tcg_env,
-                   -offsetof(ArchCPU, env) + offsetof(CPUState, cpu_index));
-    gen_helper(cpu_index, udata);
-
-    tcg_temp_free_ptr(udata);
-    tcg_temp_free_i32(cpu_index);
-}
-
-static void gen_empty_udata_cb_no_wg(void)
-{
-    gen_empty_udata_cb(gen_helper_plugin_vcpu_udata_cb_no_wg);
-}
-
-static void gen_empty_udata_cb_no_rwg(void)
-{
-    gen_empty_udata_cb(gen_helper_plugin_vcpu_udata_cb_no_rwg);
-}
-
 /*
  * For now we only support addi_i64.
  * When we support more ops, we can generate one empty inline cb for each.
@@ -XXX,XX +XXX,XX @@ static void gen_empty_mem_cb(TCGv_i64 addr, uint32_t info)
     tcg_temp_free_i32(cpu_index);
 }
 
-/*
- * Share the same function for enable/disable. When enabling, the NULL
- * pointer will be overwritten later.
- */
-static void gen_empty_mem_helper(void)
-{
-    TCGv_ptr ptr = tcg_temp_ebb_new_ptr();
-
-    tcg_gen_movi_ptr(ptr, 0);
-    tcg_gen_st_ptr(ptr, tcg_env, offsetof(CPUState, plugin_mem_cbs) -
-                                 offsetof(ArchCPU, env));
-    tcg_temp_free_ptr(ptr);
-}
-
 static void gen_plugin_cb_start(enum plugin_gen_from from,
                                 enum plugin_gen_cb type, unsigned wr)
 {
     tcg_gen_plugin_cb_start(from, type, wr);
 }
 
-static void gen_wrapped(enum plugin_gen_from from,
-                        enum plugin_gen_cb type, void (*func)(void))
-{
-    gen_plugin_cb_start(from, type, 0);
-    func();
-    tcg_gen_plugin_cb_end();
-}
-
 static void plugin_gen_empty_callback(enum plugin_gen_from from)
 {
     switch (from) {
     case PLUGIN_GEN_AFTER_INSN:
     case PLUGIN_GEN_FROM_TB:
-        tcg_gen_plugin_cb(from);
-        break;
     case PLUGIN_GEN_FROM_INSN:
-        /*
-         * Note: plugin_gen_inject() relies on ENABLE_MEM_HELPER being
-         * the first callback of an instruction
-         */
-        gen_wrapped(from, PLUGIN_GEN_ENABLE_MEM_HELPER,
-                    gen_empty_mem_helper);
-        gen_wrapped(from, PLUGIN_GEN_CB_UDATA, gen_empty_udata_cb_no_rwg);
-        gen_wrapped(from, PLUGIN_GEN_CB_UDATA_R, gen_empty_udata_cb_no_wg);
-        gen_wrapped(from, PLUGIN_GEN_CB_INLINE, gen_empty_inline_cb);
+        tcg_gen_plugin_cb(from);
         break;
     default:
         g_assert_not_reached();
@@ -XXX,XX +XXX,XX @@ static TCGOp *copy_mul_i32(TCGOp **begin_op, TCGOp *op, uint32_t v)
     return op;
 }
 
-static TCGOp *copy_st_ptr(TCGOp **begin_op, TCGOp *op)
-{
-    if (UINTPTR_MAX == UINT32_MAX) {
-        /* st_i32 */
-        op = copy_op(begin_op, op, INDEX_op_st_i32);
-    } else {
-        /* st_i64 */
-        op = copy_st_i64(begin_op, op);
-    }
-    return op;
-}
-
 static TCGOp *copy_call(TCGOp **begin_op, TCGOp *op, void *func, int *cb_idx)
 {
     TCGOp *old_op;
@@ -XXX,XX +XXX,XX @@ static TCGOp *copy_call(TCGOp **begin_op, TCGOp *op, void *func, int *cb_idx)
     return op;
 }
 
-/*
- * When we append/replace ops here we are sensitive to changing patterns of
- * TCGOps generated by the tcg_gen_FOO calls when we generated the
- * empty callbacks. This will assert very quickly in a debug build as
- * we assert the ops we are replacing are the correct ones.
- */
-static TCGOp *append_udata_cb(const struct qemu_plugin_dyn_cb *cb,
-                              TCGOp *begin_op, TCGOp *op, int *cb_idx)
-{
-    /* const_ptr */
-    op = copy_const_ptr(&begin_op, op, cb->userp);
-
-    /* copy the ld_i32, but note that we only have to copy it once */
-    if (*cb_idx == -1) {
-        op = copy_op(&begin_op, op, INDEX_op_ld_i32);
-    } else {
-        begin_op = QTAILQ_NEXT(begin_op, link);
-        tcg_debug_assert(begin_op && begin_op->opc == INDEX_op_ld_i32);
-    }
-
-    /* call */
-    op = copy_call(&begin_op, op, cb->regular.f.vcpu_udata, cb_idx);
-
-    return op;
-}
-
 static TCGOp *append_inline_cb(const struct qemu_plugin_dyn_cb *cb,
                                TCGOp *begin_op, TCGOp *op,
                                int *unused)
@@ -XXX,XX +XXX,XX @@ typedef TCGOp *(*inject_fn)(const struct qemu_plugin_dyn_cb *cb,
                             TCGOp *begin_op, TCGOp *op, int *intp);
 typedef bool (*op_ok_fn)(const TCGOp *op, const struct qemu_plugin_dyn_cb *cb);
 
-static bool op_ok(const TCGOp *op, const struct qemu_plugin_dyn_cb *cb)
-{
-    return true;
-}
-
 static bool op_rw(const TCGOp *op, const struct qemu_plugin_dyn_cb *cb)
 {
     int w;
@@ -XXX,XX +XXX,XX @@ static void inject_cb_type(const GArray *cbs, TCGOp *begin_op,
     rm_ops_range(begin_op, end_op);
 }
 
-static void
-inject_udata_cb(const GArray *cbs, TCGOp *begin_op)
-{
-    inject_cb_type(cbs, begin_op, append_udata_cb, op_ok);
-}
-
 static void
 inject_inline_cb(const GArray *cbs, TCGOp *begin_op, op_ok_fn ok)
 {
@@ -XXX,XX +XXX,XX @@ inject_mem_cb(const GArray *cbs, TCGOp *begin_op)
     inject_cb_type(cbs, begin_op, append_mem_cb, op_rw);
 }
 
-/* we could change the ops in place, but we can reuse more code by copying */
-static void inject_mem_helper(TCGOp *begin_op, GArray *arr)
-{
-    TCGOp *orig_op = begin_op;
-    TCGOp *end_op;
-    TCGOp *op;
-
-    end_op = find_op(begin_op, INDEX_op_plugin_cb_end);
-    tcg_debug_assert(end_op);
-
-    /* const ptr */
-    op = copy_const_ptr(&begin_op, end_op, arr);
-
-    /* st_ptr */
-    op = copy_st_ptr(&begin_op, op);
-
-    rm_ops_range(orig_op, end_op);
-}
-
-/*
- * Tracking memory accesses performed from helpers requires extra work.
- * If an instruction is emulated with helpers, we do two things:
- * (1) copy the CB descriptors, and keep track of it so that they can be
- * freed later on, and (2) point CPUState.plugin_mem_cbs to the descriptors, so
- * that we can read them at run-time (i.e. when the helper executes).
- * This run-time access is performed from qemu_plugin_vcpu_mem_cb.
- *
- * Note that plugin_gen_disable_mem_helpers undoes (2). Since it
- * is possible that the code we generate after the instruction is
- * dead, we also add checks before generating tb_exit etc.
- */
-static void inject_mem_enable_helper(struct qemu_plugin_tb *ptb,
-                                     struct qemu_plugin_insn *plugin_insn,
-                                     TCGOp *begin_op)
-{
-    GArray *cbs[2];
-    GArray *arr;
-    size_t n_cbs, i;
-
-    cbs[0] = plugin_insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR];
-    cbs[1] = plugin_insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE];
-
-    n_cbs = 0;
-    for (i = 0; i < ARRAY_SIZE(cbs); i++) {
-        n_cbs += cbs[i]->len;
-    }
-
-    plugin_insn->mem_helper = plugin_insn->calls_helpers && n_cbs;
-    if (likely(!plugin_insn->mem_helper)) {
-        rm_ops(begin_op);
-        return;
-    }
-    ptb->mem_helper = true;
-
-    arr = g_array_sized_new(false, false,
-                            sizeof(struct qemu_plugin_dyn_cb), n_cbs);
-
-    for (i = 0; i < ARRAY_SIZE(cbs); i++) {
-        g_array_append_vals(arr, cbs[i]->data, cbs[i]->len);
-    }
-
-    qemu_plugin_add_dyn_cb_arr(arr);
-    inject_mem_helper(begin_op, arr);
-}
-
 /* called before finishing a TB with exit_tb, goto_tb or goto_ptr */
 void plugin_gen_disable_mem_helpers(void)
 {
@@ -XXX,XX +XXX,XX @@ void plugin_gen_disable_mem_helpers(void)
     }
 }
 
-static void plugin_gen_insn_udata(const struct qemu_plugin_tb *ptb,
-                                  TCGOp *begin_op, int insn_idx)
-{
-    struct qemu_plugin_insn *insn = g_ptr_array_index(ptb->insns, insn_idx);
-
-    inject_udata_cb(insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR], begin_op);
-}
-
-static void plugin_gen_insn_udata_r(const struct qemu_plugin_tb *ptb,
-                                    TCGOp *begin_op, int insn_idx)
-{
-    struct qemu_plugin_insn *insn = g_ptr_array_index(ptb->insns, insn_idx);
-
-    inject_udata_cb(insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR_R], begin_op);
-}
-
-static void plugin_gen_insn_inline(const struct qemu_plugin_tb *ptb,
-                                   TCGOp *begin_op, int insn_idx)
-{
-    struct qemu_plugin_insn *insn = g_ptr_array_index(ptb->insns, insn_idx);
-    inject_inline_cb(insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
-                     begin_op, op_ok);
-}
-
 static void plugin_gen_mem_regular(const struct qemu_plugin_tb *ptb,
                                    TCGOp *begin_op, int insn_idx)
 {
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_mem_inline(const struct qemu_plugin_tb *ptb,
     inject_inline_cb(cbs, begin_op, op_rw);
 }
 
-static void plugin_gen_enable_mem_helper(struct qemu_plugin_tb *ptb,
-                                         TCGOp *begin_op, int insn_idx)
+static void gen_enable_mem_helper(struct qemu_plugin_tb *ptb,
+                                  struct qemu_plugin_insn *insn)
 {
-    struct qemu_plugin_insn *insn = g_ptr_array_index(ptb->insns, insn_idx);
-    inject_mem_enable_helper(ptb, insn, begin_op);
+    GArray *cbs[2];
+    GArray *arr;
+    size_t n_cbs;
+
+    /*
+     * Tracking memory accesses performed from helpers requires extra work.
+     * If an instruction is emulated with helpers, we do two things:
+     * (1) copy the CB descriptors, and keep track of it so that they can be
+     * freed later on, and (2) point CPUState.plugin_mem_cbs to the
+     * descriptors, so that we can read them at run-time
+     * (i.e. when the helper executes).
+     * This run-time access is performed from qemu_plugin_vcpu_mem_cb.
+     *
+     * Note that plugin_gen_disable_mem_helpers undoes (2). Since it
+     * is possible that the code we generate after the instruction is
+     * dead, we also add checks before generating tb_exit etc.
+     */
+    if (!insn->calls_helpers) {
+        return;
+    }
+
+    cbs[0] = insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR];
+    cbs[1] = insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE];
+    n_cbs = cbs[0]->len + cbs[1]->len;
+
+    if (n_cbs == 0) {
+        insn->mem_helper = false;
+        return;
+    }
+    insn->mem_helper = true;
+    ptb->mem_helper = true;
+
+    arr = g_array_sized_new(false, false,
+                            sizeof(struct qemu_plugin_dyn_cb), n_cbs);
+    g_array_append_vals(arr, cbs[0]->data, cbs[0]->len);
+    g_array_append_vals(arr, cbs[1]->data, cbs[1]->len);
+
+    qemu_plugin_add_dyn_cb_arr(arr);
+
+    tcg_gen_st_ptr(tcg_constant_ptr((intptr_t)arr), tcg_env,
+                   offsetof(CPUState, plugin_mem_cbs) -
+                   offsetof(ArchCPU, env));
 }
 
 static void gen_disable_mem_helper(void)
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
                 }
                 break;
 
+            case PLUGIN_GEN_FROM_INSN:
+                assert(insn != NULL);
+
+                gen_enable_mem_helper(plugin_tb, insn);
+
+                cbs = insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR];
+                for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
+                    struct qemu_plugin_dyn_cb *cb =
+                        &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
+                    gen_udata_cb(cb);
+                }
+
+                cbs = insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE];
+                for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
+                    struct qemu_plugin_dyn_cb *cb =
+                        &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
+                    gen_inline_cb(cb);
+                }
+                break;
+
             default:
                 g_assert_not_reached();
             }
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
             enum plugin_gen_cb type = op->args[1];
 
             switch (from) {
-            case PLUGIN_GEN_FROM_INSN:
-            {
-                g_assert(insn_idx >= 0);
-
-                switch (type) {
-                case PLUGIN_GEN_CB_UDATA:
-                    plugin_gen_insn_udata(plugin_tb, op, insn_idx);
-                    break;
-                case PLUGIN_GEN_CB_UDATA_R:
-                    plugin_gen_insn_udata_r(plugin_tb, op, insn_idx);
-                    break;
-                case PLUGIN_GEN_CB_INLINE:
-                    plugin_gen_insn_inline(plugin_tb, op, insn_idx);
-                    break;
-                case PLUGIN_GEN_ENABLE_MEM_HELPER:
-                    plugin_gen_enable_mem_helper(plugin_tb, op, insn_idx);
-                    break;
-                default:
-                    g_assert_not_reached();
-                }
-                break;
-            }
             case PLUGIN_GEN_FROM_MEM:
             {
                 g_assert(insn_idx >= 0);
diff --git a/plugins/api.c b/plugins/api.c
index XXXXXXX..XXXXXXX 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -XXX,XX +XXX,XX @@ void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
                                             void *udata)
 {
     if (!insn->mem_only) {
-        int index = flags == QEMU_PLUGIN_CB_R_REGS ||
-                    flags == QEMU_PLUGIN_CB_RW_REGS ?
-                    PLUGIN_CB_REGULAR_R : PLUGIN_CB_REGULAR;
-
-        plugin_register_dyn_cb__udata(&insn->cbs[PLUGIN_CB_INSN][index],
-                                      cb, flags, udata);
+        plugin_register_dyn_cb__udata(
+            &insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR], cb, flags, udata);
     }
 }
 
-- 
2.34.1

Introduce a new plugin_mem_cb op to hold the address temp
and meminfo computed by tcg-op-ldst.c.  Because this now
has its own opcode, we no longer need PLUGIN_GEN_FROM_MEM.

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/plugin-gen.h   |   4 -
 include/tcg/tcg-op-common.h |   1 +
 include/tcg/tcg-opc.h       |   1 +
 accel/tcg/plugin-gen.c      | 408 ++++--------------------------------
 tcg/tcg-op-ldst.c           |   6 +-
 tcg/tcg-op.c                |   5 +
 6 files changed, 54 insertions(+), 371 deletions(-)

diff --git a/include/exec/plugin-gen.h b/include/exec/plugin-gen.h
index XXXXXXX..XXXXXXX 100644
--- a/include/exec/plugin-gen.h
+++ b/include/exec/plugin-gen.h
@@ -XXX,XX +XXX,XX @@ void plugin_gen_insn_start(CPUState *cpu, const struct DisasContextBase *db);
 void plugin_gen_insn_end(void);
 
 void plugin_gen_disable_mem_helpers(void);
-void plugin_gen_empty_mem_callback(TCGv_i64 addr, uint32_t info);
 
 #else /* !CONFIG_PLUGIN */
 
@@ -XXX,XX +XXX,XX @@ static inline void plugin_gen_tb_end(CPUState *cpu, size_t num_insns)
 static inline void plugin_gen_disable_mem_helpers(void)
 { }
 
-static inline void plugin_gen_empty_mem_callback(TCGv_i64 addr, uint32_t info)
-{ }
-
 #endif /* CONFIG_PLUGIN */
 
 #endif /* QEMU_PLUGIN_GEN_H */
diff --git a/include/tcg/tcg-op-common.h b/include/tcg/tcg-op-common.h
index XXXXXXX..XXXXXXX 100644
--- a/include/tcg/tcg-op-common.h
+++ b/include/tcg/tcg-op-common.h
@@ -XXX,XX +XXX,XX @@ void tcg_gen_goto_tb(unsigned idx);
 void tcg_gen_lookup_and_goto_ptr(void);
 
 void tcg_gen_plugin_cb(unsigned from);
+void tcg_gen_plugin_mem_cb(TCGv_i64 addr, unsigned meminfo);
 void tcg_gen_plugin_cb_start(unsigned from, unsigned type, unsigned wr);
 void tcg_gen_plugin_cb_end(void);
 
diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index XXXXXXX..XXXXXXX 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -XXX,XX +XXX,XX @@ DEF(goto_tb, 0, 0, 1, TCG_OPF_BB_EXIT | TCG_OPF_BB_END)
 DEF(goto_ptr, 0, 1, 0, TCG_OPF_BB_EXIT | TCG_OPF_BB_END)
 
 DEF(plugin_cb, 0, 0, 1, TCG_OPF_NOT_PRESENT)
+DEF(plugin_mem_cb, 0, 1, 1, TCG_OPF_NOT_PRESENT)
 DEF(plugin_cb_start, 0, 0, 3, TCG_OPF_NOT_PRESENT)
 DEF(plugin_cb_end, 0, 0, 0, TCG_OPF_NOT_PRESENT)
 
diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -XXX,XX +XXX,XX @@
 enum plugin_gen_from {
     PLUGIN_GEN_FROM_TB,
     PLUGIN_GEN_FROM_INSN,
-    PLUGIN_GEN_FROM_MEM,
     PLUGIN_GEN_AFTER_INSN,
     PLUGIN_GEN_AFTER_TB,
     PLUGIN_GEN_N_FROMS,
@@ -XXX,XX +XXX,XX @@ void HELPER(plugin_vcpu_mem_cb)(unsigned int vcpu_index,
                                 void *userdata)
 { }
 
-/*
- * For now we only support addi_i64.
- * When we support more ops, we can generate one empty inline cb for each.
- */
-static void gen_empty_inline_cb(void)
-{
-    TCGv_i32 cpu_index = tcg_temp_ebb_new_i32();
-    TCGv_ptr cpu_index_as_ptr = tcg_temp_ebb_new_ptr();
-    TCGv_i64 val = tcg_temp_ebb_new_i64();
-    TCGv_ptr ptr = tcg_temp_ebb_new_ptr();
-
-    tcg_gen_ld_i32(cpu_index, tcg_env,
-                   -offsetof(ArchCPU, env) + offsetof(CPUState, cpu_index));
-    /* second operand will be replaced by immediate value */
-    tcg_gen_mul_i32(cpu_index, cpu_index, cpu_index);
-    tcg_gen_ext_i32_ptr(cpu_index_as_ptr, cpu_index);
-
-    tcg_gen_movi_ptr(ptr, 0);
-    tcg_gen_add_ptr(ptr, ptr, cpu_index_as_ptr);
-    tcg_gen_ld_i64(val, ptr, 0);
-    /* second operand will be replaced by immediate value */
-    tcg_gen_add_i64(val, val, val);
-
-    tcg_gen_st_i64(val, ptr, 0);
-    tcg_temp_free_ptr(ptr);
-    tcg_temp_free_i64(val);
-    tcg_temp_free_ptr(cpu_index_as_ptr);
-    tcg_temp_free_i32(cpu_index);
-}
-
-static void gen_empty_mem_cb(TCGv_i64 addr, uint32_t info)
-{
-    TCGv_i32 cpu_index = tcg_temp_ebb_new_i32();
-    TCGv_i32 meminfo = tcg_temp_ebb_new_i32();
-    TCGv_ptr udata = tcg_temp_ebb_new_ptr();
-
-    tcg_gen_movi_i32(meminfo, info);
-    tcg_gen_movi_ptr(udata, 0);
-    tcg_gen_ld_i32(cpu_index, tcg_env,
-                   -offsetof(ArchCPU, env) + offsetof(CPUState, cpu_index));
-
-    gen_helper_plugin_vcpu_mem_cb(cpu_index, meminfo, addr, udata);
-
-    tcg_temp_free_ptr(udata);
-    tcg_temp_free_i32(meminfo);
-    tcg_temp_free_i32(cpu_index);
-}
-
-static void gen_plugin_cb_start(enum plugin_gen_from from,
-                                enum plugin_gen_cb type, unsigned wr)
-{
-    tcg_gen_plugin_cb_start(from, type, wr);
-}
-
 static void plugin_gen_empty_callback(enum plugin_gen_from from)
 {
     switch (from) {
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_empty_callback(enum plugin_gen_from from)
     }
 }
 
-void plugin_gen_empty_mem_callback(TCGv_i64 addr, uint32_t info)
-{
-    enum qemu_plugin_mem_rw rw = get_plugin_meminfo_rw(info);
-
-    gen_plugin_cb_start(PLUGIN_GEN_FROM_MEM, PLUGIN_GEN_CB_MEM, rw);
-    gen_empty_mem_cb(addr, info);
-    tcg_gen_plugin_cb_end();
-
-    gen_plugin_cb_start(PLUGIN_GEN_FROM_MEM, PLUGIN_GEN_CB_INLINE, rw);
-    gen_empty_inline_cb();
-    tcg_gen_plugin_cb_end();
-}
-
-static TCGOp *find_op(TCGOp *op, TCGOpcode opc)
-{
-    while (op) {
-        if (op->opc == opc) {
-            return op;
-        }
-        op = QTAILQ_NEXT(op, link);
-    }
-    return NULL;
-}
-
-static TCGOp *rm_ops_range(TCGOp *begin, TCGOp *end)
-{
-    TCGOp *ret = QTAILQ_NEXT(end, link);
-
-    QTAILQ_REMOVE_SEVERAL(&tcg_ctx->ops, begin, end, link);
-    return ret;
-}
-
-/* remove all ops until (and including) plugin_cb_end */
-static TCGOp *rm_ops(TCGOp *op)
-{
-    TCGOp *end_op = find_op(op, INDEX_op_plugin_cb_end);
-
-    tcg_debug_assert(end_op);
-    return rm_ops_range(op, end_op);
-}
-
-static TCGOp *copy_op_nocheck(TCGOp **begin_op, TCGOp *op)
-{
-    TCGOp *old_op = QTAILQ_NEXT(*begin_op, link);
-    unsigned nargs = old_op->nargs;
-
-    *begin_op = old_op;
-    op = tcg_op_insert_after(tcg_ctx, op, old_op->opc, nargs);
-    memcpy(op->args, old_op->args, sizeof(op->args[0]) * nargs);
-
-    return op;
-}
-
-static TCGOp *copy_op(TCGOp **begin_op, TCGOp *op, TCGOpcode opc)
-{
-    op = copy_op_nocheck(begin_op, op);
-    tcg_debug_assert((*begin_op)->opc == opc);
-    return op;
-}
-
-static TCGOp *copy_const_ptr(TCGOp **begin_op, TCGOp *op, void *ptr)
-{
-    if (UINTPTR_MAX == UINT32_MAX) {
-        /* mov_i32 */
-        op = copy_op(begin_op, op, INDEX_op_mov_i32);
-        op->args[1] = tcgv_i32_arg(tcg_constant_i32((uintptr_t)ptr));
-    } else {
-        /* mov_i64 */
-        op = copy_op(begin_op, op, INDEX_op_mov_i64);
-        op->args[1] = tcgv_i64_arg(tcg_constant_i64((uintptr_t)ptr));
-    }
-    return op;
-}
-
-static TCGOp *copy_ld_i32(TCGOp **begin_op, TCGOp *op)
-{
-    return copy_op(begin_op, op, INDEX_op_ld_i32);
-}
-
-static TCGOp *copy_ext_i32_ptr(TCGOp **begin_op, TCGOp *op)
-{
-    if (UINTPTR_MAX == UINT32_MAX) {
-        op = copy_op(begin_op, op, INDEX_op_mov_i32);
-    } else {
-        op = copy_op(begin_op, op, INDEX_op_ext_i32_i64);
-    }
-    return op;
-}
-
-static TCGOp *copy_add_ptr(TCGOp **begin_op, TCGOp *op)
-{
-    if (UINTPTR_MAX == UINT32_MAX) {
-        op = copy_op(begin_op, op, INDEX_op_add_i32);
-    } else {
-        op = copy_op(begin_op, op, INDEX_op_add_i64);
-    }
-    return op;
-}
-
-static TCGOp *copy_ld_i64(TCGOp **begin_op, TCGOp *op)
-{
-    if (TCG_TARGET_REG_BITS == 32) {
-        /* 2x ld_i32 */
-        op = copy_ld_i32(begin_op, op);
-        op = copy_ld_i32(begin_op, op);
-    } else {
-        /* ld_i64 */
-        op = copy_op(begin_op, op, INDEX_op_ld_i64);
-    }
-    return op;
-}
-
-static TCGOp *copy_st_i64(TCGOp **begin_op, TCGOp *op)
-{
-    if (TCG_TARGET_REG_BITS == 32) {
-        /* 2x st_i32 */
-        op = copy_op(begin_op, op, INDEX_op_st_i32);
-        op = copy_op(begin_op, op, INDEX_op_st_i32);
-    } else {
-        /* st_i64 */
-        op = copy_op(begin_op, op, INDEX_op_st_i64);
-    }
-    return op;
-}
-
-static TCGOp *copy_add_i64(TCGOp **begin_op, TCGOp *op, uint64_t v)
-{
-    if (TCG_TARGET_REG_BITS == 32) {
-        /* all 32-bit backends must implement add2_i32 */
-        g_assert(TCG_TARGET_HAS_add2_i32);
-        op = copy_op(begin_op, op, INDEX_op_add2_i32);
-        op->args[4] = tcgv_i32_arg(tcg_constant_i32(v));
-        op->args[5] = tcgv_i32_arg(tcg_constant_i32(v >> 32));
-    } else {
-        op = copy_op(begin_op, op, INDEX_op_add_i64);
-        op->args[2] = tcgv_i64_arg(tcg_constant_i64(v));
-    }
-    return op;
-}
-
-static TCGOp *copy_mul_i32(TCGOp **begin_op, TCGOp *op, uint32_t v)
-{
-    op = copy_op(begin_op, op, INDEX_op_mul_i32);
-    op->args[2] = tcgv_i32_arg(tcg_constant_i32(v));
-    return op;
-}
-
-static TCGOp *copy_call(TCGOp **begin_op, TCGOp *op, void *func, int *cb_idx)
-{
-    TCGOp *old_op;
-    int func_idx;
-
-    /* copy all ops until the call */
-    do {
-        op = copy_op_nocheck(begin_op, op);
-    } while (op->opc != INDEX_op_call);
-
-    /* fill in the op call */
-    old_op = *begin_op;
-    TCGOP_CALLI(op) = TCGOP_CALLI(old_op);
-    TCGOP_CALLO(op) = TCGOP_CALLO(old_op);
-    tcg_debug_assert(op->life == 0);
-
-    func_idx = TCGOP_CALLO(op) + TCGOP_CALLI(op);
-    *cb_idx = func_idx;
-    op->args[func_idx] = (uintptr_t)func;
-
-    return op;
-}
-
-static TCGOp *append_inline_cb(const struct qemu_plugin_dyn_cb *cb,
-                               TCGOp *begin_op, TCGOp *op,
-                               int *unused)
-{
-    char *ptr = cb->inline_insn.entry.score->data->data;
-    size_t elem_size = g_array_get_element_size(
-        cb->inline_insn.entry.score->data);
-    size_t offset = cb->inline_insn.entry.offset;
-
-    op = copy_ld_i32(&begin_op, op);
-    op = copy_mul_i32(&begin_op, op, elem_size);
-    op = copy_ext_i32_ptr(&begin_op, op);
-    op = copy_const_ptr(&begin_op, op, ptr + offset);
-    op = copy_add_ptr(&begin_op, op);
-    op = copy_ld_i64(&begin_op, op);
-    op = copy_add_i64(&begin_op, op, cb->inline_insn.imm);
-    op = copy_st_i64(&begin_op, op);
-    return op;
-}
-
-static TCGOp *append_mem_cb(const struct qemu_plugin_dyn_cb *cb,
-                            TCGOp *begin_op, TCGOp *op, int *cb_idx)
-{
-    enum plugin_gen_cb type = begin_op->args[1];
-
-    tcg_debug_assert(type == PLUGIN_GEN_CB_MEM);
-
-    /* const_i32 == mov_i32 ("info", so it remains as is) */
-    op = copy_op(&begin_op, op, INDEX_op_mov_i32);
-
-    /* const_ptr */
-    op = copy_const_ptr(&begin_op, op, cb->userp);
-
-    /* copy the ld_i32, but note that we only have to copy it once */
-    if (*cb_idx == -1) {
-        op = copy_op(&begin_op, op, INDEX_op_ld_i32);
-    } else {
-        begin_op = QTAILQ_NEXT(begin_op, link);
-        tcg_debug_assert(begin_op && begin_op->opc == INDEX_op_ld_i32);
-    }
-
-    if (type == PLUGIN_GEN_CB_MEM) {
-        /* call */
-        op = copy_call(&begin_op, op, cb->regular.f.vcpu_udata, cb_idx);
-    }
-
-    return op;
-}
-
-typedef TCGOp *(*inject_fn)(const struct qemu_plugin_dyn_cb *cb,
-                            TCGOp *begin_op, TCGOp *op, int *intp);
-typedef bool (*op_ok_fn)(const TCGOp *op, const struct qemu_plugin_dyn_cb *cb);
-
-static bool op_rw(const TCGOp *op, const struct qemu_plugin_dyn_cb *cb)
-{
-    int w;
-
-    w = op->args[2];
-    return !!(cb->rw & (w + 1));
-}
-
-static void inject_cb_type(const GArray *cbs, TCGOp *begin_op,
-                           inject_fn inject, op_ok_fn ok)
-{
-    TCGOp *end_op;
-    TCGOp *op;
-    int cb_idx = -1;
-    int i;
-
-    if (!cbs || cbs->len == 0) {
-        rm_ops(begin_op);
-        return;
-    }
-
-    end_op = find_op(begin_op, INDEX_op_plugin_cb_end);
-    tcg_debug_assert(end_op);
-
-    op = end_op;
-    for (i = 0; i < cbs->len; i++) {
-        struct qemu_plugin_dyn_cb *cb =
-            &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
-
-        if (!ok(begin_op, cb)) {
-            continue;
-        }
-        op = inject(cb, begin_op, op, &cb_idx);
-    }
-    rm_ops_range(begin_op, end_op);
-}
-
-static void
-inject_inline_cb(const GArray *cbs, TCGOp *begin_op, op_ok_fn ok)
-{
-    inject_cb_type(cbs, begin_op, append_inline_cb, ok);
-}
-
-static void
-inject_mem_cb(const GArray *cbs, TCGOp *begin_op)
-{
-    inject_cb_type(cbs, begin_op, append_mem_cb, op_rw);
-}
-
 /* called before finishing a TB with exit_tb, goto_tb or goto_ptr */
 void plugin_gen_disable_mem_helpers(void)
 {
@@ -XXX,XX +XXX,XX @@ void plugin_gen_disable_mem_helpers(void)
     }
 }
 
-static void plugin_gen_mem_regular(const struct qemu_plugin_tb *ptb,
-                                   TCGOp *begin_op, int insn_idx)
-{
-    struct qemu_plugin_insn *insn = g_ptr_array_index(ptb->insns, insn_idx);
-    inject_mem_cb(insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR], begin_op);
-}
-
-static void plugin_gen_mem_inline(const struct qemu_plugin_tb *ptb,
-                                  TCGOp *begin_op, int insn_idx)
-{
-    const GArray *cbs;
-    struct qemu_plugin_insn *insn = g_ptr_array_index(ptb->insns, insn_idx);
-
-    cbs = insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE];
-    inject_inline_cb(cbs, begin_op, op_rw);
-}
-
 static void gen_enable_mem_helper(struct qemu_plugin_tb *ptb,
                                   struct qemu_plugin_insn *insn)
 {
@@ -XXX,XX +XXX,XX @@ static void gen_inline_cb(struct qemu_plugin_dyn_cb *cb)
     tcg_temp_free_ptr(ptr);
 }
 
+static void gen_mem_cb(struct qemu_plugin_dyn_cb *cb,
+                       qemu_plugin_meminfo_t meminfo, TCGv_i64 addr)
+{
+    TCGv_i32 cpu_index = tcg_temp_ebb_new_i32();
+
+    tcg_gen_ld_i32(cpu_index, tcg_env,
+                   -offsetof(ArchCPU, env) + offsetof(CPUState, cpu_index));
+    tcg_gen_call4(cb->regular.f.vcpu_mem, cb->regular.info, NULL,
+                  tcgv_i32_temp(cpu_index),
+                  tcgv_i32_temp(tcg_constant_i32(meminfo)),
+                  tcgv_i64_temp(addr),
+                  tcgv_ptr_temp(tcg_constant_ptr(cb->userp)));
+    tcg_temp_free_i32(cpu_index);
+}
+
 /* #define DEBUG_PLUGIN_GEN_OPS */
 static void pr_ops(void)
 {
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
             break;
         }
 
-        case INDEX_op_plugin_cb_start:
+        case INDEX_op_plugin_mem_cb:
         {
-            enum plugin_gen_from from = op->args[0];
-            enum plugin_gen_cb type = op->args[1];
+            TCGv_i64 addr = temp_tcgv_i64(arg_temp(op->args[0]));
+            qemu_plugin_meminfo_t meminfo = op->args[1];
+            struct qemu_plugin_insn *insn;
+            const GArray *cbs;
+            int i, n, rw;
 
-            switch (from) {
-            case PLUGIN_GEN_FROM_MEM:
-            {
-                g_assert(insn_idx >= 0);
+            assert(insn_idx >= 0);
+            insn = g_ptr_array_index(plugin_tb->insns, insn_idx);
+            rw = qemu_plugin_mem_is_store(meminfo) ? 2 : 1;
 
-                switch (type) {
-                case PLUGIN_GEN_CB_MEM:
-                    plugin_gen_mem_regular(plugin_tb, op, insn_idx);
-                    break;
-                case PLUGIN_GEN_CB_INLINE:
-                    plugin_gen_mem_inline(plugin_tb, op, insn_idx);
-                    break;
-                default:
-                    g_assert_not_reached();
+            tcg_ctx->emit_before_op = op;
+
+            cbs = insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR];
+            for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
+                struct qemu_plugin_dyn_cb *cb =
+                    &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
+                if (cb->rw & rw) {
+                    gen_mem_cb(cb, meminfo, addr);
                 }
+            }
 
-                break;
-            }
-            default:
-                g_assert_not_reached();
+            cbs = insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE];
+            for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
+                struct qemu_plugin_dyn_cb *cb =
+                    &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
+                if (cb->rw & rw) {
+                    gen_inline_cb(cb);
+                }
             }
+
+            tcg_ctx->emit_before_op = NULL;
+            tcg_op_remove(tcg_ctx, op);
             break;
         }
+
         default:
             /* plugins don't care about any other ops */
             break;
diff --git a/tcg/tcg-op-ldst.c b/tcg/tcg-op-ldst.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/tcg-op-ldst.c
+++ b/tcg/tcg-op-ldst.c
@@ -XXX,XX +XXX,XX @@ plugin_gen_mem_callbacks(TCGv_i64 copy_addr, TCGTemp *orig_addr, MemOpIdx oi,
                 copy_addr = tcg_temp_ebb_new_i64();
                 tcg_gen_extu_i32_i64(copy_addr, temp_tcgv_i32(orig_addr));
             }
-            plugin_gen_empty_mem_callback(copy_addr, info);
+            tcg_gen_plugin_mem_cb(copy_addr, info);
             tcg_temp_free_i64(copy_addr);
         } else {
             if (copy_addr) {
-                plugin_gen_empty_mem_callback(copy_addr, info);
+                tcg_gen_plugin_mem_cb(copy_addr, info);
                 tcg_temp_free_i64(copy_addr);
             } else {
-                plugin_gen_empty_mem_callback(temp_tcgv_i64(orig_addr), info);
+                tcg_gen_plugin_mem_cb(temp_tcgv_i64(orig_addr), info);
             }
         }
     }
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -XXX,XX +XXX,XX @@ void tcg_gen_plugin_cb(unsigned from)
     tcg_gen_op1(INDEX_op_plugin_cb, from);
 }
 
+void tcg_gen_plugin_mem_cb(TCGv_i64 addr, unsigned meminfo)
+{
+    tcg_gen_op2(INDEX_op_plugin_mem_cb, tcgv_i64_arg(addr), meminfo);
+}
+
 void tcg_gen_plugin_cb_start(unsigned from, unsigned type, unsigned wr)
 {
     tcg_gen_op3(INDEX_op_plugin_cb_start, from, type, wr);
-- 
2.34.1

These placeholder helpers are no longer required.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/plugin-helpers.h         |  5 -----
 include/exec/helper-gen-common.h   |  4 ----
 include/exec/helper-proto-common.h |  4 ----
 accel/tcg/plugin-gen.c             | 20 --------------------
 4 files changed, 33 deletions(-)
 delete mode 100644 accel/tcg/plugin-helpers.h

diff --git a/accel/tcg/plugin-helpers.h b/accel/tcg/plugin-helpers.h
deleted file mode 100644
index XXXXXXX..XXXXXXX
--- a/accel/tcg/plugin-helpers.h
+++ /dev/null
@@ -XXX,XX +XXX,XX @@
-#ifdef CONFIG_PLUGIN
-DEF_HELPER_FLAGS_2(plugin_vcpu_udata_cb_no_wg, TCG_CALL_NO_WG | TCG_CALL_PLUGIN, void, i32, ptr)
-DEF_HELPER_FLAGS_2(plugin_vcpu_udata_cb_no_rwg, TCG_CALL_NO_RWG | TCG_CALL_PLUGIN, void, i32, ptr)
-DEF_HELPER_FLAGS_4(plugin_vcpu_mem_cb, TCG_CALL_NO_RWG | TCG_CALL_PLUGIN, void, i32, i32, i64, ptr)
-#endif
diff --git a/include/exec/helper-gen-common.h b/include/exec/helper-gen-common.h
index XXXXXXX..XXXXXXX 100644
--- a/include/exec/helper-gen-common.h
+++ b/include/exec/helper-gen-common.h
@@ -XXX,XX +XXX,XX @@
 #include "exec/helper-gen.h.inc"
 #undef  HELPER_H
 
-#define HELPER_H "accel/tcg/plugin-helpers.h"
-#include "exec/helper-gen.h.inc"
-#undef  HELPER_H
-
 #endif /* HELPER_GEN_COMMON_H */
diff --git a/include/exec/helper-proto-common.h b/include/exec/helper-proto-common.h
index XXXXXXX..XXXXXXX 100644
--- a/include/exec/helper-proto-common.h
+++ b/include/exec/helper-proto-common.h
@@ -XXX,XX +XXX,XX @@
 #include "exec/helper-proto.h.inc"
 #undef  HELPER_H
 
-#define HELPER_H "accel/tcg/plugin-helpers.h"
-#include "exec/helper-proto.h.inc"
-#undef  HELPER_H
-
 #endif /* HELPER_PROTO_COMMON_H */
diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -XXX,XX +XXX,XX @@
 #include "exec/exec-all.h"
 #include "exec/plugin-gen.h"
 #include "exec/translator.h"
-#include "exec/helper-proto-common.h"
-
-#define HELPER_H  "accel/tcg/plugin-helpers.h"
-#include "exec/helper-info.c.inc"
-#undef  HELPER_H
 
 /*
  * plugin_cb_start TCG op args[]:
@@ -XXX,XX +XXX,XX @@ enum plugin_gen_cb {
     PLUGIN_GEN_N_CBS,
 };
 
-/*
- * These helpers are stubs that get dynamically switched out for calls
- * direct to the plugin if they are subscribed to.
- */
-void HELPER(plugin_vcpu_udata_cb_no_wg)(uint32_t cpu_index, void *udata)
-{ }
-
-void HELPER(plugin_vcpu_udata_cb_no_rwg)(uint32_t cpu_index, void *udata)
-{ }
-
-void HELPER(plugin_vcpu_mem_cb)(unsigned int vcpu_index,
-                                qemu_plugin_meminfo_t info, uint64_t vaddr,
-                                void *userdata)
-{ }
-
 static void plugin_gen_empty_callback(enum plugin_gen_from from)
 {
     switch (from) {
-- 
2.34.1

Since we no longer emit plugin helpers during the initial code
translation phase, we don't need to specially mark plugin helpers.

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg.h |  2 --
 plugins/core.c    | 10 ++++------
 tcg/tcg.c         |  4 +---
 3 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index XXXXXXX..XXXXXXX 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -XXX,XX +XXX,XX @@ typedef TCGv_ptr TCGv_env;
 #define TCG_CALL_NO_SIDE_EFFECTS    0x0004
 /* Helper is G_NORETURN.  */
 #define TCG_CALL_NO_RETURN          0x0008
-/* Helper is part of Plugins.  */
-#define TCG_CALL_PLUGIN             0x0010
 
 /* convenience version of most used call flags */
 #define TCG_CALL_NO_RWG         TCG_CALL_NO_READ_GLOBALS
diff --git a/plugins/core.c b/plugins/core.c
index XXXXXXX..XXXXXXX 100644
--- a/plugins/core.c
+++ b/plugins/core.c
@@ -XXX,XX +XXX,XX @@ void plugin_register_dyn_cb__udata(GArray **arr,
                                    void *udata)
 {
     static TCGHelperInfo info[3] = {
-        [QEMU_PLUGIN_CB_NO_REGS].flags = TCG_CALL_NO_RWG | TCG_CALL_PLUGIN,
-        [QEMU_PLUGIN_CB_R_REGS].flags = TCG_CALL_NO_WG | TCG_CALL_PLUGIN,
-        [QEMU_PLUGIN_CB_RW_REGS].flags = TCG_CALL_PLUGIN,
+        [QEMU_PLUGIN_CB_NO_REGS].flags = TCG_CALL_NO_RWG,
+        [QEMU_PLUGIN_CB_R_REGS].flags = TCG_CALL_NO_WG,
         /*
          * Match qemu_plugin_vcpu_udata_cb_t:
          *   void (*)(uint32_t, void *)
@@ -XXX,XX +XXX,XX @@ void plugin_register_vcpu_mem_cb(GArray **arr,
         !__builtin_types_compatible_p(qemu_plugin_meminfo_t, int32_t));
 
     static TCGHelperInfo info[3] = {
-        [QEMU_PLUGIN_CB_NO_REGS].flags = TCG_CALL_NO_RWG | TCG_CALL_PLUGIN,
-        [QEMU_PLUGIN_CB_R_REGS].flags = TCG_CALL_NO_WG | TCG_CALL_PLUGIN,
-        [QEMU_PLUGIN_CB_RW_REGS].flags = TCG_CALL_PLUGIN,
+        [QEMU_PLUGIN_CB_NO_REGS].flags = TCG_CALL_NO_RWG,
+        [QEMU_PLUGIN_CB_R_REGS].flags = TCG_CALL_NO_WG,
         /*
          * Match qemu_plugin_vcpu_mem_cb_t:
          *   void (*)(uint32_t, qemu_plugin_meminfo_t, uint64_t, void *)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -XXX,XX +XXX,XX @@ static void tcg_gen_callN(void *func, TCGHelperInfo *info,
 
 #ifdef CONFIG_PLUGIN
     /* Flag helpers that may affect guest state */
-    if (tcg_ctx->plugin_insn &&
-        !(info->flags & TCG_CALL_PLUGIN) &&
-        !(info->flags & TCG_CALL_NO_SIDE_EFFECTS)) {
+    if (tcg_ctx->plugin_insn && !(info->flags & TCG_CALL_NO_SIDE_EFFECTS)) {
         tcg_ctx->plugin_insn->calls_helpers = true;
     }
 #endif
-- 
2.34.1

These opcodes are no longer used.

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg-op-common.h |  2 --
 include/tcg/tcg-opc.h       |  2 --
 accel/tcg/plugin-gen.c      | 18 ------------------
 tcg/tcg-op.c                | 10 ----------
 4 files changed, 32 deletions(-)

We have qemu_plugin_dyn_cb.type to differentiate the various
callback types, so we do not need to keep them in separate queues.

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/plugin.h  | 35 ++++++----------
 accel/tcg/plugin-gen.c | 90 ++++++++++++++++++++++--------------------
 plugins/api.c          | 18 +++------
 3 files changed, 65 insertions(+), 78 deletions(-)

diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/plugin.h
+++ b/include/qemu/plugin.h
@@ -XXX,XX +XXX,XX @@ union qemu_plugin_cb_sig {
 };
 
 enum plugin_dyn_cb_type {
-    PLUGIN_CB_INSN,
-    PLUGIN_CB_MEM,
-    PLUGIN_N_CB_TYPES,
-};
-
-enum plugin_dyn_cb_subtype {
     PLUGIN_CB_REGULAR,
     PLUGIN_CB_INLINE,
-    PLUGIN_N_CB_SUBTYPES,
 };
 
 /*
@@ -XXX,XX +XXX,XX @@ enum plugin_dyn_cb_subtype {
  */
 struct qemu_plugin_dyn_cb {
     void *userp;
-    enum plugin_dyn_cb_subtype type;
+    enum plugin_dyn_cb_type type;
     /* @rw applies to mem callbacks only (both regular and inline) */
     enum qemu_plugin_mem_rw rw;
     /* fields specific to each dyn_cb type go here */
@@ -XXX,XX +XXX,XX @@ struct qemu_plugin_insn {
     GByteArray *data;
     uint64_t vaddr;
     void *haddr;
-    GArray *cbs[PLUGIN_N_CB_TYPES][PLUGIN_N_CB_SUBTYPES];
+    GArray *insn_cbs;
+    GArray *mem_cbs;
     bool calls_helpers;
 
     /* if set, the instruction calls helpers that might access guest memory */
@@ -XXX,XX +XXX,XX @@ static inline void qemu_plugin_insn_cleanup_fn(gpointer data)
 
 static inline struct qemu_plugin_insn *qemu_plugin_insn_alloc(void)
 {
-    int i, j;
     struct qemu_plugin_insn *insn = g_new0(struct qemu_plugin_insn, 1);
-    insn->data = g_byte_array_sized_new(4);
 
-    for (i = 0; i < PLUGIN_N_CB_TYPES; i++) {
-        for (j = 0; j < PLUGIN_N_CB_SUBTYPES; j++) {
-            insn->cbs[i][j] = g_array_new(false, false,
-                                          sizeof(struct qemu_plugin_dyn_cb));
-        }
-    }
+    insn->data = g_byte_array_sized_new(4);
     return insn;
 }
 
@@ -XXX,XX +XXX,XX @@ struct qemu_plugin_tb {
     /* if set, the TB calls helpers that might access guest memory */
     bool mem_helper;
 
-    GArray *cbs[PLUGIN_N_CB_SUBTYPES];
+    GArray *cbs;
 };
 
 /**
@@ -XXX,XX +XXX,XX @@ struct qemu_plugin_insn *qemu_plugin_tb_insn_get(struct qemu_plugin_tb *tb,
                                                  uint64_t pc)
 {
     struct qemu_plugin_insn *insn;
-    int i, j;
 
     if (unlikely(tb->n == tb->insns->len)) {
         struct qemu_plugin_insn *new_insn = qemu_plugin_insn_alloc();
         g_ptr_array_add(tb->insns, new_insn);
     }
+
     insn = g_ptr_array_index(tb->insns, tb->n++);
     g_byte_array_set_size(insn->data, 0);
     insn->calls_helpers = false;
     insn->mem_helper = false;
     insn->vaddr = pc;
-
-    for (i = 0; i < PLUGIN_N_CB_TYPES; i++) {
-        for (j = 0; j < PLUGIN_N_CB_SUBTYPES; j++) {
-            g_array_set_size(insn->cbs[i][j], 0);
-        }
+    if (insn->insn_cbs) {
+        g_array_set_size(insn->insn_cbs, 0);
+    }
+    if (insn->mem_cbs) {
+        g_array_set_size(insn->mem_cbs, 0);
     }
 
     return insn;
diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -XXX,XX +XXX,XX @@ void plugin_gen_disable_mem_helpers(void)
 static void gen_enable_mem_helper(struct qemu_plugin_tb *ptb,
                                   struct qemu_plugin_insn *insn)
 {
-    GArray *cbs[2];
     GArray *arr;
-    size_t n_cbs;
+    size_t len;
 
     /*
      * Tracking memory accesses performed from helpers requires extra work.
@@ -XXX,XX +XXX,XX @@ static void gen_enable_mem_helper(struct qemu_plugin_tb *ptb,
         return;
     }
 
-    cbs[0] = insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR];
-    cbs[1] = insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE];
-    n_cbs = cbs[0]->len + cbs[1]->len;
-
-    if (n_cbs == 0) {
+    if (!insn->mem_cbs || !insn->mem_cbs->len) {
         insn->mem_helper = false;
         return;
     }
     insn->mem_helper = true;
     ptb->mem_helper = true;
 
+    /*
+     * TODO: It seems like we should be able to use ref/unref
+     * to avoid needing to actually copy this array.
+     * Alternately, perhaps we could allocate new memory adjacent
+     * to the TranslationBlock itself, so that we do not have to
+     * actively manage the lifetime after this.
+     */
+    len = insn->mem_cbs->len;
     arr = g_array_sized_new(false, false,
-                            sizeof(struct qemu_plugin_dyn_cb), n_cbs);
-    g_array_append_vals(arr, cbs[0]->data, cbs[0]->len);
-    g_array_append_vals(arr, cbs[1]->data, cbs[1]->len);
-
+                            sizeof(struct qemu_plugin_dyn_cb), len);
+    memcpy(arr->data, insn->mem_cbs->data,
+           len * sizeof(struct qemu_plugin_dyn_cb));
     qemu_plugin_add_dyn_cb_arr(arr);
 
     tcg_gen_st_ptr(tcg_constant_ptr((intptr_t)arr), tcg_env,
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
             case PLUGIN_GEN_FROM_TB:
                 assert(insn == NULL);
 
-                cbs = plugin_tb->cbs[PLUGIN_CB_REGULAR];
+                cbs = plugin_tb->cbs;
                 for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
                     struct qemu_plugin_dyn_cb *cb =
                         &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
-                    gen_udata_cb(cb);
-                }
 
-                cbs = plugin_tb->cbs[PLUGIN_CB_INLINE];
-                for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
-                    struct qemu_plugin_dyn_cb *cb =
-                        &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
-                    gen_inline_cb(cb);
+                    switch (cb->type) {
+                    case PLUGIN_CB_REGULAR:
+                        gen_udata_cb(cb);
+                        break;
+                    case PLUGIN_CB_INLINE:
+                        gen_inline_cb(cb);
+                        break;
+                    default:
+                        g_assert_not_reached();
+                    }
                 }
                 break;
 
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
 
                 gen_enable_mem_helper(plugin_tb, insn);
 
-                cbs = insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR];
+                cbs = insn->insn_cbs;
                 for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
                     struct qemu_plugin_dyn_cb *cb =
                         &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
-                    gen_udata_cb(cb);
-                }
 
-                cbs = insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE];
-                for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
-                    struct qemu_plugin_dyn_cb *cb =
-                        &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
-                    gen_inline_cb(cb);
+                    switch (cb->type) {
+                    case PLUGIN_CB_REGULAR:
+                        gen_udata_cb(cb);
+                        break;
+                    case PLUGIN_CB_INLINE:
+                        gen_inline_cb(cb);
+                        break;
+                    default:
+                        g_assert_not_reached();
+                    }
                 }
                 break;
 
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
 
             tcg_ctx->emit_before_op = op;
 
-            cbs = insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR];
+            cbs = insn->mem_cbs;
             for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
                 struct qemu_plugin_dyn_cb *cb =
                     &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
-                if (cb->rw & rw) {
-                    gen_mem_cb(cb, meminfo, addr);
-                }
-            }
 
-            cbs = insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE];
-            for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
-                struct qemu_plugin_dyn_cb *cb =
-                    &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
                 if (cb->rw & rw) {
-                    gen_inline_cb(cb);
+                    switch (cb->type) {
+                    case PLUGIN_CB_REGULAR:
+                        gen_mem_cb(cb, meminfo, addr);
+                        break;
+                    case PLUGIN_CB_INLINE:
+                        gen_inline_cb(cb);
+                        break;
+                    default:
+                        g_assert_not_reached();
+                    }
                 }
             }
 
@@ -XXX,XX +XXX,XX @@ bool plugin_gen_tb_start(CPUState *cpu, const DisasContextBase *db,
 
     if (test_bit(QEMU_PLUGIN_EV_VCPU_TB_TRANS, cpu->plugin_state->event_mask)) {
         struct qemu_plugin_tb *ptb = tcg_ctx->plugin_tb;
-        int i;
 
         /* reset callbacks */
-        for (i = 0; i < PLUGIN_N_CB_SUBTYPES; i++) {
-            if (ptb->cbs[i]) {
-                g_array_set_size(ptb->cbs[i], 0);
-            }
+        if (ptb->cbs) {
+            g_array_set_size(ptb->cbs, 0);
         }
         ptb->n = 0;
 
diff --git a/plugins/api.c b/plugins/api.c
index XXXXXXX..XXXXXXX 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -XXX,XX +XXX,XX @@ void qemu_plugin_register_vcpu_tb_exec_cb(struct qemu_plugin_tb *tb,
                                           void *udata)
 {
     if (!tb->mem_only) {
-        plugin_register_dyn_cb__udata(&tb->cbs[PLUGIN_CB_REGULAR],
-                                      cb, flags, udata);
+        plugin_register_dyn_cb__udata(&tb->cbs, cb, flags, udata);
     }
 }
 
@@ -XXX,XX +XXX,XX @@ void qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
     uint64_t imm)
 {
     if (!tb->mem_only) {
-        plugin_register_inline_op_on_entry(
-            &tb->cbs[PLUGIN_CB_INLINE], 0, op, entry, imm);
+        plugin_register_inline_op_on_entry(&tb->cbs, 0, op, entry, imm);
     }
 }
 
@@ -XXX,XX +XXX,XX @@ void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
                                             void *udata)
 {
     if (!insn->mem_only) {
-        plugin_register_dyn_cb__udata(
-            &insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_REGULAR], cb, flags, udata);
+        plugin_register_dyn_cb__udata(&insn->insn_cbs, cb, flags, udata);
     }
 }
 
@@ -XXX,XX +XXX,XX @@ void qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
     uint64_t imm)
 {
     if (!insn->mem_only) {
-        plugin_register_inline_op_on_entry(
-            &insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE], 0, op, entry, imm);
+        plugin_register_inline_op_on_entry(&insn->insn_cbs, 0, op, entry, imm);
     }
 }
 
@@ -XXX,XX +XXX,XX @@ void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
                                       enum qemu_plugin_mem_rw rw,
                                       void *udata)
 {
-    plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
-                                cb, flags, rw, udata);
+    plugin_register_vcpu_mem_cb(&insn->mem_cbs, cb, flags, rw, udata);
 }
 
 void qemu_plugin_register_vcpu_mem_inline_per_vcpu(
@@ -XXX,XX +XXX,XX @@ void qemu_plugin_register_vcpu_mem_inline_per_vcpu(
     qemu_plugin_u64 entry,
     uint64_t imm)
 {
-    plugin_register_inline_op_on_entry(
-        &insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE], rw, op, entry, imm);
+    plugin_register_inline_op_on_entry(&insn->mem_cbs, rw, op, entry, imm);
 }
 
 void qemu_plugin_register_vcpu_tb_trans_cb(qemu_plugin_id_t id,
-- 
2.34.1

Use different enumerators for vcpu_udata and vcpu_mem callbacks.

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/plugin.h  | 1 +
 accel/tcg/plugin-gen.c | 2 +-
 plugins/core.c         | 4 ++--
 3 files changed, 4 insertions(+), 3 deletions(-)

The DEBUG_PLUGIN_GEN_OPS ifdef is replaced with "-d op_plugin".
The second pr_ops call can be obtained with "-d op".

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/log.h     |  1 +
 include/tcg/tcg.h      |  1 +
 accel/tcg/plugin-gen.c | 67 +++++++-----------------------------------
 tcg/tcg.c              | 29 +++++++++++++++++-
 util/log.c             |  4 +++
 5 files changed, 45 insertions(+), 57 deletions(-)

diff --git a/include/qemu/log.h b/include/qemu/log.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -XXX,XX +XXX,XX @@ bool qemu_log_separate(void);
 #define LOG_STRACE         (1 << 19)
 #define LOG_PER_THREAD     (1 << 20)
 #define CPU_LOG_TB_VPU     (1 << 21)
+#define LOG_TB_OP_PLUGIN   (1 << 22)
 
 /* Lock/unlock output. */
 
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index XXXXXXX..XXXXXXX 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -XXX,XX +XXX,XX @@ static inline const TCGOpcode *tcg_swap_vecop_list(const TCGOpcode *n)
 }
 
 bool tcg_can_emit_vecop_list(const TCGOpcode *, TCGType, unsigned);
+void tcg_dump_ops(TCGContext *s, FILE *f, bool have_prefs);
 
 #endif /* TCG_H */
diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -XXX,XX +XXX,XX @@
  */
 #include "qemu/osdep.h"
 #include "qemu/plugin.h"
+#include "qemu/log.h"
 #include "cpu.h"
 #include "tcg/tcg.h"
 #include "tcg/tcg-temp-internal.h"
@@ -XXX,XX +XXX,XX @@ static void gen_mem_cb(struct qemu_plugin_dyn_cb *cb,
     tcg_temp_free_i32(cpu_index);
 }
 
-/* #define DEBUG_PLUGIN_GEN_OPS */
-static void pr_ops(void)
-{
-#ifdef DEBUG_PLUGIN_GEN_OPS
-    TCGOp *op;
-    int i = 0;
-
-    QTAILQ_FOREACH(op, &tcg_ctx->ops, link) {
-        const char *name = "";
-        const char *type = "";
-
-        if (op->opc == INDEX_op_plugin_cb_start) {
-            switch (op->args[0]) {
-            case PLUGIN_GEN_FROM_TB:
-                name = "tb";
-                break;
-            case PLUGIN_GEN_FROM_INSN:
-                name = "insn";
-                break;
-            case PLUGIN_GEN_FROM_MEM:
-                name = "mem";
-                break;
-            case PLUGIN_GEN_AFTER_INSN:
-                name = "after insn";
-                break;
-            default:
-                break;
-            }
-            switch (op->args[1]) {
-            case PLUGIN_GEN_CB_UDATA:
-                type = "udata";
-                break;
-            case PLUGIN_GEN_CB_INLINE:
-                type = "inline";
-                break;
-            case PLUGIN_GEN_CB_MEM:
-                type = "mem";
-                break;
-            case PLUGIN_GEN_ENABLE_MEM_HELPER:
-                type = "enable mem helper";
-                break;
-            case PLUGIN_GEN_DISABLE_MEM_HELPER:
-                type = "disable mem helper";
-                break;
-            default:
-                break;
-            }
-        }
-        printf("op[%2i]: %s %s %s\n", i, tcg_op_defs[op->opc].name, name, type);
-        i++;
-    }
-#endif
-}
-
 static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
 {
     TCGOp *op, *next;
     int insn_idx = -1;
 
-    pr_ops();
+    if (unlikely(qemu_loglevel_mask(LOG_TB_OP_PLUGIN)
+                 && qemu_log_in_addr_range(plugin_tb->vaddr))) {
+        FILE *logfile = qemu_log_trylock();
+        if (logfile) {
+            fprintf(logfile, "OP before plugin injection:\n");
+            tcg_dump_ops(tcg_ctx, logfile, false);
+            fprintf(logfile, "\n");
+            qemu_log_unlock(logfile);
+        }
+    }
 
     /*
      * While injecting code, we cannot afford to reuse any ebb temps
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
             break;
         }
     }
-    pr_ops();
 }
 
 bool plugin_gen_tb_start(CPUState *cpu, const DisasContextBase *db,
diff --git a/tcg/tcg.c b/tcg/tcg.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -XXX,XX +XXX,XX @@ static const char bswap_flag_name[][6] = {
     [TCG_BSWAP_IZ | TCG_BSWAP_OS] = "iz,os",
 };
 
+#ifdef CONFIG_PLUGIN
+static const char * const plugin_from_name[] = {
+    "from-tb",
+    "from-insn",
+    "after-insn",
+    "after-tb",
+};
+#endif
+
 static inline bool tcg_regset_single(TCGRegSet d)
 {
     return (d & (d - 1)) == 0;
@@ -XXX,XX +XXX,XX @@ static inline TCGReg tcg_regset_first(TCGRegSet d)
 #define ne_fprintf(...) \
     ({ int ret_ = fprintf(__VA_ARGS__); ret_ >= 0 ? ret_ : 0; })
 
-static void tcg_dump_ops(TCGContext *s, FILE *f, bool have_prefs)
+void tcg_dump_ops(TCGContext *s, FILE *f, bool have_prefs)
 {
     char buf[128];
     TCGOp *op;
@@ -XXX,XX +XXX,XX @@ static void tcg_dump_ops(TCGContext *s, FILE *f, bool have_prefs)
                     i = k = 1;
                 }
                 break;
+#ifdef CONFIG_PLUGIN
+            case INDEX_op_plugin_cb:
+                {
+                    TCGArg from = op->args[k++];
+                    const char *name = NULL;
+
+                    if (from < ARRAY_SIZE(plugin_from_name)) {
+                        name = plugin_from_name[from];
+                    }
+                    if (name) {
+                        col += ne_fprintf(f, "%s", name);
+                    } else {
+                        col += ne_fprintf(f, "$0x%" TCG_PRIlx, from);
+                    }
+                    i = 1;
+                }
+                break;
+#endif
             default:
                 i = 0;
                 break;
diff --git a/util/log.c b/util/log.c
index XXXXXXX..XXXXXXX 100644
--- a/util/log.c
+++ b/util/log.c
@@ -XXX,XX +XXX,XX @@ const QEMULogItem qemu_log_items[] = {
       "show micro ops after optimization" },
     { CPU_LOG_TB_OP_IND, "op_ind",
       "show micro ops before indirect lowering" },
+#ifdef CONFIG_PLUGIN
+    { LOG_TB_OP_PLUGIN, "op_plugin",
+      "show micro ops before plugin injection" },
+#endif
     { CPU_LOG_INT, "int",
       "show interrupts/exceptions in short format" },
     { CPU_LOG_EXEC, "exec",
-- 
2.34.1

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/plugin-gen.c | 84 +++++++++++++++++++++---------------------
 1 file changed, 41 insertions(+), 43 deletions(-)

diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -XXX,XX +XXX,XX @@ static void gen_mem_cb(struct qemu_plugin_dyn_cb *cb,
     tcg_temp_free_i32(cpu_index);
 }
 
+static void inject_cb(struct qemu_plugin_dyn_cb *cb)
+
+{
+    switch (cb->type) {
+    case PLUGIN_CB_REGULAR:
+        gen_udata_cb(cb);
+        break;
+    case PLUGIN_CB_INLINE:
+        gen_inline_cb(cb);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void inject_mem_cb(struct qemu_plugin_dyn_cb *cb,
+                          enum qemu_plugin_mem_rw rw,
+                          qemu_plugin_meminfo_t meminfo, TCGv_i64 addr)
+{
+    if (cb->rw & rw) {
+        switch (cb->type) {
+        case PLUGIN_CB_MEM_REGULAR:
+            gen_mem_cb(cb, meminfo, addr);
+            break;
+        default:
+            inject_cb(cb);
+            break;
+        }
+    }
+}
+
 static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
 {
     TCGOp *op, *next;
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
 
                 cbs = plugin_tb->cbs;
                 for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
-                    struct qemu_plugin_dyn_cb *cb =
-                        &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
-
-                    switch (cb->type) {
-                    case PLUGIN_CB_REGULAR:
-                        gen_udata_cb(cb);
-                        break;
-                    case PLUGIN_CB_INLINE:
-                        gen_inline_cb(cb);
-                        break;
-                    default:
-                        g_assert_not_reached();
-                    }
+                    inject_cb(
+                        &g_array_index(cbs, struct qemu_plugin_dyn_cb, i));
                 }
                 break;
 
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
 
                 cbs = insn->insn_cbs;
                 for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
-                    struct qemu_plugin_dyn_cb *cb =
-                        &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
-
-                    switch (cb->type) {
-                    case PLUGIN_CB_REGULAR:
-                        gen_udata_cb(cb);
-                        break;
-                    case PLUGIN_CB_INLINE:
-                        gen_inline_cb(cb);
-                        break;
-                    default:
-                        g_assert_not_reached();
-                    }
+                    inject_cb(
+                        &g_array_index(cbs, struct qemu_plugin_dyn_cb, i));
                 }
                 break;
 
@@ -XXX,XX +XXX,XX @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
         {
             TCGv_i64 addr = temp_tcgv_i64(arg_temp(op->args[0]));
             qemu_plugin_meminfo_t meminfo = op->args[1];
+            enum qemu_plugin_mem_rw rw =
+                (qemu_plugin_mem_is_store(meminfo)
+                 ? QEMU_PLUGIN_MEM_W : QEMU_PLUGIN_MEM_R);
             struct qemu_plugin_insn *insn;
             const GArray *cbs;
-            int i, n, rw;
+            int i, n;
 
             assert(insn_idx >= 0);
             insn = g_ptr_array_index(plugin_tb->insns, insn_idx);
-            rw = qemu_plugin_mem_is_store(meminfo) ? 2 : 1;
 
             tcg_ctx->emit_before_op = op;
 
             cbs = insn->mem_cbs;
             for (i = 0, n = (cbs ? cbs->len : 0); i < n; i++) {
-                struct qemu_plugin_dyn_cb *cb =
-                    &g_array_index(cbs, struct qemu_plugin_dyn_cb, i);
-
-                if (cb->rw & rw) {
-                    switch (cb->type) {
-                    case PLUGIN_CB_MEM_REGULAR:
-                        gen_mem_cb(cb, meminfo, addr);
-                        break;
-                    case PLUGIN_CB_INLINE:
-                        gen_inline_cb(cb);
-                        break;
-                    default:
-                        g_assert_not_reached();
-                    }
-                }
+                inject_mem_cb(&g_array_index(cbs, struct qemu_plugin_dyn_cb, i),
+                              rw, meminfo, addr);
             }
 
             tcg_ctx->emit_before_op = NULL;
-- 
2.34.1

Merge qemu_plugin_insn_alloc and qemu_plugin_tb_insn_get into
plugin_gen_insn_start, since it is used nowhere else.

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/plugin.h  | 39 ---------------------------------------
 accel/tcg/plugin-gen.c | 39 ++++++++++++++++++++++++++++++++-------
 2 files changed, 32 insertions(+), 46 deletions(-)

diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/plugin.h
+++ b/include/qemu/plugin.h
@@ -XXX,XX +XXX,XX @@ static inline void qemu_plugin_insn_cleanup_fn(gpointer data)
     g_byte_array_free(insn->data, true);
 }
 
-static inline struct qemu_plugin_insn *qemu_plugin_insn_alloc(void)
-{
-    struct qemu_plugin_insn *insn = g_new0(struct qemu_plugin_insn, 1);
-
-    insn->data = g_byte_array_sized_new(4);
-    return insn;
-}
-
 /* Internal context for this TranslationBlock */
 struct qemu_plugin_tb {
     GPtrArray *insns;
@@ -XXX,XX +XXX,XX @@ struct qemu_plugin_tb {
     GArray *cbs;
 };
 
-/**
- * qemu_plugin_tb_insn_get(): get next plugin record for translation.
- * @tb: the internal tb context
- * @pc: address of instruction
- */
-static inline
-struct qemu_plugin_insn *qemu_plugin_tb_insn_get(struct qemu_plugin_tb *tb,
-                                                 uint64_t pc)
-{
-    struct qemu_plugin_insn *insn;
-
-    if (unlikely(tb->n == tb->insns->len)) {
-        struct qemu_plugin_insn *new_insn = qemu_plugin_insn_alloc();
-        g_ptr_array_add(tb->insns, new_insn);
-    }
-
-    insn = g_ptr_array_index(tb->insns, tb->n++);
-    g_byte_array_set_size(insn->data, 0);
-    insn->calls_helpers = false;
-    insn->mem_helper = false;
-    insn->vaddr = pc;
-    if (insn->insn_cbs) {
-        g_array_set_size(insn->insn_cbs, 0);
-    }
-    if (insn->mem_cbs) {
-        g_array_set_size(insn->mem_cbs, 0);
-    }
-
-    return insn;
-}
-
 /**
  * struct CPUPluginState - per-CPU state for plugins
  * @event_mask: plugin event bitmap. Modified only via async work.
diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -XXX,XX +XXX,XX @@ bool plugin_gen_tb_start(CPUState *cpu, const DisasContextBase *db,
 void plugin_gen_insn_start(CPUState *cpu, const DisasContextBase *db)
 {
     struct qemu_plugin_tb *ptb = tcg_ctx->plugin_tb;
-    struct qemu_plugin_insn *pinsn;
+    struct qemu_plugin_insn *insn;
+    size_t n = db->num_insns;
+    vaddr pc;
 
-    pinsn = qemu_plugin_tb_insn_get(ptb, db->pc_next);
-    tcg_ctx->plugin_insn = pinsn;
-    plugin_gen_empty_callback(PLUGIN_GEN_FROM_INSN);
+    assert(n >= 1);
+    ptb->n = n;
+    if (n <= ptb->insns->len) {
+        insn = g_ptr_array_index(ptb->insns, n - 1);
+        g_byte_array_set_size(insn->data, 0);
+    } else {
+        assert(n - 1 == ptb->insns->len);
+        insn = g_new0(struct qemu_plugin_insn, 1);
+        insn->data = g_byte_array_sized_new(4);
+        g_ptr_array_add(ptb->insns, insn);
+    }
+
+    tcg_ctx->plugin_insn = insn;
+    insn->calls_helpers = false;
+    insn->mem_helper = false;
+    if (insn->insn_cbs) {
+        g_array_set_size(insn->insn_cbs, 0);
+    }
+    if (insn->mem_cbs) {
+        g_array_set_size(insn->mem_cbs, 0);
+    }
+
+    pc = db->pc_next;
+    insn->vaddr = pc;
 
     /*
      * Detect page crossing to get the new host address.
@@ -XXX,XX +XXX,XX @@ void plugin_gen_insn_start(CPUState *cpu, const DisasContextBase *db)
      * fetching instructions from a region not backed by RAM.
      */
     if (ptb->haddr1 == NULL) {
-        pinsn->haddr = NULL;
+        insn->haddr = NULL;
     } else if (is_same_page(db, db->pc_next)) {
-        pinsn->haddr = ptb->haddr1 + pinsn->vaddr - ptb->vaddr;
+        insn->haddr = ptb->haddr1 + pc - ptb->vaddr;
     } else {
         if (ptb->vaddr2 == -1) {
             ptb->vaddr2 = TARGET_PAGE_ALIGN(db->pc_first);
             get_page_addr_code_hostp(cpu_env(cpu), ptb->vaddr2, &ptb->haddr2);
         }
-        pinsn->haddr = ptb->haddr2 + pinsn->vaddr - ptb->vaddr2;
+        insn->haddr = ptb->haddr2 + pc - ptb->vaddr2;
     }
+
+    plugin_gen_empty_callback(PLUGIN_GEN_FROM_INSN);
 }
 
 void plugin_gen_insn_end(void)
-- 
2.34.1

Each caller can use tcg_gen_plugin_cb directly.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/plugin-gen.c | 19 +++----------------
 1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -XXX,XX +XXX,XX @@ enum plugin_gen_from {
     PLUGIN_GEN_AFTER_TB,
 };
 
-static void plugin_gen_empty_callback(enum plugin_gen_from from)
-{
-    switch (from) {
-    case PLUGIN_GEN_AFTER_INSN:
-    case PLUGIN_GEN_FROM_TB:
-    case PLUGIN_GEN_FROM_INSN:
-        tcg_gen_plugin_cb(from);
-        break;
-    default:
-        g_assert_not_reached();
-    }
-}
-
 /* called before finishing a TB with exit_tb, goto_tb or goto_ptr */
 void plugin_gen_disable_mem_helpers(void)
 {
@@ -XXX,XX +XXX,XX @@ bool plugin_gen_tb_start(CPUState *cpu, const DisasContextBase *db,
         ptb->mem_only = mem_only;
         ptb->mem_helper = false;
 
-        plugin_gen_empty_callback(PLUGIN_GEN_FROM_TB);
+        tcg_gen_plugin_cb(PLUGIN_GEN_FROM_TB);
     }
 
     tcg_ctx->plugin_insn = NULL;
@@ -XXX,XX +XXX,XX @@ void plugin_gen_insn_start(CPUState *cpu, const DisasContextBase *db)
         insn->haddr = ptb->haddr2 + pc - ptb->vaddr2;
     }
 
-    plugin_gen_empty_callback(PLUGIN_GEN_FROM_INSN);
+    tcg_gen_plugin_cb(PLUGIN_GEN_FROM_INSN);
 }
 
 void plugin_gen_insn_end(void)
 {
-    plugin_gen_empty_callback(PLUGIN_GEN_AFTER_INSN);
+    tcg_gen_plugin_cb(PLUGIN_GEN_AFTER_INSN);
 }
 
 /*
-- 
2.34.1

Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/plugin-gen.c | 31 ++++---------------------------
 1 file changed, 4 insertions(+), 27 deletions(-)

diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -XXX,XX +XXX,XX @@
  * Injecting the desired instrumentation could be done with a second
  * translation pass that combined the instrumentation requests, but that
  * would be ugly and inefficient since we would decode the guest code twice.
- * Instead, during TB translation we add "empty" instrumentation calls for all
- * possible instrumentation events, and then once we collect the instrumentation
- * requests from plugins, we either "fill in" those empty events or remove them
- * if they have no requests.
- *
- * When "filling in" an event we first copy the empty callback's TCG ops. This
- * might seem unnecessary, but it is done to support an arbitrary number
- * of callbacks per event. Take for example a regular instruction callback.
- * We first generate a callback to an empty helper function. Then, if two
- * plugins register one callback each for this instruction, we make two copies
- * of the TCG ops generated for the empty callback, substituting the function
- * pointer that points to the empty helper function with the plugins' desired
- * callback functions. After that we remove the empty callback's ops.
- *
- * Note that the location in TCGOp.args[] of the pointer to a helper function
- * varies across different guest and host architectures. Instead of duplicating
- * the logic that figures this out, we rely on the fact that the empty
- * callbacks point to empty functions that are unique pointers in the program.
- * Thus, to find the right location we just have to look for a match in
- * TCGOp.args[]. This is the main reason why we first copy an empty callback's
- * TCG ops and then fill them in; regardless of whether we have one or many
- * callbacks for that event, the logic to add all of them is the same.
- *
- * When generating more than one callback per event, we make a small
- * optimization to avoid generating redundant operations. For instance, for the
- * second and all subsequent callbacks of an event, we do not need to reload the
- * CPU's index into a TCG temp, since the first callback did it already.
+ * Instead, during TB translation we add "plugin_cb" marker opcodes
+ * for all possible instrumentation events, and then once we collect the
+ * instrumentation requests from plugins, we generate code for those markers
+ * or remove them if they have no requests.
  */
 #include "qemu/osdep.h"
 #include "qemu/plugin.h"
-- 
2.34.1