Series comparison

-[PULL 00/23] tcg patch queue
+[PULL 00/22] tcg patch queue
-The following changes since commit 9e5319ca52a5b9e84d55ad9c36e2c0b317a122bb:
+The following changes since commit 390e8fc6b0e7b521c9eceb8dfe0958e141009ab9:
-  Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging (2019-10-04 18:32:34 +0100)
+  Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging (2023-06-26 16:05:45 +0200)
 are available in the Git repository at:
-  https://github.com/rth7680/qemu.git tags/pull-tcg-20191013
+  https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230626
-for you to fetch changes up to d2f86bba6931388e275e8eb4ccd1dbcc7cae6328:
+for you to fetch changes up to a0eaae08c7c6a59c185cf646b02f4167b2ac6ec0:
-  cpus: kick all vCPUs when running thread=single (2019-10-07 14:08:58 -0400)
+  accel/tcg: Renumber TLB_DISCARD_WRITE (2023-06-26 17:33:00 +0200)
 ----------------------------------------------------------------
-Host vector support for tcg/ppc.
+accel/tcg: Replace target_ulong in some APIs
-Fix thread=single cpu kicking.
+accel/tcg: Remove CONFIG_PROFILER
 accel/tcg: Store some tlb flags in CPUTLBEntryFull
 tcg: Issue memory barriers as required for the guest memory model
 tcg: Fix temporary variable in tcg_gen_gvec_andcs
 ----------------------------------------------------------------
 Alex Bennée (1):
-      cpus: kick all vCPUs when running thread=single
+      softfloat: use QEMU_FLATTEN to avoid mistaken isra inlining
-Richard Henderson (22):
+Anton Johansson (11):
-      tcg/ppc: Introduce Altivec registers
+      accel: Replace target_ulong in tlb_*()
-      tcg/ppc: Introduce macro VX4()
+      accel/tcg/translate-all.c: Widen pc and cs_base
-      tcg/ppc: Introduce macros VRT(), VRA(), VRB(), VRC()
+      target: Widen pc/cs_base in cpu_get_tb_cpu_state
-      tcg/ppc: Create TCGPowerISA and have_isa
+      accel/tcg/cputlb.c: Widen CPUTLBEntry access functions
-      tcg/ppc: Replace HAVE_ISA_2_06
+      accel/tcg/cputlb.c: Widen addr in MMULookupPageData
-      tcg/ppc: Replace HAVE_ISEL macro with a variable
+      accel/tcg/cpu-exec.c: Widen pc to vaddr
-      tcg/ppc: Enable tcg backend vector compilation
+      accel/tcg: Widen pc to vaddr in CPUJumpCache
-      tcg/ppc: Add support for load/store/logic/comparison
+      accel: Replace target_ulong with vaddr in probe_*()
-      tcg/ppc: Add support for vector maximum/minimum
+      accel/tcg: Replace target_ulong with vaddr in *_mmu_lookup()
-      tcg/ppc: Add support for vector add/subtract
+      accel/tcg: Replace target_ulong with vaddr in translator_*()
-      tcg/ppc: Add support for vector saturated add/subtract
+      cpu: Replace target_ulong with hwaddr in tb_invalidate_phys_addr()
       tcg/ppc: Support vector shift by immediate
       tcg/ppc: Support vector multiply
       tcg/ppc: Support vector dup2
       tcg/ppc: Enable Altivec detection
       tcg/ppc: Update vector support for VSX
       tcg/ppc: Update vector support for v2.07 Altivec
       tcg/ppc: Update vector support for v2.07 VSX
       tcg/ppc: Update vector support for v2.07 FP
       tcg/ppc: Update vector support for v3.00 Altivec
       tcg/ppc: Update vector support for v3.00 load/store
       tcg/ppc: Update vector support for v3.00 dup/dupi
- tcg/ppc/tcg-target.h     |   51 ++-
+Fei Wu (1):
- tcg/ppc/tcg-target.opc.h |   13 +
+      accel/tcg: remove CONFIG_PROFILER
  cpus.c                   |   24 +-
  tcg/ppc/tcg-target.inc.c | 1118 ++++++++++++++++++++++++++++++++++++++++++----
 files changed, 1119 insertions(+), 87 deletions(-)
  create mode 100644 tcg/ppc/tcg-target.opc.h
+Max Chou (1):
+      tcg: Fix temporary variable in tcg_gen_gvec_andcs
+Richard Henderson (8):
+      tests/plugin: Remove duplicate insn log from libinsn.so
+      target/microblaze: Define TCG_GUEST_DEFAULT_MO
+      tcg: Do not elide memory barriers for !CF_PARALLEL in system mode
+      tcg: Add host memory barriers to cpu_ldst.h interfaces
+      accel/tcg: Remove check_tcg_memory_orders_compatible
+      accel/tcg: Store some tlb flags in CPUTLBEntryFull
+      accel/tcg: Move TLB_WATCHPOINT to TLB_SLOW_FLAGS_MASK
+      accel/tcg: Renumber TLB_DISCARD_WRITE
+ meson.build                              |   2 -
+ qapi/machine.json                        |  18 --
+ accel/tcg/internal.h                     |  40 +++-
+ accel/tcg/tb-hash.h                      |  12 +-
+ accel/tcg/tb-jmp-cache.h                 |   2 +-
+ include/exec/cpu-all.h                   |  27 ++-
+ include/exec/cpu-defs.h                  |  10 +-
+ include/exec/cpu_ldst.h                  |  10 +-
+ include/exec/exec-all.h                  |  95 +++++----
+ include/exec/translator.h                |   6 +-
+ include/hw/core/cpu.h                    |   1 +
+ include/qemu/plugin-memory.h             |   2 +-
+ include/qemu/timer.h                     |   9 -
+ include/tcg/tcg.h                        |  26 ---
+ target/alpha/cpu.h                       |   4 +-
+ target/arm/cpu.h                         |   4 +-
+ target/avr/cpu.h                         |   4 +-
+ target/cris/cpu.h                        |   4 +-
+ target/hexagon/cpu.h                     |   4 +-
+ target/hppa/cpu.h                        |   5 +-
+ target/i386/cpu.h                        |   4 +-
+ target/loongarch/cpu.h                   |   6 +-
+ target/m68k/cpu.h                        |   4 +-
+ target/microblaze/cpu.h                  |   7 +-
+ target/mips/cpu.h                        |   4 +-
+ target/nios2/cpu.h                       |   4 +-
+ target/openrisc/cpu.h                    |   5 +-
+ target/ppc/cpu.h                         |   8 +-
+ target/riscv/cpu.h                       |   4 +-
+ target/rx/cpu.h                          |   4 +-
+ target/s390x/cpu.h                       |   4 +-
+ target/sh4/cpu.h                         |   4 +-
+ target/sparc/cpu.h                       |   4 +-
+ target/tricore/cpu.h                     |   4 +-
+ target/xtensa/cpu.h                      |   4 +-
+ accel/stubs/tcg-stub.c                   |   6 +-
+ accel/tcg/cpu-exec.c                     |  43 ++--
+ accel/tcg/cputlb.c                       | 351 +++++++++++++++++--------------
+ accel/tcg/monitor.c                      |  31 ---
+ accel/tcg/tb-maint.c                     |   2 +-
+ accel/tcg/tcg-accel-ops.c                |  10 -
+ accel/tcg/tcg-all.c                      |  39 +---
+ accel/tcg/translate-all.c                |  46 +---
+ accel/tcg/translator.c                   |  10 +-
+ accel/tcg/user-exec.c                    |  24 ++-
+ cpu.c                                    |   2 +-
+ fpu/softfloat.c                          |  22 +-
+ softmmu/runstate.c                       |   9 -
+ target/arm/helper.c                      |   4 +-
+ target/ppc/helper_regs.c                 |   4 +-
+ target/riscv/cpu_helper.c                |   4 +-
+ tcg/tcg-op-gvec.c                        |   2 +-
+ tcg/tcg-op-ldst.c                        |   2 +-
+ tcg/tcg-op.c                             |  14 +-
+ tcg/tcg.c                                | 214 -------------------
+ tests/plugin/insn.c                      |   9 +-
+ tests/qtest/qmp-cmd-test.c               |   3 -
+ hmp-commands-info.hx                     |  15 --
+ meson_options.txt                        |   2 -
+ scripts/meson-buildoptions.sh            |   3 -
+ tests/tcg/i386/Makefile.softmmu-target   |   9 -
+ tests/tcg/i386/Makefile.target           |   6 -
+ tests/tcg/x86_64/Makefile.softmmu-target |   9 -
+files changed, 469 insertions(+), 781 deletions(-)

-[PULL 22/23] tcg/ppc: Update vector support for v3.00 dup/dupi
+[PULL 01/22] accel: Replace target_ulong in tlb_*()
-These new instructions are conditional on MSR.VEC for TX=1,
+From: Anton Johansson <anjo@rev.ng>
 so we can consider these Altivec instructions.
-Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
+Replaces target_ulong with vaddr for guest virtual addresses in tlb_*()
 functions and auxilliary structs.
 Signed-off-by: Anton Johansson <anjo@rev.ng>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-Id: <20230621135633.1649-2-anjo@rev.ng>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/ppc/tcg-target.inc.c | 28 ++++++++++++++++++++++++++--
+ include/exec/cpu-defs.h      |   4 +-
-file changed, 26 insertions(+), 2 deletions(-)
+ include/exec/exec-all.h      |  79 ++++++++--------
  include/qemu/plugin-memory.h |   2 +-
  accel/stubs/tcg-stub.c       |   2 +-
  accel/tcg/cputlb.c           | 177 +++++++++++++++++------------------
  accel/tcg/tb-maint.c         |   2 +-
 files changed, 131 insertions(+), 135 deletions(-)
-diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
+diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.inc.c
+--- a/include/exec/cpu-defs.h
-+++ b/tcg/ppc/tcg-target.inc.c
++++ b/include/exec/cpu-defs.h
-@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUTLBDesc {
+      * we must flush the entire tlb.  The region is matched if
- #define XXPERMDI   (OPCD(60) | (10 << 3) | 7)  /* v2.06, force ax=bx=tx=1 */
+      * (addr & large_page_mask) == large_page_addr.
- #define XXSEL      (OPCD(60) | (3 << 4) | 0xf) /* v2.06, force ax=bx=cx=tx=1 */
+      */
-+#define XXSPLTIB   (OPCD(60) | (360 << 1) | 1) /* v3.00, force tx=1 */
+-    target_ulong large_page_addr;
+-    target_ulong large_page_mask;
- #define MFVSRD     (XO31(51) | 1)   /* v2.07, force sx=1 */
++    vaddr large_page_addr;
- #define MFVSRWZ    (XO31(115) | 1)  /* v2.07, force sx=1 */
++    vaddr large_page_mask;
- #define MTVSRD     (XO31(179) | 1)  /* v2.07, force tx=1 */
+     /* host time (in ns) at the beginning of the time window */
- #define MTVSRWZ    (XO31(243) | 1)  /* v2.07, force tx=1 */
+     int64_t window_begin_ns;
-+#define MTVSRDD    (XO31(435) | 1)  /* v3.00, force tx=1 */
+     /* maximum number of entries observed in the window */
-+#define MTVSRWS    (XO31(403) | 1)  /* v3.00, force tx=1 */
+diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
+index XXXXXXX..XXXXXXX 100644
- #define RT(r) ((r)<<21)
+--- a/include/exec/exec-all.h
- #define RS(r) ((r)<<21)
++++ b/include/exec/exec-all.h
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type, TCGReg ret,
+@@ -XXX,XX +XXX,XX @@ void tlb_destroy(CPUState *cpu);
-             return;
+  * Flush one page from the TLB of the specified CPU, for all
   * MMU indexes.
   */
 -void tlb_flush_page(CPUState *cpu, target_ulong addr);
 +void tlb_flush_page(CPUState *cpu, vaddr addr);
  /**
   * tlb_flush_page_all_cpus:
   * @cpu: src CPU of the flush
@@ -XXX,XX +XXX,XX @@ void tlb_flush_page(CPUState *cpu, target_ulong addr);
   * Flush one page from the TLB of the specified CPU, for all
   * MMU indexes.
   */
 -void tlb_flush_page_all_cpus(CPUState *src, target_ulong addr);
 +void tlb_flush_page_all_cpus(CPUState *src, vaddr addr);
  /**
   * tlb_flush_page_all_cpus_synced:
   * @cpu: src CPU of the flush
@@ -XXX,XX +XXX,XX @@ void tlb_flush_page_all_cpus(CPUState *src, target_ulong addr);
   * the source vCPUs safe work is complete. This will depend on when
   * the guests translation ends the TB.
   */
 -void tlb_flush_page_all_cpus_synced(CPUState *src, target_ulong addr);
 +void tlb_flush_page_all_cpus_synced(CPUState *src, vaddr addr);
  /**
   * tlb_flush:
   * @cpu: CPU whose TLB should be flushed
@@ -XXX,XX +XXX,XX @@ void tlb_flush_all_cpus_synced(CPUState *src_cpu);
   * Flush one page from the TLB of the specified CPU, for the specified
   * MMU indexes.
   */
 -void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr,
 +void tlb_flush_page_by_mmuidx(CPUState *cpu, vaddr addr,
                                uint16_t idxmap);
  /**
   * tlb_flush_page_by_mmuidx_all_cpus:
@@ -XXX,XX +XXX,XX @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr,
   * Flush one page from the TLB of all CPUs, for the specified
   * MMU indexes.
   */
 -void tlb_flush_page_by_mmuidx_all_cpus(CPUState *cpu, target_ulong addr,
 +void tlb_flush_page_by_mmuidx_all_cpus(CPUState *cpu, vaddr addr,
                                         uint16_t idxmap);
  /**
   * tlb_flush_page_by_mmuidx_all_cpus_synced:
@@ -XXX,XX +XXX,XX @@ void tlb_flush_page_by_mmuidx_all_cpus(CPUState *cpu, target_ulong addr,
   * complete once  the source vCPUs safe work is complete. This will
   * depend on when the guests translation ends the TB.
   */
 -void tlb_flush_page_by_mmuidx_all_cpus_synced(CPUState *cpu, target_ulong addr,
 +void tlb_flush_page_by_mmuidx_all_cpus_synced(CPUState *cpu, vaddr addr,
                                                uint16_t idxmap);
  /**
   * tlb_flush_by_mmuidx:
@@ -XXX,XX +XXX,XX @@ void tlb_flush_by_mmuidx_all_cpus_synced(CPUState *cpu, uint16_t idxmap);
   *
   * Similar to tlb_flush_page_mask, but with a bitmap of indexes.
   */
 -void tlb_flush_page_bits_by_mmuidx(CPUState *cpu, target_ulong addr,
 +void tlb_flush_page_bits_by_mmuidx(CPUState *cpu, vaddr addr,
                                     uint16_t idxmap, unsigned bits);
  /* Similarly, with broadcast and syncing. */
 -void tlb_flush_page_bits_by_mmuidx_all_cpus(CPUState *cpu, target_ulong addr,
 +void tlb_flush_page_bits_by_mmuidx_all_cpus(CPUState *cpu, vaddr addr,
                                              uint16_t idxmap, unsigned bits);
  void tlb_flush_page_bits_by_mmuidx_all_cpus_synced
 -    (CPUState *cpu, target_ulong addr, uint16_t idxmap, unsigned bits);
 +    (CPUState *cpu, vaddr addr, uint16_t idxmap, unsigned bits);
  /**
   * tlb_flush_range_by_mmuidx
@@ -XXX,XX +XXX,XX @@ void tlb_flush_page_bits_by_mmuidx_all_cpus_synced
   * For each mmuidx in @idxmap, flush all pages within [@addr,@addr+@len),
   * comparing only the low @bits worth of each virtual page.
   */
 -void tlb_flush_range_by_mmuidx(CPUState *cpu, target_ulong addr,
 -                               target_ulong len, uint16_t idxmap,
 +void tlb_flush_range_by_mmuidx(CPUState *cpu, vaddr addr,
 +                               vaddr len, uint16_t idxmap,
                                 unsigned bits);
  /* Similarly, with broadcast and syncing. */
 -void tlb_flush_range_by_mmuidx_all_cpus(CPUState *cpu, target_ulong addr,
 -                                        target_ulong len, uint16_t idxmap,
 +void tlb_flush_range_by_mmuidx_all_cpus(CPUState *cpu, vaddr addr,
 +                                        vaddr len, uint16_t idxmap,
                                          unsigned bits);
  void tlb_flush_range_by_mmuidx_all_cpus_synced(CPUState *cpu,
 -                                               target_ulong addr,
 -                                               target_ulong len,
 +                                               vaddr addr,
 +                                               vaddr len,
                                                 uint16_t idxmap,
                                                 unsigned bits);
@@ -XXX,XX +XXX,XX @@ void tlb_flush_range_by_mmuidx_all_cpus_synced(CPUState *cpu,
   * tlb_set_page_full:
   * @cpu: CPU context
   * @mmu_idx: mmu index of the tlb to modify
 - * @vaddr: virtual address of the entry to add
 + * @addr: virtual address of the entry to add
   * @full: the details of the tlb entry
   *
   * Add an entry to @cpu tlb index @mmu_idx.  All of the fields of
@@ -XXX,XX +XXX,XX @@ void tlb_flush_range_by_mmuidx_all_cpus_synced(CPUState *cpu,
   * single TARGET_PAGE_SIZE region is mapped; @full->lg_page_size is only
   * used by tlb_flush_page.
   */
 -void tlb_set_page_full(CPUState *cpu, int mmu_idx, target_ulong vaddr,
 +void tlb_set_page_full(CPUState *cpu, int mmu_idx, vaddr addr,
                         CPUTLBEntryFull *full);
  /**
   * tlb_set_page_with_attrs:
   * @cpu: CPU to add this TLB entry for
 - * @vaddr: virtual address of page to add entry for
 + * @addr: virtual address of page to add entry for
   * @paddr: physical address of the page
   * @attrs: memory transaction attributes
   * @prot: access permissions (PAGE_READ/PAGE_WRITE/PAGE_EXEC bits)
@@ -XXX,XX +XXX,XX @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx, target_ulong vaddr,
   * @size: size of the page in bytes
   *
   * Add an entry to this CPU's TLB (a mapping from virtual address
 - * @vaddr to physical address @paddr) with the specified memory
 + * @addr to physical address @paddr) with the specified memory
   * transaction attributes. This is generally called by the target CPU
   * specific code after it has been called through the tlb_fill()
   * entry point and performed a successful page table walk to find
@@ -XXX,XX +XXX,XX @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx, target_ulong vaddr,
   * single TARGET_PAGE_SIZE region is mapped; the supplied @size is only
   * used by tlb_flush_page.
   */
 -void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
 +void tlb_set_page_with_attrs(CPUState *cpu, vaddr addr,
                               hwaddr paddr, MemTxAttrs attrs,
 -                             int prot, int mmu_idx, target_ulong size);
 +                             int prot, int mmu_idx, vaddr size);
  /* tlb_set_page:
   *
   * This function is equivalent to calling tlb_set_page_with_attrs()
   * with an @attrs argument of MEMTXATTRS_UNSPECIFIED. It's provided
   * as a convenience for CPUs which don't use memory transaction attributes.
   */
 -void tlb_set_page(CPUState *cpu, target_ulong vaddr,
 +void tlb_set_page(CPUState *cpu, vaddr addr,
                    hwaddr paddr, int prot,
 -                  int mmu_idx, target_ulong size);
 +                  int mmu_idx, vaddr size);
  #else
  static inline void tlb_init(CPUState *cpu)
  {
@@ -XXX,XX +XXX,XX @@ static inline void tlb_init(CPUState *cpu)
  static inline void tlb_destroy(CPUState *cpu)
  {
  }
 -static inline void tlb_flush_page(CPUState *cpu, target_ulong addr)
 +static inline void tlb_flush_page(CPUState *cpu, vaddr addr)
  {
  }
 -static inline void tlb_flush_page_all_cpus(CPUState *src, target_ulong addr)
 +static inline void tlb_flush_page_all_cpus(CPUState *src, vaddr addr)
  {
  }
 -static inline void tlb_flush_page_all_cpus_synced(CPUState *src,
 -                                                  target_ulong addr)
 +static inline void tlb_flush_page_all_cpus_synced(CPUState *src, vaddr addr)
  {
  }
  static inline void tlb_flush(CPUState *cpu)
@@ -XXX,XX +XXX,XX @@ static inline void tlb_flush_all_cpus_synced(CPUState *src_cpu)
  {
  }
  static inline void tlb_flush_page_by_mmuidx(CPUState *cpu,
 -                                            target_ulong addr, uint16_t idxmap)
 +                                            vaddr addr, uint16_t idxmap)
  {
  }
@@ -XXX,XX +XXX,XX @@ static inline void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap)
  {
  }
  static inline void tlb_flush_page_by_mmuidx_all_cpus(CPUState *cpu,
 -                                                     target_ulong addr,
 +                                                     vaddr addr,
                                                       uint16_t idxmap)
  {
  }
  static inline void tlb_flush_page_by_mmuidx_all_cpus_synced(CPUState *cpu,
 -                                                            target_ulong addr,
 +                                                            vaddr addr,
                                                              uint16_t idxmap)
  {
  }
@@ -XXX,XX +XXX,XX @@ static inline void tlb_flush_by_mmuidx_all_cpus_synced(CPUState *cpu,
  {
  }
  static inline void tlb_flush_page_bits_by_mmuidx(CPUState *cpu,
 -                                                 target_ulong addr,
 +                                                 vaddr addr,
                                                   uint16_t idxmap,
                                                   unsigned bits)
  {
  }
  static inline void tlb_flush_page_bits_by_mmuidx_all_cpus(CPUState *cpu,
 -                                                          target_ulong addr,
 +                                                          vaddr addr,
                                                            uint16_t idxmap,
                                                            unsigned bits)
  {
  }
  static inline void
 -tlb_flush_page_bits_by_mmuidx_all_cpus_synced(CPUState *cpu, target_ulong addr,
 +tlb_flush_page_bits_by_mmuidx_all_cpus_synced(CPUState *cpu, vaddr addr,
                                                uint16_t idxmap, unsigned bits)
  {
  }
 -static inline void tlb_flush_range_by_mmuidx(CPUState *cpu, target_ulong addr,
 -                                             target_ulong len, uint16_t idxmap,
 +static inline void tlb_flush_range_by_mmuidx(CPUState *cpu, vaddr addr,
 +                                             vaddr len, uint16_t idxmap,
                                               unsigned bits)
  {
  }
  static inline void tlb_flush_range_by_mmuidx_all_cpus(CPUState *cpu,
 -                                                      target_ulong addr,
 -                                                      target_ulong len,
 +                                                      vaddr addr,
 +                                                      vaddr len,
                                                        uint16_t idxmap,
                                                        unsigned bits)
  {
  }
  static inline void tlb_flush_range_by_mmuidx_all_cpus_synced(CPUState *cpu,
 -                                                             target_ulong addr,
 -                                                             target_long len,
 +                                                             vaddr addr,
 +                                                             vaddr len,
                                                               uint16_t idxmap,
                                                               unsigned bits)
  {
@@ -XXX,XX +XXX,XX @@ static inline void mmap_lock(void) {}
  static inline void mmap_unlock(void) {}
  void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length);
 -void tlb_set_dirty(CPUState *cpu, target_ulong vaddr);
 +void tlb_set_dirty(CPUState *cpu, vaddr addr);
  MemoryRegionSection *
  address_space_translate_for_iotlb(CPUState *cpu, int asidx, hwaddr addr,
 diff --git a/include/qemu/plugin-memory.h b/include/qemu/plugin-memory.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/qemu/plugin-memory.h
 +++ b/include/qemu/plugin-memory.h
@@ -XXX,XX +XXX,XX @@ struct qemu_plugin_hwaddr {
   * It would only fail if not called from an instrumented memory access
   * which would be an abuse of the API.
   */
 -bool tlb_plugin_lookup(CPUState *cpu, target_ulong addr, int mmu_idx,
 +bool tlb_plugin_lookup(CPUState *cpu, vaddr addr, int mmu_idx,
                         bool is_store, struct qemu_plugin_hwaddr *data);
  #endif /* PLUGIN_MEMORY_H */
 diff --git a/accel/stubs/tcg-stub.c b/accel/stubs/tcg-stub.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/stubs/tcg-stub.c
 +++ b/accel/stubs/tcg-stub.c
@@ -XXX,XX +XXX,XX @@ void tb_flush(CPUState *cpu)
  {
  }
 -void tlb_set_dirty(CPUState *cpu, target_ulong vaddr)
 +void tlb_set_dirty(CPUState *cpu, vaddr vaddr)
  {
  }
 diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/tcg/cputlb.c
 +++ b/accel/tcg/cputlb.c
@@ -XXX,XX +XXX,XX @@ void tlb_flush_all_cpus_synced(CPUState *src_cpu)
  }
  static bool tlb_hit_page_mask_anyprot(CPUTLBEntry *tlb_entry,
 -                                      target_ulong page, target_ulong mask)
 +                                      vaddr page, vaddr mask)
  {
      page &= mask;
      mask &= TARGET_PAGE_MASK | TLB_INVALID_MASK;
@@ -XXX,XX +XXX,XX @@ static bool tlb_hit_page_mask_anyprot(CPUTLBEntry *tlb_entry,
              page == (tlb_entry->addr_code & mask));
  }
 -static inline bool tlb_hit_page_anyprot(CPUTLBEntry *tlb_entry,
 -                                        target_ulong page)
 +static inline bool tlb_hit_page_anyprot(CPUTLBEntry *tlb_entry, vaddr page)
  {
      return tlb_hit_page_mask_anyprot(tlb_entry, page, -1);
  }
@@ -XXX,XX +XXX,XX @@ static inline bool tlb_entry_is_empty(const CPUTLBEntry *te)
  /* Called with tlb_c.lock held */
  static bool tlb_flush_entry_mask_locked(CPUTLBEntry *tlb_entry,
 -                                        target_ulong page,
 -                                        target_ulong mask)
 +                                        vaddr page,
 +                                        vaddr mask)
  {
      if (tlb_hit_page_mask_anyprot(tlb_entry, page, mask)) {
          memset(tlb_entry, -1, sizeof(*tlb_entry));
@@ -XXX,XX +XXX,XX @@ static bool tlb_flush_entry_mask_locked(CPUTLBEntry *tlb_entry,
      return false;
  }
 -static inline bool tlb_flush_entry_locked(CPUTLBEntry *tlb_entry,
 -                                          target_ulong page)
 +static inline bool tlb_flush_entry_locked(CPUTLBEntry *tlb_entry, vaddr page)
  {
      return tlb_flush_entry_mask_locked(tlb_entry, page, -1);
  }
  /* Called with tlb_c.lock held */
  static void tlb_flush_vtlb_page_mask_locked(CPUArchState *env, int mmu_idx,
 -                                            target_ulong page,
 -                                            target_ulong mask)
 +                                            vaddr page,
 +                                            vaddr mask)
  {
      CPUTLBDesc *d = &env_tlb(env)->d[mmu_idx];
      int k;
@@ -XXX,XX +XXX,XX @@ static void tlb_flush_vtlb_page_mask_locked(CPUArchState *env, int mmu_idx,
  }
  static inline void tlb_flush_vtlb_page_locked(CPUArchState *env, int mmu_idx,
 -                                              target_ulong page)
 +                                              vaddr page)
  {
      tlb_flush_vtlb_page_mask_locked(env, mmu_idx, page, -1);
  }
 -static void tlb_flush_page_locked(CPUArchState *env, int midx,
 -                                  target_ulong page)
 +static void tlb_flush_page_locked(CPUArchState *env, int midx, vaddr page)
  {
 -    target_ulong lp_addr = env_tlb(env)->d[midx].large_page_addr;
 -    target_ulong lp_mask = env_tlb(env)->d[midx].large_page_mask;
 +    vaddr lp_addr = env_tlb(env)->d[midx].large_page_addr;
 +    vaddr lp_mask = env_tlb(env)->d[midx].large_page_mask;
      /* Check if we need to flush due to large pages.  */
      if ((page & lp_mask) == lp_addr) {
 -        tlb_debug("forcing full flush midx %d ("
 -                  TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
 +        tlb_debug("forcing full flush midx %d (%"
 +                  VADDR_PRIx "/%" VADDR_PRIx ")\n",
                    midx, lp_addr, lp_mask);
          tlb_flush_one_mmuidx_locked(env, midx, get_clock_realtime());
      } else {
@@ -XXX,XX +XXX,XX @@ static void tlb_flush_page_locked(CPUArchState *env, int midx,
   * at @addr from the tlbs indicated by @idxmap from @cpu.
   */
  static void tlb_flush_page_by_mmuidx_async_0(CPUState *cpu,
 -                                             target_ulong addr,
 +                                             vaddr addr,
                                               uint16_t idxmap)
  {
      CPUArchState *env = cpu->env_ptr;
@@ -XXX,XX +XXX,XX @@ static void tlb_flush_page_by_mmuidx_async_0(CPUState *cpu,
      assert_cpu_is_self(cpu);
 -    tlb_debug("page addr:" TARGET_FMT_lx " mmu_map:0x%x\n", addr, idxmap);
 +    tlb_debug("page addr: %" VADDR_PRIx " mmu_map:0x%x\n", addr, idxmap);
      qemu_spin_lock(&env_tlb(env)->c.lock);
      for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
@@ -XXX,XX +XXX,XX @@ static void tlb_flush_page_by_mmuidx_async_0(CPUState *cpu,
  static void tlb_flush_page_by_mmuidx_async_1(CPUState *cpu,
                                               run_on_cpu_data data)
  {
 -    target_ulong addr_and_idxmap = (target_ulong) data.target_ptr;
 -    target_ulong addr = addr_and_idxmap & TARGET_PAGE_MASK;
 +    vaddr addr_and_idxmap = data.target_ptr;
 +    vaddr addr = addr_and_idxmap & TARGET_PAGE_MASK;
      uint16_t idxmap = addr_and_idxmap & ~TARGET_PAGE_MASK;
      tlb_flush_page_by_mmuidx_async_0(cpu, addr, idxmap);
  }
  typedef struct {
 -    target_ulong addr;
 +    vaddr addr;
      uint16_t idxmap;
  } TLBFlushPageByMMUIdxData;
@@ -XXX,XX +XXX,XX @@ static void tlb_flush_page_by_mmuidx_async_2(CPUState *cpu,
      g_free(d);
  }
 -void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, uint16_t idxmap)
 +void tlb_flush_page_by_mmuidx(CPUState *cpu, vaddr addr, uint16_t idxmap)
  {
 -    tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%" PRIx16 "\n", addr, idxmap);
 +    tlb_debug("addr: %" VADDR_PRIx " mmu_idx:%" PRIx16 "\n", addr, idxmap);
      /* This should already be page aligned */
      addr &= TARGET_PAGE_MASK;
@@ -XXX,XX +XXX,XX @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, uint16_t idxmap)
      }
  }
 -void tlb_flush_page(CPUState *cpu, target_ulong addr)
 +void tlb_flush_page(CPUState *cpu, vaddr addr)
  {
      tlb_flush_page_by_mmuidx(cpu, addr, ALL_MMUIDX_BITS);
  }
 -void tlb_flush_page_by_mmuidx_all_cpus(CPUState *src_cpu, target_ulong addr,
 +void tlb_flush_page_by_mmuidx_all_cpus(CPUState *src_cpu, vaddr addr,
                                         uint16_t idxmap)
  {
 -    tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%"PRIx16"\n", addr, idxmap);
 +    tlb_debug("addr: %" VADDR_PRIx " mmu_idx:%"PRIx16"\n", addr, idxmap);
      /* This should already be page aligned */
      addr &= TARGET_PAGE_MASK;
@@ -XXX,XX +XXX,XX @@ void tlb_flush_page_by_mmuidx_all_cpus(CPUState *src_cpu, target_ulong addr,
      tlb_flush_page_by_mmuidx_async_0(src_cpu, addr, idxmap);
  }
 -void tlb_flush_page_all_cpus(CPUState *src, target_ulong addr)
 +void tlb_flush_page_all_cpus(CPUState *src, vaddr addr)
  {
      tlb_flush_page_by_mmuidx_all_cpus(src, addr, ALL_MMUIDX_BITS);
  }
  void tlb_flush_page_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
 -                                              target_ulong addr,
 +                                              vaddr addr,
                                                uint16_t idxmap)
  {
 -    tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%"PRIx16"\n", addr, idxmap);
 +    tlb_debug("addr: %" VADDR_PRIx " mmu_idx:%"PRIx16"\n", addr, idxmap);
      /* This should already be page aligned */
      addr &= TARGET_PAGE_MASK;
@@ -XXX,XX +XXX,XX @@ void tlb_flush_page_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
      }
  }
 -void tlb_flush_page_all_cpus_synced(CPUState *src, target_ulong addr)
 +void tlb_flush_page_all_cpus_synced(CPUState *src, vaddr addr)
  {
      tlb_flush_page_by_mmuidx_all_cpus_synced(src, addr, ALL_MMUIDX_BITS);
  }
  static void tlb_flush_range_locked(CPUArchState *env, int midx,
 -                                   target_ulong addr, target_ulong len,
 +                                   vaddr addr, vaddr len,
                                     unsigned bits)
  {
      CPUTLBDesc *d = &env_tlb(env)->d[midx];
      CPUTLBDescFast *f = &env_tlb(env)->f[midx];
 -    target_ulong mask = MAKE_64BIT_MASK(0, bits);
 +    vaddr mask = MAKE_64BIT_MASK(0, bits);
      /*
       * If @bits is smaller than the tlb size, there may be multiple entries
@@ -XXX,XX +XXX,XX @@ static void tlb_flush_range_locked(CPUArchState *env, int midx,
       */
      if (mask < f->mask || len > f->mask) {
          tlb_debug("forcing full flush midx %d ("
 -                  TARGET_FMT_lx "/" TARGET_FMT_lx "+" TARGET_FMT_lx ")\n",
 +                  "%" VADDR_PRIx "/%" VADDR_PRIx "+%" VADDR_PRIx ")\n",
                    midx, addr, mask, len);
          tlb_flush_one_mmuidx_locked(env, midx, get_clock_realtime());
          return;
@@ -XXX,XX +XXX,XX @@ static void tlb_flush_range_locked(CPUArchState *env, int midx,
       */
      if (((addr + len - 1) & d->large_page_mask) == d->large_page_addr) {
          tlb_debug("forcing full flush midx %d ("
 -                  TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
 +                  "%" VADDR_PRIx "/%" VADDR_PRIx ")\n",
                    midx, d->large_page_addr, d->large_page_mask);
          tlb_flush_one_mmuidx_locked(env, midx, get_clock_realtime());
          return;
      }
 -    for (target_ulong i = 0; i < len; i += TARGET_PAGE_SIZE) {
 -        target_ulong page = addr + i;
 +    for (vaddr i = 0; i < len; i += TARGET_PAGE_SIZE) {
 +        vaddr page = addr + i;
          CPUTLBEntry *entry = tlb_entry(env, midx, page);
          if (tlb_flush_entry_mask_locked(entry, page, mask)) {
@@ -XXX,XX +XXX,XX @@ static void tlb_flush_range_locked(CPUArchState *env, int midx,
  }
  typedef struct {
 -    target_ulong addr;
 -    target_ulong len;
 +    vaddr addr;
 +    vaddr len;
      uint16_t idxmap;
      uint16_t bits;
  } TLBFlushRangeData;
@@ -XXX,XX +XXX,XX @@ static void tlb_flush_range_by_mmuidx_async_0(CPUState *cpu,
      assert_cpu_is_self(cpu);
 -    tlb_debug("range:" TARGET_FMT_lx "/%u+" TARGET_FMT_lx " mmu_map:0x%x\n",
 +    tlb_debug("range: %" VADDR_PRIx "/%u+%" VADDR_PRIx " mmu_map:0x%x\n",
                d.addr, d.bits, d.len, d.idxmap);
      qemu_spin_lock(&env_tlb(env)->c.lock);
@@ -XXX,XX +XXX,XX @@ static void tlb_flush_range_by_mmuidx_async_0(CPUState *cpu,
       * overlap the flushed pages, which includes the previous.
       */
      d.addr -= TARGET_PAGE_SIZE;
 -    for (target_ulong i = 0, n = d.len / TARGET_PAGE_SIZE + 1; i < n; i++) {
 +    for (vaddr i = 0, n = d.len / TARGET_PAGE_SIZE + 1; i < n; i++) {
          tb_jmp_cache_clear_page(cpu, d.addr);
          d.addr += TARGET_PAGE_SIZE;
      }
@@ -XXX,XX +XXX,XX @@ static void tlb_flush_range_by_mmuidx_async_1(CPUState *cpu,
      g_free(d);
  }
 -void tlb_flush_range_by_mmuidx(CPUState *cpu, target_ulong addr,
 -                               target_ulong len, uint16_t idxmap,
 +void tlb_flush_range_by_mmuidx(CPUState *cpu, vaddr addr,
 +                               vaddr len, uint16_t idxmap,
                                 unsigned bits)
  {
      TLBFlushRangeData d;
@@ -XXX,XX +XXX,XX @@ void tlb_flush_range_by_mmuidx(CPUState *cpu, target_ulong addr,
      }
  }
 -void tlb_flush_page_bits_by_mmuidx(CPUState *cpu, target_ulong addr,
 +void tlb_flush_page_bits_by_mmuidx(CPUState *cpu, vaddr addr,
                                     uint16_t idxmap, unsigned bits)
  {
      tlb_flush_range_by_mmuidx(cpu, addr, TARGET_PAGE_SIZE, idxmap, bits);
  }
  void tlb_flush_range_by_mmuidx_all_cpus(CPUState *src_cpu,
 -                                        target_ulong addr, target_ulong len,
 +                                        vaddr addr, vaddr len,
                                          uint16_t idxmap, unsigned bits)
  {
      TLBFlushRangeData d;
@@ -XXX,XX +XXX,XX @@ void tlb_flush_range_by_mmuidx_all_cpus(CPUState *src_cpu,
  }
  void tlb_flush_page_bits_by_mmuidx_all_cpus(CPUState *src_cpu,
 -                                            target_ulong addr,
 -                                            uint16_t idxmap, unsigned bits)
 +                                            vaddr addr, uint16_t idxmap,
 +                                            unsigned bits)
  {
      tlb_flush_range_by_mmuidx_all_cpus(src_cpu, addr, TARGET_PAGE_SIZE,
                                         idxmap, bits);
  }
  void tlb_flush_range_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
 -                                               target_ulong addr,
 -                                               target_ulong len,
 +                                               vaddr addr,
 +                                               vaddr len,
                                                 uint16_t idxmap,
                                                 unsigned bits)
  {
@@ -XXX,XX +XXX,XX @@ void tlb_flush_range_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
  }
  void tlb_flush_page_bits_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
 -                                                   target_ulong addr,
 +                                                   vaddr addr,
                                                     uint16_t idxmap,
                                                     unsigned bits)
  {
@@ -XXX,XX +XXX,XX @@ void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length)
  /* Called with tlb_c.lock held */
  static inline void tlb_set_dirty1_locked(CPUTLBEntry *tlb_entry,
 -                                         target_ulong vaddr)
 +                                         vaddr addr)
  {
 -    if (tlb_entry->addr_write == (vaddr | TLB_NOTDIRTY)) {
 -        tlb_entry->addr_write = vaddr;
 +    if (tlb_entry->addr_write == (addr | TLB_NOTDIRTY)) {
 +        tlb_entry->addr_write = addr;
      }
  }
  /* update the TLB corresponding to virtual page vaddr
     so that it is no longer dirty */
 -void tlb_set_dirty(CPUState *cpu, target_ulong vaddr)
 +void tlb_set_dirty(CPUState *cpu, vaddr addr)
  {
      CPUArchState *env = cpu->env_ptr;
      int mmu_idx;
      assert_cpu_is_self(cpu);
 -    vaddr &= TARGET_PAGE_MASK;
 +    addr &= TARGET_PAGE_MASK;
      qemu_spin_lock(&env_tlb(env)->c.lock);
      for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
 -        tlb_set_dirty1_locked(tlb_entry(env, mmu_idx, vaddr), vaddr);
 +        tlb_set_dirty1_locked(tlb_entry(env, mmu_idx, addr), addr);
      }
      for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
          int k;
          for (k = 0; k < CPU_VTLB_SIZE; k++) {
 -            tlb_set_dirty1_locked(&env_tlb(env)->d[mmu_idx].vtable[k], vaddr);
 +            tlb_set_dirty1_locked(&env_tlb(env)->d[mmu_idx].vtable[k], addr);
          }
      }
-+    if (have_isa_3_00 && val == (tcg_target_long)dup_const(MO_8, val)) {
+     qemu_spin_unlock(&env_tlb(env)->c.lock);
-+        tcg_out32(s, XXSPLTIB | VRT(ret) | ((val & 0xff) << 11));
+@@ -XXX,XX +XXX,XX @@ void tlb_set_dirty(CPUState *cpu, target_ulong vaddr)
-+        return;
+ /* Our TLB does not support large pages, so remember the area covered by
-+    }
+    large pages and trigger a full TLB flush if these are invalidated.  */
  static void tlb_add_large_page(CPUArchState *env, int mmu_idx,
 -                               target_ulong vaddr, target_ulong size)
 +                               vaddr addr, uint64_t size)
  {
 -    target_ulong lp_addr = env_tlb(env)->d[mmu_idx].large_page_addr;
 -    target_ulong lp_mask = ~(size - 1);
 +    vaddr lp_addr = env_tlb(env)->d[mmu_idx].large_page_addr;
 +    vaddr lp_mask = ~(size - 1);
 -    if (lp_addr == (target_ulong)-1) {
 +    if (lp_addr == (vaddr)-1) {
          /* No previous large page.  */
 -        lp_addr = vaddr;
 +        lp_addr = addr;
      } else {
          /* Extend the existing region to include the new page.
             This is a compromise between unnecessary flushes and
             the cost of maintaining a full variable size TLB.  */
          lp_mask &= env_tlb(env)->d[mmu_idx].large_page_mask;
 -        while (((lp_addr ^ vaddr) & lp_mask) != 0) {
 +        while (((lp_addr ^ addr) & lp_mask) != 0) {
              lp_mask <<= 1;
          }
      }
@@ -XXX,XX +XXX,XX @@ static void tlb_add_large_page(CPUArchState *env, int mmu_idx,
   * critical section.
   */
  void tlb_set_page_full(CPUState *cpu, int mmu_idx,
 -                       target_ulong vaddr, CPUTLBEntryFull *full)
 +                       vaddr addr, CPUTLBEntryFull *full)
  {
      CPUArchState *env = cpu->env_ptr;
      CPUTLB *tlb = env_tlb(env);
      CPUTLBDesc *desc = &tlb->d[mmu_idx];
      MemoryRegionSection *section;
      unsigned int index;
 -    target_ulong address;
 -    target_ulong write_address;
 +    vaddr address;
 +    vaddr write_address;
      uintptr_t addend;
      CPUTLBEntry *te, tn;
      hwaddr iotlb, xlat, sz, paddr_page;
 -    target_ulong vaddr_page;
 +    vaddr addr_page;
      int asidx, wp_flags, prot;
      bool is_ram, is_romd;
@@ -XXX,XX +XXX,XX @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx,
          sz = TARGET_PAGE_SIZE;
      } else {
          sz = (hwaddr)1 << full->lg_page_size;
 -        tlb_add_large_page(env, mmu_idx, vaddr, sz);
 +        tlb_add_large_page(env, mmu_idx, addr, sz);
      }
 -    vaddr_page = vaddr & TARGET_PAGE_MASK;
 +    addr_page = addr & TARGET_PAGE_MASK;
      paddr_page = full->phys_addr & TARGET_PAGE_MASK;
      prot = full->prot;
@@ -XXX,XX +XXX,XX @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx,
                                                  &xlat, &sz, full->attrs, &prot);
      assert(sz >= TARGET_PAGE_SIZE);
 -    tlb_debug("vaddr=" TARGET_FMT_lx " paddr=0x" HWADDR_FMT_plx
 +    tlb_debug("vaddr=%" VADDR_PRIx " paddr=0x" HWADDR_FMT_plx
                " prot=%x idx=%d\n",
 -              vaddr, full->phys_addr, prot, mmu_idx);
 +              addr, full->phys_addr, prot, mmu_idx);
 -    address = vaddr_page;
 +    address = addr_page;
      if (full->lg_page_size < TARGET_PAGE_BITS) {
          /* Repeat the MMU check and TLB fill on every access.  */
          address |= TLB_INVALID_MASK;
@@ -XXX,XX +XXX,XX @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx,
          }
      }
 -    wp_flags = cpu_watchpoint_address_matches(cpu, vaddr_page,
 +    wp_flags = cpu_watchpoint_address_matches(cpu, addr_page,
                                                TARGET_PAGE_SIZE);
 -    index = tlb_index(env, mmu_idx, vaddr_page);
 -    te = tlb_entry(env, mmu_idx, vaddr_page);
 +    index = tlb_index(env, mmu_idx, addr_page);
 +    te = tlb_entry(env, mmu_idx, addr_page);
      /*
-      * Otherwise we must load the value from the constant pool.
+      * Hold the TLB lock for the rest of the function. We could acquire/release
-@@ -XXX,XX +XXX,XX @@ static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
+@@ -XXX,XX +XXX,XX @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx,
-                             TCGReg dst, TCGReg src)
+     tlb->c.dirty |= 1 << mmu_idx;
- {
-     tcg_debug_assert(dst >= TCG_REG_V0);
+     /* Make sure there's no cached translation for the new page.  */
--    tcg_debug_assert(src >= TCG_REG_V0);
+-    tlb_flush_vtlb_page_locked(env, mmu_idx, vaddr_page);
-+
++    tlb_flush_vtlb_page_locked(env, mmu_idx, addr_page);
 +    /* Splat from integer reg allowed via constraints for v3.00.  */
 +    if (src < TCG_REG_V0) {
 +        tcg_debug_assert(have_isa_3_00);
 +        switch (vece) {
 +        case MO_64:
 +            tcg_out32(s, MTVSRDD | VRT(dst) | RA(src) | RB(src));
 +            return true;
 +        case MO_32:
 +            tcg_out32(s, MTVSRWS | VRT(dst) | RA(src));
 +            return true;
 +        default:
 +            /* Fail, so that we fall back on either dupm or mov+dup.  */
 +            return false;
 +        }
 +    }
      /*
-      * Recall we use (or emulate) VSX integer loads, so the integer is
+      * Only evict the old entry to the victim tlb if it's for a
-@@ -XXX,XX +XXX,XX @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+      * different page; otherwise just overwrite the stale data.
-     static const TCGTargetOpDef sub2
+      */
-         = { .args_ct_str = { "r", "r", "rI", "rZM", "r", "r" } };
+-    if (!tlb_hit_page_anyprot(te, vaddr_page) && !tlb_entry_is_empty(te)) {
-     static const TCGTargetOpDef v_r = { .args_ct_str = { "v", "r" } };
++    if (!tlb_hit_page_anyprot(te, addr_page) && !tlb_entry_is_empty(te)) {
-+    static const TCGTargetOpDef v_vr = { .args_ct_str = { "v", "vr" } };
+         unsigned vidx = desc->vindex++ % CPU_VTLB_SIZE;
-     static const TCGTargetOpDef v_v = { .args_ct_str = { "v", "v" } };
+         CPUTLBEntry *tv = &desc->vtable[vidx];
-     static const TCGTargetOpDef v_v_v = { .args_ct_str = { "v", "v", "v" } };
-     static const TCGTargetOpDef v_v_v_v
+@@ -XXX,XX +XXX,XX @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx,
-@@ -XXX,XX +XXX,XX @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+      * vaddr we add back in io_readx()/io_writex()/get_page_addr_code().
-         return &v_v_v;
+      */
-     case INDEX_op_not_vec:
+     desc->fulltlb[index] = *full;
-     case INDEX_op_neg_vec:
+-    desc->fulltlb[index].xlat_section = iotlb - vaddr_page;
--    case INDEX_op_dup_vec:
++    desc->fulltlb[index].xlat_section = iotlb - addr_page;
-         return &v_v;
+     desc->fulltlb[index].phys_addr = paddr_page;
-+    case INDEX_op_dup_vec:
-+        return have_isa_3_00 ? &v_vr : &v_v;
+     /* Now calculate the new entry */
-     case INDEX_op_ld_vec:
+-    tn.addend = addend - vaddr_page;
-     case INDEX_op_st_vec:
++    tn.addend = addend - addr_page;
-     case INDEX_op_dupm_vec:
+     if (prot & PAGE_READ) {
          tn.addr_read = address;
          if (wp_flags & BP_MEM_READ) {
@@ -XXX,XX +XXX,XX @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx,
      qemu_spin_unlock(&tlb->c.lock);
  }
 -void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
 +void tlb_set_page_with_attrs(CPUState *cpu, vaddr addr,
                               hwaddr paddr, MemTxAttrs attrs, int prot,
 -                             int mmu_idx, target_ulong size)
 +                             int mmu_idx, uint64_t size)
  {
      CPUTLBEntryFull full = {
          .phys_addr = paddr,
@@ -XXX,XX +XXX,XX @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
      };
      assert(is_power_of_2(size));
 -    tlb_set_page_full(cpu, mmu_idx, vaddr, &full);
 +    tlb_set_page_full(cpu, mmu_idx, addr, &full);
  }
 -void tlb_set_page(CPUState *cpu, target_ulong vaddr,
 +void tlb_set_page(CPUState *cpu, vaddr addr,
                    hwaddr paddr, int prot,
 -                  int mmu_idx, target_ulong size)
 +                  int mmu_idx, uint64_t size)
  {
 -    tlb_set_page_with_attrs(cpu, vaddr, paddr, MEMTXATTRS_UNSPECIFIED,
 +    tlb_set_page_with_attrs(cpu, addr, paddr, MEMTXATTRS_UNSPECIFIED,
                              prot, mmu_idx, size);
  }
@@ -XXX,XX +XXX,XX @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr,
   * caller's prior references to the TLB table (e.g. CPUTLBEntry pointers) must
   * be discarded and looked up again (e.g. via tlb_entry()).
   */
 -static void tlb_fill(CPUState *cpu, target_ulong addr, int size,
 +static void tlb_fill(CPUState *cpu, vaddr addr, int size,
                       MMUAccessType access_type, int mmu_idx, uintptr_t retaddr)
  {
      bool ok;
@@ -XXX,XX +XXX,XX @@ static inline void cpu_transaction_failed(CPUState *cpu, hwaddr physaddr,
  }
  static uint64_t io_readx(CPUArchState *env, CPUTLBEntryFull *full,
 -                         int mmu_idx, target_ulong addr, uintptr_t retaddr,
 +                         int mmu_idx, vaddr addr, uintptr_t retaddr,
                           MMUAccessType access_type, MemOp op)
  {
      CPUState *cpu = env_cpu(env);
@@ -XXX,XX +XXX,XX @@ static void save_iotlb_data(CPUState *cs, MemoryRegionSection *section,
  }
  static void io_writex(CPUArchState *env, CPUTLBEntryFull *full,
 -                      int mmu_idx, uint64_t val, target_ulong addr,
 +                      int mmu_idx, uint64_t val, vaddr addr,
                        uintptr_t retaddr, MemOp op)
  {
      CPUState *cpu = env_cpu(env);
@@ -XXX,XX +XXX,XX @@ static void io_writex(CPUArchState *env, CPUTLBEntryFull *full,
  /* Return true if ADDR is present in the victim tlb, and has been copied
     back to the main tlb.  */
  static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
 -                           MMUAccessType access_type, target_ulong page)
 +                           MMUAccessType access_type, vaddr page)
  {
      size_t vidx;
@@ -XXX,XX +XXX,XX @@ tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
   * from the same thread (which a mem callback will be) this is safe.
   */
 -bool tlb_plugin_lookup(CPUState *cpu, target_ulong addr, int mmu_idx,
 +bool tlb_plugin_lookup(CPUState *cpu, vaddr addr, int mmu_idx,
                         bool is_store, struct qemu_plugin_hwaddr *data)
  {
      CPUArchState *env = cpu->env_ptr;
      CPUTLBEntry *tlbe = tlb_entry(env, mmu_idx, addr);
      uintptr_t index = tlb_index(env, mmu_idx, addr);
 -    target_ulong tlb_addr = is_store ? tlb_addr_write(tlbe) : tlbe->addr_read;
 +    vaddr tlb_addr = is_store ? tlb_addr_write(tlbe) : tlbe->addr_read;
      if (likely(tlb_hit(tlb_addr, addr))) {
          /* We must have an iotlb entry for MMIO */
 diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/tcg/tb-maint.c
 +++ b/accel/tcg/tb-maint.c
@@ -XXX,XX +XXX,XX @@ static void tb_remove_all(void)
  /* Call with mmap_lock held. */
  static void tb_record(TranslationBlock *tb, PageDesc *p1, PageDesc *p2)
  {
 -    target_ulong addr;
 +    vaddr addr;
      int flags;
      assert_memory_lock();
 --
-.17.1
+.34.1

-[PULL 21/23] tcg/ppc: Update vector support for v3.00 load/store
+[PULL 02/22] accel/tcg/translate-all.c: Widen pc and cs_base
-These new instructions are a mix of those like LXSD that are
+From: Anton Johansson <anjo@rev.ng>
 only conditional only on MSR.VEC and those like LXV that are
 conditional on MSR.VEC for TX=1.  Thus, in the end, we can
 consider all of these as Altivec instructions.
-Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
+Signed-off-by: Anton Johansson <anjo@rev.ng>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-Id: <20230621135633.1649-3-anjo@rev.ng>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/ppc/tcg-target.inc.c | 47 ++++++++++++++++++++++++++++++++--------
+ accel/tcg/internal.h      |  6 +++---
-file changed, 38 insertions(+), 9 deletions(-)
+ accel/tcg/translate-all.c | 10 +++++-----
 files changed, 8 insertions(+), 8 deletions(-)
-diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
+diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.inc.c
+--- a/accel/tcg/internal.h
-+++ b/tcg/ppc/tcg-target.inc.c
++++ b/accel/tcg/internal.h
-@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
+@@ -XXX,XX +XXX,XX @@ void tb_invalidate_phys_range_fast(ram_addr_t ram_addr,
- #define LXSDX      (XO31(588) | 1)  /* v2.06, force tx=1 */
+ G_NORETURN void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr);
- #define LXVDSX     (XO31(332) | 1)  /* v2.06, force tx=1 */
+ #endif /* CONFIG_SOFTMMU */
- #define LXSIWZX    (XO31(12) | 1)   /* v2.07, force tx=1 */
-+#define LXV        (OPCD(61) | 8 | 1)  /* v3.00, force tx=1 */
+-TranslationBlock *tb_gen_code(CPUState *cpu, target_ulong pc,
-+#define LXSD       (OPCD(57) | 2)   /* v3.00 */
+-                              target_ulong cs_base, uint32_t flags,
-+#define LXVWSX     (XO31(364) | 1)  /* v3.00, force tx=1 */
++TranslationBlock *tb_gen_code(CPUState *cpu, vaddr pc,
++                              uint64_t cs_base, uint32_t flags,
- #define STVX       XO31(231)
+                               int cflags);
- #define STVEWX     XO31(199)
+ void page_init(void);
- #define STXSDX     (XO31(716) | 1)  /* v2.06, force sx=1 */
+ void tb_htable_init(void);
- #define STXSIWX    (XO31(140) | 1)  /* v2.07, force sx=1 */
+@@ -XXX,XX +XXX,XX @@ void cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
-+#define STXV       (OPCD(61) | 8 | 5) /* v3.00, force sx=1 */
+                                uintptr_t host_pc);
-+#define STXSD      (OPCD(61) | 2)   /* v3.00 */
+ /* Return the current PC from CPU, which may be cached in TB. */
- #define VADDSBS    VX4(768)
+-static inline target_ulong log_pc(CPUState *cpu, const TranslationBlock *tb)
- #define VADDUBS    VX4(512)
++static inline vaddr log_pc(CPUState *cpu, const TranslationBlock *tb)
@@ -XXX,XX +XXX,XX @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
                               TCGReg base, tcg_target_long offset)
  {
-     tcg_target_long orig = offset, l0, l1, extra = 0, align = 0;
+     if (tb_cflags(tb) & CF_PCREL) {
--    bool is_store = false;
+         return cpu->cc->get_pc(cpu);
-+    bool is_int_store = false;
+diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
-     TCGReg rs = TCG_REG_TMP1;
+index XXXXXXX..XXXXXXX 100644
+--- a/accel/tcg/translate-all.c
-     switch (opi) {
++++ b/accel/tcg/translate-all.c
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
+@@ -XXX,XX +XXX,XX @@ void page_init(void)
-             break;
+  * Return the size of the generated code, or negative on error.
   */
  static int setjmp_gen_code(CPUArchState *env, TranslationBlock *tb,
 -                           target_ulong pc, void *host_pc,
 +                           vaddr pc, void *host_pc,
                             int *max_insns, int64_t *ti)
  {
      int ret = sigsetjmp(tcg_ctx->jmp_trans, 0);
@@ -XXX,XX +XXX,XX @@ static int setjmp_gen_code(CPUArchState *env, TranslationBlock *tb,
  /* Called with mmap_lock held for user mode emulation.  */
  TranslationBlock *tb_gen_code(CPUState *cpu,
 -                              target_ulong pc, target_ulong cs_base,
 +                              vaddr pc, uint64_t cs_base,
                                uint32_t flags, int cflags)
  {
      CPUArchState *env = cpu->env_ptr;
@@ -XXX,XX +XXX,XX @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
      cpu->cflags_next_tb = curr_cflags(cpu) | CF_MEMI_ONLY | CF_LAST_IO | n;
      if (qemu_loglevel_mask(CPU_LOG_EXEC)) {
 -        target_ulong pc = log_pc(cpu, tb);
 +        vaddr pc = log_pc(cpu, tb);
          if (qemu_log_in_addr_range(pc)) {
 -            qemu_log("cpu_io_recompile: rewound execution of TB to "
 -                     TARGET_FMT_lx "\n", pc);
 +            qemu_log("cpu_io_recompile: rewound execution of TB to %"
 +                     VADDR_PRIx "\n", pc);
          }
-         break;
-+    case LXSD:
-+    case STXSD:
-+        align = 3;
-+        break;
-+    case LXV:
-+    case STXV:
-+        align = 15;
-+        break;
-     case STD:
-         align = 3;
-         /* FALLTHRU */
-     case STB: case STH: case STW:
--        is_store = true;
-+        is_int_store = true;
-         break;
      }
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
-         if (rs == base) {
-             rs = TCG_REG_R0;
-         }
--        tcg_debug_assert(!is_store || rs != rt);
-+        tcg_debug_assert(!is_int_store || rs != rt);
-         tcg_out_movi(s, TCG_TYPE_PTR, rs, orig);
-         tcg_out32(s, opx | TAB(rt & 31, base, rs));
-         return;
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
-     case TCG_TYPE_V64:
-         tcg_debug_assert(ret >= TCG_REG_V0);
-         if (have_vsx) {
--            tcg_out_mem_long(s, 0, LXSDX, ret, base, offset);
-+            tcg_out_mem_long(s, have_isa_3_00 ? LXSD : 0, LXSDX,
-+                             ret, base, offset);
-             break;
-         }
-         tcg_debug_assert((offset & 7) == 0);
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
-     case TCG_TYPE_V128:
-         tcg_debug_assert(ret >= TCG_REG_V0);
-         tcg_debug_assert((offset & 15) == 0);
--        tcg_out_mem_long(s, 0, LVX, ret, base, offset);
-+        tcg_out_mem_long(s, have_isa_3_00 ? LXV : 0,
-+                         LVX, ret, base, offset);
-         break;
-     default:
-         g_assert_not_reached();
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
-     case TCG_TYPE_V64:
-         tcg_debug_assert(arg >= TCG_REG_V0);
-         if (have_vsx) {
--            tcg_out_mem_long(s, 0, STXSDX, arg, base, offset);
-+            tcg_out_mem_long(s, have_isa_3_00 ? STXSD : 0,
-+                             STXSDX, arg, base, offset);
-             break;
-         }
-         tcg_debug_assert((offset & 7) == 0);
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
-         break;
-     case TCG_TYPE_V128:
-         tcg_debug_assert(arg >= TCG_REG_V0);
--        tcg_out_mem_long(s, 0, STVX, arg, base, offset);
-+        tcg_out_mem_long(s, have_isa_3_00 ? STXV : 0,
-+                         STVX, arg, base, offset);
-         break;
-     default:
-         g_assert_not_reached();
-@@ -XXX,XX +XXX,XX @@ static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
-     tcg_debug_assert(out >= TCG_REG_V0);
-     switch (vece) {
-     case MO_8:
--        tcg_out_mem_long(s, 0, LVEBX, out, base, offset);
-+        if (have_isa_3_00) {
-+            tcg_out_mem_long(s, LXV, LVX, out, base, offset & -16);
-+        } else {
-+            tcg_out_mem_long(s, 0, LVEBX, out, base, offset);
-+        }
-         elt = extract32(offset, 0, 4);
- #ifndef HOST_WORDS_BIGENDIAN
-         elt ^= 15;
-@@ -XXX,XX +XXX,XX @@ static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
-         break;
-     case MO_16:
-         tcg_debug_assert((offset & 1) == 0);
--        tcg_out_mem_long(s, 0, LVEHX, out, base, offset);
-+        if (have_isa_3_00) {
-+            tcg_out_mem_long(s, LXV | 8, LVX, out, base, offset & -16);
-+        } else {
-+            tcg_out_mem_long(s, 0, LVEHX, out, base, offset);
-+        }
-         elt = extract32(offset, 1, 3);
- #ifndef HOST_WORDS_BIGENDIAN
-         elt ^= 7;
-@@ -XXX,XX +XXX,XX @@ static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
-         tcg_out32(s, VSPLTH | VRT(out) | VRB(out) | (elt << 16));
-         break;
-     case MO_32:
-+        if (have_isa_3_00) {
-+            tcg_out_mem_long(s, 0, LXVWSX, out, base, offset);
-+            break;
-+        }
-         tcg_debug_assert((offset & 3) == 0);
-         tcg_out_mem_long(s, 0, LVEWX, out, base, offset);
-         elt = extract32(offset, 2, 2);
 --
-.17.1
+.34.1

-[PULL 01/23] tcg/ppc: Introduce Altivec registers
+[PULL 03/22] target: Widen pc/cs_base in cpu_get_tb_cpu_state
-Altivec supports 32 128-bit vector registers, whose names are
+From: Anton Johansson <anjo@rev.ng>
 by convention v0 through v31.
+Signed-off-by: Anton Johansson <anjo@rev.ng>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-Id: <20230621135633.1649-4-anjo@rev.ng>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
 ---
- tcg/ppc/tcg-target.h     | 11 ++++-
+ target/alpha/cpu.h        | 4 ++--
- tcg/ppc/tcg-target.inc.c | 88 +++++++++++++++++++++++++---------------
+ target/arm/cpu.h          | 4 ++--
-files changed, 65 insertions(+), 34 deletions(-)
+ target/avr/cpu.h          | 4 ++--
  target/cris/cpu.h         | 4 ++--
  target/hexagon/cpu.h      | 4 ++--
  target/hppa/cpu.h         | 5 ++---
  target/i386/cpu.h         | 4 ++--
  target/loongarch/cpu.h    | 6 ++----
  target/m68k/cpu.h         | 4 ++--
  target/microblaze/cpu.h   | 4 ++--
  target/mips/cpu.h         | 4 ++--
  target/nios2/cpu.h        | 4 ++--
  target/openrisc/cpu.h     | 5 ++---
  target/ppc/cpu.h          | 8 ++++----
  target/riscv/cpu.h        | 4 ++--
  target/rx/cpu.h           | 4 ++--
  target/s390x/cpu.h        | 4 ++--
  target/sh4/cpu.h          | 4 ++--
  target/sparc/cpu.h        | 4 ++--
  target/tricore/cpu.h      | 4 ++--
  target/xtensa/cpu.h       | 4 ++--
  accel/tcg/cpu-exec.c      | 9 ++++++---
  accel/tcg/translate-all.c | 3 ++-
  target/arm/helper.c       | 4 ++--
  target/ppc/helper_regs.c  | 4 ++--
  target/riscv/cpu_helper.c | 4 ++--
 files changed, 58 insertions(+), 58 deletions(-)
-diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
+diff --git a/target/alpha/cpu.h b/target/alpha/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.h
+--- a/target/alpha/cpu.h
-+++ b/tcg/ppc/tcg-target.h
++++ b/target/alpha/cpu.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ void alpha_cpu_do_transaction_failed(CPUState *cs, hwaddr physaddr,
- # define TCG_TARGET_REG_BITS  32
+                                      MemTxResult response, uintptr_t retaddr);
  #endif
--#define TCG_TARGET_NB_REGS 32
+-static inline void cpu_get_tb_cpu_state(CPUAlphaState *env, target_ulong *pc,
-+#define TCG_TARGET_NB_REGS 64
+-                                        target_ulong *cs_base, uint32_t *pflags)
- #define TCG_TARGET_INSN_UNIT_SIZE 4
++static inline void cpu_get_tb_cpu_state(CPUAlphaState *env, vaddr *pc,
- #define TCG_TARGET_TLB_DISPLACEMENT_BITS 16
++                                        uint64_t *cs_base, uint32_t *pflags)
+ {
-@@ -XXX,XX +XXX,XX @@ typedef enum {
+     *pc = env->pc;
-     TCG_REG_R24, TCG_REG_R25, TCG_REG_R26, TCG_REG_R27,
+     *cs_base = 0;
-     TCG_REG_R28, TCG_REG_R29, TCG_REG_R30, TCG_REG_R31,
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
-+    TCG_REG_V0,  TCG_REG_V1,  TCG_REG_V2,  TCG_REG_V3,
+--- a/target/arm/cpu.h
-+    TCG_REG_V4,  TCG_REG_V5,  TCG_REG_V6,  TCG_REG_V7,
++++ b/target/arm/cpu.h
-+    TCG_REG_V8,  TCG_REG_V9,  TCG_REG_V10, TCG_REG_V11,
+@@ -XXX,XX +XXX,XX @@ static inline bool arm_cpu_bswap_data(CPUARMState *env)
-+    TCG_REG_V12, TCG_REG_V13, TCG_REG_V14, TCG_REG_V15,
+ }
 +    TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19,
 +    TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23,
 +    TCG_REG_V24, TCG_REG_V25, TCG_REG_V26, TCG_REG_V27,
 +    TCG_REG_V28, TCG_REG_V29, TCG_REG_V30, TCG_REG_V31,
 +
      TCG_REG_CALL_STACK = TCG_REG_R1,
      TCG_AREG0 = TCG_REG_R27
  } TCGReg;
 diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/tcg/ppc/tcg-target.inc.c
 +++ b/tcg/ppc/tcg-target.inc.c
@@ -XXX,XX +XXX,XX @@
  # define TCG_REG_TMP1   TCG_REG_R12
  #endif
-+#define TCG_VEC_TMP1    TCG_REG_V0
+-void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
-+#define TCG_VEC_TMP2    TCG_REG_V1
+-                          target_ulong *cs_base, uint32_t *flags);
-+
++void cpu_get_tb_cpu_state(CPUARMState *env, vaddr *pc,
- #define TCG_REG_TB     TCG_REG_R31
++                          uint64_t *cs_base, uint32_t *flags);
- #define USE_REG_TB     (TCG_TARGET_REG_BITS == 64)
+ enum {
-@@ -XXX,XX +XXX,XX @@ bool have_isa_3_00;
+     QEMU_PSCI_CONDUIT_DISABLED = 0,
 diff --git a/target/avr/cpu.h b/target/avr/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/avr/cpu.h
 +++ b/target/avr/cpu.h
@@ -XXX,XX +XXX,XX @@ enum {
      TB_FLAGS_SKIP = 2,
  };
 -static inline void cpu_get_tb_cpu_state(CPUAVRState *env, target_ulong *pc,
 -                                        target_ulong *cs_base, uint32_t *pflags)
 +static inline void cpu_get_tb_cpu_state(CPUAVRState *env, vaddr *pc,
 +                                        uint64_t *cs_base, uint32_t *pflags)
  {
      uint32_t flags = 0;
 diff --git a/target/cris/cpu.h b/target/cris/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/cris/cpu.h
 +++ b/target/cris/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline int cpu_mmu_index (CPUCRISState *env, bool ifetch)
  #include "exec/cpu-all.h"
 -static inline void cpu_get_tb_cpu_state(CPUCRISState *env, target_ulong *pc,
 -                                        target_ulong *cs_base, uint32_t *flags)
 +static inline void cpu_get_tb_cpu_state(CPUCRISState *env, vaddr *pc,
 +                                        uint64_t *cs_base, uint32_t *flags)
  {
      *pc = env->pc;
      *cs_base = 0;
 diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/hexagon/cpu.h
 +++ b/target/hexagon/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ArchCPU {
  FIELD(TB_FLAGS, IS_TIGHT_LOOP, 0, 1)
 -static inline void cpu_get_tb_cpu_state(CPUHexagonState *env, target_ulong *pc,
 -                                        target_ulong *cs_base, uint32_t *flags)
 +static inline void cpu_get_tb_cpu_state(CPUHexagonState *env, vaddr *pc,
 +                                        uint64_t *cs_base, uint32_t *flags)
  {
      uint32_t hex_flags = 0;
      *pc = env->gpr[HEX_REG_PC];
 diff --git a/target/hppa/cpu.h b/target/hppa/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/hppa/cpu.h
 +++ b/target/hppa/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline target_ulong hppa_form_gva(CPUHPPAState *env, uint64_t spc,
  #define TB_FLAG_PRIV_SHIFT  8
  #define TB_FLAG_UNALIGN     0x400
 -static inline void cpu_get_tb_cpu_state(CPUHPPAState *env, target_ulong *pc,
 -                                        target_ulong *cs_base,
 -                                        uint32_t *pflags)
 +static inline void cpu_get_tb_cpu_state(CPUHPPAState *env, vaddr *pc,
 +                                        uint64_t *cs_base, uint32_t *pflags)
  {
      uint32_t flags = env->psw_n * PSW_N;
 diff --git a/target/i386/cpu.h b/target/i386/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/cpu.h
 +++ b/target/i386/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline int cpu_mmu_index_kernel(CPUX86State *env)
  #include "hw/i386/apic.h"
  #endif
+-static inline void cpu_get_tb_cpu_state(CPUX86State *env, target_ulong *pc,
+-                                        target_ulong *cs_base, uint32_t *flags)
++static inline void cpu_get_tb_cpu_state(CPUX86State *env, vaddr *pc,
++                                        uint64_t *cs_base, uint32_t *flags)
+ {
+     *cs_base = env->segs[R_CS].base;
+     *pc = *cs_base + env->eip;
+diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/loongarch/cpu.h
++++ b/target/loongarch/cpu.h
+@@ -XXX,XX +XXX,XX @@ static inline int cpu_mmu_index(CPULoongArchState *env, bool ifetch)
+ #define HW_FLAGS_EUEN_FPE   0x04
+ #define HW_FLAGS_EUEN_SXE   0x08
+-static inline void cpu_get_tb_cpu_state(CPULoongArchState *env,
+-                                        target_ulong *pc,
+-                                        target_ulong *cs_base,
+-                                        uint32_t *flags)
++static inline void cpu_get_tb_cpu_state(CPULoongArchState *env, vaddr *pc,
++                                        uint64_t *cs_base, uint32_t *flags)
+ {
+     *pc = env->pc;
+     *cs_base = 0;
+diff --git a/target/m68k/cpu.h b/target/m68k/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/m68k/cpu.h
++++ b/target/m68k/cpu.h
+@@ -XXX,XX +XXX,XX @@ void m68k_cpu_transaction_failed(CPUState *cs, hwaddr physaddr, vaddr addr,
+ #define TB_FLAGS_TRACE          16
+ #define TB_FLAGS_TRACE_BIT      (1 << TB_FLAGS_TRACE)
+-static inline void cpu_get_tb_cpu_state(CPUM68KState *env, target_ulong *pc,
+-                                        target_ulong *cs_base, uint32_t *flags)
++static inline void cpu_get_tb_cpu_state(CPUM68KState *env, vaddr *pc,
++                                        uint64_t *cs_base, uint32_t *flags)
+ {
+     *pc = env->pc;
+     *cs_base = 0;
+diff --git a/target/microblaze/cpu.h b/target/microblaze/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/microblaze/cpu.h
++++ b/target/microblaze/cpu.h
+@@ -XXX,XX +XXX,XX @@ void mb_tcg_init(void);
+ /* Ensure there is no overlap between the two masks. */
+ QEMU_BUILD_BUG_ON(MSR_TB_MASK & IFLAGS_TB_MASK);
+-static inline void cpu_get_tb_cpu_state(CPUMBState *env, target_ulong *pc,
+-                                        target_ulong *cs_base, uint32_t *flags)
++static inline void cpu_get_tb_cpu_state(CPUMBState *env, vaddr *pc,
++                                        uint64_t *cs_base, uint32_t *flags)
+ {
+     *pc = env->pc;
+     *flags = (env->iflags & IFLAGS_TB_MASK) | (env->msr & MSR_TB_MASK);
+diff --git a/target/mips/cpu.h b/target/mips/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/mips/cpu.h
++++ b/target/mips/cpu.h
+@@ -XXX,XX +XXX,XX @@ void itc_reconfigure(struct MIPSITUState *tag);
+ /* helper.c */
+ target_ulong exception_resume_pc(CPUMIPSState *env);
+-static inline void cpu_get_tb_cpu_state(CPUMIPSState *env, target_ulong *pc,
+-                                        target_ulong *cs_base, uint32_t *flags)
++static inline void cpu_get_tb_cpu_state(CPUMIPSState *env, vaddr *pc,
++                                        uint64_t *cs_base, uint32_t *flags)
+ {
+     *pc = env->active_tc.PC;
+     *cs_base = 0;
+diff --git a/target/nios2/cpu.h b/target/nios2/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/nios2/cpu.h
++++ b/target/nios2/cpu.h
+@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAGS, CRS0, 0, 1)  /* Set if CRS == 0. */
+ FIELD(TBFLAGS, U, 1, 1)     /* Overlaps CR_STATUS_U */
+ FIELD(TBFLAGS, R0_0, 2, 1)  /* Set if R0 == 0. */
+-static inline void cpu_get_tb_cpu_state(CPUNios2State *env, target_ulong *pc,
+-                                        target_ulong *cs_base, uint32_t *flags)
++static inline void cpu_get_tb_cpu_state(CPUNios2State *env, vaddr *pc,
++                                        uint64_t *cs_base, uint32_t *flags)
+ {
+     unsigned crs = FIELD_EX32(env->ctrl[CR_STATUS], CR_STATUS, CRS);
+diff --git a/target/openrisc/cpu.h b/target/openrisc/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/openrisc/cpu.h
++++ b/target/openrisc/cpu.h
+@@ -XXX,XX +XXX,XX @@ static inline void cpu_set_gpr(CPUOpenRISCState *env, int i, uint32_t val)
+     env->shadow_gpr[0][i] = val;
+ }
+-static inline void cpu_get_tb_cpu_state(CPUOpenRISCState *env,
+-                                        target_ulong *pc,
+-                                        target_ulong *cs_base, uint32_t *flags)
++static inline void cpu_get_tb_cpu_state(CPUOpenRISCState *env, vaddr *pc,
++                                        uint64_t *cs_base, uint32_t *flags)
+ {
+     *pc = env->pc;
+     *cs_base = 0;
+diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/ppc/cpu.h
++++ b/target/ppc/cpu.h
+@@ -XXX,XX +XXX,XX @@ void cpu_write_xer(CPUPPCState *env, target_ulong xer);
+ #define is_book3s_arch2x(ctx) (!!((ctx)->insns_flags & PPC_SEGMENT_64B))
  #ifdef CONFIG_DEBUG_TCG
--static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
+-void cpu_get_tb_cpu_state(CPUPPCState *env, target_ulong *pc,
--    "r0",
+-                          target_ulong *cs_base, uint32_t *flags);
--    "r1",
++void cpu_get_tb_cpu_state(CPUPPCState *env, vaddr *pc,
--    "r2",
++                          uint64_t *cs_base, uint32_t *flags);
--    "r3",
+ #else
--    "r4",
+-static inline void cpu_get_tb_cpu_state(CPUPPCState *env, target_ulong *pc,
--    "r5",
+-                                        target_ulong *cs_base, uint32_t *flags)
--    "r6",
++static inline void cpu_get_tb_cpu_state(CPUPPCState *env, vaddr *pc,
--    "r7",
++                                        uint64_t *cs_base, uint32_t *flags)
--    "r8",
+ {
--    "r9",
+     *pc = env->nip;
--    "r10",
+     *cs_base = 0;
--    "r11",
+diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
--    "r12",
+index XXXXXXX..XXXXXXX 100644
--    "r13",
+--- a/target/riscv/cpu.h
--    "r14",
++++ b/target/riscv/cpu.h
--    "r15",
+@@ -XXX,XX +XXX,XX @@ static inline uint32_t vext_get_vlmax(RISCVCPU *cpu, target_ulong vtype)
--    "r16",
+     return cpu->cfg.vlen >> (sew + 3 - lmul);
--    "r17",
+ }
--    "r18",
--    "r19",
+-void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
--    "r20",
+-                          target_ulong *cs_base, uint32_t *pflags);
--    "r21",
++void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
--    "r22",
++                          uint64_t *cs_base, uint32_t *pflags);
--    "r23",
--    "r24",
+ void riscv_cpu_update_mask(CPURISCVState *env);
--    "r25",
--    "r26",
+diff --git a/target/rx/cpu.h b/target/rx/cpu.h
--    "r27",
+index XXXXXXX..XXXXXXX 100644
--    "r28",
+--- a/target/rx/cpu.h
--    "r29",
++++ b/target/rx/cpu.h
--    "r30",
+@@ -XXX,XX +XXX,XX @@ void rx_cpu_unpack_psw(CPURXState *env, uint32_t psw, int rte);
--    "r31"
+ #define RX_CPU_IRQ 0
-+static const char tcg_target_reg_names[TCG_TARGET_NB_REGS][4] = {
+ #define RX_CPU_FIR 1
-+    "r0",  "r1",  "r2",  "r3",  "r4",  "r5",  "r6",  "r7",
-+    "r8",  "r9",  "r10", "r11", "r12", "r13", "r14", "r15",
+-static inline void cpu_get_tb_cpu_state(CPURXState *env, target_ulong *pc,
-+    "r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23",
+-                                        target_ulong *cs_base, uint32_t *flags)
-+    "r24", "r25", "r26", "r27", "r28", "r29", "r30", "r31",
++static inline void cpu_get_tb_cpu_state(CPURXState *env, vaddr *pc,
-+    "v0",  "v1",  "v2",  "v3",  "v4",  "v5",  "v6",  "v7",
++                                        uint64_t *cs_base, uint32_t *flags)
-+    "v8",  "v9",  "v10", "v11", "v12", "v13", "v14", "v15",
+ {
-+    "v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23",
+     *pc = env->pc;
-+    "v24", "v25", "v26", "v27", "v28", "v29", "v30", "v31",
+     *cs_base = 0;
- };
+diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/s390x/cpu.h
 +++ b/target/s390x/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline int cpu_mmu_index(CPUS390XState *env, bool ifetch)
  #endif
+ }
-@@ -XXX,XX +XXX,XX @@ static const int tcg_target_reg_alloc_order[] = {
-     TCG_REG_R5,
+-static inline void cpu_get_tb_cpu_state(CPUS390XState* env, target_ulong *pc,
-     TCG_REG_R4,
+-                                        target_ulong *cs_base, uint32_t *flags)
-     TCG_REG_R3,
++static inline void cpu_get_tb_cpu_state(CPUS390XState *env, vaddr *pc,
-+
++                                        uint64_t *cs_base, uint32_t *flags)
-+    /* V0 and V1 reserved as temporaries; V20 - V31 are call-saved */
+ {
-+    TCG_REG_V2,   /* call clobbered, vectors */
+     if (env->psw.addr & 1) {
-+    TCG_REG_V3,
+         /*
-+    TCG_REG_V4,
+diff --git a/target/sh4/cpu.h b/target/sh4/cpu.h
-+    TCG_REG_V5,
+index XXXXXXX..XXXXXXX 100644
-+    TCG_REG_V6,
+--- a/target/sh4/cpu.h
-+    TCG_REG_V7,
++++ b/target/sh4/cpu.h
-+    TCG_REG_V8,
+@@ -XXX,XX +XXX,XX @@ static inline void cpu_write_sr(CPUSH4State *env, target_ulong sr)
-+    TCG_REG_V9,
+     env->sr = sr & ~((1u << SR_M) | (1u << SR_Q) | (1u << SR_T));
-+    TCG_REG_V10,
+ }
-+    TCG_REG_V11,
-+    TCG_REG_V12,
+-static inline void cpu_get_tb_cpu_state(CPUSH4State *env, target_ulong *pc,
-+    TCG_REG_V13,
+-                                        target_ulong *cs_base, uint32_t *flags)
-+    TCG_REG_V14,
++static inline void cpu_get_tb_cpu_state(CPUSH4State *env, vaddr *pc,
-+    TCG_REG_V15,
++                                        uint64_t *cs_base, uint32_t *flags)
-+    TCG_REG_V16,
+ {
-+    TCG_REG_V17,
+     *pc = env->pc;
-+    TCG_REG_V18,
+     /* For a gUSA region, notice the end of the region.  */
-+    TCG_REG_V19,
+diff --git a/target/sparc/cpu.h b/target/sparc/cpu.h
- };
+index XXXXXXX..XXXXXXX 100644
+--- a/target/sparc/cpu.h
- static const int tcg_target_call_iarg_regs[] = {
++++ b/target/sparc/cpu.h
-@@ -XXX,XX +XXX,XX @@ static void tcg_target_init(TCGContext *s)
+@@ -XXX,XX +XXX,XX @@ trap_state* cpu_tsptr(CPUSPARCState* env);
-     tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R11);
+ #define TB_FLAG_HYPER        (1 << 7)
-     tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R12);
+ #define TB_FLAG_ASI_SHIFT    24
-+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V0);
+-static inline void cpu_get_tb_cpu_state(CPUSPARCState *env, target_ulong *pc,
-+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V1);
+-                                        target_ulong *cs_base, uint32_t *pflags)
-+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V2);
++static inline void cpu_get_tb_cpu_state(CPUSPARCState *env, vaddr *pc,
-+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V3);
++                                        uint64_t *cs_base, uint32_t *pflags)
-+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V4);
+ {
-+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V5);
+     uint32_t flags;
-+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V6);
+     *pc = env->pc;
-+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V7);
+diff --git a/target/tricore/cpu.h b/target/tricore/cpu.h
-+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V8);
+index XXXXXXX..XXXXXXX 100644
-+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V9);
+--- a/target/tricore/cpu.h
-+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V10);
++++ b/target/tricore/cpu.h
-+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V11);
+@@ -XXX,XX +XXX,XX @@ FIELD(TB_FLAGS, PRIV, 0, 2)
-+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V12);
+ void cpu_state_reset(CPUTriCoreState *s);
-+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V13);
+ void tricore_tcg_init(void);
-+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V14);
-+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V15);
+-static inline void cpu_get_tb_cpu_state(CPUTriCoreState *env, target_ulong *pc,
-+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V16);
+-                                        target_ulong *cs_base, uint32_t *flags)
-+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V17);
++static inline void cpu_get_tb_cpu_state(CPUTriCoreState *env, vaddr *pc,
-+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V18);
++                                        uint64_t *cs_base, uint32_t *flags)
-+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V19);
+ {
-+
+     uint32_t new_flags = 0;
-     s->reserved_regs = 0;
+     *pc = env->PC;
-     tcg_regset_set_reg(s->reserved_regs, TCG_REG_R0); /* tcg temp */
+diff --git a/target/xtensa/cpu.h b/target/xtensa/cpu.h
-     tcg_regset_set_reg(s->reserved_regs, TCG_REG_R1); /* stack pointer */
+index XXXXXXX..XXXXXXX 100644
-@@ -XXX,XX +XXX,XX @@ static void tcg_target_init(TCGContext *s)
+--- a/target/xtensa/cpu.h
-     tcg_regset_set_reg(s->reserved_regs, TCG_REG_R13); /* thread pointer */
++++ b/target/xtensa/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline int cpu_mmu_index(CPUXtensaState *env, bool ifetch)
  #include "exec/cpu-all.h"
 -static inline void cpu_get_tb_cpu_state(CPUXtensaState *env, target_ulong *pc,
 -        target_ulong *cs_base, uint32_t *flags)
 +static inline void cpu_get_tb_cpu_state(CPUXtensaState *env, vaddr *pc,
 +                                        uint64_t *cs_base, uint32_t *flags)
  {
      *pc = env->pc;
      *cs_base = 0;
 diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/tcg/cpu-exec.c
 +++ b/accel/tcg/cpu-exec.c
@@ -XXX,XX +XXX,XX @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env)
  {
      CPUState *cpu = env_cpu(env);
      TranslationBlock *tb;
 -    target_ulong cs_base, pc;
 +    vaddr pc;
 +    uint64_t cs_base;
      uint32_t flags, cflags;
      cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
@@ -XXX,XX +XXX,XX @@ void cpu_exec_step_atomic(CPUState *cpu)
  {
      CPUArchState *env = cpu->env_ptr;
      TranslationBlock *tb;
 -    target_ulong cs_base, pc;
 +    vaddr pc;
 +    uint64_t cs_base;
      uint32_t flags, cflags;
      int tb_exit;
@@ -XXX,XX +XXX,XX @@ cpu_exec_loop(CPUState *cpu, SyncClocks *sc)
          while (!cpu_handle_interrupt(cpu, &last_tb)) {
              TranslationBlock *tb;
 -            target_ulong cs_base, pc;
 +            vaddr pc;
 +            uint64_t cs_base;
              uint32_t flags, cflags;
              cpu_get_tb_cpu_state(cpu->env_ptr, &pc, &cs_base, &flags);
 diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/tcg/translate-all.c
 +++ b/accel/tcg/translate-all.c
@@ -XXX,XX +XXX,XX @@ void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr)
          /* The exception probably happened in a helper.  The CPU state should
             have been saved before calling it. Fetch the PC from there.  */
          CPUArchState *env = cpu->env_ptr;
 -        target_ulong pc, cs_base;
 +        vaddr pc;
 +        uint64_t cs_base;
          tb_page_addr_t addr;
          uint32_t flags;
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static bool mve_no_pred(CPUARMState *env)
      return true;
  }
 -void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
 -                          target_ulong *cs_base, uint32_t *pflags)
 +void cpu_get_tb_cpu_state(CPUARMState *env, vaddr *pc,
 +                          uint64_t *cs_base, uint32_t *pflags)
  {
      CPUARMTBFlags flags;
 diff --git a/target/ppc/helper_regs.c b/target/ppc/helper_regs.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/ppc/helper_regs.c
 +++ b/target/ppc/helper_regs.c
@@ -XXX,XX +XXX,XX @@ void hreg_update_pmu_hflags(CPUPPCState *env)
  }
  #ifdef CONFIG_DEBUG_TCG
 -void cpu_get_tb_cpu_state(CPUPPCState *env, target_ulong *pc,
 -                          target_ulong *cs_base, uint32_t *flags)
 +void cpu_get_tb_cpu_state(CPUPPCState *env, vaddr *pc,
 +                          uint64_t *cs_base, uint32_t *flags)
  {
      uint32_t hflags_current = env->hflags;
      uint32_t hflags_rebuilt;
 diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/riscv/cpu_helper.c
 +++ b/target/riscv/cpu_helper.c
@@ -XXX,XX +XXX,XX @@ int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch)
  #endif
-     tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP1); /* mem temp */
+ }
-+    tcg_regset_set_reg(s->reserved_regs, TCG_VEC_TMP1);
-+    tcg_regset_set_reg(s->reserved_regs, TCG_VEC_TMP2);
+-void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
-     if (USE_REG_TB) {
+-                          target_ulong *cs_base, uint32_t *pflags)
-         tcg_regset_set_reg(s->reserved_regs, TCG_REG_TB);  /* tb->tc_ptr */
++void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc,
-     }
++                          uint64_t *cs_base, uint32_t *pflags)
  {
      CPUState *cs = env_cpu(env);
      RISCVCPU *cpu = RISCV_CPU(cs);
 --
-.17.1
+.34.1

-[PULL 02/23] tcg/ppc: Introduce macro VX4()
+Deleted patch
-Introduce macro VX4() used for encoding Altivec instructions.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
----
- tcg/ppc/tcg-target.inc.c | 1 +
-file changed, 1 insertion(+)
-diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.inc.c
-+++ b/tcg/ppc/tcg-target.inc.c
-@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
- #define XO31(opc) (OPCD(31)|((opc)<<1))
- #define XO58(opc) (OPCD(58)|(opc))
- #define XO62(opc) (OPCD(62)|(opc))
-+#define VX4(opc)  (OPCD(4)|(opc))
- #define B      OPCD( 18)
- #define BC     OPCD( 16)
---
-.17.1

-[PULL 14/23] tcg/ppc: Support vector dup2
+[PULL 04/22] accel/tcg/cputlb.c: Widen CPUTLBEntry access functions
-This is only used for 32-bit hosts.
+From: Anton Johansson <anjo@rev.ng>
+Signed-off-by: Anton Johansson <anjo@rev.ng>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-Id: <20230621135633.1649-5-anjo@rev.ng>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
 ---
- tcg/ppc/tcg-target.inc.c | 9 +++++++++
+ include/exec/cpu_ldst.h | 10 +++++-----
-file changed, 9 insertions(+)
+ accel/tcg/cputlb.c      |  8 ++++----
 files changed, 9 insertions(+), 9 deletions(-)
-diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
+diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.inc.c
+--- a/include/exec/cpu_ldst.h
-+++ b/tcg/ppc/tcg-target.inc.c
++++ b/include/exec/cpu_ldst.h
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
+@@ -XXX,XX +XXX,XX @@ static inline void clear_helper_retaddr(void)
-         }
-         break;
+ #include "tcg/oversized-guest.h"
-+    case INDEX_op_dup2_vec:
+-static inline target_ulong tlb_read_idx(const CPUTLBEntry *entry,
-+        assert(TCG_TARGET_REG_BITS == 32);
+-                                        MMUAccessType access_type)
-+        /* With inputs a1 = xLxx, a2 = xHxx  */
++static inline uint64_t tlb_read_idx(const CPUTLBEntry *entry,
-+        tcg_out32(s, VMRGHW | VRT(a0) | VRA(a2) | VRB(a1));  /* a0  = xxHL */
++                                    MMUAccessType access_type)
-+        tcg_out_vsldoi(s, TCG_VEC_TMP1, a0, a0, 8);          /* tmp = HLxx */
+ {
-+        tcg_out_vsldoi(s, a0, a0, TCG_VEC_TMP1, 8);          /* a0  = HLHL */
+     /* Do not rearrange the CPUTLBEntry structure members. */
-+        return;
+     QEMU_BUILD_BUG_ON(offsetof(CPUTLBEntry, addr_read) !=
-+
+@@ -XXX,XX +XXX,XX @@ static inline target_ulong tlb_read_idx(const CPUTLBEntry *entry,
-     case INDEX_op_ppc_mrgh_vec:
+ #endif
-         insn = mrgh_op[vece];
+ }
-         break;
-@@ -XXX,XX +XXX,XX @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+-static inline target_ulong tlb_addr_write(const CPUTLBEntry *entry)
-     case INDEX_op_ppc_mulou_vec:
++static inline uint64_t tlb_addr_write(const CPUTLBEntry *entry)
-     case INDEX_op_ppc_pkum_vec:
+ {
-     case INDEX_op_ppc_rotl_vec:
+     return tlb_read_idx(entry, MMU_DATA_STORE);
-+    case INDEX_op_dup2_vec:
+ }
-         return &v_v_v;
-     case INDEX_op_not_vec:
+ /* Find the TLB index corresponding to the mmu_idx + address pair.  */
-     case INDEX_op_dup_vec:
+ static inline uintptr_t tlb_index(CPUArchState *env, uintptr_t mmu_idx,
 -                                  target_ulong addr)
 +                                  vaddr addr)
  {
      uintptr_t size_mask = env_tlb(env)->f[mmu_idx].mask >> CPU_TLB_ENTRY_BITS;
@@ -XXX,XX +XXX,XX @@ static inline uintptr_t tlb_index(CPUArchState *env, uintptr_t mmu_idx,
  /* Find the TLB entry corresponding to the mmu_idx + address pair.  */
  static inline CPUTLBEntry *tlb_entry(CPUArchState *env, uintptr_t mmu_idx,
 -                                     target_ulong addr)
 +                                     vaddr addr)
  {
      return &env_tlb(env)->f[mmu_idx].table[tlb_index(env, mmu_idx, addr)];
  }
 diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/tcg/cputlb.c
 +++ b/accel/tcg/cputlb.c
@@ -XXX,XX +XXX,XX @@ static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
      assert_cpu_is_self(env_cpu(env));
      for (vidx = 0; vidx < CPU_VTLB_SIZE; ++vidx) {
          CPUTLBEntry *vtlb = &env_tlb(env)->d[mmu_idx].vtable[vidx];
 -        target_ulong cmp = tlb_read_idx(vtlb, access_type);
 +        uint64_t cmp = tlb_read_idx(vtlb, access_type);
          if (cmp == page) {
              /* Found entry in victim tlb, swap tlb and iotlb.  */
@@ -XXX,XX +XXX,XX @@ static int probe_access_internal(CPUArchState *env, target_ulong addr,
  {
      uintptr_t index = tlb_index(env, mmu_idx, addr);
      CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
 -    target_ulong tlb_addr = tlb_read_idx(entry, access_type);
 +    uint64_t tlb_addr = tlb_read_idx(entry, access_type);
      target_ulong page_addr = addr & TARGET_PAGE_MASK;
      int flags = TLB_FLAGS_MASK;
@@ -XXX,XX +XXX,XX @@ bool tlb_plugin_lookup(CPUState *cpu, vaddr addr, int mmu_idx,
      CPUArchState *env = cpu->env_ptr;
      CPUTLBEntry *tlbe = tlb_entry(env, mmu_idx, addr);
      uintptr_t index = tlb_index(env, mmu_idx, addr);
 -    vaddr tlb_addr = is_store ? tlb_addr_write(tlbe) : tlbe->addr_read;
 +    uint64_t tlb_addr = is_store ? tlb_addr_write(tlbe) : tlbe->addr_read;
      if (likely(tlb_hit(tlb_addr, addr))) {
          /* We must have an iotlb entry for MMIO */
@@ -XXX,XX +XXX,XX @@ static bool mmu_lookup1(CPUArchState *env, MMULookupPageData *data,
      target_ulong addr = data->addr;
      uintptr_t index = tlb_index(env, mmu_idx, addr);
      CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
 -    target_ulong tlb_addr = tlb_read_idx(entry, access_type);
 +    uint64_t tlb_addr = tlb_read_idx(entry, access_type);
      bool maybe_resized = false;
      /* If the TLB entry is for a different page, reload and try again.  */
 --
-.17.1
+.34.1

-[PULL 08/23] tcg/ppc: Add support for load/store/logic/comparison
+[PULL 05/22] accel/tcg/cputlb.c: Widen addr in MMULookupPageData
-Add various bits and peaces related mostly to load and store
+From: Anton Johansson <anjo@rev.ng>
 operations. In that context, logic, compare, and splat Altivec
 instructions are used, and, therefore, the support for emitting
 them is included in this patch too.
+Functions accessing MMULookupPageData are also updated.
+Signed-off-by: Anton Johansson <anjo@rev.ng>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-Id: <20230621135633.1649-6-anjo@rev.ng>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
 ---
- tcg/ppc/tcg-target.h     |   6 +-
+ accel/tcg/cputlb.c | 30 +++++++++++++++---------------
- tcg/ppc/tcg-target.inc.c | 472 ++++++++++++++++++++++++++++++++++++---
+file changed, 15 insertions(+), 15 deletions(-)
 files changed, 442 insertions(+), 36 deletions(-)
-diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
+diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.h
+--- a/accel/tcg/cputlb.c
-+++ b/tcg/ppc/tcg-target.h
++++ b/accel/tcg/cputlb.c
-@@ -XXX,XX +XXX,XX @@ extern bool have_altivec;
+@@ -XXX,XX +XXX,XX @@ bool tlb_plugin_lookup(CPUState *cpu, vaddr addr, int mmu_idx,
- #define TCG_TARGET_HAS_v128             have_altivec
+ typedef struct MMULookupPageData {
- #define TCG_TARGET_HAS_v256             0
+     CPUTLBEntryFull *full;
+     void *haddr;
--#define TCG_TARGET_HAS_andc_vec         0
+-    target_ulong addr;
-+#define TCG_TARGET_HAS_andc_vec         1
++    vaddr addr;
- #define TCG_TARGET_HAS_orc_vec          0
+     int flags;
--#define TCG_TARGET_HAS_not_vec          0
+     int size;
-+#define TCG_TARGET_HAS_not_vec          1
+ } MMULookupPageData;
- #define TCG_TARGET_HAS_neg_vec          0
+@@ -XXX,XX +XXX,XX @@ typedef struct MMULookupLocals {
- #define TCG_TARGET_HAS_abs_vec          0
+ static bool mmu_lookup1(CPUArchState *env, MMULookupPageData *data,
- #define TCG_TARGET_HAS_shi_vec          0
+                         int mmu_idx, MMUAccessType access_type, uintptr_t ra)
  #define TCG_TARGET_HAS_shs_vec          0
  #define TCG_TARGET_HAS_shv_vec          0
 -#define TCG_TARGET_HAS_cmp_vec          0
 +#define TCG_TARGET_HAS_cmp_vec          1
  #define TCG_TARGET_HAS_mul_vec          0
  #define TCG_TARGET_HAS_sat_vec          0
  #define TCG_TARGET_HAS_minmax_vec       0
 diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/tcg/ppc/tcg-target.inc.c
 +++ b/tcg/ppc/tcg-target.inc.c
@@ -XXX,XX +XXX,XX @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
          ct->ct |= TCG_CT_REG;
          ct->u.regs = 0xffffffff;
          break;
 +    case 'v':
 +        ct->ct |= TCG_CT_REG;
 +        ct->u.regs = 0xffffffff00000000ull;
 +        break;
      case 'L':                   /* qemu_ld constraint */
          ct->ct |= TCG_CT_REG;
          ct->u.regs = 0xffffffff;
@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
  #define NOP    ORI  /* ori 0,0,0 */
 +#define LVX        XO31(103)
 +#define LVEBX      XO31(7)
 +#define LVEHX      XO31(39)
 +#define LVEWX      XO31(71)
 +
 +#define STVX       XO31(231)
 +#define STVEWX     XO31(199)
 +
 +#define VCMPEQUB   VX4(6)
 +#define VCMPEQUH   VX4(70)
 +#define VCMPEQUW   VX4(134)
 +#define VCMPGTSB   VX4(774)
 +#define VCMPGTSH   VX4(838)
 +#define VCMPGTSW   VX4(902)
 +#define VCMPGTUB   VX4(518)
 +#define VCMPGTUH   VX4(582)
 +#define VCMPGTUW   VX4(646)
 +
 +#define VAND       VX4(1028)
 +#define VANDC      VX4(1092)
 +#define VNOR       VX4(1284)
 +#define VOR        VX4(1156)
 +#define VXOR       VX4(1220)
 +
 +#define VSPLTB     VX4(524)
 +#define VSPLTH     VX4(588)
 +#define VSPLTW     VX4(652)
 +#define VSPLTISB   VX4(780)
 +#define VSPLTISH   VX4(844)
 +#define VSPLTISW   VX4(908)
 +
 +#define VSLDOI     VX4(44)
 +
  #define RT(r) ((r)<<21)
  #define RS(r) ((r)<<21)
  #define RA(r) ((r)<<16)
@@ -XXX,XX +XXX,XX @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                          intptr_t value, intptr_t addend)
  {
-     tcg_insn_unit *target;
+-    target_ulong addr = data->addr;
-+    int16_t lo;
++    vaddr addr = data->addr;
-+    int32_t hi;
+     uintptr_t index = tlb_index(env, mmu_idx, addr);
+     CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
-     value += addend;
+     uint64_t tlb_addr = tlb_read_idx(entry, access_type);
-     target = (tcg_insn_unit *)value;
+@@ -XXX,XX +XXX,XX @@ static void mmu_watch_or_dirty(CPUArchState *env, MMULookupPageData *data,
-@@ -XXX,XX +XXX,XX @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
+                                MMUAccessType access_type, uintptr_t ra)
          }
          *code_ptr = (*code_ptr & ~0xfffc) | (value & 0xfffc);
          break;
 +    case R_PPC_ADDR32:
 +        /*
 +         * We are abusing this relocation type.  Again, this points to
 +         * a pair of insns, lis + load.  This is an absolute address
 +         * relocation for PPC32 so the lis cannot be removed.
 +         */
 +        lo = value;
 +        hi = value - lo;
 +        if (hi + lo != value) {
 +            return false;
 +        }
 +        code_ptr[0] = deposit32(code_ptr[0], 0, 16, hi >> 16);
 +        code_ptr[1] = deposit32(code_ptr[1], 0, 16, lo);
 +        break;
      default:
          g_assert_not_reached();
      }
@@ -XXX,XX +XXX,XX @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
  static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
  {
--    tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
+     CPUTLBEntryFull *full = data->full;
--    if (ret != arg) {
+-    target_ulong addr = data->addr;
--        tcg_out32(s, OR | SAB(arg, ret, arg));
++    vaddr addr = data->addr;
-+    if (ret == arg) {
+     int flags = data->flags;
-+        return true;
+     int size = data->size;
-+    }
-+    switch (type) {
+@@ -XXX,XX +XXX,XX @@ static void mmu_watch_or_dirty(CPUArchState *env, MMULookupPageData *data,
-+    case TCG_TYPE_I64:
+  * Resolve the translation for the page(s) beginning at @addr, for MemOp.size
-+        tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+  * bytes.  Return true if the lookup crosses a page boundary.
-+        /* fallthru */
+  */
-+    case TCG_TYPE_I32:
+-static bool mmu_lookup(CPUArchState *env, target_ulong addr, MemOpIdx oi,
-+        if (ret < TCG_REG_V0 && arg < TCG_REG_V0) {
++static bool mmu_lookup(CPUArchState *env, vaddr addr, MemOpIdx oi,
-+            tcg_out32(s, OR | SAB(arg, ret, arg));
+                        uintptr_t ra, MMUAccessType type, MMULookupLocals *l)
-+            break;
+ {
-+        } else if (ret < TCG_REG_V0 || arg < TCG_REG_V0) {
+     unsigned a_bits;
-+            /* Altivec does not support vector/integer moves.  */
+@@ -XXX,XX +XXX,XX @@ static uint64_t do_ld_mmio_beN(CPUArchState *env, MMULookupPageData *p,
-+            return false;
+                                MMUAccessType type, uintptr_t ra)
-+        }
+ {
-+        /* fallthru */
+     CPUTLBEntryFull *full = p->full;
-+    case TCG_TYPE_V64:
+-    target_ulong addr = p->addr;
-+    case TCG_TYPE_V128:
++    vaddr addr = p->addr;
-+        tcg_debug_assert(ret >= TCG_REG_V0 && arg >= TCG_REG_V0);
+     int i, size = p->size;
-+        tcg_out32(s, VOR | VRT(ret) | VRA(arg) | VRB(arg));
-+        break;
+     QEMU_IOTHREAD_LOCK_GUARD();
-+    default:
+@@ -XXX,XX +XXX,XX @@ static uint64_t do_ld_8(CPUArchState *env, MMULookupPageData *p, int mmu_idx,
-+        g_assert_not_reached();
+     return ret;
      }
      return true;
  }
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
- static void tcg_out_dupi_vec(TCGContext *s, TCGType type, TCGReg ret,
+-static uint8_t do_ld1_mmu(CPUArchState *env, target_ulong addr, MemOpIdx oi,
-                              tcg_target_long val)
++static uint8_t do_ld1_mmu(CPUArchState *env, vaddr addr, MemOpIdx oi,
                            uintptr_t ra, MMUAccessType access_type)
  {
--    g_assert_not_reached();
+     MMULookupLocals l;
-+    uint32_t load_insn;
+@@ -XXX,XX +XXX,XX @@ tcg_target_ulong helper_ldub_mmu(CPUArchState *env, uint64_t addr,
-+    int rel, low;
+     return do_ld1_mmu(env, addr, oi, retaddr, MMU_DATA_LOAD);
 +    intptr_t add;
 +
 +    low = (int8_t)val;
 +    if (low >= -16 && low < 16) {
 +        if (val == (tcg_target_long)dup_const(MO_8, low)) {
 +            tcg_out32(s, VSPLTISB | VRT(ret) | ((val & 31) << 16));
 +            return;
 +        }
 +        if (val == (tcg_target_long)dup_const(MO_16, low)) {
 +            tcg_out32(s, VSPLTISH | VRT(ret) | ((val & 31) << 16));
 +            return;
 +        }
 +        if (val == (tcg_target_long)dup_const(MO_32, low)) {
 +            tcg_out32(s, VSPLTISW | VRT(ret) | ((val & 31) << 16));
 +            return;
 +        }
 +    }
 +
 +    /*
 +     * Otherwise we must load the value from the constant pool.
 +     */
 +    if (USE_REG_TB) {
 +        rel = R_PPC_ADDR16;
 +        add = -(intptr_t)s->code_gen_ptr;
 +    } else {
 +        rel = R_PPC_ADDR32;
 +        add = 0;
 +    }
 +
 +    load_insn = LVX | VRT(ret) | RB(TCG_REG_TMP1);
 +    if (TCG_TARGET_REG_BITS == 64) {
 +        new_pool_l2(s, rel, s->code_ptr, add, val, val);
 +    } else {
 +        new_pool_l4(s, rel, s->code_ptr, add, val, val, val, val);
 +    }
 +
 +    if (USE_REG_TB) {
 +        tcg_out32(s, ADDI | TAI(TCG_REG_TMP1, 0, 0));
 +        load_insn |= RA(TCG_REG_TB);
 +    } else {
 +        tcg_out32(s, ADDIS | TAI(TCG_REG_TMP1, 0, 0));
 +        tcg_out32(s, ADDI | TAI(TCG_REG_TMP1, TCG_REG_TMP1, 0));
 +    }
 +    tcg_out32(s, load_insn);
  }
- static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret,
+-static uint16_t do_ld2_mmu(CPUArchState *env, target_ulong addr, MemOpIdx oi,
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
++static uint16_t do_ld2_mmu(CPUArchState *env, vaddr addr, MemOpIdx oi,
-         align = 3;
+                            uintptr_t ra, MMUAccessType access_type)
-         /* FALLTHRU */
+ {
-     default:
+     MMULookupLocals l;
--        if (rt != TCG_REG_R0) {
+@@ -XXX,XX +XXX,XX @@ tcg_target_ulong helper_lduw_mmu(CPUArchState *env, uint64_t addr,
-+        if (rt > TCG_REG_R0 && rt < TCG_REG_V0) {
+     return do_ld2_mmu(env, addr, oi, retaddr, MMU_DATA_LOAD);
              rs = rt;
              break;
          }
@@ -XXX,XX +XXX,XX @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
      }
      /* For unaligned, or very large offsets, use the indexed form.  */
 -    if (offset & align || offset != (int32_t)offset) {
 +    if (offset & align || offset != (int32_t)offset || opi == 0) {
          if (rs == base) {
              rs = TCG_REG_R0;
          }
          tcg_debug_assert(!is_store || rs != rt);
          tcg_out_movi(s, TCG_TYPE_PTR, rs, orig);
 -        tcg_out32(s, opx | TAB(rt, base, rs));
 +        tcg_out32(s, opx | TAB(rt & 31, base, rs));
          return;
      }
@@ -XXX,XX +XXX,XX @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
          base = rs;
      }
      if (opi != ADDI || base != rt || l0 != 0) {
 -        tcg_out32(s, opi | TAI(rt, base, l0));
 +        tcg_out32(s, opi | TAI(rt & 31, base, l0));
      }
  }
--static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
+-static uint32_t do_ld4_mmu(CPUArchState *env, target_ulong addr, MemOpIdx oi,
--                              TCGReg arg1, intptr_t arg2)
++static uint32_t do_ld4_mmu(CPUArchState *env, vaddr addr, MemOpIdx oi,
-+static void tcg_out_vsldoi(TCGContext *s, TCGReg ret,
+                            uintptr_t ra, MMUAccessType access_type)
 +                           TCGReg va, TCGReg vb, int shb)
  {
--    int opi, opx;
+     MMULookupLocals l;
--
+@@ -XXX,XX +XXX,XX @@ tcg_target_ulong helper_ldul_mmu(CPUArchState *env, uint64_t addr,
--    tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
+     return do_ld4_mmu(env, addr, oi, retaddr, MMU_DATA_LOAD);
 -    if (type == TCG_TYPE_I32) {
 -        opi = LWZ, opx = LWZX;
 -    } else {
 -        opi = LD, opx = LDX;
 -    }
 -    tcg_out_mem_long(s, opi, opx, ret, arg1, arg2);
 +    tcg_out32(s, VSLDOI | VRT(ret) | VRA(va) | VRB(vb) | (shb << 6));
  }
--static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
+-static uint64_t do_ld8_mmu(CPUArchState *env, target_ulong addr, MemOpIdx oi,
--                              TCGReg arg1, intptr_t arg2)
++static uint64_t do_ld8_mmu(CPUArchState *env, vaddr addr, MemOpIdx oi,
-+static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
+                            uintptr_t ra, MMUAccessType access_type)
 +                       TCGReg base, intptr_t offset)
  {
--    int opi, opx;
+     MMULookupLocals l;
-+    int shift;
+@@ -XXX,XX +XXX,XX @@ tcg_target_ulong helper_ldsl_mmu(CPUArchState *env, uint64_t addr,
+     return (int32_t)helper_ldul_mmu(env, addr, oi, retaddr);
 -    tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
 -    if (type == TCG_TYPE_I32) {
 -        opi = STW, opx = STWX;
 -    } else {
 -        opi = STD, opx = STDX;
 +    switch (type) {
 +    case TCG_TYPE_I32:
 +        if (ret < TCG_REG_V0) {
 +            tcg_out_mem_long(s, LWZ, LWZX, ret, base, offset);
 +            break;
 +        }
 +        tcg_debug_assert((offset & 3) == 0);
 +        tcg_out_mem_long(s, 0, LVEWX, ret, base, offset);
 +        shift = (offset - 4) & 0xc;
 +        if (shift) {
 +            tcg_out_vsldoi(s, ret, ret, ret, shift);
 +        }
 +        break;
 +    case TCG_TYPE_I64:
 +        if (ret < TCG_REG_V0) {
 +            tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
 +            tcg_out_mem_long(s, LD, LDX, ret, base, offset);
 +            break;
 +        }
 +        /* fallthru */
 +    case TCG_TYPE_V64:
 +        tcg_debug_assert(ret >= TCG_REG_V0);
 +        tcg_debug_assert((offset & 7) == 0);
 +        tcg_out_mem_long(s, 0, LVX, ret, base, offset & -16);
 +        if (offset & 8) {
 +            tcg_out_vsldoi(s, ret, ret, ret, 8);
 +        }
 +        break;
 +    case TCG_TYPE_V128:
 +        tcg_debug_assert(ret >= TCG_REG_V0);
 +        tcg_debug_assert((offset & 15) == 0);
 +        tcg_out_mem_long(s, 0, LVX, ret, base, offset);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
 +static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
 +                              TCGReg base, intptr_t offset)
 +{
 +    int shift;
 +
 +    switch (type) {
 +    case TCG_TYPE_I32:
 +        if (arg < TCG_REG_V0) {
 +            tcg_out_mem_long(s, STW, STWX, arg, base, offset);
 +            break;
 +        }
 +        tcg_debug_assert((offset & 3) == 0);
 +        shift = (offset - 4) & 0xc;
 +        if (shift) {
 +            tcg_out_vsldoi(s, TCG_VEC_TMP1, arg, arg, shift);
 +            arg = TCG_VEC_TMP1;
 +        }
 +        tcg_out_mem_long(s, 0, STVEWX, arg, base, offset);
 +        break;
 +    case TCG_TYPE_I64:
 +        if (arg < TCG_REG_V0) {
 +            tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
 +            tcg_out_mem_long(s, STD, STDX, arg, base, offset);
 +            break;
 +        }
 +        /* fallthru */
 +    case TCG_TYPE_V64:
 +        tcg_debug_assert(arg >= TCG_REG_V0);
 +        tcg_debug_assert((offset & 7) == 0);
 +        if (offset & 8) {
 +            tcg_out_vsldoi(s, TCG_VEC_TMP1, arg, arg, 8);
 +            arg = TCG_VEC_TMP1;
 +        }
 +        tcg_out_mem_long(s, 0, STVEWX, arg, base, offset);
 +        tcg_out_mem_long(s, 0, STVEWX, arg, base, offset + 4);
 +        break;
 +    case TCG_TYPE_V128:
 +        tcg_debug_assert(arg >= TCG_REG_V0);
 +        tcg_out_mem_long(s, 0, STVX, arg, base, offset);
 +        break;
 +    default:
 +        g_assert_not_reached();
      }
 -    tcg_out_mem_long(s, opi, opx, arg, arg1, arg2);
  }
- static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
+-static Int128 do_ld16_mmu(CPUArchState *env, target_ulong addr,
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
++static Int128 do_ld16_mmu(CPUArchState *env, vaddr addr,
+                           MemOpIdx oi, uintptr_t ra)
  int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
  {
--    g_assert_not_reached();
+     MMULookupLocals l;
-+    switch (opc) {
+@@ -XXX,XX +XXX,XX @@ static uint64_t do_st_mmio_leN(CPUArchState *env, MMULookupPageData *p,
-+    case INDEX_op_and_vec:
+                                uint64_t val_le, int mmu_idx, uintptr_t ra)
-+    case INDEX_op_or_vec:
+ {
-+    case INDEX_op_xor_vec:
+     CPUTLBEntryFull *full = p->full;
-+    case INDEX_op_andc_vec:
+-    target_ulong addr = p->addr;
-+    case INDEX_op_not_vec:
++    vaddr addr = p->addr;
-+        return 1;
+     int i, size = p->size;
-+    case INDEX_op_cmp_vec:
-+        return vece <= MO_32 ? -1 : 0;
+     QEMU_IOTHREAD_LOCK_GUARD();
-+    default:
+@@ -XXX,XX +XXX,XX @@ void helper_stb_mmu(CPUArchState *env, uint64_t addr, uint32_t val,
-+        return 0;
+     do_st_1(env, &l.page[0], val, l.mmu_idx, ra);
 +    }
  }
- static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
+-static void do_st2_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
-                             TCGReg dst, TCGReg src)
++static void do_st2_mmu(CPUArchState *env, vaddr addr, uint16_t val,
                         MemOpIdx oi, uintptr_t ra)
  {
--    g_assert_not_reached();
+     MMULookupLocals l;
-+    tcg_debug_assert(dst >= TCG_REG_V0);
+@@ -XXX,XX +XXX,XX @@ void helper_stw_mmu(CPUArchState *env, uint64_t addr, uint32_t val,
-+    tcg_debug_assert(src >= TCG_REG_V0);
+     do_st2_mmu(env, addr, val, oi, retaddr);
 +
 +    /*
 +     * Recall we use (or emulate) VSX integer loads, so the integer is
 +     * right justified within the left (zero-index) double-word.
 +     */
 +    switch (vece) {
 +    case MO_8:
 +        tcg_out32(s, VSPLTB | VRT(dst) | VRB(src) | (7 << 16));
 +        break;
 +    case MO_16:
 +        tcg_out32(s, VSPLTH | VRT(dst) | VRB(src) | (3 << 16));
 +        break;
 +    case MO_32:
 +        tcg_out32(s, VSPLTW | VRT(dst) | VRB(src) | (1 << 16));
 +        break;
 +    case MO_64:
 +        tcg_out_vsldoi(s, TCG_VEC_TMP1, src, src, 8);
 +        tcg_out_vsldoi(s, dst, TCG_VEC_TMP1, src, 8);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +    return true;
  }
- static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
+-static void do_st4_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
-                              TCGReg out, TCGReg base, intptr_t offset)
++static void do_st4_mmu(CPUArchState *env, vaddr addr, uint32_t val,
                         MemOpIdx oi, uintptr_t ra)
  {
--    g_assert_not_reached();
+     MMULookupLocals l;
-+    int elt;
+@@ -XXX,XX +XXX,XX @@ void helper_stl_mmu(CPUArchState *env, uint64_t addr, uint32_t val,
-+
+     do_st4_mmu(env, addr, val, oi, retaddr);
 +    tcg_debug_assert(out >= TCG_REG_V0);
 +    switch (vece) {
 +    case MO_8:
 +        tcg_out_mem_long(s, 0, LVEBX, out, base, offset);
 +        elt = extract32(offset, 0, 4);
 +#ifndef HOST_WORDS_BIGENDIAN
 +        elt ^= 15;
 +#endif
 +        tcg_out32(s, VSPLTB | VRT(out) | VRB(out) | (elt << 16));
 +        break;
 +    case MO_16:
 +        tcg_debug_assert((offset & 1) == 0);
 +        tcg_out_mem_long(s, 0, LVEHX, out, base, offset);
 +        elt = extract32(offset, 1, 3);
 +#ifndef HOST_WORDS_BIGENDIAN
 +        elt ^= 7;
 +#endif
 +        tcg_out32(s, VSPLTH | VRT(out) | VRB(out) | (elt << 16));
 +        break;
 +    case MO_32:
 +        tcg_debug_assert((offset & 3) == 0);
 +        tcg_out_mem_long(s, 0, LVEWX, out, base, offset);
 +        elt = extract32(offset, 2, 2);
 +#ifndef HOST_WORDS_BIGENDIAN
 +        elt ^= 3;
 +#endif
 +        tcg_out32(s, VSPLTW | VRT(out) | VRB(out) | (elt << 16));
 +        break;
 +    case MO_64:
 +        tcg_debug_assert((offset & 7) == 0);
 +        tcg_out_mem_long(s, 0, LVX, out, base, offset & -16);
 +        tcg_out_vsldoi(s, TCG_VEC_TMP1, out, out, 8);
 +        elt = extract32(offset, 3, 1);
 +#ifndef HOST_WORDS_BIGENDIAN
 +        elt = !elt;
 +#endif
 +        if (elt) {
 +            tcg_out_vsldoi(s, out, out, TCG_VEC_TMP1, 8);
 +        } else {
 +            tcg_out_vsldoi(s, out, TCG_VEC_TMP1, out, 8);
 +        }
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +    return true;
  }
- static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
+-static void do_st8_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
-                            unsigned vecl, unsigned vece,
++static void do_st8_mmu(CPUArchState *env, vaddr addr, uint64_t val,
-                            const TCGArg *args, const int *const_args)
+                        MemOpIdx oi, uintptr_t ra)
  {
--    g_assert_not_reached();
+     MMULookupLocals l;
-+    static const uint32_t
+@@ -XXX,XX +XXX,XX @@ void helper_stq_mmu(CPUArchState *env, uint64_t addr, uint64_t val,
-+        eq_op[4]  = { VCMPEQUB, VCMPEQUH, VCMPEQUW, 0 },
+     do_st8_mmu(env, addr, val, oi, retaddr);
 +        gts_op[4] = { VCMPGTSB, VCMPGTSH, VCMPGTSW, 0 },
 +        gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, 0 };
 +
 +    TCGType type = vecl + TCG_TYPE_V64;
 +    TCGArg a0 = args[0], a1 = args[1], a2 = args[2];
 +    uint32_t insn;
 +
 +    switch (opc) {
 +    case INDEX_op_ld_vec:
 +        tcg_out_ld(s, type, a0, a1, a2);
 +        return;
 +    case INDEX_op_st_vec:
 +        tcg_out_st(s, type, a0, a1, a2);
 +        return;
 +    case INDEX_op_dupm_vec:
 +        tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
 +        return;
 +
 +    case INDEX_op_and_vec:
 +        insn = VAND;
 +        break;
 +    case INDEX_op_or_vec:
 +        insn = VOR;
 +        break;
 +    case INDEX_op_xor_vec:
 +        insn = VXOR;
 +        break;
 +    case INDEX_op_andc_vec:
 +        insn = VANDC;
 +        break;
 +    case INDEX_op_not_vec:
 +        insn = VNOR;
 +        a2 = a1;
 +        break;
 +
 +    case INDEX_op_cmp_vec:
 +        switch (args[3]) {
 +        case TCG_COND_EQ:
 +            insn = eq_op[vece];
 +            break;
 +        case TCG_COND_GT:
 +            insn = gts_op[vece];
 +            break;
 +        case TCG_COND_GTU:
 +            insn = gtu_op[vece];
 +            break;
 +        default:
 +            g_assert_not_reached();
 +        }
 +        break;
 +
 +    case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
 +    case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
 +    case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
 +    default:
 +        g_assert_not_reached();
 +    }
 +
 +    tcg_debug_assert(insn != 0);
 +    tcg_out32(s, insn | VRT(a0) | VRA(a1) | VRB(a2));
 +}
 +
 +static void expand_vec_cmp(TCGType type, unsigned vece, TCGv_vec v0,
 +                           TCGv_vec v1, TCGv_vec v2, TCGCond cond)
 +{
 +    bool need_swap = false, need_inv = false;
 +
 +    tcg_debug_assert(vece <= MO_32);
 +
 +    switch (cond) {
 +    case TCG_COND_EQ:
 +    case TCG_COND_GT:
 +    case TCG_COND_GTU:
 +        break;
 +    case TCG_COND_NE:
 +    case TCG_COND_LE:
 +    case TCG_COND_LEU:
 +        need_inv = true;
 +        break;
 +    case TCG_COND_LT:
 +    case TCG_COND_LTU:
 +        need_swap = true;
 +        break;
 +    case TCG_COND_GE:
 +    case TCG_COND_GEU:
 +        need_swap = need_inv = true;
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +
 +    if (need_inv) {
 +        cond = tcg_invert_cond(cond);
 +    }
 +    if (need_swap) {
 +        TCGv_vec t1;
 +        t1 = v1, v1 = v2, v2 = t1;
 +        cond = tcg_swap_cond(cond);
 +    }
 +
 +    vec_gen_4(INDEX_op_cmp_vec, type, vece, tcgv_vec_arg(v0),
 +              tcgv_vec_arg(v1), tcgv_vec_arg(v2), cond);
 +
 +    if (need_inv) {
 +        tcg_gen_not_vec(vece, v0, v0);
 +    }
  }
- void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
+-static void do_st16_mmu(CPUArchState *env, target_ulong addr, Int128 val,
-                        TCGArg a0, ...)
++static void do_st16_mmu(CPUArchState *env, vaddr addr, Int128 val,
                          MemOpIdx oi, uintptr_t ra)
  {
--    g_assert_not_reached();
+     MMULookupLocals l;
 +    va_list va;
 +    TCGv_vec v0, v1, v2;
 +
 +    va_start(va, a0);
 +    v0 = temp_tcgv_vec(arg_temp(a0));
 +    v1 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
 +    v2 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
 +
 +    switch (opc) {
 +    case INDEX_op_cmp_vec:
 +        expand_vec_cmp(type, vece, v0, v1, v2, va_arg(va, TCGArg));
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +    va_end(va);
  }
  static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
@@ -XXX,XX +XXX,XX @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
          = { .args_ct_str = { "r", "r", "r", "r", "rI", "rZM" } };
      static const TCGTargetOpDef sub2
          = { .args_ct_str = { "r", "r", "rI", "rZM", "r", "r" } };
 +    static const TCGTargetOpDef v_r = { .args_ct_str = { "v", "r" } };
 +    static const TCGTargetOpDef v_v = { .args_ct_str = { "v", "v" } };
 +    static const TCGTargetOpDef v_v_v = { .args_ct_str = { "v", "v", "v" } };
      switch (op) {
      case INDEX_op_goto_ptr:
@@ -XXX,XX +XXX,XX @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
          return (TCG_TARGET_REG_BITS == 64 ? &S_S
                  : TARGET_LONG_BITS == 32 ? &S_S_S : &S_S_S_S);
 +    case INDEX_op_and_vec:
 +    case INDEX_op_or_vec:
 +    case INDEX_op_xor_vec:
 +    case INDEX_op_andc_vec:
 +    case INDEX_op_orc_vec:
 +    case INDEX_op_cmp_vec:
 +        return &v_v_v;
 +    case INDEX_op_not_vec:
 +    case INDEX_op_dup_vec:
 +        return &v_v;
 +    case INDEX_op_ld_vec:
 +    case INDEX_op_st_vec:
 +    case INDEX_op_dupm_vec:
 +        return &v_r;
 +
      default:
          return NULL;
      }
 --
-.17.1
+.34.1

-[PULL 11/23] tcg/ppc: Add support for vector saturated add/subtract
+[PULL 06/22] accel/tcg/cpu-exec.c: Widen pc to vaddr
-Add support for vector saturated add/subtract using Altivec
+From: Anton Johansson <anjo@rev.ng>
 instructions:
 VADDSBS, VADDSHS, VADDSWS, VADDUBS, VADDUHS, VADDUWS, and
 VSUBSBS, VSUBSHS, VSUBSWS, VSUBUBS, VSUBUHS, VSUBUWS.
+Signed-off-by: Anton Johansson <anjo@rev.ng>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-Id: <20230621135633.1649-7-anjo@rev.ng>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
 ---
- tcg/ppc/tcg-target.h     |  2 +-
+ accel/tcg/cpu-exec.c | 34 +++++++++++++++++-----------------
- tcg/ppc/tcg-target.inc.c | 36 ++++++++++++++++++++++++++++++++++++
+file changed, 17 insertions(+), 17 deletions(-)
 files changed, 37 insertions(+), 1 deletion(-)
-diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
+diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.h
+--- a/accel/tcg/cpu-exec.c
-+++ b/tcg/ppc/tcg-target.h
++++ b/accel/tcg/cpu-exec.c
-@@ -XXX,XX +XXX,XX @@ extern bool have_altivec;
+@@ -XXX,XX +XXX,XX @@ uint32_t curr_cflags(CPUState *cpu)
- #define TCG_TARGET_HAS_shv_vec          0
+ }
- #define TCG_TARGET_HAS_cmp_vec          1
- #define TCG_TARGET_HAS_mul_vec          0
+ struct tb_desc {
--#define TCG_TARGET_HAS_sat_vec          0
+-    target_ulong pc;
-+#define TCG_TARGET_HAS_sat_vec          1
+-    target_ulong cs_base;
- #define TCG_TARGET_HAS_minmax_vec       1
++    vaddr pc;
- #define TCG_TARGET_HAS_bitsel_vec       0
++    uint64_t cs_base;
- #define TCG_TARGET_HAS_cmpsel_vec       0
+     CPUArchState *env;
-diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
+     tb_page_addr_t page_addr0;
-index XXXXXXX..XXXXXXX 100644
+     uint32_t flags;
---- a/tcg/ppc/tcg-target.inc.c
+@@ -XXX,XX +XXX,XX @@ static bool tb_lookup_cmp(const void *p, const void *d)
-+++ b/tcg/ppc/tcg-target.inc.c
+             return true;
-@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
+         } else {
- #define STVX       XO31(231)
+             tb_page_addr_t phys_page1;
- #define STVEWX     XO31(199)
+-            target_ulong virt_page1;
++            vaddr virt_page1;
-+#define VADDSBS    VX4(768)
-+#define VADDUBS    VX4(512)
+             /*
- #define VADDUBM    VX4(0)
+              * We know that the first page matched, and an otherwise valid TB
-+#define VADDSHS    VX4(832)
+@@ -XXX,XX +XXX,XX @@ static bool tb_lookup_cmp(const void *p, const void *d)
-+#define VADDUHS    VX4(576)
+     return false;
- #define VADDUHM    VX4(64)
+ }
-+#define VADDSWS    VX4(896)
-+#define VADDUWS    VX4(640)
+-static TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
- #define VADDUWM    VX4(128)
+-                                          target_ulong cs_base, uint32_t flags,
++static TranslationBlock *tb_htable_lookup(CPUState *cpu, vaddr pc,
-+#define VSUBSBS    VX4(1792)
++                                          uint64_t cs_base, uint32_t flags,
-+#define VSUBUBS    VX4(1536)
+                                           uint32_t cflags)
- #define VSUBUBM    VX4(1024)
+ {
-+#define VSUBSHS    VX4(1856)
+     tb_page_addr_t phys_pc;
-+#define VSUBUHS    VX4(1600)
+@@ -XXX,XX +XXX,XX @@ static TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
- #define VSUBUHM    VX4(1088)
+ }
-+#define VSUBSWS    VX4(1920)
-+#define VSUBUWS    VX4(1664)
+ /* Might cause an exception, so have a longjmp destination ready */
- #define VSUBUWM    VX4(1152)
+-static inline TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc,
+-                                          target_ulong cs_base,
- #define VMAXSB     VX4(258)
+-                                          uint32_t flags, uint32_t cflags)
-@@ -XXX,XX +XXX,XX @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
++static inline TranslationBlock *tb_lookup(CPUState *cpu, vaddr pc,
-     case INDEX_op_smin_vec:
++                                          uint64_t cs_base, uint32_t flags,
-     case INDEX_op_umax_vec:
++                                          uint32_t cflags)
-     case INDEX_op_umin_vec:
+ {
-+    case INDEX_op_ssadd_vec:
+     TranslationBlock *tb;
-+    case INDEX_op_sssub_vec:
+     CPUJumpCache *jc;
-+    case INDEX_op_usadd_vec:
+@@ -XXX,XX +XXX,XX @@ static inline TranslationBlock *tb_lookup(CPUState *cpu, target_ulong pc,
-+    case INDEX_op_ussub_vec:
+     return tb;
-         return vece <= MO_32;
+ }
-     case INDEX_op_cmp_vec:
-         return vece <= MO_32 ? -1 : 0;
+-static void log_cpu_exec(target_ulong pc, CPUState *cpu,
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
++static void log_cpu_exec(vaddr pc, CPUState *cpu,
-         eq_op[4]  = { VCMPEQUB, VCMPEQUH, VCMPEQUW, 0 },
+                          const TranslationBlock *tb)
-         gts_op[4] = { VCMPGTSB, VCMPGTSH, VCMPGTSW, 0 },
+ {
-         gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, 0 },
+     if (qemu_log_in_addr_range(pc)) {
-+        ssadd_op[4] = { VADDSBS, VADDSHS, VADDSWS, 0 },
+         qemu_log_mask(CPU_LOG_EXEC,
-+        usadd_op[4] = { VADDUBS, VADDUHS, VADDUWS, 0 },
+                       "Trace %d: %p [%08" PRIx64
-+        sssub_op[4] = { VSUBSBS, VSUBSHS, VSUBSWS, 0 },
+-                      "/" TARGET_FMT_lx "/%08x/%08x] %s\n",
-+        ussub_op[4] = { VSUBUBS, VSUBUHS, VSUBUWS, 0 },
++                      "/%" VADDR_PRIx "/%08x/%08x] %s\n",
-         umin_op[4] = { VMINUB, VMINUH, VMINUW, 0 },
+                       cpu->cpu_index, tb->tc.ptr, tb->cs_base, pc,
-         smin_op[4] = { VMINSB, VMINSH, VMINSW, 0 },
+                       tb->flags, tb->cflags, lookup_symbol(pc));
-         umax_op[4] = { VMAXUB, VMAXUH, VMAXUW, 0 },
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
+@@ -XXX,XX +XXX,XX @@ static void log_cpu_exec(target_ulong pc, CPUState *cpu,
-     case INDEX_op_sub_vec:
+     }
-         insn = sub_op[vece];
+ }
-         break;
-+    case INDEX_op_ssadd_vec:
+-static bool check_for_breakpoints_slow(CPUState *cpu, target_ulong pc,
-+        insn = ssadd_op[vece];
++static bool check_for_breakpoints_slow(CPUState *cpu, vaddr pc,
-+        break;
+                                        uint32_t *cflags)
-+    case INDEX_op_sssub_vec:
+ {
-+        insn = sssub_op[vece];
+     CPUBreakpoint *bp;
-+        break;
+@@ -XXX,XX +XXX,XX @@ static bool check_for_breakpoints_slow(CPUState *cpu, target_ulong pc,
-+    case INDEX_op_usadd_vec:
+     return false;
-+        insn = usadd_op[vece];
+ }
-+        break;
-+    case INDEX_op_ussub_vec:
+-static inline bool check_for_breakpoints(CPUState *cpu, target_ulong pc,
-+        insn = ussub_op[vece];
++static inline bool check_for_breakpoints(CPUState *cpu, vaddr pc,
-+        break;
+                                          uint32_t *cflags)
-     case INDEX_op_smin_vec:
+ {
-         insn = smin_op[vece];
+     return unlikely(!QTAILQ_EMPTY(&cpu->breakpoints)) &&
-         break;
+@@ -XXX,XX +XXX,XX @@ cpu_tb_exec(CPUState *cpu, TranslationBlock *itb, int *tb_exit)
-@@ -XXX,XX +XXX,XX @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+             cc->set_pc(cpu, last_tb->pc);
-     case INDEX_op_andc_vec:
+         }
-     case INDEX_op_orc_vec:
+         if (qemu_loglevel_mask(CPU_LOG_EXEC)) {
-     case INDEX_op_cmp_vec:
+-            target_ulong pc = log_pc(cpu, last_tb);
-+    case INDEX_op_ssadd_vec:
++            vaddr pc = log_pc(cpu, last_tb);
-+    case INDEX_op_sssub_vec:
+             if (qemu_log_in_addr_range(pc)) {
-+    case INDEX_op_usadd_vec:
+-                qemu_log("Stopped execution of TB chain before %p ["
-+    case INDEX_op_ussub_vec:
+-                         TARGET_FMT_lx "] %s\n",
-     case INDEX_op_smax_vec:
++                qemu_log("Stopped execution of TB chain before %p [%"
-     case INDEX_op_smin_vec:
++                         VADDR_PRIx "] %s\n",
-     case INDEX_op_umax_vec:
+                          last_tb->tc.ptr, pc, lookup_symbol(pc));
              }
          }
@@ -XXX,XX +XXX,XX @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
  }
  static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
 -                                    target_ulong pc,
 -                                    TranslationBlock **last_tb, int *tb_exit)
 +                                    vaddr pc, TranslationBlock **last_tb,
 +                                    int *tb_exit)
  {
      int32_t insns_left;
 --
-.17.1
+.34.1

-[PULL 20/23] tcg/ppc: Update vector support for v3.00 Altivec
+[PULL 07/22] accel/tcg: Widen pc to vaddr in CPUJumpCache
-These new instructions are conditional only on MSR.VEC and
+From: Anton Johansson <anjo@rev.ng>
 are thus part of the Altivec instruction set, and not VSX.
 This includes negation and compare not equal.
-Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
+Related functions dealing with the jump cache are also updated.
 Signed-off-by: Anton Johansson <anjo@rev.ng>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-Id: <20230621135633.1649-8-anjo@rev.ng>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/ppc/tcg-target.h     |  2 +-
+ accel/tcg/tb-hash.h      | 12 ++++++------
- tcg/ppc/tcg-target.inc.c | 23 +++++++++++++++++++++++
+ accel/tcg/tb-jmp-cache.h |  2 +-
-files changed, 24 insertions(+), 1 deletion(-)
+ accel/tcg/cputlb.c       |  2 +-
 files changed, 8 insertions(+), 8 deletions(-)
-diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
+diff --git a/accel/tcg/tb-hash.h b/accel/tcg/tb-hash.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.h
+--- a/accel/tcg/tb-hash.h
-+++ b/tcg/ppc/tcg-target.h
++++ b/accel/tcg/tb-hash.h
-@@ -XXX,XX +XXX,XX @@ extern bool have_vsx;
+@@ -XXX,XX +XXX,XX @@
- #define TCG_TARGET_HAS_andc_vec         1
+ #define TB_JMP_ADDR_MASK (TB_JMP_PAGE_SIZE - 1)
- #define TCG_TARGET_HAS_orc_vec          have_isa_2_07
+ #define TB_JMP_PAGE_MASK (TB_JMP_CACHE_SIZE - TB_JMP_PAGE_SIZE)
- #define TCG_TARGET_HAS_not_vec          1
--#define TCG_TARGET_HAS_neg_vec          0
+-static inline unsigned int tb_jmp_cache_hash_page(target_ulong pc)
-+#define TCG_TARGET_HAS_neg_vec          have_isa_3_00
++static inline unsigned int tb_jmp_cache_hash_page(vaddr pc)
- #define TCG_TARGET_HAS_abs_vec          0
+ {
- #define TCG_TARGET_HAS_shi_vec          0
+-    target_ulong tmp;
- #define TCG_TARGET_HAS_shs_vec          0
++    vaddr tmp;
-diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
+     tmp = pc ^ (pc >> (TARGET_PAGE_BITS - TB_JMP_PAGE_BITS));
      return (tmp >> (TARGET_PAGE_BITS - TB_JMP_PAGE_BITS)) & TB_JMP_PAGE_MASK;
  }
 -static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
 +static inline unsigned int tb_jmp_cache_hash_func(vaddr pc)
  {
 -    target_ulong tmp;
 +    vaddr tmp;
      tmp = pc ^ (pc >> (TARGET_PAGE_BITS - TB_JMP_PAGE_BITS));
      return (((tmp >> (TARGET_PAGE_BITS - TB_JMP_PAGE_BITS)) & TB_JMP_PAGE_MASK)
             | (tmp & TB_JMP_ADDR_MASK));
@@ -XXX,XX +XXX,XX @@ static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
  #else
  /* In user-mode we can get better hashing because we do not have a TLB */
 -static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
 +static inline unsigned int tb_jmp_cache_hash_func(vaddr pc)
  {
      return (pc ^ (pc >> TB_JMP_CACHE_BITS)) & (TB_JMP_CACHE_SIZE - 1);
  }
@@ -XXX,XX +XXX,XX @@ static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
  #endif /* CONFIG_SOFTMMU */
  static inline
 -uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc,
 +uint32_t tb_hash_func(tb_page_addr_t phys_pc, vaddr pc,
                        uint32_t flags, uint64_t flags2, uint32_t cf_mask)
  {
      return qemu_xxhash8(phys_pc, pc, flags2, flags, cf_mask);
 diff --git a/accel/tcg/tb-jmp-cache.h b/accel/tcg/tb-jmp-cache.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.inc.c
+--- a/accel/tcg/tb-jmp-cache.h
-+++ b/tcg/ppc/tcg-target.inc.c
++++ b/accel/tcg/tb-jmp-cache.h
-@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
+@@ -XXX,XX +XXX,XX @@ struct CPUJumpCache {
- #define VSUBUWM    VX4(1152)
+     struct rcu_head rcu;
- #define VSUBUDM    VX4(1216)      /* v2.07 */
+     struct {
+         TranslationBlock *tb;
-+#define VNEGW      (VX4(1538) | (6 << 16))  /* v3.00 */
+-        target_ulong pc;
-+#define VNEGD      (VX4(1538) | (7 << 16))  /* v3.00 */
++        vaddr pc;
-+
+     } array[TB_JMP_CACHE_SIZE];
- #define VMAXSB     VX4(258)
+ };
- #define VMAXSH     VX4(322)
- #define VMAXSW     VX4(386)
+diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
-@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
+index XXXXXXX..XXXXXXX 100644
- #define VCMPGTUH   VX4(582)
+--- a/accel/tcg/cputlb.c
- #define VCMPGTUW   VX4(646)
++++ b/accel/tcg/cputlb.c
- #define VCMPGTUD   VX4(711)       /* v2.07 */
+@@ -XXX,XX +XXX,XX @@ static void tlb_window_reset(CPUTLBDesc *desc, int64_t ns,
-+#define VCMPNEB    VX4(7)         /* v3.00 */
+     desc->window_max_entries = max_entries;
-+#define VCMPNEH    VX4(71)        /* v3.00 */
+ }
-+#define VCMPNEW    VX4(135)       /* v3.00 */
+-static void tb_jmp_cache_clear_page(CPUState *cpu, target_ulong page_addr)
- #define VSLB       VX4(260)
++static void tb_jmp_cache_clear_page(CPUState *cpu, vaddr page_addr)
- #define VSLH       VX4(324)
+ {
-@@ -XXX,XX +XXX,XX @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
+     CPUJumpCache *jc = cpu->tb_jmp_cache;
-     case INDEX_op_shri_vec:
+     int i, i0;
      case INDEX_op_sari_vec:
          return vece <= MO_32 || have_isa_2_07 ? -1 : 0;
 +    case INDEX_op_neg_vec:
 +        return vece >= MO_32 && have_isa_3_00;
      case INDEX_op_mul_vec:
          switch (vece) {
          case MO_8:
@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
      static const uint32_t
          add_op[4] = { VADDUBM, VADDUHM, VADDUWM, VADDUDM },
          sub_op[4] = { VSUBUBM, VSUBUHM, VSUBUWM, VSUBUDM },
 +        neg_op[4] = { 0, 0, VNEGW, VNEGD },
          eq_op[4]  = { VCMPEQUB, VCMPEQUH, VCMPEQUW, VCMPEQUD },
 +        ne_op[4]  = { VCMPNEB, VCMPNEH, VCMPNEW, 0 },
          gts_op[4] = { VCMPGTSB, VCMPGTSH, VCMPGTSW, VCMPGTSD },
          gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, VCMPGTUD },
          ssadd_op[4] = { VADDSBS, VADDSHS, VADDSWS, 0 },
@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
      case INDEX_op_sub_vec:
          insn = sub_op[vece];
          break;
 +    case INDEX_op_neg_vec:
 +        insn = neg_op[vece];
 +        a2 = a1;
 +        a1 = 0;
 +        break;
      case INDEX_op_mul_vec:
          tcg_debug_assert(vece == MO_32 && have_isa_2_07);
          insn = VMULUWM;
@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
          case TCG_COND_EQ:
              insn = eq_op[vece];
              break;
 +        case TCG_COND_NE:
 +            insn = ne_op[vece];
 +            break;
          case TCG_COND_GT:
              insn = gts_op[vece];
              break;
@@ -XXX,XX +XXX,XX @@ static void expand_vec_cmp(TCGType type, unsigned vece, TCGv_vec v0,
      case TCG_COND_GTU:
          break;
      case TCG_COND_NE:
 +        if (have_isa_3_00 && vece <= MO_32) {
 +            break;
 +        }
 +        /* fall through */
      case TCG_COND_LE:
      case TCG_COND_LEU:
          need_inv = true;
@@ -XXX,XX +XXX,XX @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
      case INDEX_op_dup2_vec:
          return &v_v_v;
      case INDEX_op_not_vec:
 +    case INDEX_op_neg_vec:
      case INDEX_op_dup_vec:
          return &v_v;
      case INDEX_op_ld_vec:
 --
-.17.1
+.34.1

-[PULL 13/23] tcg/ppc: Support vector multiply
+[PULL 08/22] accel: Replace target_ulong with vaddr in probe_*()
-For Altivec, this is always an expansion.
+From: Anton Johansson <anjo@rev.ng>
+Functions for probing memory accesses (and functions that call these)
+are updated to take a vaddr for guest virtual addresses over
+target_ulong.
+Signed-off-by: Anton Johansson <anjo@rev.ng>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-Id: <20230621135633.1649-9-anjo@rev.ng>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
 ---
- tcg/ppc/tcg-target.h     |   2 +-
+ include/exec/exec-all.h | 14 +++++++-------
- tcg/ppc/tcg-target.opc.h |   8 +++
+ accel/stubs/tcg-stub.c  |  4 ++--
- tcg/ppc/tcg-target.inc.c | 113 ++++++++++++++++++++++++++++++++++++++-
+ accel/tcg/cputlb.c      | 12 ++++++------
-files changed, 121 insertions(+), 2 deletions(-)
+ accel/tcg/user-exec.c   |  8 ++++----
 files changed, 19 insertions(+), 19 deletions(-)
-diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
+diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.h
+--- a/include/exec/exec-all.h
-+++ b/tcg/ppc/tcg-target.h
++++ b/include/exec/exec-all.h
-@@ -XXX,XX +XXX,XX @@ extern bool have_altivec;
+@@ -XXX,XX +XXX,XX @@ static inline void tlb_flush_range_by_mmuidx_all_cpus_synced(CPUState *cpu,
- #define TCG_TARGET_HAS_shs_vec          0
+  * Finally, return the host address for a page that is backed by RAM,
- #define TCG_TARGET_HAS_shv_vec          1
+  * or NULL if the page requires I/O.
- #define TCG_TARGET_HAS_cmp_vec          1
+  */
--#define TCG_TARGET_HAS_mul_vec          0
+-void *probe_access(CPUArchState *env, target_ulong addr, int size,
-+#define TCG_TARGET_HAS_mul_vec          1
++void *probe_access(CPUArchState *env, vaddr addr, int size,
- #define TCG_TARGET_HAS_sat_vec          1
+                    MMUAccessType access_type, int mmu_idx, uintptr_t retaddr);
- #define TCG_TARGET_HAS_minmax_vec       1
- #define TCG_TARGET_HAS_bitsel_vec       0
+-static inline void *probe_write(CPUArchState *env, target_ulong addr, int size,
-diff --git a/tcg/ppc/tcg-target.opc.h b/tcg/ppc/tcg-target.opc.h
++static inline void *probe_write(CPUArchState *env, vaddr addr, int size,
                                  int mmu_idx, uintptr_t retaddr)
  {
      return probe_access(env, addr, size, MMU_DATA_STORE, mmu_idx, retaddr);
  }
 -static inline void *probe_read(CPUArchState *env, target_ulong addr, int size,
 +static inline void *probe_read(CPUArchState *env, vaddr addr, int size,
                                 int mmu_idx, uintptr_t retaddr)
  {
      return probe_access(env, addr, size, MMU_DATA_LOAD, mmu_idx, retaddr);
@@ -XXX,XX +XXX,XX @@ static inline void *probe_read(CPUArchState *env, target_ulong addr, int size,
   * Do handle clean pages, so exclude TLB_NOTDIRY from the returned flags.
   * For simplicity, all "mmio-like" flags are folded to TLB_MMIO.
   */
 -int probe_access_flags(CPUArchState *env, target_ulong addr, int size,
 +int probe_access_flags(CPUArchState *env, vaddr addr, int size,
                         MMUAccessType access_type, int mmu_idx,
                         bool nonfault, void **phost, uintptr_t retaddr);
@@ -XXX,XX +XXX,XX @@ int probe_access_flags(CPUArchState *env, target_ulong addr, int size,
   * and must be consumed or copied immediately, before any further
   * access or changes to TLB @mmu_idx.
   */
 -int probe_access_full(CPUArchState *env, target_ulong addr, int size,
 +int probe_access_full(CPUArchState *env, vaddr addr, int size,
                        MMUAccessType access_type, int mmu_idx,
                        bool nonfault, void **phost,
                        CPUTLBEntryFull **pfull, uintptr_t retaddr);
@@ -XXX,XX +XXX,XX @@ struct MemoryRegionSection *iotlb_to_section(CPUState *cpu,
   *
   * Note: this function can trigger an exception.
   */
 -tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
 +tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, vaddr addr,
                                          void **hostp);
  /**
@@ -XXX,XX +XXX,XX @@ tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
   * Note: this function can trigger an exception.
   */
  static inline tb_page_addr_t get_page_addr_code(CPUArchState *env,
 -                                                target_ulong addr)
 +                                                vaddr addr)
  {
      return get_page_addr_code_hostp(env, addr, NULL);
  }
 diff --git a/accel/stubs/tcg-stub.c b/accel/stubs/tcg-stub.c
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.opc.h
+--- a/accel/stubs/tcg-stub.c
-+++ b/tcg/ppc/tcg-target.opc.h
++++ b/accel/stubs/tcg-stub.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ void tcg_flush_jmp_cache(CPUState *cpu)
-  * emitted by tcg_expand_vec_op.  For those familiar with GCC internals,
+ {
-  * consider these to be UNSPEC with names.
+ }
-  */
-+
+-int probe_access_flags(CPUArchState *env, target_ulong addr, int size,
-+DEF(ppc_mrgh_vec, 1, 2, 0, IMPLVEC)
++int probe_access_flags(CPUArchState *env, vaddr addr, int size,
-+DEF(ppc_mrgl_vec, 1, 2, 0, IMPLVEC)
+                        MMUAccessType access_type, int mmu_idx,
-+DEF(ppc_msum_vec, 1, 3, 0, IMPLVEC)
+                        bool nonfault, void **phost, uintptr_t retaddr)
-+DEF(ppc_muleu_vec, 1, 2, 0, IMPLVEC)
+ {
-+DEF(ppc_mulou_vec, 1, 2, 0, IMPLVEC)
+      g_assert_not_reached();
-+DEF(ppc_pkum_vec, 1, 2, 0, IMPLVEC)
+ }
-+DEF(ppc_rotl_vec, 1, 2, 0, IMPLVEC)
-diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
+-void *probe_access(CPUArchState *env, target_ulong addr, int size,
 +void *probe_access(CPUArchState *env, vaddr addr, int size,
                     MMUAccessType access_type, int mmu_idx, uintptr_t retaddr)
  {
       /* Handled by hardware accelerator. */
 diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.inc.c
+--- a/accel/tcg/cputlb.c
-+++ b/tcg/ppc/tcg-target.inc.c
++++ b/accel/tcg/cputlb.c
-@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
+@@ -XXX,XX +XXX,XX @@ static void notdirty_write(CPUState *cpu, vaddr mem_vaddr, unsigned size,
  #define VSRAB      VX4(772)
  #define VSRAH      VX4(836)
  #define VSRAW      VX4(900)
 +#define VRLB       VX4(4)
 +#define VRLH       VX4(68)
 +#define VRLW       VX4(132)
 +
 +#define VMULEUB    VX4(520)
 +#define VMULEUH    VX4(584)
 +#define VMULOUB    VX4(8)
 +#define VMULOUH    VX4(72)
 +#define VMSUMUHM   VX4(38)
 +
 +#define VMRGHB     VX4(12)
 +#define VMRGHH     VX4(76)
 +#define VMRGHW     VX4(140)
 +#define VMRGLB     VX4(268)
 +#define VMRGLH     VX4(332)
 +#define VMRGLW     VX4(396)
 +
 +#define VPKUHUM    VX4(14)
 +#define VPKUWUM    VX4(78)
  #define VAND       VX4(1028)
  #define VANDC      VX4(1092)
@@ -XXX,XX +XXX,XX @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
      case INDEX_op_sarv_vec:
          return vece <= MO_32;
      case INDEX_op_cmp_vec:
 +    case INDEX_op_mul_vec:
      case INDEX_op_shli_vec:
      case INDEX_op_shri_vec:
      case INDEX_op_sari_vec:
@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
          smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, 0 },
          shlv_op[4] = { VSLB, VSLH, VSLW, 0 },
          shrv_op[4] = { VSRB, VSRH, VSRW, 0 },
 -        sarv_op[4] = { VSRAB, VSRAH, VSRAW, 0 };
 +        sarv_op[4] = { VSRAB, VSRAH, VSRAW, 0 },
 +        mrgh_op[4] = { VMRGHB, VMRGHH, VMRGHW, 0 },
 +        mrgl_op[4] = { VMRGLB, VMRGLH, VMRGLW, 0 },
 +        muleu_op[4] = { VMULEUB, VMULEUH, 0, 0 },
 +        mulou_op[4] = { VMULOUB, VMULOUH, 0, 0 },
 +        pkum_op[4] = { VPKUHUM, VPKUWUM, 0, 0 },
 +        rotl_op[4] = { VRLB, VRLH, VRLW, 0 };
      TCGType type = vecl + TCG_TYPE_V64;
      TCGArg a0 = args[0], a1 = args[1], a2 = args[2];
@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
          }
          break;
 +    case INDEX_op_ppc_mrgh_vec:
 +        insn = mrgh_op[vece];
 +        break;
 +    case INDEX_op_ppc_mrgl_vec:
 +        insn = mrgl_op[vece];
 +        break;
 +    case INDEX_op_ppc_muleu_vec:
 +        insn = muleu_op[vece];
 +        break;
 +    case INDEX_op_ppc_mulou_vec:
 +        insn = mulou_op[vece];
 +        break;
 +    case INDEX_op_ppc_pkum_vec:
 +        insn = pkum_op[vece];
 +        break;
 +    case INDEX_op_ppc_rotl_vec:
 +        insn = rotl_op[vece];
 +        break;
 +    case INDEX_op_ppc_msum_vec:
 +        tcg_debug_assert(vece == MO_16);
 +        tcg_out32(s, VMSUMUHM | VRT(a0) | VRA(a1) | VRB(a2) | VRC(args[3]));
 +        return;
 +
      case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
      case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
      case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
@@ -XXX,XX +XXX,XX @@ static void expand_vec_cmp(TCGType type, unsigned vece, TCGv_vec v0,
      }
  }
-+static void expand_vec_mul(TCGType type, unsigned vece, TCGv_vec v0,
+-static int probe_access_internal(CPUArchState *env, target_ulong addr,
-+                           TCGv_vec v1, TCGv_vec v2)
++static int probe_access_internal(CPUArchState *env, vaddr addr,
-+{
+                                  int fault_size, MMUAccessType access_type,
-+    TCGv_vec t1 = tcg_temp_new_vec(type);
+                                  int mmu_idx, bool nonfault,
-+    TCGv_vec t2 = tcg_temp_new_vec(type);
+                                  void **phost, CPUTLBEntryFull **pfull,
-+    TCGv_vec t3, t4;
+@@ -XXX,XX +XXX,XX @@ static int probe_access_internal(CPUArchState *env, target_ulong addr,
-+
+     uintptr_t index = tlb_index(env, mmu_idx, addr);
-+    switch (vece) {
+     CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
-+    case MO_8:
+     uint64_t tlb_addr = tlb_read_idx(entry, access_type);
-+    case MO_16:
+-    target_ulong page_addr = addr & TARGET_PAGE_MASK;
-+        vec_gen_3(INDEX_op_ppc_muleu_vec, type, vece, tcgv_vec_arg(t1),
++    vaddr page_addr = addr & TARGET_PAGE_MASK;
-+                  tcgv_vec_arg(v1), tcgv_vec_arg(v2));
+     int flags = TLB_FLAGS_MASK;
-+        vec_gen_3(INDEX_op_ppc_mulou_vec, type, vece, tcgv_vec_arg(t2),
-+                  tcgv_vec_arg(v1), tcgv_vec_arg(v2));
+     if (!tlb_hit_page(tlb_addr, page_addr)) {
-+        vec_gen_3(INDEX_op_ppc_mrgh_vec, type, vece + 1, tcgv_vec_arg(v0),
+@@ -XXX,XX +XXX,XX @@ static int probe_access_internal(CPUArchState *env, target_ulong addr,
-+                  tcgv_vec_arg(t1), tcgv_vec_arg(t2));
+     return flags;
-+        vec_gen_3(INDEX_op_ppc_mrgl_vec, type, vece + 1, tcgv_vec_arg(t1),
+ }
-+                  tcgv_vec_arg(t1), tcgv_vec_arg(t2));
-+        vec_gen_3(INDEX_op_ppc_pkum_vec, type, vece, tcgv_vec_arg(v0),
+-int probe_access_full(CPUArchState *env, target_ulong addr, int size,
-+                  tcgv_vec_arg(v0), tcgv_vec_arg(t1));
++int probe_access_full(CPUArchState *env, vaddr addr, int size,
-+    break;
+                       MMUAccessType access_type, int mmu_idx,
-+
+                       bool nonfault, void **phost, CPUTLBEntryFull **pfull,
-+    case MO_32:
+                       uintptr_t retaddr)
-+        t3 = tcg_temp_new_vec(type);
+@@ -XXX,XX +XXX,XX @@ int probe_access_full(CPUArchState *env, target_ulong addr, int size,
-+        t4 = tcg_temp_new_vec(type);
+     return flags;
-+        tcg_gen_dupi_vec(MO_8, t4, -16);
+ }
-+        vec_gen_3(INDEX_op_ppc_rotl_vec, type, MO_32, tcgv_vec_arg(t1),
-+                  tcgv_vec_arg(v2), tcgv_vec_arg(t4));
+-int probe_access_flags(CPUArchState *env, target_ulong addr, int size,
-+        vec_gen_3(INDEX_op_ppc_mulou_vec, type, MO_16, tcgv_vec_arg(t2),
++int probe_access_flags(CPUArchState *env, vaddr addr, int size,
-+                  tcgv_vec_arg(v1), tcgv_vec_arg(v2));
+                        MMUAccessType access_type, int mmu_idx,
-+        tcg_gen_dupi_vec(MO_8, t3, 0);
+                        bool nonfault, void **phost, uintptr_t retaddr)
 +        vec_gen_4(INDEX_op_ppc_msum_vec, type, MO_16, tcgv_vec_arg(t3),
 +                  tcgv_vec_arg(v1), tcgv_vec_arg(t1), tcgv_vec_arg(t3));
 +        vec_gen_3(INDEX_op_shlv_vec, type, MO_32, tcgv_vec_arg(t3),
 +                  tcgv_vec_arg(t3), tcgv_vec_arg(t4));
 +        tcg_gen_add_vec(MO_32, v0, t2, t3);
 +        tcg_temp_free_vec(t3);
 +        tcg_temp_free_vec(t4);
 +        break;
 +
 +    default:
 +        g_assert_not_reached();
 +    }
 +    tcg_temp_free_vec(t1);
 +    tcg_temp_free_vec(t2);
 +}
 +
  void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
                         TCGArg a0, ...)
  {
-@@ -XXX,XX +XXX,XX @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
+@@ -XXX,XX +XXX,XX @@ int probe_access_flags(CPUArchState *env, target_ulong addr, int size,
-         v2 = temp_tcgv_vec(arg_temp(a2));
+     return flags;
-         expand_vec_cmp(type, vece, v0, v1, v2, va_arg(va, TCGArg));
+ }
-         break;
-+    case INDEX_op_mul_vec:
+-void *probe_access(CPUArchState *env, target_ulong addr, int size,
-+        v2 = temp_tcgv_vec(arg_temp(a2));
++void *probe_access(CPUArchState *env, vaddr addr, int size,
-+        expand_vec_mul(type, vece, v0, v1, v2);
+                    MMUAccessType access_type, int mmu_idx, uintptr_t retaddr)
-+        break;
+ {
-     default:
+     CPUTLBEntryFull *full;
-         g_assert_not_reached();
+@@ -XXX,XX +XXX,XX @@ void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
-     }
+  * NOTE: This function will trigger an exception if the page is
-@@ -XXX,XX +XXX,XX @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+  * not executable.
-     static const TCGTargetOpDef v_r = { .args_ct_str = { "v", "r" } };
+  */
-     static const TCGTargetOpDef v_v = { .args_ct_str = { "v", "v" } };
+-tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
-     static const TCGTargetOpDef v_v_v = { .args_ct_str = { "v", "v", "v" } };
++tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, vaddr addr,
-+    static const TCGTargetOpDef v_v_v_v
+                                         void **hostp)
-+        = { .args_ct_str = { "v", "v", "v", "v" } };
+ {
+     CPUTLBEntryFull *full;
-     switch (op) {
+diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
-     case INDEX_op_goto_ptr:
+index XXXXXXX..XXXXXXX 100644
-@@ -XXX,XX +XXX,XX @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+--- a/accel/tcg/user-exec.c
++++ b/accel/tcg/user-exec.c
-     case INDEX_op_add_vec:
+@@ -XXX,XX +XXX,XX @@ int page_unprotect(target_ulong address, uintptr_t pc)
-     case INDEX_op_sub_vec:
+     return current_tb_invalidated ? 2 : 1;
-+    case INDEX_op_mul_vec:
+ }
-     case INDEX_op_and_vec:
-     case INDEX_op_or_vec:
+-static int probe_access_internal(CPUArchState *env, target_ulong addr,
-     case INDEX_op_xor_vec:
++static int probe_access_internal(CPUArchState *env, vaddr addr,
-@@ -XXX,XX +XXX,XX @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+                                  int fault_size, MMUAccessType access_type,
-     case INDEX_op_shlv_vec:
+                                  bool nonfault, uintptr_t ra)
-     case INDEX_op_shrv_vec:
+ {
-     case INDEX_op_sarv_vec:
+@@ -XXX,XX +XXX,XX @@ static int probe_access_internal(CPUArchState *env, target_ulong addr,
-+    case INDEX_op_ppc_mrgh_vec:
+     cpu_loop_exit_sigsegv(env_cpu(env), addr, access_type, maperr, ra);
-+    case INDEX_op_ppc_mrgl_vec:
+ }
-+    case INDEX_op_ppc_muleu_vec:
-+    case INDEX_op_ppc_mulou_vec:
+-int probe_access_flags(CPUArchState *env, target_ulong addr, int size,
-+    case INDEX_op_ppc_pkum_vec:
++int probe_access_flags(CPUArchState *env, vaddr addr, int size,
-+    case INDEX_op_ppc_rotl_vec:
+                        MMUAccessType access_type, int mmu_idx,
-         return &v_v_v;
+                        bool nonfault, void **phost, uintptr_t ra)
-     case INDEX_op_not_vec:
+ {
-     case INDEX_op_dup_vec:
+@@ -XXX,XX +XXX,XX @@ int probe_access_flags(CPUArchState *env, target_ulong addr, int size,
-@@ -XXX,XX +XXX,XX @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+     return flags;
-     case INDEX_op_st_vec:
+ }
-     case INDEX_op_dupm_vec:
-         return &v_r;
+-void *probe_access(CPUArchState *env, target_ulong addr, int size,
-+    case INDEX_op_ppc_msum_vec:
++void *probe_access(CPUArchState *env, vaddr addr, int size,
-+        return &v_v_v_v;
+                    MMUAccessType access_type, int mmu_idx, uintptr_t ra)
+ {
-     default:
+     int flags;
-         return NULL;
+@@ -XXX,XX +XXX,XX @@ void *probe_access(CPUArchState *env, target_ulong addr, int size,
      return size ? g2h(env_cpu(env), addr) : NULL;
  }
 -tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, target_ulong addr,
 +tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, vaddr addr,
                                          void **hostp)
  {
      int flags;
 --
-.17.1
+.34.1

-[PULL 19/23] tcg/ppc: Update vector support for v2.07 FP
+[PULL 09/22] accel/tcg: Replace target_ulong with vaddr in *_mmu_lookup()
-These new instructions are conditional on MSR.FP when TX=0 and
+From: Anton Johansson <anjo@rev.ng>
 MSR.VEC when TX=1.  Since we only care about the Altivec registers,
 and force TX=1, we can consider these to be Altivec instructions.
 Since Altivec is true for any use of vector types, we only need
 test have_isa_2_07.
-This includes moves to and from the integer registers.
+Update atomic_mmu_lookup() and cpu_mmu_lookup() to take the guest
 virtual address as a vaddr instead of a target_ulong.
-Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
+Signed-off-by: Anton Johansson <anjo@rev.ng>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-Id: <20230621135633.1649-10-anjo@rev.ng>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/ppc/tcg-target.inc.c | 32 ++++++++++++++++++++++++++------
+ accel/tcg/cputlb.c    | 6 +++---
-file changed, 26 insertions(+), 6 deletions(-)
+ accel/tcg/user-exec.c | 6 +++---
 files changed, 6 insertions(+), 6 deletions(-)
-diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
+diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.inc.c
+--- a/accel/tcg/cputlb.c
-+++ b/tcg/ppc/tcg-target.inc.c
++++ b/accel/tcg/cputlb.c
-@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
+@@ -XXX,XX +XXX,XX @@ static bool mmu_lookup(CPUArchState *env, vaddr addr, MemOpIdx oi,
- #define XXPERMDI   (OPCD(60) | (10 << 3) | 7)  /* v2.06, force ax=bx=tx=1 */
+  * Probe for an atomic operation.  Do not allow unaligned operations,
- #define XXSEL      (OPCD(60) | (3 << 4) | 0xf) /* v2.06, force ax=bx=cx=tx=1 */
+  * or io operations to proceed.  Return the host address.
+  */
-+#define MFVSRD     (XO31(51) | 1)   /* v2.07, force sx=1 */
+-static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
-+#define MFVSRWZ    (XO31(115) | 1)  /* v2.07, force sx=1 */
+-                               MemOpIdx oi, int size, uintptr_t retaddr)
-+#define MTVSRD     (XO31(179) | 1)  /* v2.07, force tx=1 */
++static void *atomic_mmu_lookup(CPUArchState *env, vaddr addr, MemOpIdx oi,
-+#define MTVSRWZ    (XO31(243) | 1)  /* v2.07, force tx=1 */
++                               int size, uintptr_t retaddr)
-+
+ {
- #define RT(r) ((r)<<21)
+     uintptr_t mmu_idx = get_mmuidx(oi);
- #define RS(r) ((r)<<21)
+     MemOp mop = get_memop(oi);
- #define RA(r) ((r)<<16)
+     int a_bits = get_alignment_bits(mop);
-@@ -XXX,XX +XXX,XX @@ static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
+     uintptr_t index;
-         tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+     CPUTLBEntry *tlbe;
-         /* fallthru */
+-    target_ulong tlb_addr;
-     case TCG_TYPE_I32:
++    vaddr tlb_addr;
--        if (ret < TCG_REG_V0 && arg < TCG_REG_V0) {
+     void *hostaddr;
--            tcg_out32(s, OR | SAB(arg, ret, arg));
+     CPUTLBEntryFull *full;
--            break;
--        } else if (ret < TCG_REG_V0 || arg < TCG_REG_V0) {
+diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
--            /* Altivec does not support vector/integer moves.  */
+index XXXXXXX..XXXXXXX 100644
--            return false;
+--- a/accel/tcg/user-exec.c
-+        if (ret < TCG_REG_V0) {
++++ b/accel/tcg/user-exec.c
-+            if (arg < TCG_REG_V0) {
+@@ -XXX,XX +XXX,XX @@ void page_reset_target_data(target_ulong start, target_ulong last) { }
-+                tcg_out32(s, OR | SAB(arg, ret, arg));
-+                break;
+ /* The softmmu versions of these helpers are in cputlb.c.  */
-+            } else if (have_isa_2_07) {
-+                tcg_out32(s, (type == TCG_TYPE_I32 ? MFVSRWZ : MFVSRD)
+-static void *cpu_mmu_lookup(CPUArchState *env, abi_ptr addr,
-+                          | VRT(arg) | RA(ret));
++static void *cpu_mmu_lookup(CPUArchState *env, vaddr addr,
-+                break;
+                             MemOp mop, uintptr_t ra, MMUAccessType type)
-+            } else {
+ {
-+                /* Altivec does not support vector->integer moves.  */
+     int a_bits = get_alignment_bits(mop);
-+                return false;
+@@ -XXX,XX +XXX,XX @@ uint64_t cpu_ldq_code_mmu(CPUArchState *env, abi_ptr addr,
-+            }
+ /*
-+        } else if (arg < TCG_REG_V0) {
+  * Do not allow unaligned operations to proceed.  Return the host address.
-+            if (have_isa_2_07) {
+  */
-+                tcg_out32(s, (type == TCG_TYPE_I32 ? MTVSRWZ : MTVSRD)
+-static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
-+                          | VRT(ret) | RA(arg));
+-                               MemOpIdx oi, int size, uintptr_t retaddr)
-+                break;
++static void *atomic_mmu_lookup(CPUArchState *env, vaddr addr, MemOpIdx oi,
-+            } else {
++                               int size, uintptr_t retaddr)
-+                /* Altivec does not support integer->vector moves.  */
+ {
-+                return false;
+     MemOp mop = get_memop(oi);
-+            }
+     int a_bits = get_alignment_bits(mop);
          }
          /* fallthru */
      case TCG_TYPE_V64:
 --
-.17.1
+.34.1

-[PULL 07/23] tcg/ppc: Enable tcg backend vector compilation
+[PULL 10/22] accel/tcg: Replace target_ulong with vaddr in translator_*()
-Introduce all of the flags required to enable tcg backend vector support,
+From: Anton Johansson <anjo@rev.ng>
 and a runtime flag to indicate the host supports Altivec instructions.
-For now, do not actually set have_isa_altivec to true, because we have not
+Use vaddr for guest virtual address in translator_use_goto_tb() and
-yet added all of the code to actually generate all of the required insns.
+translator_loop().
 However, we must define these flags in order to disable ifndefs that create
 stub versions of the functions added here.
-The change to tcg_out_movi works around a buglet in tcg.c wherein if we
+Signed-off-by: Anton Johansson <anjo@rev.ng>
-do not define tcg_out_dupi_vec we get a declared but not defined Werror,
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-but if we only declare it we get a defined but not used Werror.  We need
+Message-Id: <20230621135633.1649-11-anjo@rev.ng>
-to this change to tcg_out_movi eventually anyway, so it's no biggie.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  include/exec/translator.h |  6 +++---
  accel/tcg/translator.c    | 10 +++++-----
 files changed, 8 insertions(+), 8 deletions(-)
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+diff --git a/include/exec/translator.h b/include/exec/translator.h
 Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
 ---
  tcg/ppc/tcg-target.h     | 25 ++++++++++++++++
  tcg/ppc/tcg-target.opc.h |  5 ++++
  tcg/ppc/tcg-target.inc.c | 62 ++++++++++++++++++++++++++++++++++++++--
 files changed, 89 insertions(+), 3 deletions(-)
  create mode 100644 tcg/ppc/tcg-target.opc.h
 diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.h
+--- a/include/exec/translator.h
-+++ b/tcg/ppc/tcg-target.h
++++ b/include/exec/translator.h
-@@ -XXX,XX +XXX,XX @@ typedef enum {
+@@ -XXX,XX +XXX,XX @@ typedef struct TranslatorOps {
- } TCGPowerISA;
+  * - When too many instructions have been translated.
+  */
- extern TCGPowerISA have_isa;
+ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
-+extern bool have_altivec;
+-                     target_ulong pc, void *host_pc,
+-                     const TranslatorOps *ops, DisasContextBase *db);
- #define have_isa_2_06  (have_isa >= tcg_isa_2_06)
++                     vaddr pc, void *host_pc, const TranslatorOps *ops,
- #define have_isa_3_00  (have_isa >= tcg_isa_3_00)
++                     DisasContextBase *db);
-@@ -XXX,XX +XXX,XX @@ extern TCGPowerISA have_isa;
- #define TCG_TARGET_HAS_mulsh_i64        1
+ /**
- #endif
+  * translator_use_goto_tb
+@@ -XXX,XX +XXX,XX @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
-+/*
+  * Return true if goto_tb is allowed between the current TB
-+ * While technically Altivec could support V64, it has no 64-bit store
+  * and the destination PC.
-+ * instruction and substituting two 32-bit stores makes the generated
+  */
-+ * code quite large.
+-bool translator_use_goto_tb(DisasContextBase *db, target_ulong dest);
-+ */
++bool translator_use_goto_tb(DisasContextBase *db, vaddr dest);
-+#define TCG_TARGET_HAS_v64              0
-+#define TCG_TARGET_HAS_v128             have_altivec
+ /**
-+#define TCG_TARGET_HAS_v256             0
+  * translator_io_start
-+
+diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
 +#define TCG_TARGET_HAS_andc_vec         0
 +#define TCG_TARGET_HAS_orc_vec          0
 +#define TCG_TARGET_HAS_not_vec          0
 +#define TCG_TARGET_HAS_neg_vec          0
 +#define TCG_TARGET_HAS_abs_vec          0
 +#define TCG_TARGET_HAS_shi_vec          0
 +#define TCG_TARGET_HAS_shs_vec          0
 +#define TCG_TARGET_HAS_shv_vec          0
 +#define TCG_TARGET_HAS_cmp_vec          0
 +#define TCG_TARGET_HAS_mul_vec          0
 +#define TCG_TARGET_HAS_sat_vec          0
 +#define TCG_TARGET_HAS_minmax_vec       0
 +#define TCG_TARGET_HAS_bitsel_vec       0
 +#define TCG_TARGET_HAS_cmpsel_vec       0
 +
  void flush_icache_range(uintptr_t start, uintptr_t stop);
  void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
 diff --git a/tcg/ppc/tcg-target.opc.h b/tcg/ppc/tcg-target.opc.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/tcg/ppc/tcg-target.opc.h
@@ -XXX,XX +XXX,XX @@
 +/*
 + * Target-specific opcodes for host vector expansion.  These will be
 + * emitted by tcg_expand_vec_op.  For those familiar with GCC internals,
 + * consider these to be UNSPEC with names.
 + */
 diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.inc.c
+--- a/accel/tcg/translator.c
-+++ b/tcg/ppc/tcg-target.inc.c
++++ b/accel/tcg/translator.c
-@@ -XXX,XX +XXX,XX @@ static tcg_insn_unit *tb_ret_addr;
+@@ -XXX,XX +XXX,XX @@ static void gen_tb_end(const TranslationBlock *tb, uint32_t cflags,
  TCGPowerISA have_isa;
  static bool have_isel;
 +bool have_altivec;
  #ifndef CONFIG_SOFTMMU
  #define TCG_GUEST_BASE_REG 30
@@ -XXX,XX +XXX,XX @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
      }
  }
--static inline void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret,
+-bool translator_use_goto_tb(DisasContextBase *db, target_ulong dest)
--                                tcg_target_long arg)
++bool translator_use_goto_tb(DisasContextBase *db, vaddr dest)
 +static void tcg_out_dupi_vec(TCGContext *s, TCGType type, TCGReg ret,
 +                             tcg_target_long val)
  {
--    tcg_out_movi_int(s, type, ret, arg, false);
+     /* Suppress goto_tb if requested. */
-+    g_assert_not_reached();
+     if (tb_cflags(db->tb) & CF_NO_GOTO_TB) {
-+}
+@@ -XXX,XX +XXX,XX @@ bool translator_use_goto_tb(DisasContextBase *db, target_ulong dest)
 +
 +static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret,
 +                         tcg_target_long arg)
 +{
 +    switch (type) {
 +    case TCG_TYPE_I32:
 +    case TCG_TYPE_I64:
 +        tcg_debug_assert(ret < TCG_REG_V0);
 +        tcg_out_movi_int(s, type, ret, arg, false);
 +        break;
 +
 +    case TCG_TYPE_V64:
 +    case TCG_TYPE_V128:
 +        tcg_debug_assert(ret >= TCG_REG_V0);
 +        tcg_out_dupi_vec(s, type, ret, arg);
 +        break;
 +
 +    default:
 +        g_assert_not_reached();
 +    }
  }
- static bool mask_operand(uint32_t c, int *mb, int *me)
+ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
+-                     target_ulong pc, void *host_pc,
-     }
+-                     const TranslatorOps *ops, DisasContextBase *db)
 +                     vaddr pc, void *host_pc, const TranslatorOps *ops,
 +                     DisasContextBase *db)
  {
      uint32_t cflags = tb_cflags(tb);
      TCGOp *icount_start_insn;
@@ -XXX,XX +XXX,XX @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
  }
-+int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
+ static void *translator_access(CPUArchState *env, DisasContextBase *db,
-+{
+-                               target_ulong pc, size_t len)
-+    g_assert_not_reached();
++                               vaddr pc, size_t len)
 +}
 +
 +static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
 +                            TCGReg dst, TCGReg src)
 +{
 +    g_assert_not_reached();
 +}
 +
 +static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
 +                             TCGReg out, TCGReg base, intptr_t offset)
 +{
 +    g_assert_not_reached();
 +}
 +
 +static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 +                           unsigned vecl, unsigned vece,
 +                           const TCGArg *args, const int *const_args)
 +{
 +    g_assert_not_reached();
 +}
 +
 +void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
 +                       TCGArg a0, ...)
 +{
 +    g_assert_not_reached();
 +}
 +
  static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
  {
-     static const TCGTargetOpDef r = { .args_ct_str = { "r" } };
+     void *host;
-@@ -XXX,XX +XXX,XX @@ static void tcg_target_init(TCGContext *s)
+-    target_ulong base, end;
++    vaddr base, end;
-     tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff;
+     TranslationBlock *tb;
-     tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffff;
-+    if (have_altivec) {
+     tb = db->tb;
 +        tcg_target_available_regs[TCG_TYPE_V64] = 0xffffffff00000000ull;
 +        tcg_target_available_regs[TCG_TYPE_V128] = 0xffffffff00000000ull;
 +    }
      tcg_target_call_clobber_regs = 0;
      tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R0);
 --
-.17.1
+.34.1

-[PULL 18/23] tcg/ppc: Update vector support for v2.07 VSX
+[PULL 11/22] cpu: Replace target_ulong with hwaddr in tb_invalidate_phys_addr()
-These new instructions are conditional only on MSR.VSX and
+From: Anton Johansson <anjo@rev.ng>
 are thus part of the VSX instruction set, and not Altivec.
 This includes double-word loads and stores.
-Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
+Signed-off-by: Anton Johansson <anjo@rev.ng>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-Id: <20230621135633.1649-13-anjo@rev.ng>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/ppc/tcg-target.inc.c | 11 +++++++++++
+ include/exec/exec-all.h | 2 +-
-file changed, 11 insertions(+)
+ cpu.c                   | 2 +-
 files changed, 2 insertions(+), 2 deletions(-)
-diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
+diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.inc.c
+--- a/include/exec/exec-all.h
-+++ b/tcg/ppc/tcg-target.inc.c
++++ b/include/exec/exec-all.h
-@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
+@@ -XXX,XX +XXX,XX @@ uint32_t curr_cflags(CPUState *cpu);
- #define LVEWX      XO31(71)
- #define LXSDX      (XO31(588) | 1)  /* v2.06, force tx=1 */
+ /* TranslationBlock invalidate API */
- #define LXVDSX     (XO31(332) | 1)  /* v2.06, force tx=1 */
+ #if defined(CONFIG_USER_ONLY)
-+#define LXSIWZX    (XO31(12) | 1)   /* v2.07, force tx=1 */
+-void tb_invalidate_phys_addr(target_ulong addr);
++void tb_invalidate_phys_addr(hwaddr addr);
- #define STVX       XO31(231)
+ #else
- #define STVEWX     XO31(199)
+ void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr, MemTxAttrs attrs);
- #define STXSDX     (XO31(716) | 1)  /* v2.06, force sx=1 */
+ #endif
-+#define STXSIWX    (XO31(140) | 1)  /* v2.07, force sx=1 */
+diff --git a/cpu.c b/cpu.c
+index XXXXXXX..XXXXXXX 100644
- #define VADDSBS    VX4(768)
+--- a/cpu.c
- #define VADDUBS    VX4(512)
++++ b/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
+@@ -XXX,XX +XXX,XX @@ void list_cpus(void)
-             tcg_out_mem_long(s, LWZ, LWZX, ret, base, offset);
+ }
-             break;
-         }
+ #if defined(CONFIG_USER_ONLY)
-+        if (have_isa_2_07 && have_vsx) {
+-void tb_invalidate_phys_addr(target_ulong addr)
-+            tcg_out_mem_long(s, 0, LXSIWZX, ret, base, offset);
++void tb_invalidate_phys_addr(hwaddr addr)
-+            break;
+ {
-+        }
+     mmap_lock();
-         tcg_debug_assert((offset & 3) == 0);
+     tb_invalidate_phys_page(addr);
          tcg_out_mem_long(s, 0, LVEWX, ret, base, offset);
          shift = (offset - 4) & 0xc;
@@ -XXX,XX +XXX,XX @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
              tcg_out_mem_long(s, STW, STWX, arg, base, offset);
              break;
          }
 +        if (have_isa_2_07 && have_vsx) {
 +            tcg_out_mem_long(s, 0, STXSIWX, arg, base, offset);
 +            break;
 +        }
 +        assert((offset & 3) == 0);
          tcg_debug_assert((offset & 3) == 0);
          shift = (offset - 4) & 0xc;
          if (shift) {
 --
-.17.1
+.34.1

-[PULL 23/23] cpus: kick all vCPUs when running thread=single
+[PULL 12/22] softfloat: use QEMU_FLATTEN to avoid mistaken isra inlining
 From: Alex Bennée <alex.bennee@linaro.org>
-qemu_cpu_kick is used for a number of reasons including to indicate
+Balton discovered that asserts for the extract/deposit calls had a
-there is work to be done. However when thread=single the old
+significant impact on a lame benchmark on qemu-ppc. Replicating with:
 qemu_cpu_kick_rr_cpu only advanced the vCPU to the next executing one
 which can lead to a hang in the case that:
-  a) the kick is from outside the vCPUs (e.g. iothread)
+  ./qemu-ppc64 ~/lsrc/tests/lame.git-svn/builds/ppc64/frontend/lame \
-  b) the timers are paused (i.e. iothread calling run_on_cpu)
+    -h pts-trondheim-3.wav pts-trondheim-3.mp3
-To avoid this lets split qemu_cpu_kick_rr into two functions. One for
+showed up the pack/unpack routines not eliding the assert checks as it
-the timer which continues to advance to the next timeslice and another
+should have done causing them to prominently figure in the profile:
 for all other kicks.
-Message-Id: <20191001160426.26644-1-alex.bennee@linaro.org>
+.44%  qemu-ppc64  qemu-ppc64               [.] unpack_raw64.isra.0
-Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
+.03%  qemu-ppc64  qemu-ppc64               [.] parts64_uncanon_normal
 .26%  qemu-ppc64  qemu-ppc64               [.] helper_compute_fprf_float64
 .75%  qemu-ppc64  qemu-ppc64               [.] do_float_check_status
 .34%  qemu-ppc64  qemu-ppc64               [.] parts64_muladd
 .75%  qemu-ppc64  qemu-ppc64               [.] pack_raw64.isra.0
 .38%  qemu-ppc64  qemu-ppc64               [.] parts64_canonicalize
 .62%  qemu-ppc64  qemu-ppc64               [.] float64r32_round_pack_canonical
 After this patch the same test runs 31 seconds faster with a profile
 where the generated code dominates more:
 +   14.12%     0.00%  qemu-ppc64  [unknown]                [.] 0x0000004000619420
 +   13.30%     0.00%  qemu-ppc64  [unknown]                [.] 0x0000004000616850
 +   12.58%    12.19%  qemu-ppc64  qemu-ppc64               [.] parts64_uncanon_normal
 +   10.62%     0.00%  qemu-ppc64  [unknown]                [.] 0x000000400061bf70
 +    9.91%     9.73%  qemu-ppc64  qemu-ppc64               [.] helper_compute_fprf_float64
 +    7.84%     7.82%  qemu-ppc64  qemu-ppc64               [.] do_float_check_status
 +    6.47%     5.78%  qemu-ppc64  qemu-ppc64               [.] parts64_canonicalize.constprop.0
 +    6.46%     0.00%  qemu-ppc64  [unknown]                [.] 0x0000004000620130
 +    6.42%     0.00%  qemu-ppc64  [unknown]                [.] 0x0000004000619400
 +    6.17%     6.04%  qemu-ppc64  qemu-ppc64               [.] parts64_muladd
 +    5.85%     0.00%  qemu-ppc64  [unknown]                [.] 0x00000040006167e0
 +    5.74%     0.00%  qemu-ppc64  [unknown]                [.] 0x0000b693fcffffd3
 +    5.45%     4.78%  qemu-ppc64  qemu-ppc64               [.] float64r32_round_pack_canonical
 Suggested-by: Richard Henderson <richard.henderson@linaro.org>
 Message-Id: <ec9cfe5a-d5f2-466d-34dc-c35817e7e010@linaro.org>
 [AJB: Patchified rth's suggestion]
 Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
 Cc: BALATON Zoltan <balaton@eik.bme.hu>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Tested-by: BALATON Zoltan <balaton@eik.bme.hu>
 Message-Id: <20230523131107.3680641-1-alex.bennee@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- cpus.c | 24 ++++++++++++++++++------
+ fpu/softfloat.c | 22 +++++++++++-----------
-file changed, 18 insertions(+), 6 deletions(-)
+file changed, 11 insertions(+), 11 deletions(-)
-diff --git a/cpus.c b/cpus.c
+diff --git a/fpu/softfloat.c b/fpu/softfloat.c
 index XXXXXXX..XXXXXXX 100644
---- a/cpus.c
+--- a/fpu/softfloat.c
-+++ b/cpus.c
++++ b/fpu/softfloat.c
-@@ -XXX,XX +XXX,XX @@ static inline int64_t qemu_tcg_next_kick(void)
+@@ -XXX,XX +XXX,XX @@ static void unpack_raw64(FloatParts64 *r, const FloatFmt *fmt, uint64_t raw)
-     return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + TCG_KICK_PERIOD;
+     };
  }
--/* Kick the currently round-robin scheduled vCPU */
+-static inline void float16_unpack_raw(FloatParts64 *p, float16 f)
--static void qemu_cpu_kick_rr_cpu(void)
++static void QEMU_FLATTEN float16_unpack_raw(FloatParts64 *p, float16 f)
 +/* Kick the currently round-robin scheduled vCPU to next */
 +static void qemu_cpu_kick_rr_next_cpu(void)
  {
-     CPUState *cpu;
+     unpack_raw64(p, &float16_params, f);
      do {
@@ -XXX,XX +XXX,XX @@ static void qemu_cpu_kick_rr_cpu(void)
      } while (cpu != atomic_mb_read(&tcg_current_rr_cpu));
  }
-+/* Kick all RR vCPUs */
+-static inline void bfloat16_unpack_raw(FloatParts64 *p, bfloat16 f)
-+static void qemu_cpu_kick_rr_cpus(void)
++static void QEMU_FLATTEN bfloat16_unpack_raw(FloatParts64 *p, bfloat16 f)
 +{
 +    CPUState *cpu;
 +
 +    CPU_FOREACH(cpu) {
 +        cpu_exit(cpu);
 +    };
 +}
 +
  static void do_nothing(CPUState *cpu, run_on_cpu_data unused)
  {
+     unpack_raw64(p, &bfloat16_params, f);
  }
-@@ -XXX,XX +XXX,XX @@ void qemu_timer_notify_cb(void *opaque, QEMUClockType type)
- static void kick_tcg_thread(void *opaque)
+-static inline void float32_unpack_raw(FloatParts64 *p, float32 f)
 +static void QEMU_FLATTEN float32_unpack_raw(FloatParts64 *p, float32 f)
  {
-     timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
+     unpack_raw64(p, &float32_params, f);
 -    qemu_cpu_kick_rr_cpu();
 +    qemu_cpu_kick_rr_next_cpu();
  }
- static void start_tcg_kick_timer(void)
+-static inline void float64_unpack_raw(FloatParts64 *p, float64 f)
-@@ -XXX,XX +XXX,XX @@ void qemu_cpu_kick(CPUState *cpu)
++static void QEMU_FLATTEN float64_unpack_raw(FloatParts64 *p, float64 f)
  {
-     qemu_cond_broadcast(cpu->halt_cond);
+     unpack_raw64(p, &float64_params, f);
-     if (tcg_enabled()) {
+ }
--        cpu_exit(cpu);
--        /* NOP unless doing single-thread RR */
+-static void floatx80_unpack_raw(FloatParts128 *p, floatx80 f)
--        qemu_cpu_kick_rr_cpu();
++static void QEMU_FLATTEN floatx80_unpack_raw(FloatParts128 *p, floatx80 f)
-+        if (qemu_tcg_mttcg_enabled()) {
+ {
-+            cpu_exit(cpu);
+     *p = (FloatParts128) {
-+        } else {
+         .cls = float_class_unclassified,
-+            qemu_cpu_kick_rr_cpus();
+@@ -XXX,XX +XXX,XX @@ static void floatx80_unpack_raw(FloatParts128 *p, floatx80 f)
-+        }
+     };
-     } else {
+ }
-         if (hax_enabled()) {
-             /*
+-static void float128_unpack_raw(FloatParts128 *p, float128 f)
 +static void QEMU_FLATTEN float128_unpack_raw(FloatParts128 *p, float128 f)
  {
      const int f_size = float128_params.frac_size - 64;
      const int e_size = float128_params.exp_size;
@@ -XXX,XX +XXX,XX @@ static uint64_t pack_raw64(const FloatParts64 *p, const FloatFmt *fmt)
      return ret;
  }
 -static inline float16 float16_pack_raw(const FloatParts64 *p)
 +static float16 QEMU_FLATTEN float16_pack_raw(const FloatParts64 *p)
  {
      return make_float16(pack_raw64(p, &float16_params));
  }
 -static inline bfloat16 bfloat16_pack_raw(const FloatParts64 *p)
 +static bfloat16 QEMU_FLATTEN bfloat16_pack_raw(const FloatParts64 *p)
  {
      return pack_raw64(p, &bfloat16_params);
  }
 -static inline float32 float32_pack_raw(const FloatParts64 *p)
 +static float32 QEMU_FLATTEN float32_pack_raw(const FloatParts64 *p)
  {
      return make_float32(pack_raw64(p, &float32_params));
  }
 -static inline float64 float64_pack_raw(const FloatParts64 *p)
 +static float64 QEMU_FLATTEN float64_pack_raw(const FloatParts64 *p)
  {
      return make_float64(pack_raw64(p, &float64_params));
  }
 -static float128 float128_pack_raw(const FloatParts128 *p)
 +static float128 QEMU_FLATTEN float128_pack_raw(const FloatParts128 *p)
  {
      const int f_size = float128_params.frac_size - 64;
      const int e_size = float128_params.exp_size;
 --
-.17.1
+.34.1

-[PULL 10/23] tcg/ppc: Add support for vector add/subtract
+[PULL 13/22] tests/plugin: Remove duplicate insn log from libinsn.so
-Add support for vector add/subtract using Altivec instructions:
+This is a perfectly natural occurrence for x86 "rep movb",
-VADDUBM, VADDUHM, VADDUWM, VSUBUBM, VSUBUHM, VSUBUWM.
+where the "rep" prefix forms a counted loop of the one insn.
 During the tests/tcg/multiarch/memory test, this logging is
 triggered over 350000 times.  Within the context of cross-i386-tci
 build, which is already slow by nature, the logging is sufficient
 to push the test into timeout.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
 ---
- tcg/ppc/tcg-target.inc.c | 20 ++++++++++++++++++++
+ tests/plugin/insn.c                      | 9 +--------
-file changed, 20 insertions(+)
+ tests/tcg/i386/Makefile.softmmu-target   | 9 ---------
  tests/tcg/i386/Makefile.target           | 6 ------
  tests/tcg/x86_64/Makefile.softmmu-target | 9 ---------
 files changed, 1 insertion(+), 32 deletions(-)
-diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
+diff --git a/tests/plugin/insn.c b/tests/plugin/insn.c
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.inc.c
+--- a/tests/plugin/insn.c
-+++ b/tcg/ppc/tcg-target.inc.c
++++ b/tests/plugin/insn.c
-@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
+@@ -XXX,XX +XXX,XX @@ QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
- #define STVX       XO31(231)
+ #define MAX_CPUS 8 /* lets not go nuts */
- #define STVEWX     XO31(199)
+ typedef struct {
-+#define VADDUBM    VX4(0)
+-    uint64_t last_pc;
-+#define VADDUHM    VX4(64)
+     uint64_t insn_count;
-+#define VADDUWM    VX4(128)
+ } InstructionCount;
@@ -XXX,XX +XXX,XX @@ static void vcpu_insn_exec_before(unsigned int cpu_index, void *udata)
  {
      unsigned int i = cpu_index % MAX_CPUS;
      InstructionCount *c = &counts[i];
 -    uint64_t this_pc = GPOINTER_TO_UINT(udata);
 -    if (this_pc == c->last_pc) {
 -        g_autofree gchar *out = g_strdup_printf("detected repeat execution @ 0x%"
 -                                                PRIx64 "\n", this_pc);
 -        qemu_plugin_outs(out);
 -    }
 -    c->last_pc = this_pc;
 +
-+#define VSUBUBM    VX4(1024)
+     c->insn_count++;
-+#define VSUBUHM    VX4(1088)
+ }
-+#define VSUBUWM    VX4(1152)
-+
+diff --git a/tests/tcg/i386/Makefile.softmmu-target b/tests/tcg/i386/Makefile.softmmu-target
- #define VMAXSB     VX4(258)
+index XXXXXXX..XXXXXXX 100644
- #define VMAXSH     VX4(322)
+--- a/tests/tcg/i386/Makefile.softmmu-target
- #define VMAXSW     VX4(386)
++++ b/tests/tcg/i386/Makefile.softmmu-target
-@@ -XXX,XX +XXX,XX @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
+@@ -XXX,XX +XXX,XX @@ EXTRA_RUNS+=$(MULTIARCH_RUNS)
-     case INDEX_op_andc_vec:
-     case INDEX_op_not_vec:
+ memory: CFLAGS+=-DCHECK_UNALIGNED=1
-         return 1;
-+    case INDEX_op_add_vec:
+-# non-inline runs will trigger the duplicate instruction heuristics in libinsn.so
-+    case INDEX_op_sub_vec:
+-run-plugin-%-with-libinsn.so:
-     case INDEX_op_smax_vec:
+-    $(call run-test, $@, \
-     case INDEX_op_smin_vec:
+-      $(QEMU) -monitor none -display none \
-     case INDEX_op_umax_vec:
+-          -chardev file$(COMMA)path=$@.out$(COMMA)id=output \
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
+-                  -plugin ../../plugin/libinsn.so$(COMMA)inline=on \
-                            const TCGArg *args, const int *const_args)
+-              -d plugin -D $*-with-libinsn.so.pout \
- {
+-          $(QEMU_OPTS) $*)
-     static const uint32_t
+-
-+        add_op[4] = { VADDUBM, VADDUHM, VADDUWM, 0 },
+ # Running
-+        sub_op[4] = { VSUBUBM, VSUBUHM, VSUBUWM, 0 },
+ QEMU_OPTS+=-device isa-debugcon,chardev=output -device isa-debug-exit,iobase=0xf4,iosize=0x4 -kernel
-         eq_op[4]  = { VCMPEQUB, VCMPEQUH, VCMPEQUW, 0 },
+diff --git a/tests/tcg/i386/Makefile.target b/tests/tcg/i386/Makefile.target
-         gts_op[4] = { VCMPGTSB, VCMPGTSH, VCMPGTSW, 0 },
+index XXXXXXX..XXXXXXX 100644
-         gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, 0 },
+--- a/tests/tcg/i386/Makefile.target
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
++++ b/tests/tcg/i386/Makefile.target
-         tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
+@@ -XXX,XX +XXX,XX @@ else
-         return;
+ SKIP_I386_TESTS+=test-i386-fprem
+ endif
-+    case INDEX_op_add_vec:
-+        insn = add_op[vece];
+-# non-inline runs will trigger the duplicate instruction heuristics in libinsn.so
-+        break;
+-run-plugin-%-with-libinsn.so:
-+    case INDEX_op_sub_vec:
+-    $(call run-test, $@, $(QEMU) $(QEMU_OPTS) \
-+        insn = sub_op[vece];
+-           -plugin ../../plugin/libinsn.so$(COMMA)inline=on \
-+        break;
+-           -d plugin -D $*-with-libinsn.so.pout $*)
-     case INDEX_op_smin_vec:
+-
-         insn = smin_op[vece];
+ # Update TESTS
-         break;
+ I386_TESTS:=$(filter-out $(SKIP_I386_TESTS), $(ALL_X86_TESTS))
-@@ -XXX,XX +XXX,XX @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+ TESTS=$(MULTIARCH_TESTS) $(I386_TESTS)
-         return (TCG_TARGET_REG_BITS == 64 ? &S_S
+diff --git a/tests/tcg/x86_64/Makefile.softmmu-target b/tests/tcg/x86_64/Makefile.softmmu-target
-                 : TARGET_LONG_BITS == 32 ? &S_S_S : &S_S_S_S);
+index XXXXXXX..XXXXXXX 100644
+--- a/tests/tcg/x86_64/Makefile.softmmu-target
-+    case INDEX_op_add_vec:
++++ b/tests/tcg/x86_64/Makefile.softmmu-target
-+    case INDEX_op_sub_vec:
+@@ -XXX,XX +XXX,XX @@ EXTRA_RUNS+=$(MULTIARCH_RUNS)
-     case INDEX_op_and_vec:
-     case INDEX_op_or_vec:
+ memory: CFLAGS+=-DCHECK_UNALIGNED=1
-     case INDEX_op_xor_vec:
 -# non-inline runs will trigger the duplicate instruction heuristics in libinsn.so
 -run-plugin-%-with-libinsn.so:
 -    $(call run-test, $@, \
 -      $(QEMU) -monitor none -display none \
 -          -chardev file$(COMMA)path=$@.out$(COMMA)id=output \
 -                  -plugin ../../plugin/libinsn.so$(COMMA)inline=on \
 -              -d plugin -D $*-with-libinsn.so.pout \
 -          $(QEMU_OPTS) $*)
 -
  # Running
  QEMU_OPTS+=-device isa-debugcon,chardev=output -device isa-debug-exit,iobase=0xf4,iosize=0x4 -kernel
 --
-.17.1
+.34.1

-[PULL 06/23] tcg/ppc: Replace HAVE_ISEL macro with a variable
+[PULL 14/22] accel/tcg: remove CONFIG_PROFILER
-Previously we've been hard-coding knowledge that Power7 has ISEL, but
+From: Fei Wu <fei2.wu@intel.com>
 it was an optional instruction before that.  Use the AT_HWCAP2 bit,
 when present, to properly determine support.
-Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
+TBStats will be introduced to replace CONFIG_PROFILER totally, here
 remove all CONFIG_PROFILER related stuffs first.
 Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
 Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
 Signed-off-by: Fei Wu <fei2.wu@intel.com>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-Id: <20230607122411.3394702-2-fei2.wu@intel.com>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/ppc/tcg-target.inc.c | 17 ++++++++++++-----
+ meson.build                   |   2 -
-file changed, 12 insertions(+), 5 deletions(-)
+ qapi/machine.json             |  18 ---
  include/qemu/timer.h          |   9 --
  include/tcg/tcg.h             |  26 -----
  accel/tcg/monitor.c           |  31 -----
  accel/tcg/tcg-accel-ops.c     |  10 --
  accel/tcg/translate-all.c     |  33 ------
  softmmu/runstate.c            |   9 --
  tcg/tcg.c                     | 214 ----------------------------------
  tests/qtest/qmp-cmd-test.c    |   3 -
  hmp-commands-info.hx          |  15 ---
  meson_options.txt             |   2 -
  scripts/meson-buildoptions.sh |   3 -
 files changed, 375 deletions(-)
-diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
+diff --git a/meson.build b/meson.build
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.inc.c
+--- a/meson.build
-+++ b/tcg/ppc/tcg-target.inc.c
++++ b/meson.build
@@ -XXX,XX +XXX,XX @@ if numa.found()
                                         dependencies: numa))
  endif
  config_host_data.set('CONFIG_OPENGL', opengl.found())
 -config_host_data.set('CONFIG_PROFILER', get_option('profiler'))
  config_host_data.set('CONFIG_RBD', rbd.found())
  config_host_data.set('CONFIG_RDMA', rdma.found())
  config_host_data.set('CONFIG_SAFESTACK', get_option('safe_stack'))
@@ -XXX,XX +XXX,XX @@ if 'objc' in all_languages
    summary_info += {'QEMU_OBJCFLAGS':    ' '.join(qemu_common_flags)}
  endif
  summary_info += {'QEMU_LDFLAGS':      ' '.join(qemu_ldflags)}
 -summary_info += {'profiler':          get_option('profiler')}
  summary_info += {'link-time optimization (LTO)': get_option('b_lto')}
  summary_info += {'PIE':               get_option('b_pie')}
  summary_info += {'static build':      get_option('prefer_static')}
 diff --git a/qapi/machine.json b/qapi/machine.json
 index XXXXXXX..XXXXXXX 100644
 --- a/qapi/machine.json
 +++ b/qapi/machine.json
 @@ -XXX,XX +XXX,XX @@
- static tcg_insn_unit *tb_ret_addr;
+   'if': 'CONFIG_TCG',
+   'features': [ 'unstable' ] }
- TCGPowerISA have_isa;
--
+-##
--#define HAVE_ISEL      have_isa_2_06
+-# @x-query-profile:
-+static bool have_isel;
+-#
+-# Query TCG profiling information
- #ifndef CONFIG_SOFTMMU
+-#
- #define TCG_GUEST_BASE_REG 30
+-# Features:
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_setcond(TCGContext *s, TCGType type, TCGCond cond,
+-#
-     /* If we have ISEL, we can implement everything with 3 or 4 insns.
+-# @unstable: This command is meant for debugging.
-        All other cases below are also at least 3 insns, so speed up the
+-#
-        code generator by not considering them and always using ISEL.  */
+-# Returns: profile information
--    if (HAVE_ISEL) {
+-#
-+    if (have_isel) {
+-# Since: 6.2
-         int isel, tab;
+-##
+-{ 'command': 'x-query-profile',
-         tcg_out_cmp(s, cond, arg1, arg2, const_arg2, 7, type);
+-  'returns': 'HumanReadableText',
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_movcond(TCGContext *s, TCGType type, TCGCond cond,
+-  'if': 'CONFIG_TCG',
+-  'features': [ 'unstable' ] }
-     tcg_out_cmp(s, cond, c1, c2, const_c2, 7, type);
+-
+ ##
--    if (HAVE_ISEL) {
+ # @x-query-ramblock:
-+    if (have_isel) {
+ #
-         int isel = tcg_to_isel[cond];
+diff --git a/include/qemu/timer.h b/include/qemu/timer.h
+index XXXXXXX..XXXXXXX 100644
-         /* Swap the V operands if the operation indicates inversion.  */
+--- a/include/qemu/timer.h
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_cntxz(TCGContext *s, TCGType type, uint32_t opc,
++++ b/include/qemu/timer.h
-     } else {
+@@ -XXX,XX +XXX,XX @@ static inline int64_t cpu_get_host_ticks(void)
-         tcg_out_cmp(s, TCG_COND_EQ, a1, 0, 1, 7, type);
+ }
-         /* Note that the only other valid constant for a2 is 0.  */
+ #endif
--        if (HAVE_ISEL) {
-+        if (have_isel) {
+-#ifdef CONFIG_PROFILER
-             tcg_out32(s, opc | RA(TCG_REG_R0) | RS(a1));
+-static inline int64_t profile_getclock(void)
-             tcg_out32(s, tcg_to_isel[TCG_COND_EQ] | TAB(a0, a2, TCG_REG_R0));
+-{
-         } else if (!const_a2 && a0 == a2) {
+-    return get_clock();
-@@ -XXX,XX +XXX,XX @@ static void tcg_target_init(TCGContext *s)
+-}
 -
 -extern int64_t dev_time;
 -#endif
 -
  #endif
 diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/tcg/tcg.h
 +++ b/include/tcg/tcg.h
@@ -XXX,XX +XXX,XX @@ static inline TCGRegSet output_pref(const TCGOp *op, unsigned i)
      return i < ARRAY_SIZE(op->output_pref) ? op->output_pref[i] : 0;
  }
 -typedef struct TCGProfile {
 -    int64_t cpu_exec_time;
 -    int64_t tb_count1;
 -    int64_t tb_count;
 -    int64_t op_count; /* total insn count */
 -    int op_count_max; /* max insn per TB */
 -    int temp_count_max;
 -    int64_t temp_count;
 -    int64_t del_op_count;
 -    int64_t code_in_len;
 -    int64_t code_out_len;
 -    int64_t search_out_len;
 -    int64_t interm_time;
 -    int64_t code_time;
 -    int64_t la_time;
 -    int64_t opt_time;
 -    int64_t restore_count;
 -    int64_t restore_time;
 -    int64_t table_op_count[NB_OPS];
 -} TCGProfile;
 -
  struct TCGContext {
      uint8_t *pool_cur, *pool_end;
      TCGPool *pool_first, *pool_current, *pool_first_large;
@@ -XXX,XX +XXX,XX @@ struct TCGContext {
      tcg_insn_unit *code_buf;      /* pointer for start of tb */
      tcg_insn_unit *code_ptr;      /* pointer for running end of tb */
 -#ifdef CONFIG_PROFILER
 -    TCGProfile prof;
 -#endif
 -
  #ifdef CONFIG_DEBUG_TCG
      int goto_tb_issue_mask;
      const TCGOpcode *vecop_list;
@@ -XXX,XX +XXX,XX @@ static inline TCGv_ptr tcg_temp_new_ptr(void)
      return temp_tcgv_ptr(t);
  }
 -int64_t tcg_cpu_exec_time(void);
  void tcg_dump_info(GString *buf);
  void tcg_dump_op_count(GString *buf);
 diff --git a/accel/tcg/monitor.c b/accel/tcg/monitor.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/tcg/monitor.c
 +++ b/accel/tcg/monitor.c
@@ -XXX,XX +XXX,XX @@ HumanReadableText *qmp_x_query_opcount(Error **errp)
      return human_readable_text_from_str(buf);
  }
 -#ifdef CONFIG_PROFILER
 -
 -int64_t dev_time;
 -
 -HumanReadableText *qmp_x_query_profile(Error **errp)
 -{
 -    g_autoptr(GString) buf = g_string_new("");
 -    static int64_t last_cpu_exec_time;
 -    int64_t cpu_exec_time;
 -    int64_t delta;
 -
 -    cpu_exec_time = tcg_cpu_exec_time();
 -    delta = cpu_exec_time - last_cpu_exec_time;
 -
 -    g_string_append_printf(buf, "async time  %" PRId64 " (%0.3f)\n",
 -                           dev_time, dev_time / (double)NANOSECONDS_PER_SECOND);
 -    g_string_append_printf(buf, "qemu time   %" PRId64 " (%0.3f)\n",
 -                           delta, delta / (double)NANOSECONDS_PER_SECOND);
 -    last_cpu_exec_time = cpu_exec_time;
 -    dev_time = 0;
 -
 -    return human_readable_text_from_str(buf);
 -}
 -#else
 -HumanReadableText *qmp_x_query_profile(Error **errp)
 -{
 -    error_setg(errp, "Internal profiler not compiled");
 -    return NULL;
 -}
 -#endif
 -
  static void hmp_tcg_register(void)
  {
      monitor_register_hmp_info_hrt("jit", qmp_x_query_jit);
 diff --git a/accel/tcg/tcg-accel-ops.c b/accel/tcg/tcg-accel-ops.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/tcg/tcg-accel-ops.c
 +++ b/accel/tcg/tcg-accel-ops.c
@@ -XXX,XX +XXX,XX @@ void tcg_cpus_destroy(CPUState *cpu)
  int tcg_cpus_exec(CPUState *cpu)
  {
      int ret;
 -#ifdef CONFIG_PROFILER
 -    int64_t ti;
 -#endif
      assert(tcg_enabled());
 -#ifdef CONFIG_PROFILER
 -    ti = profile_getclock();
 -#endif
      cpu_exec_start(cpu);
      ret = cpu_exec(cpu);
      cpu_exec_end(cpu);
 -#ifdef CONFIG_PROFILER
 -    qatomic_set(&tcg_ctx->prof.cpu_exec_time,
 -                tcg_ctx->prof.cpu_exec_time + profile_getclock() - ti);
 -#endif
      return ret;
  }
 diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/tcg/translate-all.c
 +++ b/accel/tcg/translate-all.c
@@ -XXX,XX +XXX,XX @@ void cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
                                 uintptr_t host_pc)
  {
      uint64_t data[TARGET_INSN_START_WORDS];
 -#ifdef CONFIG_PROFILER
 -    TCGProfile *prof = &tcg_ctx->prof;
 -    int64_t ti = profile_getclock();
 -#endif
      int insns_left = cpu_unwind_data_from_tb(tb, host_pc, data);
      if (insns_left < 0) {
@@ -XXX,XX +XXX,XX @@ void cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
      }
      cpu->cc->tcg_ops->restore_state_to_opc(cpu, tb, data);
 -
 -#ifdef CONFIG_PROFILER
 -    qatomic_set(&prof->restore_time,
 -                prof->restore_time + profile_getclock() - ti);
 -    qatomic_set(&prof->restore_count, prof->restore_count + 1);
 -#endif
  }
  bool cpu_restore_state(CPUState *cpu, uintptr_t host_pc)
@@ -XXX,XX +XXX,XX @@ static int setjmp_gen_code(CPUArchState *env, TranslationBlock *tb,
      tcg_ctx->cpu = NULL;
      *max_insns = tb->icount;
 -#ifdef CONFIG_PROFILER
 -    qatomic_set(&tcg_ctx->prof.tb_count, tcg_ctx->prof.tb_count + 1);
 -    qatomic_set(&tcg_ctx->prof.interm_time,
 -                tcg_ctx->prof.interm_time + profile_getclock() - *ti);
 -    *ti = profile_getclock();
 -#endif
 -
      return tcg_gen_code(tcg_ctx, tb, pc);
  }
@@ -XXX,XX +XXX,XX @@ TranslationBlock *tb_gen_code(CPUState *cpu,
      tb_page_addr_t phys_pc;
      tcg_insn_unit *gen_code_buf;
      int gen_code_size, search_size, max_insns;
 -#ifdef CONFIG_PROFILER
 -    TCGProfile *prof = &tcg_ctx->prof;
 -#endif
      int64_t ti;
      void *host_pc;
@@ -XXX,XX +XXX,XX @@ TranslationBlock *tb_gen_code(CPUState *cpu,
   tb_overflow:
 -#ifdef CONFIG_PROFILER
 -    /* includes aborted translations because of exceptions */
 -    qatomic_set(&prof->tb_count1, prof->tb_count1 + 1);
 -    ti = profile_getclock();
 -#endif
 -
      trace_translate_block(tb, pc, tb->tc.ptr);
      gen_code_size = setjmp_gen_code(env, tb, pc, host_pc, &max_insns, &ti);
@@ -XXX,XX +XXX,XX @@ TranslationBlock *tb_gen_code(CPUState *cpu,
       */
      perf_report_code(pc, tb, tcg_splitwx_to_rx(gen_code_buf));
 -#ifdef CONFIG_PROFILER
 -    qatomic_set(&prof->code_time, prof->code_time + profile_getclock() - ti);
 -    qatomic_set(&prof->code_in_len, prof->code_in_len + tb->size);
 -    qatomic_set(&prof->code_out_len, prof->code_out_len + gen_code_size);
 -    qatomic_set(&prof->search_out_len, prof->search_out_len + search_size);
 -#endif
 -
      if (qemu_loglevel_mask(CPU_LOG_TB_OUT_ASM) &&
          qemu_log_in_addr_range(pc)) {
          FILE *logfile = qemu_log_trylock();
 diff --git a/softmmu/runstate.c b/softmmu/runstate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/softmmu/runstate.c
 +++ b/softmmu/runstate.c
@@ -XXX,XX +XXX,XX @@ static bool main_loop_should_exit(int *status)
  int qemu_main_loop(void)
  {
      int status = EXIT_SUCCESS;
 -#ifdef CONFIG_PROFILER
 -    int64_t ti;
 -#endif
      while (!main_loop_should_exit(&status)) {
 -#ifdef CONFIG_PROFILER
 -        ti = profile_getclock();
 -#endif
          main_loop_wait(false);
 -#ifdef CONFIG_PROFILER
 -        dev_time += profile_getclock() - ti;
 -#endif
      }
      return status;
 diff --git a/tcg/tcg.c b/tcg/tcg.c
 index XXXXXXX..XXXXXXX 100644
 --- a/tcg/tcg.c
 +++ b/tcg/tcg.c
@@ -XXX,XX +XXX,XX @@ void tcg_op_remove(TCGContext *s, TCGOp *op)
      QTAILQ_REMOVE(&s->ops, op, link);
      QTAILQ_INSERT_TAIL(&s->free_ops, op, link);
      s->nb_ops--;
 -
 -#ifdef CONFIG_PROFILER
 -    qatomic_set(&s->prof.del_op_count, s->prof.del_op_count + 1);
 -#endif
  }
  void tcg_remove_ops_after(TCGOp *op)
@@ -XXX,XX +XXX,XX @@ static void tcg_out_st_helper_args(TCGContext *s, const TCGLabelQemuLdst *ldst,
      tcg_out_helper_load_common_args(s, ldst, parm, info, next_arg);
  }
 -#ifdef CONFIG_PROFILER
 -
 -/* avoid copy/paste errors */
 -#define PROF_ADD(to, from, field)                       \
 -    do {                                                \
 -        (to)->field += qatomic_read(&((from)->field));  \
 -    } while (0)
 -
 -#define PROF_MAX(to, from, field)                                       \
 -    do {                                                                \
 -        typeof((from)->field) val__ = qatomic_read(&((from)->field));   \
 -        if (val__ > (to)->field) {                                      \
 -            (to)->field = val__;                                        \
 -        }                                                               \
 -    } while (0)
 -
 -/* Pass in a zero'ed @prof */
 -static inline
 -void tcg_profile_snapshot(TCGProfile *prof, bool counters, bool table)
 -{
 -    unsigned int n_ctxs = qatomic_read(&tcg_cur_ctxs);
 -    unsigned int i;
 -
 -    for (i = 0; i < n_ctxs; i++) {
 -        TCGContext *s = qatomic_read(&tcg_ctxs[i]);
 -        const TCGProfile *orig = &s->prof;
 -
 -        if (counters) {
 -            PROF_ADD(prof, orig, cpu_exec_time);
 -            PROF_ADD(prof, orig, tb_count1);
 -            PROF_ADD(prof, orig, tb_count);
 -            PROF_ADD(prof, orig, op_count);
 -            PROF_MAX(prof, orig, op_count_max);
 -            PROF_ADD(prof, orig, temp_count);
 -            PROF_MAX(prof, orig, temp_count_max);
 -            PROF_ADD(prof, orig, del_op_count);
 -            PROF_ADD(prof, orig, code_in_len);
 -            PROF_ADD(prof, orig, code_out_len);
 -            PROF_ADD(prof, orig, search_out_len);
 -            PROF_ADD(prof, orig, interm_time);
 -            PROF_ADD(prof, orig, code_time);
 -            PROF_ADD(prof, orig, la_time);
 -            PROF_ADD(prof, orig, opt_time);
 -            PROF_ADD(prof, orig, restore_count);
 -            PROF_ADD(prof, orig, restore_time);
 -        }
 -        if (table) {
 -            int i;
 -
 -            for (i = 0; i < NB_OPS; i++) {
 -                PROF_ADD(prof, orig, table_op_count[i]);
 -            }
 -        }
 -    }
 -}
 -
 -#undef PROF_ADD
 -#undef PROF_MAX
 -
 -static void tcg_profile_snapshot_counters(TCGProfile *prof)
 -{
 -    tcg_profile_snapshot(prof, true, false);
 -}
 -
 -static void tcg_profile_snapshot_table(TCGProfile *prof)
 -{
 -    tcg_profile_snapshot(prof, false, true);
 -}
 -
 -void tcg_dump_op_count(GString *buf)
 -{
 -    TCGProfile prof = {};
 -    int i;
 -
 -    tcg_profile_snapshot_table(&prof);
 -    for (i = 0; i < NB_OPS; i++) {
 -        g_string_append_printf(buf, "%s %" PRId64 "\n", tcg_op_defs[i].name,
 -                               prof.table_op_count[i]);
 -    }
 -}
 -
 -int64_t tcg_cpu_exec_time(void)
 -{
 -    unsigned int n_ctxs = qatomic_read(&tcg_cur_ctxs);
 -    unsigned int i;
 -    int64_t ret = 0;
 -
 -    for (i = 0; i < n_ctxs; i++) {
 -        const TCGContext *s = qatomic_read(&tcg_ctxs[i]);
 -        const TCGProfile *prof = &s->prof;
 -
 -        ret += qatomic_read(&prof->cpu_exec_time);
 -    }
 -    return ret;
 -}
 -#else
  void tcg_dump_op_count(GString *buf)
  {
      g_string_append_printf(buf, "[TCG profiler not compiled]\n");
  }
 -int64_t tcg_cpu_exec_time(void)
 -{
 -    error_report("%s: TCG profiler not compiled", __func__);
 -    exit(EXIT_FAILURE);
 -}
 -#endif
 -
 -
  int tcg_gen_code(TCGContext *s, TranslationBlock *tb, uint64_t pc_start)
  {
 -#ifdef CONFIG_PROFILER
 -    TCGProfile *prof = &s->prof;
 -#endif
      int i, start_words, num_insns;
      TCGOp *op;
 -#ifdef CONFIG_PROFILER
 -    {
 -        int n = 0;
 -
 -        QTAILQ_FOREACH(op, &s->ops, link) {
 -            n++;
 -        }
 -        qatomic_set(&prof->op_count, prof->op_count + n);
 -        if (n > prof->op_count_max) {
 -            qatomic_set(&prof->op_count_max, n);
 -        }
 -
 -        n = s->nb_temps;
 -        qatomic_set(&prof->temp_count, prof->temp_count + n);
 -        if (n > prof->temp_count_max) {
 -            qatomic_set(&prof->temp_count_max, n);
 -        }
 -    }
 -#endif
 -
      if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP)
                   && qemu_log_in_addr_range(pc_start))) {
          FILE *logfile = qemu_log_trylock();
@@ -XXX,XX +XXX,XX @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb, uint64_t pc_start)
      }
  #endif
-+#ifdef PPC_FEATURE2_HAS_ISEL
+-#ifdef CONFIG_PROFILER
-+    /* Prefer explicit instruction from the kernel. */
+-    qatomic_set(&prof->opt_time, prof->opt_time - profile_getclock());
-+    have_isel = (hwcap2 & PPC_FEATURE2_HAS_ISEL) != 0;
+-#endif
-+#else
+-
-+    /* Fall back to knowing Power7 (2.06) has ISEL. */
+     tcg_optimize(s);
-+    have_isel = have_isa_2_06;
-+#endif
+-#ifdef CONFIG_PROFILER
-+
+-    qatomic_set(&prof->opt_time, prof->opt_time + profile_getclock());
-     tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff;
+-    qatomic_set(&prof->la_time, prof->la_time - profile_getclock());
-     tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffff;
+-#endif
+-
      reachable_code_pass(s);
      liveness_pass_0(s);
      liveness_pass_1(s);
@@ -XXX,XX +XXX,XX @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb, uint64_t pc_start)
          }
      }
 -#ifdef CONFIG_PROFILER
 -    qatomic_set(&prof->la_time, prof->la_time + profile_getclock());
 -#endif
 -
      if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP_OPT)
                   && qemu_log_in_addr_range(pc_start))) {
          FILE *logfile = qemu_log_trylock();
@@ -XXX,XX +XXX,XX @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb, uint64_t pc_start)
      QTAILQ_FOREACH(op, &s->ops, link) {
          TCGOpcode opc = op->opc;
 -#ifdef CONFIG_PROFILER
 -        qatomic_set(&prof->table_op_count[opc], prof->table_op_count[opc] + 1);
 -#endif
 -
          switch (opc) {
          case INDEX_op_mov_i32:
          case INDEX_op_mov_i64:
@@ -XXX,XX +XXX,XX @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb, uint64_t pc_start)
      return tcg_current_code_size(s);
  }
 -#ifdef CONFIG_PROFILER
 -void tcg_dump_info(GString *buf)
 -{
 -    TCGProfile prof = {};
 -    const TCGProfile *s;
 -    int64_t tb_count;
 -    int64_t tb_div_count;
 -    int64_t tot;
 -
 -    tcg_profile_snapshot_counters(&prof);
 -    s = &prof;
 -    tb_count = s->tb_count;
 -    tb_div_count = tb_count ? tb_count : 1;
 -    tot = s->interm_time + s->code_time;
 -
 -    g_string_append_printf(buf, "JIT cycles          %" PRId64
 -                           " (%0.3f s at 2.4 GHz)\n",
 -                           tot, tot / 2.4e9);
 -    g_string_append_printf(buf, "translated TBs      %" PRId64
 -                           " (aborted=%" PRId64 " %0.1f%%)\n",
 -                           tb_count, s->tb_count1 - tb_count,
 -                           (double)(s->tb_count1 - s->tb_count)
 -                           / (s->tb_count1 ? s->tb_count1 : 1) * 100.0);
 -    g_string_append_printf(buf, "avg ops/TB          %0.1f max=%d\n",
 -                           (double)s->op_count / tb_div_count, s->op_count_max);
 -    g_string_append_printf(buf, "deleted ops/TB      %0.2f\n",
 -                           (double)s->del_op_count / tb_div_count);
 -    g_string_append_printf(buf, "avg temps/TB        %0.2f max=%d\n",
 -                           (double)s->temp_count / tb_div_count,
 -                           s->temp_count_max);
 -    g_string_append_printf(buf, "avg host code/TB    %0.1f\n",
 -                           (double)s->code_out_len / tb_div_count);
 -    g_string_append_printf(buf, "avg search data/TB  %0.1f\n",
 -                           (double)s->search_out_len / tb_div_count);
 -
 -    g_string_append_printf(buf, "cycles/op           %0.1f\n",
 -                           s->op_count ? (double)tot / s->op_count : 0);
 -    g_string_append_printf(buf, "cycles/in byte      %0.1f\n",
 -                           s->code_in_len ? (double)tot / s->code_in_len : 0);
 -    g_string_append_printf(buf, "cycles/out byte     %0.1f\n",
 -                           s->code_out_len ? (double)tot / s->code_out_len : 0);
 -    g_string_append_printf(buf, "cycles/search byte     %0.1f\n",
 -                           s->search_out_len ?
 -                           (double)tot / s->search_out_len : 0);
 -    if (tot == 0) {
 -        tot = 1;
 -    }
 -    g_string_append_printf(buf, "  gen_interm time   %0.1f%%\n",
 -                           (double)s->interm_time / tot * 100.0);
 -    g_string_append_printf(buf, "  gen_code time     %0.1f%%\n",
 -                           (double)s->code_time / tot * 100.0);
 -    g_string_append_printf(buf, "optim./code time    %0.1f%%\n",
 -                           (double)s->opt_time / (s->code_time ?
 -                                                  s->code_time : 1)
 -                           * 100.0);
 -    g_string_append_printf(buf, "liveness/code time  %0.1f%%\n",
 -                           (double)s->la_time / (s->code_time ?
 -                                                 s->code_time : 1) * 100.0);
 -    g_string_append_printf(buf, "cpu_restore count   %" PRId64 "\n",
 -                           s->restore_count);
 -    g_string_append_printf(buf, "  avg cycles        %0.1f\n",
 -                           s->restore_count ?
 -                           (double)s->restore_time / s->restore_count : 0);
 -}
 -#else
  void tcg_dump_info(GString *buf)
  {
      g_string_append_printf(buf, "[TCG profiler not compiled]\n");
  }
 -#endif
  #ifdef ELF_HOST_MACHINE
  /* In order to use this feature, the backend needs to do three things:
 diff --git a/tests/qtest/qmp-cmd-test.c b/tests/qtest/qmp-cmd-test.c
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/qtest/qmp-cmd-test.c
 +++ b/tests/qtest/qmp-cmd-test.c
@@ -XXX,XX +XXX,XX @@ static int query_error_class(const char *cmd)
          { "query-balloon", ERROR_CLASS_DEVICE_NOT_ACTIVE },
          { "query-hotpluggable-cpus", ERROR_CLASS_GENERIC_ERROR },
          { "query-vm-generation-id", ERROR_CLASS_GENERIC_ERROR },
 -#ifndef CONFIG_PROFILER
 -        { "x-query-profile", ERROR_CLASS_GENERIC_ERROR },
 -#endif
          /* Only valid with a USB bus added */
          { "x-query-usb", ERROR_CLASS_GENERIC_ERROR },
          /* Only valid with accel=tcg */
 diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
 index XXXXXXX..XXXXXXX 100644
 --- a/hmp-commands-info.hx
 +++ b/hmp-commands-info.hx
@@ -XXX,XX +XXX,XX @@ SRST
      Show host USB devices.
  ERST
 -#if defined(CONFIG_TCG)
 -    {
 -        .name       = "profile",
 -        .args_type  = "",
 -        .params     = "",
 -        .help       = "show profiling information",
 -        .cmd_info_hrt = qmp_x_query_profile,
 -    },
 -#endif
 -
 -SRST
 -  ``info profile``
 -    Show profiling information.
 -ERST
 -
      {
          .name       = "capture",
          .args_type  = "",
 diff --git a/meson_options.txt b/meson_options.txt
 index XXXXXXX..XXXXXXX 100644
 --- a/meson_options.txt
 +++ b/meson_options.txt
@@ -XXX,XX +XXX,XX @@ option('qom_cast_debug', type: 'boolean', value: true,
  option('gprof', type: 'boolean', value: false,
         description: 'QEMU profiling with gprof',
         deprecated: true)
 -option('profiler', type: 'boolean', value: false,
 -       description: 'profiler support')
  option('slirp_smbd', type : 'feature', value : 'auto',
         description: 'use smbd (at path --smbd=*) in slirp networking')
 diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
 index XXXXXXX..XXXXXXX 100644
 --- a/scripts/meson-buildoptions.sh
 +++ b/scripts/meson-buildoptions.sh
@@ -XXX,XX +XXX,XX @@ meson_options_help() {
    printf "%s\n" '                           jemalloc/system/tcmalloc)'
    printf "%s\n" '  --enable-module-upgrades try to load modules from alternate paths for'
    printf "%s\n" '                           upgrades'
 -  printf "%s\n" '  --enable-profiler        profiler support'
    printf "%s\n" '  --enable-rng-none        dummy RNG, avoid using /dev/(u)random and'
    printf "%s\n" '                           getrandom()'
    printf "%s\n" '  --enable-safe-stack      SafeStack Stack Smash Protection (requires'
@@ -XXX,XX +XXX,XX @@ _meson_option_parse() {
      --with-pkgversion=*) quote_sh "-Dpkgversion=$2" ;;
      --enable-png) printf "%s" -Dpng=enabled ;;
      --disable-png) printf "%s" -Dpng=disabled ;;
 -    --enable-profiler) printf "%s" -Dprofiler=true ;;
 -    --disable-profiler) printf "%s" -Dprofiler=false ;;
      --enable-pvrdma) printf "%s" -Dpvrdma=enabled ;;
      --disable-pvrdma) printf "%s" -Dpvrdma=disabled ;;
      --enable-qcow1) printf "%s" -Dqcow1=enabled ;;
 --
-.17.1
+.34.1

-[PULL 17/23] tcg/ppc: Update vector support for v2.07 Altivec
+[PULL 15/22] tcg: Fix temporary variable in tcg_gen_gvec_andcs
-These new instructions are conditional only on MSR.VEC and
+From: Max Chou <max.chou@sifive.com>
 are thus part of the Altivec instruction set, and not VSX.
 This includes lots of double-word arithmetic and a few extra
 logical operations.
-Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
+The 5th parameter of tcg_gen_gvec_2s should be replaced by the
 temporary tmp variable in the tcg_gen_gvec_andcs function.
 Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
 Signed-off-by: Max Chou <max.chou@sifive.com>
 Message-Id: <20230622161646.32005-9-max.chou@sifive.com>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/ppc/tcg-target.h     |  4 +-
+ tcg/tcg-op-gvec.c | 2 +-
- tcg/ppc/tcg-target.inc.c | 85 ++++++++++++++++++++++++++++++----------
+file changed, 1 insertion(+), 1 deletion(-)
 files changed, 67 insertions(+), 22 deletions(-)
-diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
+diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.h
+--- a/tcg/tcg-op-gvec.c
-+++ b/tcg/ppc/tcg-target.h
++++ b/tcg/tcg-op-gvec.c
-@@ -XXX,XX +XXX,XX @@ typedef enum {
+@@ -XXX,XX +XXX,XX @@ void tcg_gen_gvec_andcs(unsigned vece, uint32_t dofs, uint32_t aofs,
- typedef enum {
-     tcg_isa_base,
+     TCGv_i64 tmp = tcg_temp_ebb_new_i64();
-     tcg_isa_2_06,
+     tcg_gen_dup_i64(vece, tmp, c);
-+    tcg_isa_2_07,
+-    tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, c, &g);
-     tcg_isa_3_00,
++    tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, tmp, &g);
- } TCGPowerISA;
+     tcg_temp_free_i64(tmp);
+ }
-@@ -XXX,XX +XXX,XX @@ extern bool have_altivec;
  extern bool have_vsx;
  #define have_isa_2_06  (have_isa >= tcg_isa_2_06)
 +#define have_isa_2_07  (have_isa >= tcg_isa_2_07)
  #define have_isa_3_00  (have_isa >= tcg_isa_3_00)
  /* optional instructions automatically implemented */
@@ -XXX,XX +XXX,XX @@ extern bool have_vsx;
  #define TCG_TARGET_HAS_v256             0
  #define TCG_TARGET_HAS_andc_vec         1
 -#define TCG_TARGET_HAS_orc_vec          0
 +#define TCG_TARGET_HAS_orc_vec          have_isa_2_07
  #define TCG_TARGET_HAS_not_vec          1
  #define TCG_TARGET_HAS_neg_vec          0
  #define TCG_TARGET_HAS_abs_vec          0
 diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/tcg/ppc/tcg-target.inc.c
 +++ b/tcg/ppc/tcg-target.inc.c
@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
  #define VADDSWS    VX4(896)
  #define VADDUWS    VX4(640)
  #define VADDUWM    VX4(128)
 +#define VADDUDM    VX4(192)       /* v2.07 */
  #define VSUBSBS    VX4(1792)
  #define VSUBUBS    VX4(1536)
@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
  #define VSUBSWS    VX4(1920)
  #define VSUBUWS    VX4(1664)
  #define VSUBUWM    VX4(1152)
 +#define VSUBUDM    VX4(1216)      /* v2.07 */
  #define VMAXSB     VX4(258)
  #define VMAXSH     VX4(322)
  #define VMAXSW     VX4(386)
 +#define VMAXSD     VX4(450)       /* v2.07 */
  #define VMAXUB     VX4(2)
  #define VMAXUH     VX4(66)
  #define VMAXUW     VX4(130)
 +#define VMAXUD     VX4(194)       /* v2.07 */
  #define VMINSB     VX4(770)
  #define VMINSH     VX4(834)
  #define VMINSW     VX4(898)
 +#define VMINSD     VX4(962)       /* v2.07 */
  #define VMINUB     VX4(514)
  #define VMINUH     VX4(578)
  #define VMINUW     VX4(642)
 +#define VMINUD     VX4(706)       /* v2.07 */
  #define VCMPEQUB   VX4(6)
  #define VCMPEQUH   VX4(70)
  #define VCMPEQUW   VX4(134)
 +#define VCMPEQUD   VX4(199)       /* v2.07 */
  #define VCMPGTSB   VX4(774)
  #define VCMPGTSH   VX4(838)
  #define VCMPGTSW   VX4(902)
 +#define VCMPGTSD   VX4(967)       /* v2.07 */
  #define VCMPGTUB   VX4(518)
  #define VCMPGTUH   VX4(582)
  #define VCMPGTUW   VX4(646)
 +#define VCMPGTUD   VX4(711)       /* v2.07 */
  #define VSLB       VX4(260)
  #define VSLH       VX4(324)
  #define VSLW       VX4(388)
 +#define VSLD       VX4(1476)      /* v2.07 */
  #define VSRB       VX4(516)
  #define VSRH       VX4(580)
  #define VSRW       VX4(644)
 +#define VSRD       VX4(1732)      /* v2.07 */
  #define VSRAB      VX4(772)
  #define VSRAH      VX4(836)
  #define VSRAW      VX4(900)
 +#define VSRAD      VX4(964)       /* v2.07 */
  #define VRLB       VX4(4)
  #define VRLH       VX4(68)
  #define VRLW       VX4(132)
 +#define VRLD       VX4(196)       /* v2.07 */
  #define VMULEUB    VX4(520)
  #define VMULEUH    VX4(584)
 +#define VMULEUW    VX4(648)       /* v2.07 */
  #define VMULOUB    VX4(8)
  #define VMULOUH    VX4(72)
 +#define VMULOUW    VX4(136)       /* v2.07 */
 +#define VMULUWM    VX4(137)       /* v2.07 */
  #define VMSUMUHM   VX4(38)
  #define VMRGHB     VX4(12)
@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
  #define VNOR       VX4(1284)
  #define VOR        VX4(1156)
  #define VXOR       VX4(1220)
 +#define VEQV       VX4(1668)      /* v2.07 */
 +#define VNAND      VX4(1412)      /* v2.07 */
 +#define VORC       VX4(1348)      /* v2.07 */
  #define VSPLTB     VX4(524)
  #define VSPLTH     VX4(588)
@@ -XXX,XX +XXX,XX @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
      case INDEX_op_andc_vec:
      case INDEX_op_not_vec:
          return 1;
 +    case INDEX_op_orc_vec:
 +        return have_isa_2_07;
      case INDEX_op_add_vec:
      case INDEX_op_sub_vec:
      case INDEX_op_smax_vec:
      case INDEX_op_smin_vec:
      case INDEX_op_umax_vec:
      case INDEX_op_umin_vec:
 +    case INDEX_op_shlv_vec:
 +    case INDEX_op_shrv_vec:
 +    case INDEX_op_sarv_vec:
 +        return vece <= MO_32 || have_isa_2_07;
      case INDEX_op_ssadd_vec:
      case INDEX_op_sssub_vec:
      case INDEX_op_usadd_vec:
      case INDEX_op_ussub_vec:
 -    case INDEX_op_shlv_vec:
 -    case INDEX_op_shrv_vec:
 -    case INDEX_op_sarv_vec:
          return vece <= MO_32;
      case INDEX_op_cmp_vec:
 -    case INDEX_op_mul_vec:
      case INDEX_op_shli_vec:
      case INDEX_op_shri_vec:
      case INDEX_op_sari_vec:
 -        return vece <= MO_32 ? -1 : 0;
 +        return vece <= MO_32 || have_isa_2_07 ? -1 : 0;
 +    case INDEX_op_mul_vec:
 +        switch (vece) {
 +        case MO_8:
 +        case MO_16:
 +            return -1;
 +        case MO_32:
 +            return have_isa_2_07 ? 1 : -1;
 +        }
 +        return 0;
      case INDEX_op_bitsel_vec:
          return have_vsx;
      default:
@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
                             const TCGArg *args, const int *const_args)
  {
      static const uint32_t
 -        add_op[4] = { VADDUBM, VADDUHM, VADDUWM, 0 },
 -        sub_op[4] = { VSUBUBM, VSUBUHM, VSUBUWM, 0 },
 -        eq_op[4]  = { VCMPEQUB, VCMPEQUH, VCMPEQUW, 0 },
 -        gts_op[4] = { VCMPGTSB, VCMPGTSH, VCMPGTSW, 0 },
 -        gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, 0 },
 +        add_op[4] = { VADDUBM, VADDUHM, VADDUWM, VADDUDM },
 +        sub_op[4] = { VSUBUBM, VSUBUHM, VSUBUWM, VSUBUDM },
 +        eq_op[4]  = { VCMPEQUB, VCMPEQUH, VCMPEQUW, VCMPEQUD },
 +        gts_op[4] = { VCMPGTSB, VCMPGTSH, VCMPGTSW, VCMPGTSD },
 +        gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, VCMPGTUD },
          ssadd_op[4] = { VADDSBS, VADDSHS, VADDSWS, 0 },
          usadd_op[4] = { VADDUBS, VADDUHS, VADDUWS, 0 },
          sssub_op[4] = { VSUBSBS, VSUBSHS, VSUBSWS, 0 },
          ussub_op[4] = { VSUBUBS, VSUBUHS, VSUBUWS, 0 },
 -        umin_op[4] = { VMINUB, VMINUH, VMINUW, 0 },
 -        smin_op[4] = { VMINSB, VMINSH, VMINSW, 0 },
 -        umax_op[4] = { VMAXUB, VMAXUH, VMAXUW, 0 },
 -        smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, 0 },
 -        shlv_op[4] = { VSLB, VSLH, VSLW, 0 },
 -        shrv_op[4] = { VSRB, VSRH, VSRW, 0 },
 -        sarv_op[4] = { VSRAB, VSRAH, VSRAW, 0 },
 +        umin_op[4] = { VMINUB, VMINUH, VMINUW, VMINUD },
 +        smin_op[4] = { VMINSB, VMINSH, VMINSW, VMINSD },
 +        umax_op[4] = { VMAXUB, VMAXUH, VMAXUW, VMAXUD },
 +        smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, VMAXSD },
 +        shlv_op[4] = { VSLB, VSLH, VSLW, VSLD },
 +        shrv_op[4] = { VSRB, VSRH, VSRW, VSRD },
 +        sarv_op[4] = { VSRAB, VSRAH, VSRAW, VSRAD },
          mrgh_op[4] = { VMRGHB, VMRGHH, VMRGHW, 0 },
          mrgl_op[4] = { VMRGLB, VMRGLH, VMRGLW, 0 },
 -        muleu_op[4] = { VMULEUB, VMULEUH, 0, 0 },
 -        mulou_op[4] = { VMULOUB, VMULOUH, 0, 0 },
 +        muleu_op[4] = { VMULEUB, VMULEUH, VMULEUW, 0 },
 +        mulou_op[4] = { VMULOUB, VMULOUH, VMULOUW, 0 },
          pkum_op[4] = { VPKUHUM, VPKUWUM, 0, 0 },
 -        rotl_op[4] = { VRLB, VRLH, VRLW, 0 };
 +        rotl_op[4] = { VRLB, VRLH, VRLW, VRLD };
      TCGType type = vecl + TCG_TYPE_V64;
      TCGArg a0 = args[0], a1 = args[1], a2 = args[2];
@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
      case INDEX_op_sub_vec:
          insn = sub_op[vece];
          break;
 +    case INDEX_op_mul_vec:
 +        tcg_debug_assert(vece == MO_32 && have_isa_2_07);
 +        insn = VMULUWM;
 +        break;
      case INDEX_op_ssadd_vec:
          insn = ssadd_op[vece];
          break;
@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
          insn = VNOR;
          a2 = a1;
          break;
 +    case INDEX_op_orc_vec:
 +        insn = VORC;
 +        break;
      case INDEX_op_cmp_vec:
          switch (args[3]) {
@@ -XXX,XX +XXX,XX @@ static void expand_vec_cmp(TCGType type, unsigned vece, TCGv_vec v0,
  {
      bool need_swap = false, need_inv = false;
 -    tcg_debug_assert(vece <= MO_32);
 +    tcg_debug_assert(vece <= MO_32 || have_isa_2_07);
      switch (cond) {
      case TCG_COND_EQ:
@@ -XXX,XX +XXX,XX @@ static void expand_vec_mul(TCGType type, unsigned vece, TCGv_vec v0,
      break;
      case MO_32:
 +        tcg_debug_assert(!have_isa_2_07);
          t3 = tcg_temp_new_vec(type);
          t4 = tcg_temp_new_vec(type);
          tcg_gen_dupi_vec(MO_8, t4, -16);
@@ -XXX,XX +XXX,XX @@ static void tcg_target_init(TCGContext *s)
      if (hwcap & PPC_FEATURE_ARCH_2_06) {
          have_isa = tcg_isa_2_06;
      }
 +#ifdef PPC_FEATURE2_ARCH_2_07
 +    if (hwcap2 & PPC_FEATURE2_ARCH_2_07) {
 +        have_isa = tcg_isa_2_07;
 +    }
 +#endif
  #ifdef PPC_FEATURE2_ARCH_3_00
      if (hwcap2 & PPC_FEATURE2_ARCH_3_00) {
          have_isa = tcg_isa_3_00;
 --
-.17.1
+.34.1

-[PULL 15/23] tcg/ppc: Enable Altivec detection
+[PULL 16/22] target/microblaze: Define TCG_GUEST_DEFAULT_MO
-Now that we have implemented the required tcg operations,
+The microblaze architecture does not reorder instructions.
-we can enable detection of host vector support.
+While there is an MBAR wait-for-data-access instruction,
 this concerns synchronizing with DMA.
-Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> (PPC32)
+This should have been defined when enabling MTTCG.
-Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Edgar E. Iglesias <edgar@zeroasic.com>
 Fixes: d449561b130 ("configure: microblaze: Enable mttcg")
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/ppc/tcg-target.inc.c | 4 ++++
+ target/microblaze/cpu.h | 3 +++
-file changed, 4 insertions(+)
+file changed, 3 insertions(+)
-diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
+diff --git a/target/microblaze/cpu.h b/target/microblaze/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.inc.c
+--- a/target/microblaze/cpu.h
-+++ b/tcg/ppc/tcg-target.inc.c
++++ b/target/microblaze/cpu.h
-@@ -XXX,XX +XXX,XX @@ static void tcg_target_init(TCGContext *s)
+@@ -XXX,XX +XXX,XX @@
-     have_isel = have_isa_2_06;
+ #include "exec/cpu-defs.h"
- #endif
+ #include "qemu/cpu-float.h"
-+    if (hwcap & PPC_FEATURE_HAS_ALTIVEC) {
++/* MicroBlaze is always in-order. */
-+        have_altivec = true;
++#define TCG_GUEST_DEFAULT_MO  TCG_MO_ALL
 +    }
 +
-     tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff;
+ typedef struct CPUArchState CPUMBState;
-     tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffff;
+ #if !defined(CONFIG_USER_ONLY)
-     if (have_altivec) {
+ #include "mmu.h"
 --
-.17.1
+.34.1

-[PULL 09/23] tcg/ppc: Add support for vector maximum/minimum
+[PULL 17/22] tcg: Do not elide memory barriers for !CF_PARALLEL in system mode
-Add support for vector maximum/minimum using Altivec instructions
+The virtio devices require proper memory ordering between
-VMAXSB, VMAXSH, VMAXSW, VMAXUB, VMAXUH, VMAXUW, and
+the vcpus and the iothreads.
 VMINSB, VMINSH, VMINSW, VMINUB, VMINUH, VMINUW.
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
 ---
- tcg/ppc/tcg-target.h     |  2 +-
+ tcg/tcg-op.c | 14 +++++++++++++-
- tcg/ppc/tcg-target.inc.c | 40 +++++++++++++++++++++++++++++++++++++++-
+file changed, 13 insertions(+), 1 deletion(-)
 files changed, 40 insertions(+), 2 deletions(-)
-diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
+diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.h
+--- a/tcg/tcg-op.c
-+++ b/tcg/ppc/tcg-target.h
++++ b/tcg/tcg-op.c
-@@ -XXX,XX +XXX,XX @@ extern bool have_altivec;
+@@ -XXX,XX +XXX,XX @@ void tcg_gen_br(TCGLabel *l)
- #define TCG_TARGET_HAS_cmp_vec          1
- #define TCG_TARGET_HAS_mul_vec          0
+ void tcg_gen_mb(TCGBar mb_type)
- #define TCG_TARGET_HAS_sat_vec          0
+ {
--#define TCG_TARGET_HAS_minmax_vec       0
+-    if (tcg_ctx->gen_tb->cflags & CF_PARALLEL) {
-+#define TCG_TARGET_HAS_minmax_vec       1
++#ifdef CONFIG_USER_ONLY
- #define TCG_TARGET_HAS_bitsel_vec       0
++    bool parallel = tcg_ctx->gen_tb->cflags & CF_PARALLEL;
- #define TCG_TARGET_HAS_cmpsel_vec       0
++#else
++    /*
-diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
++     * It is tempting to elide the barrier in a uniprocessor context.
-index XXXXXXX..XXXXXXX 100644
++     * However, even with a single cpu we have i/o threads running in
---- a/tcg/ppc/tcg-target.inc.c
++     * parallel, and lack of memory order can result in e.g. virtio
-+++ b/tcg/ppc/tcg-target.inc.c
++     * queue entries being read incorrectly.
-@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
++     */
- #define STVX       XO31(231)
++    bool parallel = true;
- #define STVEWX     XO31(199)
++#endif
 +#define VMAXSB     VX4(258)
 +#define VMAXSH     VX4(322)
 +#define VMAXSW     VX4(386)
 +#define VMAXUB     VX4(2)
 +#define VMAXUH     VX4(66)
 +#define VMAXUW     VX4(130)
 +#define VMINSB     VX4(770)
 +#define VMINSH     VX4(834)
 +#define VMINSW     VX4(898)
 +#define VMINUB     VX4(514)
 +#define VMINUH     VX4(578)
 +#define VMINUW     VX4(642)
 +
- #define VCMPEQUB   VX4(6)
++    if (parallel) {
- #define VCMPEQUH   VX4(70)
+         tcg_gen_op1(INDEX_op_mb, mb_type);
- #define VCMPEQUW   VX4(134)
+     }
-@@ -XXX,XX +XXX,XX @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
+ }
      case INDEX_op_andc_vec:
      case INDEX_op_not_vec:
          return 1;
 +    case INDEX_op_smax_vec:
 +    case INDEX_op_smin_vec:
 +    case INDEX_op_umax_vec:
 +    case INDEX_op_umin_vec:
 +        return vece <= MO_32;
      case INDEX_op_cmp_vec:
          return vece <= MO_32 ? -1 : 0;
      default:
@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
      static const uint32_t
          eq_op[4]  = { VCMPEQUB, VCMPEQUH, VCMPEQUW, 0 },
          gts_op[4] = { VCMPGTSB, VCMPGTSH, VCMPGTSW, 0 },
 -        gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, 0 };
 +        gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, 0 },
 +        umin_op[4] = { VMINUB, VMINUH, VMINUW, 0 },
 +        smin_op[4] = { VMINSB, VMINSH, VMINSW, 0 },
 +        umax_op[4] = { VMAXUB, VMAXUH, VMAXUW, 0 },
 +        smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, 0 };
      TCGType type = vecl + TCG_TYPE_V64;
      TCGArg a0 = args[0], a1 = args[1], a2 = args[2];
@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
          tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
          return;
 +    case INDEX_op_smin_vec:
 +        insn = smin_op[vece];
 +        break;
 +    case INDEX_op_umin_vec:
 +        insn = umin_op[vece];
 +        break;
 +    case INDEX_op_smax_vec:
 +        insn = smax_op[vece];
 +        break;
 +    case INDEX_op_umax_vec:
 +        insn = umax_op[vece];
 +        break;
      case INDEX_op_and_vec:
          insn = VAND;
          break;
@@ -XXX,XX +XXX,XX @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
      case INDEX_op_andc_vec:
      case INDEX_op_orc_vec:
      case INDEX_op_cmp_vec:
 +    case INDEX_op_smax_vec:
 +    case INDEX_op_smin_vec:
 +    case INDEX_op_umax_vec:
 +    case INDEX_op_umin_vec:
          return &v_v_v;
      case INDEX_op_not_vec:
      case INDEX_op_dup_vec:
 --
-.17.1
+.34.1

-[PULL 03/23] tcg/ppc: Introduce macros VRT(), VRA(), VRB(), VRC()
+[PULL 18/22] tcg: Add host memory barriers to cpu_ldst.h interfaces
-Introduce macros VRT(), VRA(), VRB(), VRC() used for encoding
+Bring the helpers into line with the rest of tcg in respecting
-elements of Altivec instructions.
+guest memory ordering.
 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
 ---
- tcg/ppc/tcg-target.inc.c | 5 +++++
+ accel/tcg/internal.h  | 34 ++++++++++++++++++++++++++++++++++
-file changed, 5 insertions(+)
+ accel/tcg/cputlb.c    | 10 ++++++++++
+ accel/tcg/user-exec.c | 10 ++++++++++
-diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
+files changed, 54 insertions(+)
 diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.inc.c
+--- a/accel/tcg/internal.h
-+++ b/tcg/ppc/tcg-target.inc.c
++++ b/accel/tcg/internal.h
-@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
+@@ -XXX,XX +XXX,XX @@ extern int64_t max_advance;
- #define MB64(b) ((b)<<5)
- #define FXM(b) (1 << (19 - (b)))
+ extern bool one_insn_per_tb;
-+#define VRT(r)  (((r) & 31) << 21)
++/**
-+#define VRA(r)  (((r) & 31) << 16)
++ * tcg_req_mo:
-+#define VRB(r)  (((r) & 31) << 11)
++ * @type: TCGBar
-+#define VRC(r)  (((r) & 31) <<  6)
++ *
 + * Filter @type to the barrier that is required for the guest
 + * memory ordering vs the host memory ordering.  A non-zero
 + * result indicates that some barrier is required.
 + *
 + * If TCG_GUEST_DEFAULT_MO is not defined, assume that the
 + * guest requires strict ordering.
 + *
 + * This is a macro so that it's constant even without optimization.
 + */
 +#ifdef TCG_GUEST_DEFAULT_MO
 +# define tcg_req_mo(type) \
 +    ((type) & TCG_GUEST_DEFAULT_MO & ~TCG_TARGET_DEFAULT_MO)
 +#else
 +# define tcg_req_mo(type) ((type) & ~TCG_TARGET_DEFAULT_MO)
 +#endif
 +
- #define LK    1
++/**
++ * cpu_req_mo:
- #define TAB(t, a, b) (RT(t) | RA(a) | RB(b))
++ * @type: TCGBar
 + *
 + * If tcg_req_mo indicates a barrier for @type is required
 + * for the guest memory model, issue a host memory barrier.
 + */
 +#define cpu_req_mo(type)          \
 +    do {                          \
 +        if (tcg_req_mo(type)) {   \
 +            smp_mb();             \
 +        }                         \
 +    } while (0)
 +
  #endif /* ACCEL_TCG_INTERNAL_H */
 diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/tcg/cputlb.c
 +++ b/accel/tcg/cputlb.c
@@ -XXX,XX +XXX,XX @@ static uint8_t do_ld1_mmu(CPUArchState *env, vaddr addr, MemOpIdx oi,
      MMULookupLocals l;
      bool crosspage;
 +    cpu_req_mo(TCG_MO_LD_LD | TCG_MO_ST_LD);
      crosspage = mmu_lookup(env, addr, oi, ra, access_type, &l);
      tcg_debug_assert(!crosspage);
@@ -XXX,XX +XXX,XX @@ static uint16_t do_ld2_mmu(CPUArchState *env, vaddr addr, MemOpIdx oi,
      uint16_t ret;
      uint8_t a, b;
 +    cpu_req_mo(TCG_MO_LD_LD | TCG_MO_ST_LD);
      crosspage = mmu_lookup(env, addr, oi, ra, access_type, &l);
      if (likely(!crosspage)) {
          return do_ld_2(env, &l.page[0], l.mmu_idx, access_type, l.memop, ra);
@@ -XXX,XX +XXX,XX @@ static uint32_t do_ld4_mmu(CPUArchState *env, vaddr addr, MemOpIdx oi,
      bool crosspage;
      uint32_t ret;
 +    cpu_req_mo(TCG_MO_LD_LD | TCG_MO_ST_LD);
      crosspage = mmu_lookup(env, addr, oi, ra, access_type, &l);
      if (likely(!crosspage)) {
          return do_ld_4(env, &l.page[0], l.mmu_idx, access_type, l.memop, ra);
@@ -XXX,XX +XXX,XX @@ static uint64_t do_ld8_mmu(CPUArchState *env, vaddr addr, MemOpIdx oi,
      bool crosspage;
      uint64_t ret;
 +    cpu_req_mo(TCG_MO_LD_LD | TCG_MO_ST_LD);
      crosspage = mmu_lookup(env, addr, oi, ra, access_type, &l);
      if (likely(!crosspage)) {
          return do_ld_8(env, &l.page[0], l.mmu_idx, access_type, l.memop, ra);
@@ -XXX,XX +XXX,XX @@ static Int128 do_ld16_mmu(CPUArchState *env, vaddr addr,
      Int128 ret;
      int first;
 +    cpu_req_mo(TCG_MO_LD_LD | TCG_MO_ST_LD);
      crosspage = mmu_lookup(env, addr, oi, ra, MMU_DATA_LOAD, &l);
      if (likely(!crosspage)) {
          /* Perform the load host endian. */
@@ -XXX,XX +XXX,XX @@ void helper_stb_mmu(CPUArchState *env, uint64_t addr, uint32_t val,
      bool crosspage;
      tcg_debug_assert((get_memop(oi) & MO_SIZE) == MO_8);
 +    cpu_req_mo(TCG_MO_LD_ST | TCG_MO_ST_ST);
      crosspage = mmu_lookup(env, addr, oi, ra, MMU_DATA_STORE, &l);
      tcg_debug_assert(!crosspage);
@@ -XXX,XX +XXX,XX @@ static void do_st2_mmu(CPUArchState *env, vaddr addr, uint16_t val,
      bool crosspage;
      uint8_t a, b;
 +    cpu_req_mo(TCG_MO_LD_ST | TCG_MO_ST_ST);
      crosspage = mmu_lookup(env, addr, oi, ra, MMU_DATA_STORE, &l);
      if (likely(!crosspage)) {
          do_st_2(env, &l.page[0], val, l.mmu_idx, l.memop, ra);
@@ -XXX,XX +XXX,XX @@ static void do_st4_mmu(CPUArchState *env, vaddr addr, uint32_t val,
      MMULookupLocals l;
      bool crosspage;
 +    cpu_req_mo(TCG_MO_LD_ST | TCG_MO_ST_ST);
      crosspage = mmu_lookup(env, addr, oi, ra, MMU_DATA_STORE, &l);
      if (likely(!crosspage)) {
          do_st_4(env, &l.page[0], val, l.mmu_idx, l.memop, ra);
@@ -XXX,XX +XXX,XX @@ static void do_st8_mmu(CPUArchState *env, vaddr addr, uint64_t val,
      MMULookupLocals l;
      bool crosspage;
 +    cpu_req_mo(TCG_MO_LD_ST | TCG_MO_ST_ST);
      crosspage = mmu_lookup(env, addr, oi, ra, MMU_DATA_STORE, &l);
      if (likely(!crosspage)) {
          do_st_8(env, &l.page[0], val, l.mmu_idx, l.memop, ra);
@@ -XXX,XX +XXX,XX @@ static void do_st16_mmu(CPUArchState *env, vaddr addr, Int128 val,
      uint64_t a, b;
      int first;
 +    cpu_req_mo(TCG_MO_LD_ST | TCG_MO_ST_ST);
      crosspage = mmu_lookup(env, addr, oi, ra, MMU_DATA_STORE, &l);
      if (likely(!crosspage)) {
          /* Swap to host endian if necessary, then store. */
 diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/tcg/user-exec.c
 +++ b/accel/tcg/user-exec.c
@@ -XXX,XX +XXX,XX @@ static uint8_t do_ld1_mmu(CPUArchState *env, abi_ptr addr,
      uint8_t ret;
      tcg_debug_assert((mop & MO_SIZE) == MO_8);
 +    cpu_req_mo(TCG_MO_LD_LD | TCG_MO_ST_LD);
      haddr = cpu_mmu_lookup(env, addr, mop, ra, MMU_DATA_LOAD);
      ret = ldub_p(haddr);
      clear_helper_retaddr();
@@ -XXX,XX +XXX,XX @@ static uint16_t do_ld2_mmu(CPUArchState *env, abi_ptr addr,
      uint16_t ret;
      tcg_debug_assert((mop & MO_SIZE) == MO_16);
 +    cpu_req_mo(TCG_MO_LD_LD | TCG_MO_ST_LD);
      haddr = cpu_mmu_lookup(env, addr, mop, ra, MMU_DATA_LOAD);
      ret = load_atom_2(env, ra, haddr, mop);
      clear_helper_retaddr();
@@ -XXX,XX +XXX,XX @@ static uint32_t do_ld4_mmu(CPUArchState *env, abi_ptr addr,
      uint32_t ret;
      tcg_debug_assert((mop & MO_SIZE) == MO_32);
 +    cpu_req_mo(TCG_MO_LD_LD | TCG_MO_ST_LD);
      haddr = cpu_mmu_lookup(env, addr, mop, ra, MMU_DATA_LOAD);
      ret = load_atom_4(env, ra, haddr, mop);
      clear_helper_retaddr();
@@ -XXX,XX +XXX,XX @@ static uint64_t do_ld8_mmu(CPUArchState *env, abi_ptr addr,
      uint64_t ret;
      tcg_debug_assert((mop & MO_SIZE) == MO_64);
 +    cpu_req_mo(TCG_MO_LD_LD | TCG_MO_ST_LD);
      haddr = cpu_mmu_lookup(env, addr, mop, ra, MMU_DATA_LOAD);
      ret = load_atom_8(env, ra, haddr, mop);
      clear_helper_retaddr();
@@ -XXX,XX +XXX,XX @@ static Int128 do_ld16_mmu(CPUArchState *env, abi_ptr addr,
      Int128 ret;
      tcg_debug_assert((mop & MO_SIZE) == MO_128);
 +    cpu_req_mo(TCG_MO_LD_LD | TCG_MO_ST_LD);
      haddr = cpu_mmu_lookup(env, addr, mop, ra, MMU_DATA_LOAD);
      ret = load_atom_16(env, ra, haddr, mop);
      clear_helper_retaddr();
@@ -XXX,XX +XXX,XX @@ static void do_st1_mmu(CPUArchState *env, abi_ptr addr, uint8_t val,
      void *haddr;
      tcg_debug_assert((mop & MO_SIZE) == MO_8);
 +    cpu_req_mo(TCG_MO_LD_ST | TCG_MO_ST_ST);
      haddr = cpu_mmu_lookup(env, addr, mop, ra, MMU_DATA_STORE);
      stb_p(haddr, val);
      clear_helper_retaddr();
@@ -XXX,XX +XXX,XX @@ static void do_st2_mmu(CPUArchState *env, abi_ptr addr, uint16_t val,
      void *haddr;
      tcg_debug_assert((mop & MO_SIZE) == MO_16);
 +    cpu_req_mo(TCG_MO_LD_ST | TCG_MO_ST_ST);
      haddr = cpu_mmu_lookup(env, addr, mop, ra, MMU_DATA_STORE);
      if (mop & MO_BSWAP) {
@@ -XXX,XX +XXX,XX @@ static void do_st4_mmu(CPUArchState *env, abi_ptr addr, uint32_t val,
      void *haddr;
      tcg_debug_assert((mop & MO_SIZE) == MO_32);
 +    cpu_req_mo(TCG_MO_LD_ST | TCG_MO_ST_ST);
      haddr = cpu_mmu_lookup(env, addr, mop, ra, MMU_DATA_STORE);
      if (mop & MO_BSWAP) {
@@ -XXX,XX +XXX,XX @@ static void do_st8_mmu(CPUArchState *env, abi_ptr addr, uint64_t val,
      void *haddr;
      tcg_debug_assert((mop & MO_SIZE) == MO_64);
 +    cpu_req_mo(TCG_MO_LD_ST | TCG_MO_ST_ST);
      haddr = cpu_mmu_lookup(env, addr, mop, ra, MMU_DATA_STORE);
      if (mop & MO_BSWAP) {
@@ -XXX,XX +XXX,XX @@ static void do_st16_mmu(CPUArchState *env, abi_ptr addr, Int128 val,
      void *haddr;
      tcg_debug_assert((mop & MO_SIZE) == MO_128);
 +    cpu_req_mo(TCG_MO_LD_ST | TCG_MO_ST_ST);
      haddr = cpu_mmu_lookup(env, addr, mop, ra, MMU_DATA_STORE);
      if (mop & MO_BSWAP) {
 --
-.17.1
+.34.1

-[PULL 12/23] tcg/ppc: Support vector shift by immediate
+[PULL 19/22] accel/tcg: Remove check_tcg_memory_orders_compatible
-For Altivec, this is done via vector shift by vector,
+We now issue host memory barriers to match the guest memory order.
-and loading the immediate into a register.
+Continue to disable MTTCG only if the guest has not been ported.
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
 ---
- tcg/ppc/tcg-target.h     |  2 +-
+ accel/tcg/tcg-all.c | 39 ++++++++++-----------------------------
- tcg/ppc/tcg-target.inc.c | 58 ++++++++++++++++++++++++++++++++++++++--
+file changed, 10 insertions(+), 29 deletions(-)
 files changed, 57 insertions(+), 3 deletions(-)
-diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
+diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.h
+--- a/accel/tcg/tcg-all.c
-+++ b/tcg/ppc/tcg-target.h
++++ b/accel/tcg/tcg-all.c
-@@ -XXX,XX +XXX,XX @@ extern bool have_altivec;
+@@ -XXX,XX +XXX,XX @@ DECLARE_INSTANCE_CHECKER(TCGState, TCG_STATE,
- #define TCG_TARGET_HAS_abs_vec          0
+  * they can set the appropriate CONFIG flags in ${target}-softmmu.mak
- #define TCG_TARGET_HAS_shi_vec          0
+  *
- #define TCG_TARGET_HAS_shs_vec          0
+  * Once a guest architecture has been converted to the new primitives
--#define TCG_TARGET_HAS_shv_vec          0
+- * there are two remaining limitations to check.
-+#define TCG_TARGET_HAS_shv_vec          1
+- *
- #define TCG_TARGET_HAS_cmp_vec          1
+- * - The guest can't be oversized (e.g. 64 bit guest on 32 bit host)
- #define TCG_TARGET_HAS_mul_vec          0
+- * - The host must have a stronger memory order than the guest
- #define TCG_TARGET_HAS_sat_vec          1
+- *
-diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
+- * It may be possible in future to support strong guests on weak hosts
-index XXXXXXX..XXXXXXX 100644
+- * but that will require tagging all load/stores in a guest with their
---- a/tcg/ppc/tcg-target.inc.c
+- * implicit memory order requirements which would likely slow things
-+++ b/tcg/ppc/tcg-target.inc.c
+- * down a lot.
-@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
++ * there is one remaining limitation to check:
- #define VCMPGTUH   VX4(582)
++ *   - The guest can't be oversized (e.g. 64 bit guest on 32 bit host)
- #define VCMPGTUW   VX4(646)
+  */
-+#define VSLB       VX4(260)
+-static bool check_tcg_memory_orders_compatible(void)
-+#define VSLH       VX4(324)
+-{
-+#define VSLW       VX4(388)
+-#if defined(TCG_GUEST_DEFAULT_MO) && defined(TCG_TARGET_DEFAULT_MO)
-+#define VSRB       VX4(516)
+-    return (TCG_GUEST_DEFAULT_MO & ~TCG_TARGET_DEFAULT_MO) == 0;
-+#define VSRH       VX4(580)
+-#else
-+#define VSRW       VX4(644)
+-    return false;
-+#define VSRAB      VX4(772)
+-#endif
-+#define VSRAH      VX4(836)
+-}
-+#define VSRAW      VX4(900)
+-
-+
+ static bool default_mttcg_enabled(void)
- #define VAND       VX4(1028)
+ {
- #define VANDC      VX4(1092)
+     if (icount_enabled() || TCG_OVERSIZED_GUEST) {
- #define VNOR       VX4(1284)
+         return false;
-@@ -XXX,XX +XXX,XX @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
+-    } else {
-     case INDEX_op_sssub_vec:
+-#ifdef TARGET_SUPPORTS_MTTCG
-     case INDEX_op_usadd_vec:
+-        return check_tcg_memory_orders_compatible();
-     case INDEX_op_ussub_vec:
+-#else
-+    case INDEX_op_shlv_vec:
+-        return false;
-+    case INDEX_op_shrv_vec:
+-#endif
-+    case INDEX_op_sarv_vec:
+     }
-         return vece <= MO_32;
++#ifdef TARGET_SUPPORTS_MTTCG
-     case INDEX_op_cmp_vec:
++# ifndef TCG_GUEST_DEFAULT_MO
-+    case INDEX_op_shli_vec:
++#  error "TARGET_SUPPORTS_MTTCG without TCG_GUEST_DEFAULT_MO"
-+    case INDEX_op_shri_vec:
++# endif
-+    case INDEX_op_sari_vec:
++    return true;
-         return vece <= MO_32 ? -1 : 0;
++#else
-     default:
++    return false;
-         return 0;
++#endif
@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
          umin_op[4] = { VMINUB, VMINUH, VMINUW, 0 },
          smin_op[4] = { VMINSB, VMINSH, VMINSW, 0 },
          umax_op[4] = { VMAXUB, VMAXUH, VMAXUW, 0 },
 -        smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, 0 };
 +        smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, 0 },
 +        shlv_op[4] = { VSLB, VSLH, VSLW, 0 },
 +        shrv_op[4] = { VSRB, VSRH, VSRW, 0 },
 +        sarv_op[4] = { VSRAB, VSRAH, VSRAW, 0 };
      TCGType type = vecl + TCG_TYPE_V64;
      TCGArg a0 = args[0], a1 = args[1], a2 = args[2];
@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
      case INDEX_op_umax_vec:
          insn = umax_op[vece];
          break;
 +    case INDEX_op_shlv_vec:
 +        insn = shlv_op[vece];
 +        break;
 +    case INDEX_op_shrv_vec:
 +        insn = shrv_op[vece];
 +        break;
 +    case INDEX_op_sarv_vec:
 +        insn = sarv_op[vece];
 +        break;
      case INDEX_op_and_vec:
          insn = VAND;
          break;
@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
      tcg_out32(s, insn | VRT(a0) | VRA(a1) | VRB(a2));
  }
-+static void expand_vec_shi(TCGType type, unsigned vece, TCGv_vec v0,
+ static void tcg_accel_instance_init(Object *obj)
-+                           TCGv_vec v1, TCGArg imm, TCGOpcode opci)
+@@ -XXX,XX +XXX,XX @@ static void tcg_set_thread(Object *obj, const char *value, Error **errp)
-+{
+             warn_report("Guest not yet converted to MTTCG - "
-+    TCGv_vec t1 = tcg_temp_new_vec(type);
+                         "you may get unexpected results");
-+
+ #endif
-+    /* Splat w/bytes for xxspltib.  */
+-            if (!check_tcg_memory_orders_compatible()) {
-+    tcg_gen_dupi_vec(MO_8, t1, imm & ((8 << vece) - 1));
+-                warn_report("Guest expects a stronger memory ordering "
-+    vec_gen_3(opci, type, vece, tcgv_vec_arg(v0),
+-                            "than the host provides");
-+              tcgv_vec_arg(v1), tcgv_vec_arg(t1));
+-                error_printf("This may cause strange/hard to debug errors\n");
-+    tcg_temp_free_vec(t1);
+-            }
-+}
+             s->mttcg_enabled = true;
-+
+         }
- static void expand_vec_cmp(TCGType type, unsigned vece, TCGv_vec v0,
+     } else if (strcmp(value, "single") == 0) {
                             TCGv_vec v1, TCGv_vec v2, TCGCond cond)
  {
@@ -XXX,XX +XXX,XX @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
  {
      va_list va;
      TCGv_vec v0, v1, v2;
 +    TCGArg a2;
      va_start(va, a0);
      v0 = temp_tcgv_vec(arg_temp(a0));
      v1 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
 -    v2 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
 +    a2 = va_arg(va, TCGArg);
      switch (opc) {
 +    case INDEX_op_shli_vec:
 +        expand_vec_shi(type, vece, v0, v1, a2, INDEX_op_shlv_vec);
 +        break;
 +    case INDEX_op_shri_vec:
 +        expand_vec_shi(type, vece, v0, v1, a2, INDEX_op_shrv_vec);
 +        break;
 +    case INDEX_op_sari_vec:
 +        expand_vec_shi(type, vece, v0, v1, a2, INDEX_op_sarv_vec);
 +        break;
      case INDEX_op_cmp_vec:
 +        v2 = temp_tcgv_vec(arg_temp(a2));
          expand_vec_cmp(type, vece, v0, v1, v2, va_arg(va, TCGArg));
          break;
      default:
@@ -XXX,XX +XXX,XX @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
      case INDEX_op_smin_vec:
      case INDEX_op_umax_vec:
      case INDEX_op_umin_vec:
 +    case INDEX_op_shlv_vec:
 +    case INDEX_op_shrv_vec:
 +    case INDEX_op_sarv_vec:
          return &v_v_v;
      case INDEX_op_not_vec:
      case INDEX_op_dup_vec:
 --
-.17.1
+.34.1

-[PULL 05/23] tcg/ppc: Replace HAVE_ISA_2_06
+[PULL 20/22] accel/tcg: Store some tlb flags in CPUTLBEntryFull
-This is identical to have_isa_2_06, so replace it.
+We have run out of bits we can use within the CPUTLBEntry comparators,
+as TLB_FLAGS_MASK cannot overlap alignment.
-Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
 Store slow_flags[] in CPUTLBEntryFull, and merge with the flags from
 the comparator.  A new TLB_FORCE_SLOW bit is set within the comparator
 as an indication that the slow path must be used.
 Move TLB_BSWAP to TLB_SLOW_FLAGS_MASK.  Since we are out of bits,
 we cannot create a new bit without moving an old one.
 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/ppc/tcg-target.inc.c | 5 ++---
+ include/exec/cpu-all.h  | 21 +++++++--
-file changed, 2 insertions(+), 3 deletions(-)
+ include/exec/cpu-defs.h |  6 +++
+ include/hw/core/cpu.h   |  1 +
-diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
+ accel/tcg/cputlb.c      | 98 ++++++++++++++++++++++++-----------------
-index XXXXXXX..XXXXXXX 100644
+files changed, 82 insertions(+), 44 deletions(-)
---- a/tcg/ppc/tcg-target.inc.c
-+++ b/tcg/ppc/tcg-target.inc.c
+diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
-@@ -XXX,XX +XXX,XX @@ static tcg_insn_unit *tb_ret_addr;
+index XXXXXXX..XXXXXXX 100644
+--- a/include/exec/cpu-all.h
- TCGPowerISA have_isa;
++++ b/include/exec/cpu-all.h
+@@ -XXX,XX +XXX,XX @@ CPUArchState *cpu_copy(CPUArchState *env);
--#define HAVE_ISA_2_06  have_isa_2_06
+ #define TLB_MMIO            (1 << (TARGET_PAGE_BITS_MIN - 3))
- #define HAVE_ISEL      have_isa_2_06
+ /* Set if TLB entry contains a watchpoint.  */
+ #define TLB_WATCHPOINT      (1 << (TARGET_PAGE_BITS_MIN - 4))
- #ifndef CONFIG_SOFTMMU
+-/* Set if TLB entry requires byte swap.  */
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is_64)
+-#define TLB_BSWAP           (1 << (TARGET_PAGE_BITS_MIN - 5))
 +/* Set if the slow path must be used; more flags in CPUTLBEntryFull. */
 +#define TLB_FORCE_SLOW      (1 << (TARGET_PAGE_BITS_MIN - 5))
  /* Set if TLB entry writes ignored.  */
  #define TLB_DISCARD_WRITE   (1 << (TARGET_PAGE_BITS_MIN - 6))
 -/* Use this mask to check interception with an alignment mask
 +/*
 + * Use this mask to check interception with an alignment mask
   * in a TCG backend.
   */
  #define TLB_FLAGS_MASK \
      (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO \
 -    | TLB_WATCHPOINT | TLB_BSWAP | TLB_DISCARD_WRITE)
 +    | TLB_WATCHPOINT | TLB_FORCE_SLOW | TLB_DISCARD_WRITE)
 +
 +/*
 + * Flags stored in CPUTLBEntryFull.slow_flags[x].
 + * TLB_FORCE_SLOW must be set in CPUTLBEntry.addr_idx[x].
 + */
 +/* Set if TLB entry requires byte swap.  */
 +#define TLB_BSWAP            (1 << 0)
 +
 +#define TLB_SLOW_FLAGS_MASK  TLB_BSWAP
 +
 +/* The two sets of flags must not overlap. */
 +QEMU_BUILD_BUG_ON(TLB_FLAGS_MASK & TLB_SLOW_FLAGS_MASK);
  /**
   * tlb_hit_page: return true if page aligned @addr is a hit against the
 diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/exec/cpu-defs.h
 +++ b/include/exec/cpu-defs.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUTLBEntryFull {
      /* @lg_page_size contains the log2 of the page size. */
      uint8_t lg_page_size;
 +    /*
 +     * Additional tlb flags for use by the slow path. If non-zero,
 +     * the corresponding CPUTLBEntry comparator must have TLB_FORCE_SLOW.
 +     */
 +    uint8_t slow_flags[MMU_ACCESS_COUNT];
 +
      /*
       * Allow target-specific additions to this structure.
       * This may be used to cache items from the guest cpu
 diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/core/cpu.h
 +++ b/include/hw/core/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef enum MMUAccessType {
      MMU_DATA_LOAD  = 0,
      MMU_DATA_STORE = 1,
      MMU_INST_FETCH = 2
 +#define MMU_ACCESS_COUNT 3
  } MMUAccessType;
  typedef struct CPUWatchpoint CPUWatchpoint;
 diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/tcg/cputlb.c
 +++ b/accel/tcg/cputlb.c
@@ -XXX,XX +XXX,XX @@ static void tlb_add_large_page(CPUArchState *env, int mmu_idx,
      env_tlb(env)->d[mmu_idx].large_page_mask = lp_mask;
  }
 +static inline void tlb_set_compare(CPUTLBEntryFull *full, CPUTLBEntry *ent,
 +                                   target_ulong address, int flags,
 +                                   MMUAccessType access_type, bool enable)
 +{
 +    if (enable) {
 +        address |= flags & TLB_FLAGS_MASK;
 +        flags &= TLB_SLOW_FLAGS_MASK;
 +        if (flags) {
 +            address |= TLB_FORCE_SLOW;
 +        }
 +    } else {
 +        address = -1;
 +        flags = 0;
 +    }
 +    ent->addr_idx[access_type] = address;
 +    full->slow_flags[access_type] = flags;
 +}
 +
  /*
   * Add a new TLB entry. At most one entry for a given virtual address
   * is permitted. Only a single TARGET_PAGE_SIZE region is mapped, the
@@ -XXX,XX +XXX,XX @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx,
      CPUTLB *tlb = env_tlb(env);
      CPUTLBDesc *desc = &tlb->d[mmu_idx];
      MemoryRegionSection *section;
 -    unsigned int index;
 -    vaddr address;
 -    vaddr write_address;
 +    unsigned int index, read_flags, write_flags;
      uintptr_t addend;
      CPUTLBEntry *te, tn;
      hwaddr iotlb, xlat, sz, paddr_page;
@@ -XXX,XX +XXX,XX @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx,
                " prot=%x idx=%d\n",
                addr, full->phys_addr, prot, mmu_idx);
 -    address = addr_page;
 +    read_flags = 0;
      if (full->lg_page_size < TARGET_PAGE_BITS) {
          /* Repeat the MMU check and TLB fill on every access.  */
 -        address |= TLB_INVALID_MASK;
 +        read_flags |= TLB_INVALID_MASK;
      }
      if (full->attrs.byte_swap) {
 -        address |= TLB_BSWAP;
 +        read_flags |= TLB_BSWAP;
      }
      is_ram = memory_region_is_ram(section->mr);
@@ -XXX,XX +XXX,XX @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx,
          addend = 0;
      }
 -    write_address = address;
 +    write_flags = read_flags;
      if (is_ram) {
          iotlb = memory_region_get_ram_addr(section->mr) + xlat;
          /*
@@ -XXX,XX +XXX,XX @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx,
           */
          if (prot & PAGE_WRITE) {
              if (section->readonly) {
 -                write_address |= TLB_DISCARD_WRITE;
 +                write_flags |= TLB_DISCARD_WRITE;
              } else if (cpu_physical_memory_is_clean(iotlb)) {
 -                write_address |= TLB_NOTDIRTY;
 +                write_flags |= TLB_NOTDIRTY;
              }
          }
      } else {
-         uint32_t insn = qemu_ldx_opc[opc & (MO_BSWAP | MO_SSIZE)];
+@@ -XXX,XX +XXX,XX @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx,
--        if (!HAVE_ISA_2_06 && insn == LDBRX) {
+          * Reads to romd devices go through the ram_ptr found above,
-+        if (!have_isa_2_06 && insn == LDBRX) {
+          * but of course reads to I/O must go through MMIO.
-             tcg_out32(s, ADDI | TAI(TCG_REG_R0, addrlo, 4));
+          */
-             tcg_out32(s, LWBRX | TAB(datalo, rbase, addrlo));
+-        write_address |= TLB_MMIO;
-             tcg_out32(s, LWBRX | TAB(TCG_REG_R0, rbase, TCG_REG_R0));
++        write_flags |= TLB_MMIO;
-@@ -XXX,XX +XXX,XX @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is_64)
+         if (!is_romd) {
 -            address = write_address;
 +            read_flags = write_flags;
          }
-     } else {
+     }
-         uint32_t insn = qemu_stx_opc[opc & (MO_BSWAP | MO_SIZE)];
--        if (!HAVE_ISA_2_06 && insn == STDBRX) {
+@@ -XXX,XX +XXX,XX @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx,
-+        if (!have_isa_2_06 && insn == STDBRX) {
+      * TARGET_PAGE_BITS, and either
-             tcg_out32(s, STWBRX | SAB(datalo, rbase, addrlo));
+      *  + the ram_addr_t of the page base of the target RAM (RAM)
-             tcg_out32(s, ADDI | TAI(TCG_REG_TMP1, addrlo, 4));
+      *  + the offset within section->mr of the page base (I/O, ROMD)
-             tcg_out_shri64(s, TCG_REG_R0, datalo, 32);
+-     * We subtract the vaddr_page (which is page aligned and thus won't
 +     * We subtract addr_page (which is page aligned and thus won't
       * disturb the low bits) to give an offset which can be added to the
       * (non-page-aligned) vaddr of the eventual memory access to get
       * the MemoryRegion offset for the access. Note that the vaddr we
@@ -XXX,XX +XXX,XX @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx,
       * vaddr we add back in io_readx()/io_writex()/get_page_addr_code().
       */
      desc->fulltlb[index] = *full;
 -    desc->fulltlb[index].xlat_section = iotlb - addr_page;
 -    desc->fulltlb[index].phys_addr = paddr_page;
 +    full = &desc->fulltlb[index];
 +    full->xlat_section = iotlb - addr_page;
 +    full->phys_addr = paddr_page;
      /* Now calculate the new entry */
      tn.addend = addend - addr_page;
 -    if (prot & PAGE_READ) {
 -        tn.addr_read = address;
 -        if (wp_flags & BP_MEM_READ) {
 -            tn.addr_read |= TLB_WATCHPOINT;
 -        }
 -    } else {
 -        tn.addr_read = -1;
 -    }
 -    if (prot & PAGE_EXEC) {
 -        tn.addr_code = address;
 -    } else {
 -        tn.addr_code = -1;
 -    }
 +    tlb_set_compare(full, &tn, addr_page, read_flags,
 +                    MMU_INST_FETCH, prot & PAGE_EXEC);
 -    tn.addr_write = -1;
 -    if (prot & PAGE_WRITE) {
 -        tn.addr_write = write_address;
 -        if (prot & PAGE_WRITE_INV) {
 -            tn.addr_write |= TLB_INVALID_MASK;
 -        }
 -        if (wp_flags & BP_MEM_WRITE) {
 -            tn.addr_write |= TLB_WATCHPOINT;
 -        }
 +    if (wp_flags & BP_MEM_READ) {
 +        read_flags |= TLB_WATCHPOINT;
      }
 +    tlb_set_compare(full, &tn, addr_page, read_flags,
 +                    MMU_DATA_LOAD, prot & PAGE_READ);
 +
 +    if (prot & PAGE_WRITE_INV) {
 +        write_flags |= TLB_INVALID_MASK;
 +    }
 +    if (wp_flags & BP_MEM_WRITE) {
 +        write_flags |= TLB_WATCHPOINT;
 +    }
 +    tlb_set_compare(full, &tn, addr_page, write_flags,
 +                    MMU_DATA_STORE, prot & PAGE_WRITE);
      copy_tlb_helper_locked(te, &tn);
      tlb_n_used_entries_inc(env, mmu_idx);
@@ -XXX,XX +XXX,XX @@ static int probe_access_internal(CPUArchState *env, vaddr addr,
      CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
      uint64_t tlb_addr = tlb_read_idx(entry, access_type);
      vaddr page_addr = addr & TARGET_PAGE_MASK;
 -    int flags = TLB_FLAGS_MASK;
 +    int flags = TLB_FLAGS_MASK & ~TLB_FORCE_SLOW;
 +    CPUTLBEntryFull *full;
      if (!tlb_hit_page(tlb_addr, page_addr)) {
          if (!victim_tlb_hit(env, mmu_idx, index, access_type, page_addr)) {
@@ -XXX,XX +XXX,XX @@ static int probe_access_internal(CPUArchState *env, vaddr addr,
      }
      flags &= tlb_addr;
 -    *pfull = &env_tlb(env)->d[mmu_idx].fulltlb[index];
 +    *pfull = full = &env_tlb(env)->d[mmu_idx].fulltlb[index];
 +    flags |= full->slow_flags[access_type];
      /* Fold all "mmio-like" bits into TLB_MMIO.  This is not RAM.  */
      if (unlikely(flags & ~(TLB_WATCHPOINT | TLB_NOTDIRTY))) {
@@ -XXX,XX +XXX,XX @@ static bool mmu_lookup1(CPUArchState *env, MMULookupPageData *data,
      CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
      uint64_t tlb_addr = tlb_read_idx(entry, access_type);
      bool maybe_resized = false;
 +    CPUTLBEntryFull *full;
 +    int flags;
      /* If the TLB entry is for a different page, reload and try again.  */
      if (!tlb_hit(tlb_addr, addr)) {
@@ -XXX,XX +XXX,XX @@ static bool mmu_lookup1(CPUArchState *env, MMULookupPageData *data,
          tlb_addr = tlb_read_idx(entry, access_type) & ~TLB_INVALID_MASK;
      }
 -    data->flags = tlb_addr & TLB_FLAGS_MASK;
 -    data->full = &env_tlb(env)->d[mmu_idx].fulltlb[index];
 +    full = &env_tlb(env)->d[mmu_idx].fulltlb[index];
 +    flags = tlb_addr & (TLB_FLAGS_MASK & ~TLB_FORCE_SLOW);
 +    flags |= full->slow_flags[access_type];
 +
 +    data->full = full;
 +    data->flags = flags;
      /* Compute haddr speculatively; depending on flags it might be invalid. */
      data->haddr = (void *)((uintptr_t)addr + entry->addend);
 --
-.17.1
+.34.1

-[PULL 16/23] tcg/ppc: Update vector support for VSX
+[PULL 21/22] accel/tcg: Move TLB_WATCHPOINT to TLB_SLOW_FLAGS_MASK
-The VSX instruction set instructions include double-word loads and
+This frees up one bit of the primary tlb flags without
-stores, double-word load and splat, double-word permute, and bit
+impacting the TLB_NOTDIRTY logic.
 select.  All of which require multiple operations in the Altivec
 instruction set.
-Because the VSX registers map %vsr32 to %vr0, and we have no current
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-intention or need to use vector registers outside %vr0-%vr19, force
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-on the {ax,bx,cx,tx} bits within the added VSX insns so that we don't
+---
-have to otherwise modify the VR[TABC] macros.
+ include/exec/cpu-all.h |  8 ++++----
  accel/tcg/cputlb.c     | 18 ++++++++++++++----
 files changed, 18 insertions(+), 8 deletions(-)
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
 Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
 ---
  tcg/ppc/tcg-target.h     |  5 ++--
  tcg/ppc/tcg-target.inc.c | 52 ++++++++++++++++++++++++++++++++++++----
 files changed, 51 insertions(+), 6 deletions(-)
 diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.h
+--- a/include/exec/cpu-all.h
-+++ b/tcg/ppc/tcg-target.h
++++ b/include/exec/cpu-all.h
-@@ -XXX,XX +XXX,XX @@ typedef enum {
+@@ -XXX,XX +XXX,XX @@ CPUArchState *cpu_copy(CPUArchState *env);
+ #define TLB_NOTDIRTY        (1 << (TARGET_PAGE_BITS_MIN - 2))
- extern TCGPowerISA have_isa;
+ /* Set if TLB entry is an IO callback.  */
- extern bool have_altivec;
+ #define TLB_MMIO            (1 << (TARGET_PAGE_BITS_MIN - 3))
-+extern bool have_vsx;
+-/* Set if TLB entry contains a watchpoint.  */
+-#define TLB_WATCHPOINT      (1 << (TARGET_PAGE_BITS_MIN - 4))
- #define have_isa_2_06  (have_isa >= tcg_isa_2_06)
+ /* Set if the slow path must be used; more flags in CPUTLBEntryFull. */
- #define have_isa_3_00  (have_isa >= tcg_isa_3_00)
+ #define TLB_FORCE_SLOW      (1 << (TARGET_PAGE_BITS_MIN - 5))
-@@ -XXX,XX +XXX,XX @@ extern bool have_altivec;
+ /* Set if TLB entry writes ignored.  */
-  * instruction and substituting two 32-bit stores makes the generated
+@@ -XXX,XX +XXX,XX @@ CPUArchState *cpu_copy(CPUArchState *env);
   * code quite large.
   */
--#define TCG_TARGET_HAS_v64              0
+ #define TLB_FLAGS_MASK \
-+#define TCG_TARGET_HAS_v64              have_vsx
+     (TLB_INVALID_MASK | TLB_NOTDIRTY | TLB_MMIO \
- #define TCG_TARGET_HAS_v128             have_altivec
+-    | TLB_WATCHPOINT | TLB_FORCE_SLOW | TLB_DISCARD_WRITE)
- #define TCG_TARGET_HAS_v256             0
++    | TLB_FORCE_SLOW | TLB_DISCARD_WRITE)
-@@ -XXX,XX +XXX,XX @@ extern bool have_altivec;
+ /*
- #define TCG_TARGET_HAS_mul_vec          1
+  * Flags stored in CPUTLBEntryFull.slow_flags[x].
- #define TCG_TARGET_HAS_sat_vec          1
+@@ -XXX,XX +XXX,XX @@ CPUArchState *cpu_copy(CPUArchState *env);
- #define TCG_TARGET_HAS_minmax_vec       1
+  */
--#define TCG_TARGET_HAS_bitsel_vec       0
+ /* Set if TLB entry requires byte swap.  */
-+#define TCG_TARGET_HAS_bitsel_vec       have_vsx
+ #define TLB_BSWAP            (1 << 0)
- #define TCG_TARGET_HAS_cmpsel_vec       0
++/* Set if TLB entry contains a watchpoint.  */
++#define TLB_WATCHPOINT       (1 << 1)
- void flush_icache_range(uintptr_t start, uintptr_t stop);
-diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
+-#define TLB_SLOW_FLAGS_MASK  TLB_BSWAP
 +#define TLB_SLOW_FLAGS_MASK  (TLB_BSWAP | TLB_WATCHPOINT)
  /* The two sets of flags must not overlap. */
  QEMU_BUILD_BUG_ON(TLB_FLAGS_MASK & TLB_SLOW_FLAGS_MASK);
 diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.inc.c
+--- a/accel/tcg/cputlb.c
-+++ b/tcg/ppc/tcg-target.inc.c
++++ b/accel/tcg/cputlb.c
-@@ -XXX,XX +XXX,XX @@ static tcg_insn_unit *tb_ret_addr;
+@@ -XXX,XX +XXX,XX @@ static void *atomic_mmu_lookup(CPUArchState *env, vaddr addr, MemOpIdx oi,
- TCGPowerISA have_isa;
+          */
- static bool have_isel;
+         goto stop_the_world;
- bool have_altivec;
+     }
-+bool have_vsx;
+-    /* Collect TLB_WATCHPOINT for read. */
++    /* Collect tlb flags for read. */
- #ifndef CONFIG_SOFTMMU
+     tlb_addr |= tlbe->addr_read;
- #define TCG_GUEST_BASE_REG 30
-@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
+     /* Notice an IO access or a needs-MMU-lookup access */
- #define LVEBX      XO31(7)
+@@ -XXX,XX +XXX,XX @@ static void *atomic_mmu_lookup(CPUArchState *env, vaddr addr, MemOpIdx oi,
- #define LVEHX      XO31(39)
+         notdirty_write(env_cpu(env), addr, size, full, retaddr);
- #define LVEWX      XO31(71)
+     }
-+#define LXSDX      (XO31(588) | 1)  /* v2.06, force tx=1 */
-+#define LXVDSX     (XO31(332) | 1)  /* v2.06, force tx=1 */
+-    if (unlikely(tlb_addr & TLB_WATCHPOINT)) {
+-        cpu_check_watchpoint(env_cpu(env), addr, size, full->attrs,
- #define STVX       XO31(231)
+-                             BP_MEM_READ | BP_MEM_WRITE, retaddr);
- #define STVEWX     XO31(199)
++    if (unlikely(tlb_addr & TLB_FORCE_SLOW)) {
-+#define STXSDX     (XO31(716) | 1)  /* v2.06, force sx=1 */
++        int wp_flags = 0;
  #define VADDSBS    VX4(768)
  #define VADDUBS    VX4(512)
@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
  #define VSLDOI     VX4(44)
 +#define XXPERMDI   (OPCD(60) | (10 << 3) | 7)  /* v2.06, force ax=bx=tx=1 */
 +#define XXSEL      (OPCD(60) | (3 << 4) | 0xf) /* v2.06, force ax=bx=cx=tx=1 */
 +
- #define RT(r) ((r)<<21)
++        if (full->slow_flags[MMU_DATA_STORE] & TLB_WATCHPOINT) {
- #define RS(r) ((r)<<21)
++            wp_flags |= BP_MEM_WRITE;
  #define RA(r) ((r)<<16)
@@ -XXX,XX +XXX,XX @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type, TCGReg ret,
          add = 0;
      }
 -    load_insn = LVX | VRT(ret) | RB(TCG_REG_TMP1);
 -    if (TCG_TARGET_REG_BITS == 64) {
 -        new_pool_l2(s, rel, s->code_ptr, add, val, val);
 +    if (have_vsx) {
 +        load_insn = type == TCG_TYPE_V64 ? LXSDX : LXVDSX;
 +        load_insn |= VRT(ret) | RB(TCG_REG_TMP1);
 +        if (TCG_TARGET_REG_BITS == 64) {
 +            new_pool_label(s, val, rel, s->code_ptr, add);
 +        } else {
 +            new_pool_l2(s, rel, s->code_ptr, add, val, val);
 +        }
-     } else {
++        if (full->slow_flags[MMU_DATA_LOAD] & TLB_WATCHPOINT) {
--        new_pool_l4(s, rel, s->code_ptr, add, val, val, val, val);
++            wp_flags |= BP_MEM_READ;
-+        load_insn = LVX | VRT(ret) | RB(TCG_REG_TMP1);
++        }
-+        if (TCG_TARGET_REG_BITS == 64) {
++        if (wp_flags) {
-+            new_pool_l2(s, rel, s->code_ptr, add, val, val);
++            cpu_check_watchpoint(env_cpu(env), addr, size,
-+        } else {
++                                 full->attrs, wp_flags, retaddr);
 +            new_pool_l4(s, rel, s->code_ptr, add, val, val, val, val);
 +        }
      }
-     if (USE_REG_TB) {
+     return hostaddr;
@@ -XXX,XX +XXX,XX @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
          /* fallthru */
      case TCG_TYPE_V64:
          tcg_debug_assert(ret >= TCG_REG_V0);
 +        if (have_vsx) {
 +            tcg_out_mem_long(s, 0, LXSDX, ret, base, offset);
 +            break;
 +        }
          tcg_debug_assert((offset & 7) == 0);
          tcg_out_mem_long(s, 0, LVX, ret, base, offset & -16);
          if (offset & 8) {
@@ -XXX,XX +XXX,XX @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
          /* fallthru */
      case TCG_TYPE_V64:
          tcg_debug_assert(arg >= TCG_REG_V0);
 +        if (have_vsx) {
 +            tcg_out_mem_long(s, 0, STXSDX, arg, base, offset);
 +            break;
 +        }
          tcg_debug_assert((offset & 7) == 0);
          if (offset & 8) {
              tcg_out_vsldoi(s, TCG_VEC_TMP1, arg, arg, 8);
@@ -XXX,XX +XXX,XX @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
      case INDEX_op_shri_vec:
      case INDEX_op_sari_vec:
          return vece <= MO_32 ? -1 : 0;
 +    case INDEX_op_bitsel_vec:
 +        return have_vsx;
      default:
          return 0;
      }
@@ -XXX,XX +XXX,XX @@ static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
          tcg_out32(s, VSPLTW | VRT(dst) | VRB(src) | (1 << 16));
          break;
      case MO_64:
 +        if (have_vsx) {
 +            tcg_out32(s, XXPERMDI | VRT(dst) | VRA(src) | VRB(src));
 +            break;
 +        }
          tcg_out_vsldoi(s, TCG_VEC_TMP1, src, src, 8);
          tcg_out_vsldoi(s, dst, TCG_VEC_TMP1, src, 8);
          break;
@@ -XXX,XX +XXX,XX @@ static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
          tcg_out32(s, VSPLTW | VRT(out) | VRB(out) | (elt << 16));
          break;
      case MO_64:
 +        if (have_vsx) {
 +            tcg_out_mem_long(s, 0, LXVDSX, out, base, offset);
 +            break;
 +        }
          tcg_debug_assert((offset & 7) == 0);
          tcg_out_mem_long(s, 0, LVX, out, base, offset & -16);
          tcg_out_vsldoi(s, TCG_VEC_TMP1, out, out, 8);
@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
          }
          break;
 +    case INDEX_op_bitsel_vec:
 +        tcg_out32(s, XXSEL | VRT(a0) | VRC(a1) | VRB(a2) | VRA(args[3]));
 +        return;
 +
      case INDEX_op_dup2_vec:
          assert(TCG_TARGET_REG_BITS == 32);
          /* With inputs a1 = xLxx, a2 = xHxx  */
@@ -XXX,XX +XXX,XX @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
      case INDEX_op_st_vec:
      case INDEX_op_dupm_vec:
          return &v_r;
 +    case INDEX_op_bitsel_vec:
      case INDEX_op_ppc_msum_vec:
          return &v_v_v_v;
@@ -XXX,XX +XXX,XX @@ static void tcg_target_init(TCGContext *s)
      if (hwcap & PPC_FEATURE_HAS_ALTIVEC) {
          have_altivec = true;
 +        /* We only care about the portion of VSX that overlaps Altivec. */
 +        if (hwcap & PPC_FEATURE_HAS_VSX) {
 +            have_vsx = true;
 +        }
      }
      tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff;
 --
-.17.1
+.34.1

-[PULL 04/23] tcg/ppc: Create TCGPowerISA and have_isa
+[PULL 22/22] accel/tcg: Renumber TLB_DISCARD_WRITE
-Introduce an enum to hold base < 2.06 < 3.00.  Use macros to
+Move to fill a hole in the set of bits.
-preserve the existing have_isa_2_06 and have_isa_3_00 predicates.
+Reduce the total number of tlb bits by 1.
-Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- tcg/ppc/tcg-target.h     | 12 ++++++++++--
+ include/exec/cpu-all.h | 4 ++--
- tcg/ppc/tcg-target.inc.c |  8 ++++----
+ tcg/tcg-op-ldst.c      | 2 +-
-files changed, 14 insertions(+), 6 deletions(-)
+files changed, 3 insertions(+), 3 deletions(-)
-diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
+diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.h
+--- a/include/exec/cpu-all.h
-+++ b/tcg/ppc/tcg-target.h
++++ b/include/exec/cpu-all.h
-@@ -XXX,XX +XXX,XX @@ typedef enum {
+@@ -XXX,XX +XXX,XX @@ CPUArchState *cpu_copy(CPUArchState *env);
-     TCG_AREG0 = TCG_REG_R27
+ #define TLB_NOTDIRTY        (1 << (TARGET_PAGE_BITS_MIN - 2))
- } TCGReg;
+ /* Set if TLB entry is an IO callback.  */
+ #define TLB_MMIO            (1 << (TARGET_PAGE_BITS_MIN - 3))
--extern bool have_isa_2_06;
++/* Set if TLB entry writes ignored.  */
--extern bool have_isa_3_00;
++#define TLB_DISCARD_WRITE   (1 << (TARGET_PAGE_BITS_MIN - 4))
-+typedef enum {
+ /* Set if the slow path must be used; more flags in CPUTLBEntryFull. */
-+    tcg_isa_base,
+ #define TLB_FORCE_SLOW      (1 << (TARGET_PAGE_BITS_MIN - 5))
-+    tcg_isa_2_06,
+-/* Set if TLB entry writes ignored.  */
-+    tcg_isa_3_00,
+-#define TLB_DISCARD_WRITE   (1 << (TARGET_PAGE_BITS_MIN - 6))
-+} TCGPowerISA;
-+
+ /*
-+extern TCGPowerISA have_isa;
+  * Use this mask to check interception with an alignment mask
-+
+diff --git a/tcg/tcg-op-ldst.c b/tcg/tcg-op-ldst.c
 +#define have_isa_2_06  (have_isa >= tcg_isa_2_06)
 +#define have_isa_3_00  (have_isa >= tcg_isa_3_00)
  /* optional instructions automatically implemented */
  #define TCG_TARGET_HAS_ext8u_i32        0 /* andi */
 diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
 index XXXXXXX..XXXXXXX 100644
---- a/tcg/ppc/tcg-target.inc.c
+--- a/tcg/tcg-op-ldst.c
-+++ b/tcg/ppc/tcg-target.inc.c
++++ b/tcg/tcg-op-ldst.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void check_max_alignment(unsigned a_bits)
+      * The requested alignment cannot overlap the TLB flags.
- static tcg_insn_unit *tb_ret_addr;
+      * FIXME: Must keep the count up-to-date with "exec/cpu-all.h".
+      */
--bool have_isa_2_06;
+-    tcg_debug_assert(a_bits + 6 <= tcg_ctx->page_bits);
--bool have_isa_3_00;
++    tcg_debug_assert(a_bits + 5 <= tcg_ctx->page_bits);
 +TCGPowerISA have_isa;
  #define HAVE_ISA_2_06  have_isa_2_06
  #define HAVE_ISEL      have_isa_2_06
@@ -XXX,XX +XXX,XX @@ static void tcg_target_init(TCGContext *s)
      unsigned long hwcap = qemu_getauxval(AT_HWCAP);
      unsigned long hwcap2 = qemu_getauxval(AT_HWCAP2);
 +    have_isa = tcg_isa_base;
      if (hwcap & PPC_FEATURE_ARCH_2_06) {
 -        have_isa_2_06 = true;
 +        have_isa = tcg_isa_2_06;
      }
  #ifdef PPC_FEATURE2_ARCH_3_00
      if (hwcap2 & PPC_FEATURE2_ARCH_3_00) {
 -        have_isa_3_00 = true;
 +        have_isa = tcg_isa_3_00;
      }
  #endif
+ }
 --
-.17.1
+.34.1

The following changes since commit 9e5319ca52a5b9e84d55ad9c36e2c0b317a122bb:

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging (2019-10-04 18:32:34 +0100)

are available in the Git repository at:

https://github.com/rth7680/qemu.git tags/pull-tcg-20191013

for you to fetch changes up to d2f86bba6931388e275e8eb4ccd1dbcc7cae6328:

cpus: kick all vCPUs when running thread=single (2019-10-07 14:08:58 -0400)

----------------------------------------------------------------
Host vector support for tcg/ppc.
Fix thread=single cpu kicking.

----------------------------------------------------------------
Alex Bennée (1):
      cpus: kick all vCPUs when running thread=single

Richard Henderson (22):
      tcg/ppc: Introduce Altivec registers
      tcg/ppc: Introduce macro VX4()
      tcg/ppc: Introduce macros VRT(), VRA(), VRB(), VRC()
      tcg/ppc: Create TCGPowerISA and have_isa
      tcg/ppc: Replace HAVE_ISA_2_06
      tcg/ppc: Replace HAVE_ISEL macro with a variable
      tcg/ppc: Enable tcg backend vector compilation
      tcg/ppc: Add support for load/store/logic/comparison
      tcg/ppc: Add support for vector maximum/minimum
      tcg/ppc: Add support for vector add/subtract
      tcg/ppc: Add support for vector saturated add/subtract
      tcg/ppc: Support vector shift by immediate
      tcg/ppc: Support vector multiply
      tcg/ppc: Support vector dup2
      tcg/ppc: Enable Altivec detection
      tcg/ppc: Update vector support for VSX
      tcg/ppc: Update vector support for v2.07 Altivec
      tcg/ppc: Update vector support for v2.07 VSX
      tcg/ppc: Update vector support for v2.07 FP
      tcg/ppc: Update vector support for v3.00 Altivec
      tcg/ppc: Update vector support for v3.00 load/store
      tcg/ppc: Update vector support for v3.00 dup/dupi

tcg/ppc/tcg-target.h     |   51 ++-
 tcg/ppc/tcg-target.opc.h |   13 +
 cpus.c                   |   24 +-
 tcg/ppc/tcg-target.inc.c | 1118 ++++++++++++++++++++++++++++++++++++++++++----
 4 files changed, 1119 insertions(+), 87 deletions(-)
 create mode 100644 tcg/ppc/tcg-target.opc.h

Altivec supports 32 128-bit vector registers, whose names are
by convention v0 through v31.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.h     | 11 ++++-
 tcg/ppc/tcg-target.inc.c | 88 +++++++++++++++++++++++++---------------
 2 files changed, 65 insertions(+), 34 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index XXXXXXX..XXXXXXX 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -XXX,XX +XXX,XX @@
 # define TCG_TARGET_REG_BITS  32
 #endif
 
-#define TCG_TARGET_NB_REGS 32
+#define TCG_TARGET_NB_REGS 64
 #define TCG_TARGET_INSN_UNIT_SIZE 4
 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 16
 
@@ -XXX,XX +XXX,XX @@ typedef enum {
     TCG_REG_R24, TCG_REG_R25, TCG_REG_R26, TCG_REG_R27,
     TCG_REG_R28, TCG_REG_R29, TCG_REG_R30, TCG_REG_R31,
 
+    TCG_REG_V0,  TCG_REG_V1,  TCG_REG_V2,  TCG_REG_V3,
+    TCG_REG_V4,  TCG_REG_V5,  TCG_REG_V6,  TCG_REG_V7,
+    TCG_REG_V8,  TCG_REG_V9,  TCG_REG_V10, TCG_REG_V11,
+    TCG_REG_V12, TCG_REG_V13, TCG_REG_V14, TCG_REG_V15,
+    TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19,
+    TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23,
+    TCG_REG_V24, TCG_REG_V25, TCG_REG_V26, TCG_REG_V27,
+    TCG_REG_V28, TCG_REG_V29, TCG_REG_V30, TCG_REG_V31,
+
     TCG_REG_CALL_STACK = TCG_REG_R1,
     TCG_AREG0 = TCG_REG_R27
 } TCGReg;
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -XXX,XX +XXX,XX @@
 # define TCG_REG_TMP1   TCG_REG_R12
 #endif
 
+#define TCG_VEC_TMP1    TCG_REG_V0
+#define TCG_VEC_TMP2    TCG_REG_V1
+
 #define TCG_REG_TB     TCG_REG_R31
 #define USE_REG_TB     (TCG_TARGET_REG_BITS == 64)
 
@@ -XXX,XX +XXX,XX @@ bool have_isa_3_00;
 #endif
 
 #ifdef CONFIG_DEBUG_TCG
-static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
-    "r0",
-    "r1",
-    "r2",
-    "r3",
-    "r4",
-    "r5",
-    "r6",
-    "r7",
-    "r8",
-    "r9",
-    "r10",
-    "r11",
-    "r12",
-    "r13",
-    "r14",
-    "r15",
-    "r16",
-    "r17",
-    "r18",
-    "r19",
-    "r20",
-    "r21",
-    "r22",
-    "r23",
-    "r24",
-    "r25",
-    "r26",
-    "r27",
-    "r28",
-    "r29",
-    "r30",
-    "r31"
+static const char tcg_target_reg_names[TCG_TARGET_NB_REGS][4] = {
+    "r0",  "r1",  "r2",  "r3",  "r4",  "r5",  "r6",  "r7",
+    "r8",  "r9",  "r10", "r11", "r12", "r13", "r14", "r15",
+    "r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23",
+    "r24", "r25", "r26", "r27", "r28", "r29", "r30", "r31",
+    "v0",  "v1",  "v2",  "v3",  "v4",  "v5",  "v6",  "v7",
+    "v8",  "v9",  "v10", "v11", "v12", "v13", "v14", "v15",
+    "v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23",
+    "v24", "v25", "v26", "v27", "v28", "v29", "v30", "v31",
 };
 #endif
 
@@ -XXX,XX +XXX,XX @@ static const int tcg_target_reg_alloc_order[] = {
     TCG_REG_R5,
     TCG_REG_R4,
     TCG_REG_R3,
+
+    /* V0 and V1 reserved as temporaries; V20 - V31 are call-saved */
+    TCG_REG_V2,   /* call clobbered, vectors */
+    TCG_REG_V3,
+    TCG_REG_V4,
+    TCG_REG_V5,
+    TCG_REG_V6,
+    TCG_REG_V7,
+    TCG_REG_V8,
+    TCG_REG_V9,
+    TCG_REG_V10,
+    TCG_REG_V11,
+    TCG_REG_V12,
+    TCG_REG_V13,
+    TCG_REG_V14,
+    TCG_REG_V15,
+    TCG_REG_V16,
+    TCG_REG_V17,
+    TCG_REG_V18,
+    TCG_REG_V19,
 };
 
 static const int tcg_target_call_iarg_regs[] = {
@@ -XXX,XX +XXX,XX @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R11);
     tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R12);
 
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V0);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V1);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V2);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V3);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V4);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V5);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V6);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V7);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V8);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V9);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V10);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V11);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V12);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V13);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V14);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V15);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V16);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V17);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V18);
+    tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_V19);
+
     s->reserved_regs = 0;
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_R0); /* tcg temp */
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_R1); /* stack pointer */
@@ -XXX,XX +XXX,XX @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_R13); /* thread pointer */
 #endif
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP1); /* mem temp */
+    tcg_regset_set_reg(s->reserved_regs, TCG_VEC_TMP1);
+    tcg_regset_set_reg(s->reserved_regs, TCG_VEC_TMP2);
     if (USE_REG_TB) {
         tcg_regset_set_reg(s->reserved_regs, TCG_REG_TB);  /* tb->tc_ptr */
     }
-- 
2.17.1

Introduce an enum to hold base < 2.06 < 3.00.  Use macros to
preserve the existing have_isa_2_06 and have_isa_3_00 predicates.

Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.h     | 12 ++++++++++--
 tcg/ppc/tcg-target.inc.c |  8 ++++----
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index XXXXXXX..XXXXXXX 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -XXX,XX +XXX,XX @@ typedef enum {
     TCG_AREG0 = TCG_REG_R27
 } TCGReg;
 
-extern bool have_isa_2_06;
-extern bool have_isa_3_00;
+typedef enum {
+    tcg_isa_base,
+    tcg_isa_2_06,
+    tcg_isa_3_00,
+} TCGPowerISA;
+
+extern TCGPowerISA have_isa;
+
+#define have_isa_2_06  (have_isa >= tcg_isa_2_06)
+#define have_isa_3_00  (have_isa >= tcg_isa_3_00)
 
 /* optional instructions automatically implemented */
 #define TCG_TARGET_HAS_ext8u_i32        0 /* andi */
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -XXX,XX +XXX,XX @@
 
 static tcg_insn_unit *tb_ret_addr;
 
-bool have_isa_2_06;
-bool have_isa_3_00;
+TCGPowerISA have_isa;
 
 #define HAVE_ISA_2_06  have_isa_2_06
 #define HAVE_ISEL      have_isa_2_06
@@ -XXX,XX +XXX,XX @@ static void tcg_target_init(TCGContext *s)
     unsigned long hwcap = qemu_getauxval(AT_HWCAP);
     unsigned long hwcap2 = qemu_getauxval(AT_HWCAP2);
 
+    have_isa = tcg_isa_base;
     if (hwcap & PPC_FEATURE_ARCH_2_06) {
-        have_isa_2_06 = true;
+        have_isa = tcg_isa_2_06;
     }
 #ifdef PPC_FEATURE2_ARCH_3_00
     if (hwcap2 & PPC_FEATURE2_ARCH_3_00) {
-        have_isa_3_00 = true;
+        have_isa = tcg_isa_3_00;
     }
 #endif
 
-- 
2.17.1

This is identical to have_isa_2_06, so replace it.

Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.inc.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -XXX,XX +XXX,XX @@ static tcg_insn_unit *tb_ret_addr;
 
 TCGPowerISA have_isa;
 
-#define HAVE_ISA_2_06  have_isa_2_06
 #define HAVE_ISEL      have_isa_2_06
 
 #ifndef CONFIG_SOFTMMU
@@ -XXX,XX +XXX,XX @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is_64)
         }
     } else {
         uint32_t insn = qemu_ldx_opc[opc & (MO_BSWAP | MO_SSIZE)];
-        if (!HAVE_ISA_2_06 && insn == LDBRX) {
+        if (!have_isa_2_06 && insn == LDBRX) {
             tcg_out32(s, ADDI | TAI(TCG_REG_R0, addrlo, 4));
             tcg_out32(s, LWBRX | TAB(datalo, rbase, addrlo));
             tcg_out32(s, LWBRX | TAB(TCG_REG_R0, rbase, TCG_REG_R0));
@@ -XXX,XX +XXX,XX @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is_64)
         }
     } else {
         uint32_t insn = qemu_stx_opc[opc & (MO_BSWAP | MO_SIZE)];
-        if (!HAVE_ISA_2_06 && insn == STDBRX) {
+        if (!have_isa_2_06 && insn == STDBRX) {
             tcg_out32(s, STWBRX | SAB(datalo, rbase, addrlo));
             tcg_out32(s, ADDI | TAI(TCG_REG_TMP1, addrlo, 4));
             tcg_out_shri64(s, TCG_REG_R0, datalo, 32);
-- 
2.17.1

Previously we've been hard-coding knowledge that Power7 has ISEL, but
it was an optional instruction before that.  Use the AT_HWCAP2 bit,
when present, to properly determine support.

Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.inc.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -XXX,XX +XXX,XX @@
 static tcg_insn_unit *tb_ret_addr;
 
 TCGPowerISA have_isa;
-
-#define HAVE_ISEL      have_isa_2_06
+static bool have_isel;
 
 #ifndef CONFIG_SOFTMMU
 #define TCG_GUEST_BASE_REG 30
@@ -XXX,XX +XXX,XX @@ static void tcg_out_setcond(TCGContext *s, TCGType type, TCGCond cond,
     /* If we have ISEL, we can implement everything with 3 or 4 insns.
        All other cases below are also at least 3 insns, so speed up the
        code generator by not considering them and always using ISEL.  */
-    if (HAVE_ISEL) {
+    if (have_isel) {
         int isel, tab;
 
         tcg_out_cmp(s, cond, arg1, arg2, const_arg2, 7, type);
@@ -XXX,XX +XXX,XX @@ static void tcg_out_movcond(TCGContext *s, TCGType type, TCGCond cond,
 
     tcg_out_cmp(s, cond, c1, c2, const_c2, 7, type);
 
-    if (HAVE_ISEL) {
+    if (have_isel) {
         int isel = tcg_to_isel[cond];
 
         /* Swap the V operands if the operation indicates inversion.  */
@@ -XXX,XX +XXX,XX @@ static void tcg_out_cntxz(TCGContext *s, TCGType type, uint32_t opc,
     } else {
         tcg_out_cmp(s, TCG_COND_EQ, a1, 0, 1, 7, type);
         /* Note that the only other valid constant for a2 is 0.  */
-        if (HAVE_ISEL) {
+        if (have_isel) {
             tcg_out32(s, opc | RA(TCG_REG_R0) | RS(a1));
             tcg_out32(s, tcg_to_isel[TCG_COND_EQ] | TAB(a0, a2, TCG_REG_R0));
         } else if (!const_a2 && a0 == a2) {
@@ -XXX,XX +XXX,XX @@ static void tcg_target_init(TCGContext *s)
     }
 #endif
 
+#ifdef PPC_FEATURE2_HAS_ISEL
+    /* Prefer explicit instruction from the kernel. */
+    have_isel = (hwcap2 & PPC_FEATURE2_HAS_ISEL) != 0;
+#else
+    /* Fall back to knowing Power7 (2.06) has ISEL. */
+    have_isel = have_isa_2_06;
+#endif
+
     tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff;
     tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffff;
 
-- 
2.17.1

Introduce all of the flags required to enable tcg backend vector support,
and a runtime flag to indicate the host supports Altivec instructions.

For now, do not actually set have_isa_altivec to true, because we have not
yet added all of the code to actually generate all of the required insns.
However, we must define these flags in order to disable ifndefs that create
stub versions of the functions added here.

The change to tcg_out_movi works around a buglet in tcg.c wherein if we
do not define tcg_out_dupi_vec we get a declared but not defined Werror,
but if we only declare it we get a defined but not used Werror.  We need
to this change to tcg_out_movi eventually anyway, so it's no biggie.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.h     | 25 ++++++++++++++++
 tcg/ppc/tcg-target.opc.h |  5 ++++
 tcg/ppc/tcg-target.inc.c | 62 ++++++++++++++++++++++++++++++++++++++--
 3 files changed, 89 insertions(+), 3 deletions(-)
 create mode 100644 tcg/ppc/tcg-target.opc.h

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index XXXXXXX..XXXXXXX 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -XXX,XX +XXX,XX @@ typedef enum {
 } TCGPowerISA;
 
 extern TCGPowerISA have_isa;
+extern bool have_altivec;
 
 #define have_isa_2_06  (have_isa >= tcg_isa_2_06)
 #define have_isa_3_00  (have_isa >= tcg_isa_3_00)
@@ -XXX,XX +XXX,XX @@ extern TCGPowerISA have_isa;
 #define TCG_TARGET_HAS_mulsh_i64        1
 #endif
 
+/*
+ * While technically Altivec could support V64, it has no 64-bit store
+ * instruction and substituting two 32-bit stores makes the generated
+ * code quite large.
+ */
+#define TCG_TARGET_HAS_v64              0
+#define TCG_TARGET_HAS_v128             have_altivec
+#define TCG_TARGET_HAS_v256             0
+
+#define TCG_TARGET_HAS_andc_vec         0
+#define TCG_TARGET_HAS_orc_vec          0
+#define TCG_TARGET_HAS_not_vec          0
+#define TCG_TARGET_HAS_neg_vec          0
+#define TCG_TARGET_HAS_abs_vec          0
+#define TCG_TARGET_HAS_shi_vec          0
+#define TCG_TARGET_HAS_shs_vec          0
+#define TCG_TARGET_HAS_shv_vec          0
+#define TCG_TARGET_HAS_cmp_vec          0
+#define TCG_TARGET_HAS_mul_vec          0
+#define TCG_TARGET_HAS_sat_vec          0
+#define TCG_TARGET_HAS_minmax_vec       0
+#define TCG_TARGET_HAS_bitsel_vec       0
+#define TCG_TARGET_HAS_cmpsel_vec       0
+
 void flush_icache_range(uintptr_t start, uintptr_t stop);
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
 
diff --git a/tcg/ppc/tcg-target.opc.h b/tcg/ppc/tcg-target.opc.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tcg/ppc/tcg-target.opc.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * Target-specific opcodes for host vector expansion.  These will be
+ * emitted by tcg_expand_vec_op.  For those familiar with GCC internals,
+ * consider these to be UNSPEC with names.
+ */
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -XXX,XX +XXX,XX @@ static tcg_insn_unit *tb_ret_addr;
 
 TCGPowerISA have_isa;
 static bool have_isel;
+bool have_altivec;
 
 #ifndef CONFIG_SOFTMMU
 #define TCG_GUEST_BASE_REG 30
@@ -XXX,XX +XXX,XX @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
     }
 }
 
-static inline void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret,
-                                tcg_target_long arg)
+static void tcg_out_dupi_vec(TCGContext *s, TCGType type, TCGReg ret,
+                             tcg_target_long val)
 {
-    tcg_out_movi_int(s, type, ret, arg, false);
+    g_assert_not_reached();
+}
+
+static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret,
+                         tcg_target_long arg)
+{
+    switch (type) {
+    case TCG_TYPE_I32:
+    case TCG_TYPE_I64:
+        tcg_debug_assert(ret < TCG_REG_V0);
+        tcg_out_movi_int(s, type, ret, arg, false);
+        break;
+
+    case TCG_TYPE_V64:
+    case TCG_TYPE_V128:
+        tcg_debug_assert(ret >= TCG_REG_V0);
+        tcg_out_dupi_vec(s, type, ret, arg);
+        break;
+
+    default:
+        g_assert_not_reached();
+    }
 }
 
 static bool mask_operand(uint32_t c, int *mb, int *me)
@@ -XXX,XX +XXX,XX @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     }
 }
 
+int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
+{
+    g_assert_not_reached();
+}
+
+static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
+                            TCGReg dst, TCGReg src)
+{
+    g_assert_not_reached();
+}
+
+static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
+                             TCGReg out, TCGReg base, intptr_t offset)
+{
+    g_assert_not_reached();
+}
+
+static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
+                           unsigned vecl, unsigned vece,
+                           const TCGArg *args, const int *const_args)
+{
+    g_assert_not_reached();
+}
+
+void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
+                       TCGArg a0, ...)
+{
+    g_assert_not_reached();
+}
+
 static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
 {
     static const TCGTargetOpDef r = { .args_ct_str = { "r" } };
@@ -XXX,XX +XXX,XX @@ static void tcg_target_init(TCGContext *s)
 
     tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff;
     tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffff;
+    if (have_altivec) {
+        tcg_target_available_regs[TCG_TYPE_V64] = 0xffffffff00000000ull;
+        tcg_target_available_regs[TCG_TYPE_V128] = 0xffffffff00000000ull;
+    }
 
     tcg_target_call_clobber_regs = 0;
     tcg_regset_set_reg(tcg_target_call_clobber_regs, TCG_REG_R0);
-- 
2.17.1

Add various bits and peaces related mostly to load and store
operations. In that context, logic, compare, and splat Altivec
instructions are used, and, therefore, the support for emitting
them is included in this patch too.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.h     |   6 +-
 tcg/ppc/tcg-target.inc.c | 472 ++++++++++++++++++++++++++++++++++++---
 2 files changed, 442 insertions(+), 36 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index XXXXXXX..XXXXXXX 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -XXX,XX +XXX,XX @@ extern bool have_altivec;
 #define TCG_TARGET_HAS_v128             have_altivec
 #define TCG_TARGET_HAS_v256             0
 
-#define TCG_TARGET_HAS_andc_vec         0
+#define TCG_TARGET_HAS_andc_vec         1
 #define TCG_TARGET_HAS_orc_vec          0
-#define TCG_TARGET_HAS_not_vec          0
+#define TCG_TARGET_HAS_not_vec          1
 #define TCG_TARGET_HAS_neg_vec          0
 #define TCG_TARGET_HAS_abs_vec          0
 #define TCG_TARGET_HAS_shi_vec          0
 #define TCG_TARGET_HAS_shs_vec          0
 #define TCG_TARGET_HAS_shv_vec          0
-#define TCG_TARGET_HAS_cmp_vec          0
+#define TCG_TARGET_HAS_cmp_vec          1
 #define TCG_TARGET_HAS_mul_vec          0
 #define TCG_TARGET_HAS_sat_vec          0
 #define TCG_TARGET_HAS_minmax_vec       0
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -XXX,XX +XXX,XX @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
         ct->ct |= TCG_CT_REG;
         ct->u.regs = 0xffffffff;
         break;
+    case 'v':
+        ct->ct |= TCG_CT_REG;
+        ct->u.regs = 0xffffffff00000000ull;
+        break;
     case 'L':                   /* qemu_ld constraint */
         ct->ct |= TCG_CT_REG;
         ct->u.regs = 0xffffffff;
@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 
 #define NOP    ORI  /* ori 0,0,0 */
 
+#define LVX        XO31(103)
+#define LVEBX      XO31(7)
+#define LVEHX      XO31(39)
+#define LVEWX      XO31(71)
+
+#define STVX       XO31(231)
+#define STVEWX     XO31(199)
+
+#define VCMPEQUB   VX4(6)
+#define VCMPEQUH   VX4(70)
+#define VCMPEQUW   VX4(134)
+#define VCMPGTSB   VX4(774)
+#define VCMPGTSH   VX4(838)
+#define VCMPGTSW   VX4(902)
+#define VCMPGTUB   VX4(518)
+#define VCMPGTUH   VX4(582)
+#define VCMPGTUW   VX4(646)
+
+#define VAND       VX4(1028)
+#define VANDC      VX4(1092)
+#define VNOR       VX4(1284)
+#define VOR        VX4(1156)
+#define VXOR       VX4(1220)
+
+#define VSPLTB     VX4(524)
+#define VSPLTH     VX4(588)
+#define VSPLTW     VX4(652)
+#define VSPLTISB   VX4(780)
+#define VSPLTISH   VX4(844)
+#define VSPLTISW   VX4(908)
+
+#define VSLDOI     VX4(44)
+
 #define RT(r) ((r)<<21)
 #define RS(r) ((r)<<21)
 #define RA(r) ((r)<<16)
@@ -XXX,XX +XXX,XX @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
 {
     tcg_insn_unit *target;
+    int16_t lo;
+    int32_t hi;
 
     value += addend;
     target = (tcg_insn_unit *)value;
@@ -XXX,XX +XXX,XX @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
         }
         *code_ptr = (*code_ptr & ~0xfffc) | (value & 0xfffc);
         break;
+    case R_PPC_ADDR32:
+        /*
+         * We are abusing this relocation type.  Again, this points to
+         * a pair of insns, lis + load.  This is an absolute address
+         * relocation for PPC32 so the lis cannot be removed.
+         */
+        lo = value;
+        hi = value - lo;
+        if (hi + lo != value) {
+            return false;
+        }
+        code_ptr[0] = deposit32(code_ptr[0], 0, 16, hi >> 16);
+        code_ptr[1] = deposit32(code_ptr[1], 0, 16, lo);
+        break;
     default:
         g_assert_not_reached();
     }
@@ -XXX,XX +XXX,XX @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
 
 static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
 {
-    tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
-    if (ret != arg) {
-        tcg_out32(s, OR | SAB(arg, ret, arg));
+    if (ret == arg) {
+        return true;
+    }
+    switch (type) {
+    case TCG_TYPE_I64:
+        tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+        /* fallthru */
+    case TCG_TYPE_I32:
+        if (ret < TCG_REG_V0 && arg < TCG_REG_V0) {
+            tcg_out32(s, OR | SAB(arg, ret, arg));
+            break;
+        } else if (ret < TCG_REG_V0 || arg < TCG_REG_V0) {
+            /* Altivec does not support vector/integer moves.  */
+            return false;
+        }
+        /* fallthru */
+    case TCG_TYPE_V64:
+    case TCG_TYPE_V128:
+        tcg_debug_assert(ret >= TCG_REG_V0 && arg >= TCG_REG_V0);
+        tcg_out32(s, VOR | VRT(ret) | VRA(arg) | VRB(arg));
+        break;
+    default:
+        g_assert_not_reached();
     }
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
 static void tcg_out_dupi_vec(TCGContext *s, TCGType type, TCGReg ret,
                              tcg_target_long val)
 {
-    g_assert_not_reached();
+    uint32_t load_insn;
+    int rel, low;
+    intptr_t add;
+
+    low = (int8_t)val;
+    if (low >= -16 && low < 16) {
+        if (val == (tcg_target_long)dup_const(MO_8, low)) {
+            tcg_out32(s, VSPLTISB | VRT(ret) | ((val & 31) << 16));
+            return;
+        }
+        if (val == (tcg_target_long)dup_const(MO_16, low)) {
+            tcg_out32(s, VSPLTISH | VRT(ret) | ((val & 31) << 16));
+            return;
+        }
+        if (val == (tcg_target_long)dup_const(MO_32, low)) {
+            tcg_out32(s, VSPLTISW | VRT(ret) | ((val & 31) << 16));
+            return;
+        }
+    }
+
+    /*
+     * Otherwise we must load the value from the constant pool.
+     */
+    if (USE_REG_TB) {
+        rel = R_PPC_ADDR16;
+        add = -(intptr_t)s->code_gen_ptr;
+    } else {
+        rel = R_PPC_ADDR32;
+        add = 0;
+    }
+
+    load_insn = LVX | VRT(ret) | RB(TCG_REG_TMP1);
+    if (TCG_TARGET_REG_BITS == 64) {
+        new_pool_l2(s, rel, s->code_ptr, add, val, val);
+    } else {
+        new_pool_l4(s, rel, s->code_ptr, add, val, val, val, val);
+    }
+
+    if (USE_REG_TB) {
+        tcg_out32(s, ADDI | TAI(TCG_REG_TMP1, 0, 0));
+        load_insn |= RA(TCG_REG_TB);
+    } else {
+        tcg_out32(s, ADDIS | TAI(TCG_REG_TMP1, 0, 0));
+        tcg_out32(s, ADDI | TAI(TCG_REG_TMP1, TCG_REG_TMP1, 0));
+    }
+    tcg_out32(s, load_insn);
 }
 
 static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret,
@@ -XXX,XX +XXX,XX @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
         align = 3;
         /* FALLTHRU */
     default:
-        if (rt != TCG_REG_R0) {
+        if (rt > TCG_REG_R0 && rt < TCG_REG_V0) {
             rs = rt;
             break;
         }
@@ -XXX,XX +XXX,XX @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
     }
 
     /* For unaligned, or very large offsets, use the indexed form.  */
-    if (offset & align || offset != (int32_t)offset) {
+    if (offset & align || offset != (int32_t)offset || opi == 0) {
         if (rs == base) {
             rs = TCG_REG_R0;
         }
         tcg_debug_assert(!is_store || rs != rt);
         tcg_out_movi(s, TCG_TYPE_PTR, rs, orig);
-        tcg_out32(s, opx | TAB(rt, base, rs));
+        tcg_out32(s, opx | TAB(rt & 31, base, rs));
         return;
     }
 
@@ -XXX,XX +XXX,XX @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
         base = rs;
     }
     if (opi != ADDI || base != rt || l0 != 0) {
-        tcg_out32(s, opi | TAI(rt, base, l0));
+        tcg_out32(s, opi | TAI(rt & 31, base, l0));
     }
 }
 
-static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
-                              TCGReg arg1, intptr_t arg2)
+static void tcg_out_vsldoi(TCGContext *s, TCGReg ret,
+                           TCGReg va, TCGReg vb, int shb)
 {
-    int opi, opx;
-
-    tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
-    if (type == TCG_TYPE_I32) {
-        opi = LWZ, opx = LWZX;
-    } else {
-        opi = LD, opx = LDX;
-    }
-    tcg_out_mem_long(s, opi, opx, ret, arg1, arg2);
+    tcg_out32(s, VSLDOI | VRT(ret) | VRA(va) | VRB(vb) | (shb << 6));
 }
 
-static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
-                              TCGReg arg1, intptr_t arg2)
+static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
+                       TCGReg base, intptr_t offset)
 {
-    int opi, opx;
+    int shift;
 
-    tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
-    if (type == TCG_TYPE_I32) {
-        opi = STW, opx = STWX;
-    } else {
-        opi = STD, opx = STDX;
+    switch (type) {
+    case TCG_TYPE_I32:
+        if (ret < TCG_REG_V0) {
+            tcg_out_mem_long(s, LWZ, LWZX, ret, base, offset);
+            break;
+        }
+        tcg_debug_assert((offset & 3) == 0);
+        tcg_out_mem_long(s, 0, LVEWX, ret, base, offset);
+        shift = (offset - 4) & 0xc;
+        if (shift) {
+            tcg_out_vsldoi(s, ret, ret, ret, shift);
+        }
+        break;
+    case TCG_TYPE_I64:
+        if (ret < TCG_REG_V0) {
+            tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+            tcg_out_mem_long(s, LD, LDX, ret, base, offset);
+            break;
+        }
+        /* fallthru */
+    case TCG_TYPE_V64:
+        tcg_debug_assert(ret >= TCG_REG_V0);
+        tcg_debug_assert((offset & 7) == 0);
+        tcg_out_mem_long(s, 0, LVX, ret, base, offset & -16);
+        if (offset & 8) {
+            tcg_out_vsldoi(s, ret, ret, ret, 8);
+        }
+        break;
+    case TCG_TYPE_V128:
+        tcg_debug_assert(ret >= TCG_REG_V0);
+        tcg_debug_assert((offset & 15) == 0);
+        tcg_out_mem_long(s, 0, LVX, ret, base, offset);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
+                              TCGReg base, intptr_t offset)
+{
+    int shift;
+
+    switch (type) {
+    case TCG_TYPE_I32:
+        if (arg < TCG_REG_V0) {
+            tcg_out_mem_long(s, STW, STWX, arg, base, offset);
+            break;
+        }
+        tcg_debug_assert((offset & 3) == 0);
+        shift = (offset - 4) & 0xc;
+        if (shift) {
+            tcg_out_vsldoi(s, TCG_VEC_TMP1, arg, arg, shift);
+            arg = TCG_VEC_TMP1;
+        }
+        tcg_out_mem_long(s, 0, STVEWX, arg, base, offset);
+        break;
+    case TCG_TYPE_I64:
+        if (arg < TCG_REG_V0) {
+            tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+            tcg_out_mem_long(s, STD, STDX, arg, base, offset);
+            break;
+        }
+        /* fallthru */
+    case TCG_TYPE_V64:
+        tcg_debug_assert(arg >= TCG_REG_V0);
+        tcg_debug_assert((offset & 7) == 0);
+        if (offset & 8) {
+            tcg_out_vsldoi(s, TCG_VEC_TMP1, arg, arg, 8);
+            arg = TCG_VEC_TMP1;
+        }
+        tcg_out_mem_long(s, 0, STVEWX, arg, base, offset);
+        tcg_out_mem_long(s, 0, STVEWX, arg, base, offset + 4);
+        break;
+    case TCG_TYPE_V128:
+        tcg_debug_assert(arg >= TCG_REG_V0);
+        tcg_out_mem_long(s, 0, STVX, arg, base, offset);
+        break;
+    default:
+        g_assert_not_reached();
     }
-    tcg_out_mem_long(s, opi, opx, arg, arg1, arg2);
 }
 
 static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
@@ -XXX,XX +XXX,XX @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
 
 int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
 {
-    g_assert_not_reached();
+    switch (opc) {
+    case INDEX_op_and_vec:
+    case INDEX_op_or_vec:
+    case INDEX_op_xor_vec:
+    case INDEX_op_andc_vec:
+    case INDEX_op_not_vec:
+        return 1;
+    case INDEX_op_cmp_vec:
+        return vece <= MO_32 ? -1 : 0;
+    default:
+        return 0;
+    }
 }
 
 static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
                             TCGReg dst, TCGReg src)
 {
-    g_assert_not_reached();
+    tcg_debug_assert(dst >= TCG_REG_V0);
+    tcg_debug_assert(src >= TCG_REG_V0);
+
+    /*
+     * Recall we use (or emulate) VSX integer loads, so the integer is
+     * right justified within the left (zero-index) double-word.
+     */
+    switch (vece) {
+    case MO_8:
+        tcg_out32(s, VSPLTB | VRT(dst) | VRB(src) | (7 << 16));
+        break;
+    case MO_16:
+        tcg_out32(s, VSPLTH | VRT(dst) | VRB(src) | (3 << 16));
+        break;
+    case MO_32:
+        tcg_out32(s, VSPLTW | VRT(dst) | VRB(src) | (1 << 16));
+        break;
+    case MO_64:
+        tcg_out_vsldoi(s, TCG_VEC_TMP1, src, src, 8);
+        tcg_out_vsldoi(s, dst, TCG_VEC_TMP1, src, 8);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    return true;
 }
 
 static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
                              TCGReg out, TCGReg base, intptr_t offset)
 {
-    g_assert_not_reached();
+    int elt;
+
+    tcg_debug_assert(out >= TCG_REG_V0);
+    switch (vece) {
+    case MO_8:
+        tcg_out_mem_long(s, 0, LVEBX, out, base, offset);
+        elt = extract32(offset, 0, 4);
+#ifndef HOST_WORDS_BIGENDIAN
+        elt ^= 15;
+#endif
+        tcg_out32(s, VSPLTB | VRT(out) | VRB(out) | (elt << 16));
+        break;
+    case MO_16:
+        tcg_debug_assert((offset & 1) == 0);
+        tcg_out_mem_long(s, 0, LVEHX, out, base, offset);
+        elt = extract32(offset, 1, 3);
+#ifndef HOST_WORDS_BIGENDIAN
+        elt ^= 7;
+#endif
+        tcg_out32(s, VSPLTH | VRT(out) | VRB(out) | (elt << 16));
+        break;
+    case MO_32:
+        tcg_debug_assert((offset & 3) == 0);
+        tcg_out_mem_long(s, 0, LVEWX, out, base, offset);
+        elt = extract32(offset, 2, 2);
+#ifndef HOST_WORDS_BIGENDIAN
+        elt ^= 3;
+#endif
+        tcg_out32(s, VSPLTW | VRT(out) | VRB(out) | (elt << 16));
+        break;
+    case MO_64:
+        tcg_debug_assert((offset & 7) == 0);
+        tcg_out_mem_long(s, 0, LVX, out, base, offset & -16);
+        tcg_out_vsldoi(s, TCG_VEC_TMP1, out, out, 8);
+        elt = extract32(offset, 3, 1);
+#ifndef HOST_WORDS_BIGENDIAN
+        elt = !elt;
+#endif
+        if (elt) {
+            tcg_out_vsldoi(s, out, out, TCG_VEC_TMP1, 8);
+        } else {
+            tcg_out_vsldoi(s, out, TCG_VEC_TMP1, out, 8);
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    return true;
 }
 
 static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
                            unsigned vecl, unsigned vece,
                            const TCGArg *args, const int *const_args)
 {
-    g_assert_not_reached();
+    static const uint32_t
+        eq_op[4]  = { VCMPEQUB, VCMPEQUH, VCMPEQUW, 0 },
+        gts_op[4] = { VCMPGTSB, VCMPGTSH, VCMPGTSW, 0 },
+        gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, 0 };
+
+    TCGType type = vecl + TCG_TYPE_V64;
+    TCGArg a0 = args[0], a1 = args[1], a2 = args[2];
+    uint32_t insn;
+
+    switch (opc) {
+    case INDEX_op_ld_vec:
+        tcg_out_ld(s, type, a0, a1, a2);
+        return;
+    case INDEX_op_st_vec:
+        tcg_out_st(s, type, a0, a1, a2);
+        return;
+    case INDEX_op_dupm_vec:
+        tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
+        return;
+
+    case INDEX_op_and_vec:
+        insn = VAND;
+        break;
+    case INDEX_op_or_vec:
+        insn = VOR;
+        break;
+    case INDEX_op_xor_vec:
+        insn = VXOR;
+        break;
+    case INDEX_op_andc_vec:
+        insn = VANDC;
+        break;
+    case INDEX_op_not_vec:
+        insn = VNOR;
+        a2 = a1;
+        break;
+
+    case INDEX_op_cmp_vec:
+        switch (args[3]) {
+        case TCG_COND_EQ:
+            insn = eq_op[vece];
+            break;
+        case TCG_COND_GT:
+            insn = gts_op[vece];
+            break;
+        case TCG_COND_GTU:
+            insn = gtu_op[vece];
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        break;
+
+    case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
+    case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
+    case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
+    default:
+        g_assert_not_reached();
+    }
+
+    tcg_debug_assert(insn != 0);
+    tcg_out32(s, insn | VRT(a0) | VRA(a1) | VRB(a2));
+}
+
+static void expand_vec_cmp(TCGType type, unsigned vece, TCGv_vec v0,
+                           TCGv_vec v1, TCGv_vec v2, TCGCond cond)
+{
+    bool need_swap = false, need_inv = false;
+
+    tcg_debug_assert(vece <= MO_32);
+
+    switch (cond) {
+    case TCG_COND_EQ:
+    case TCG_COND_GT:
+    case TCG_COND_GTU:
+        break;
+    case TCG_COND_NE:
+    case TCG_COND_LE:
+    case TCG_COND_LEU:
+        need_inv = true;
+        break;
+    case TCG_COND_LT:
+    case TCG_COND_LTU:
+        need_swap = true;
+        break;
+    case TCG_COND_GE:
+    case TCG_COND_GEU:
+        need_swap = need_inv = true;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    if (need_inv) {
+        cond = tcg_invert_cond(cond);
+    }
+    if (need_swap) {
+        TCGv_vec t1;
+        t1 = v1, v1 = v2, v2 = t1;
+        cond = tcg_swap_cond(cond);
+    }
+
+    vec_gen_4(INDEX_op_cmp_vec, type, vece, tcgv_vec_arg(v0),
+              tcgv_vec_arg(v1), tcgv_vec_arg(v2), cond);
+
+    if (need_inv) {
+        tcg_gen_not_vec(vece, v0, v0);
+    }
 }
 
 void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
                        TCGArg a0, ...)
 {
-    g_assert_not_reached();
+    va_list va;
+    TCGv_vec v0, v1, v2;
+
+    va_start(va, a0);
+    v0 = temp_tcgv_vec(arg_temp(a0));
+    v1 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
+    v2 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
+
+    switch (opc) {
+    case INDEX_op_cmp_vec:
+        expand_vec_cmp(type, vece, v0, v1, v2, va_arg(va, TCGArg));
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    va_end(va);
 }
 
 static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
@@ -XXX,XX +XXX,XX @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         = { .args_ct_str = { "r", "r", "r", "r", "rI", "rZM" } };
     static const TCGTargetOpDef sub2
         = { .args_ct_str = { "r", "r", "rI", "rZM", "r", "r" } };
+    static const TCGTargetOpDef v_r = { .args_ct_str = { "v", "r" } };
+    static const TCGTargetOpDef v_v = { .args_ct_str = { "v", "v" } };
+    static const TCGTargetOpDef v_v_v = { .args_ct_str = { "v", "v", "v" } };
 
     switch (op) {
     case INDEX_op_goto_ptr:
@@ -XXX,XX +XXX,XX @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         return (TCG_TARGET_REG_BITS == 64 ? &S_S
                 : TARGET_LONG_BITS == 32 ? &S_S_S : &S_S_S_S);
 
+    case INDEX_op_and_vec:
+    case INDEX_op_or_vec:
+    case INDEX_op_xor_vec:
+    case INDEX_op_andc_vec:
+    case INDEX_op_orc_vec:
+    case INDEX_op_cmp_vec:
+        return &v_v_v;
+    case INDEX_op_not_vec:
+    case INDEX_op_dup_vec:
+        return &v_v;
+    case INDEX_op_ld_vec:
+    case INDEX_op_st_vec:
+    case INDEX_op_dupm_vec:
+        return &v_r;
+
     default:
         return NULL;
     }
-- 
2.17.1

Add support for vector maximum/minimum using Altivec instructions
VMAXSB, VMAXSH, VMAXSW, VMAXUB, VMAXUH, VMAXUW, and
VMINSB, VMINSH, VMINSW, VMINUB, VMINUH, VMINUW.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.h     |  2 +-
 tcg/ppc/tcg-target.inc.c | 40 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 40 insertions(+), 2 deletions(-)

Add support for vector add/subtract using Altivec instructions:
VADDUBM, VADDUHM, VADDUWM, VSUBUBM, VSUBUHM, VSUBUWM.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.inc.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

Add support for vector saturated add/subtract using Altivec
instructions:
VADDSBS, VADDSHS, VADDSWS, VADDUBS, VADDUHS, VADDUWS, and
VSUBSBS, VSUBSHS, VSUBSWS, VSUBUBS, VSUBUHS, VSUBUWS.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.h     |  2 +-
 tcg/ppc/tcg-target.inc.c | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+), 1 deletion(-)

For Altivec, this is done via vector shift by vector,
and loading the immediate into a register.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.h     |  2 +-
 tcg/ppc/tcg-target.inc.c | 58 ++++++++++++++++++++++++++++++++++++++--
 2 files changed, 57 insertions(+), 3 deletions(-)

For Altivec, this is always an expansion.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.h     |   2 +-
 tcg/ppc/tcg-target.opc.h |   8 +++
 tcg/ppc/tcg-target.inc.c | 113 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 121 insertions(+), 2 deletions(-)

This is only used for 32-bit hosts.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.inc.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_dup2_vec:
+        assert(TCG_TARGET_REG_BITS == 32);
+        /* With inputs a1 = xLxx, a2 = xHxx  */
+        tcg_out32(s, VMRGHW | VRT(a0) | VRA(a2) | VRB(a1));  /* a0  = xxHL */
+        tcg_out_vsldoi(s, TCG_VEC_TMP1, a0, a0, 8);          /* tmp = HLxx */
+        tcg_out_vsldoi(s, a0, a0, TCG_VEC_TMP1, 8);          /* a0  = HLHL */
+        return;
+
     case INDEX_op_ppc_mrgh_vec:
         insn = mrgh_op[vece];
         break;
@@ -XXX,XX +XXX,XX @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_ppc_mulou_vec:
     case INDEX_op_ppc_pkum_vec:
     case INDEX_op_ppc_rotl_vec:
+    case INDEX_op_dup2_vec:
         return &v_v_v;
     case INDEX_op_not_vec:
     case INDEX_op_dup_vec:
-- 
2.17.1

The VSX instruction set instructions include double-word loads and
stores, double-word load and splat, double-word permute, and bit
select.  All of which require multiple operations in the Altivec
instruction set.

Because the VSX registers map %vsr32 to %vr0, and we have no current
intention or need to use vector registers outside %vr0-%vr19, force
on the {ax,bx,cx,tx} bits within the added VSX insns so that we don't
have to otherwise modify the VR[TABC] macros.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com>
---
 tcg/ppc/tcg-target.h     |  5 ++--
 tcg/ppc/tcg-target.inc.c | 52 ++++++++++++++++++++++++++++++++++++----
 2 files changed, 51 insertions(+), 6 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index XXXXXXX..XXXXXXX 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -XXX,XX +XXX,XX @@ typedef enum {
 
 extern TCGPowerISA have_isa;
 extern bool have_altivec;
+extern bool have_vsx;
 
 #define have_isa_2_06  (have_isa >= tcg_isa_2_06)
 #define have_isa_3_00  (have_isa >= tcg_isa_3_00)
@@ -XXX,XX +XXX,XX @@ extern bool have_altivec;
  * instruction and substituting two 32-bit stores makes the generated
  * code quite large.
  */
-#define TCG_TARGET_HAS_v64              0
+#define TCG_TARGET_HAS_v64              have_vsx
 #define TCG_TARGET_HAS_v128             have_altivec
 #define TCG_TARGET_HAS_v256             0
 
@@ -XXX,XX +XXX,XX @@ extern bool have_altivec;
 #define TCG_TARGET_HAS_mul_vec          1
 #define TCG_TARGET_HAS_sat_vec          1
 #define TCG_TARGET_HAS_minmax_vec       1
-#define TCG_TARGET_HAS_bitsel_vec       0
+#define TCG_TARGET_HAS_bitsel_vec       have_vsx
 #define TCG_TARGET_HAS_cmpsel_vec       0
 
 void flush_icache_range(uintptr_t start, uintptr_t stop);
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -XXX,XX +XXX,XX @@ static tcg_insn_unit *tb_ret_addr;
 TCGPowerISA have_isa;
 static bool have_isel;
 bool have_altivec;
+bool have_vsx;
 
 #ifndef CONFIG_SOFTMMU
 #define TCG_GUEST_BASE_REG 30
@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define LVEBX      XO31(7)
 #define LVEHX      XO31(39)
 #define LVEWX      XO31(71)
+#define LXSDX      (XO31(588) | 1)  /* v2.06, force tx=1 */
+#define LXVDSX     (XO31(332) | 1)  /* v2.06, force tx=1 */
 
 #define STVX       XO31(231)
 #define STVEWX     XO31(199)
+#define STXSDX     (XO31(716) | 1)  /* v2.06, force sx=1 */
 
 #define VADDSBS    VX4(768)
 #define VADDUBS    VX4(512)
@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 
 #define VSLDOI     VX4(44)
 
+#define XXPERMDI   (OPCD(60) | (10 << 3) | 7)  /* v2.06, force ax=bx=tx=1 */
+#define XXSEL      (OPCD(60) | (3 << 4) | 0xf) /* v2.06, force ax=bx=cx=tx=1 */
+
 #define RT(r) ((r)<<21)
 #define RS(r) ((r)<<21)
 #define RA(r) ((r)<<16)
@@ -XXX,XX +XXX,XX @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type, TCGReg ret,
         add = 0;
     }
 
-    load_insn = LVX | VRT(ret) | RB(TCG_REG_TMP1);
-    if (TCG_TARGET_REG_BITS == 64) {
-        new_pool_l2(s, rel, s->code_ptr, add, val, val);
+    if (have_vsx) {
+        load_insn = type == TCG_TYPE_V64 ? LXSDX : LXVDSX;
+        load_insn |= VRT(ret) | RB(TCG_REG_TMP1);
+        if (TCG_TARGET_REG_BITS == 64) {
+            new_pool_label(s, val, rel, s->code_ptr, add);
+        } else {
+            new_pool_l2(s, rel, s->code_ptr, add, val, val);
+        }
     } else {
-        new_pool_l4(s, rel, s->code_ptr, add, val, val, val, val);
+        load_insn = LVX | VRT(ret) | RB(TCG_REG_TMP1);
+        if (TCG_TARGET_REG_BITS == 64) {
+            new_pool_l2(s, rel, s->code_ptr, add, val, val);
+        } else {
+            new_pool_l4(s, rel, s->code_ptr, add, val, val, val, val);
+        }
     }
 
     if (USE_REG_TB) {
@@ -XXX,XX +XXX,XX @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
         /* fallthru */
     case TCG_TYPE_V64:
         tcg_debug_assert(ret >= TCG_REG_V0);
+        if (have_vsx) {
+            tcg_out_mem_long(s, 0, LXSDX, ret, base, offset);
+            break;
+        }
         tcg_debug_assert((offset & 7) == 0);
         tcg_out_mem_long(s, 0, LVX, ret, base, offset & -16);
         if (offset & 8) {
@@ -XXX,XX +XXX,XX @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
         /* fallthru */
     case TCG_TYPE_V64:
         tcg_debug_assert(arg >= TCG_REG_V0);
+        if (have_vsx) {
+            tcg_out_mem_long(s, 0, STXSDX, arg, base, offset);
+            break;
+        }
         tcg_debug_assert((offset & 7) == 0);
         if (offset & 8) {
             tcg_out_vsldoi(s, TCG_VEC_TMP1, arg, arg, 8);
@@ -XXX,XX +XXX,XX @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_shri_vec:
     case INDEX_op_sari_vec:
         return vece <= MO_32 ? -1 : 0;
+    case INDEX_op_bitsel_vec:
+        return have_vsx;
     default:
         return 0;
     }
@@ -XXX,XX +XXX,XX @@ static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
         tcg_out32(s, VSPLTW | VRT(dst) | VRB(src) | (1 << 16));
         break;
     case MO_64:
+        if (have_vsx) {
+            tcg_out32(s, XXPERMDI | VRT(dst) | VRA(src) | VRB(src));
+            break;
+        }
         tcg_out_vsldoi(s, TCG_VEC_TMP1, src, src, 8);
         tcg_out_vsldoi(s, dst, TCG_VEC_TMP1, src, 8);
         break;
@@ -XXX,XX +XXX,XX @@ static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
         tcg_out32(s, VSPLTW | VRT(out) | VRB(out) | (elt << 16));
         break;
     case MO_64:
+        if (have_vsx) {
+            tcg_out_mem_long(s, 0, LXVDSX, out, base, offset);
+            break;
+        }
         tcg_debug_assert((offset & 7) == 0);
         tcg_out_mem_long(s, 0, LVX, out, base, offset & -16);
         tcg_out_vsldoi(s, TCG_VEC_TMP1, out, out, 8);
@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_bitsel_vec:
+        tcg_out32(s, XXSEL | VRT(a0) | VRC(a1) | VRB(a2) | VRA(args[3]));
+        return;
+
     case INDEX_op_dup2_vec:
         assert(TCG_TARGET_REG_BITS == 32);
         /* With inputs a1 = xLxx, a2 = xHxx  */
@@ -XXX,XX +XXX,XX @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_st_vec:
     case INDEX_op_dupm_vec:
         return &v_r;
+    case INDEX_op_bitsel_vec:
     case INDEX_op_ppc_msum_vec:
         return &v_v_v_v;
 
@@ -XXX,XX +XXX,XX @@ static void tcg_target_init(TCGContext *s)
 
     if (hwcap & PPC_FEATURE_HAS_ALTIVEC) {
         have_altivec = true;
+        /* We only care about the portion of VSX that overlaps Altivec. */
+        if (hwcap & PPC_FEATURE_HAS_VSX) {
+            have_vsx = true;
+        }
     }
 
     tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff;
-- 
2.17.1

These new instructions are conditional only on MSR.VEC and
are thus part of the Altivec instruction set, and not VSX.
This includes lots of double-word arithmetic and a few extra
logical operations.

Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.h     |  4 +-
 tcg/ppc/tcg-target.inc.c | 85 ++++++++++++++++++++++++++++++----------
 2 files changed, 67 insertions(+), 22 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index XXXXXXX..XXXXXXX 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -XXX,XX +XXX,XX @@ typedef enum {
 typedef enum {
     tcg_isa_base,
     tcg_isa_2_06,
+    tcg_isa_2_07,
     tcg_isa_3_00,
 } TCGPowerISA;
 
@@ -XXX,XX +XXX,XX @@ extern bool have_altivec;
 extern bool have_vsx;
 
 #define have_isa_2_06  (have_isa >= tcg_isa_2_06)
+#define have_isa_2_07  (have_isa >= tcg_isa_2_07)
 #define have_isa_3_00  (have_isa >= tcg_isa_3_00)
 
 /* optional instructions automatically implemented */
@@ -XXX,XX +XXX,XX @@ extern bool have_vsx;
 #define TCG_TARGET_HAS_v256             0
 
 #define TCG_TARGET_HAS_andc_vec         1
-#define TCG_TARGET_HAS_orc_vec          0
+#define TCG_TARGET_HAS_orc_vec          have_isa_2_07
 #define TCG_TARGET_HAS_not_vec          1
 #define TCG_TARGET_HAS_neg_vec          0
 #define TCG_TARGET_HAS_abs_vec          0
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define VADDSWS    VX4(896)
 #define VADDUWS    VX4(640)
 #define VADDUWM    VX4(128)
+#define VADDUDM    VX4(192)       /* v2.07 */
 
 #define VSUBSBS    VX4(1792)
 #define VSUBUBS    VX4(1536)
@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define VSUBSWS    VX4(1920)
 #define VSUBUWS    VX4(1664)
 #define VSUBUWM    VX4(1152)
+#define VSUBUDM    VX4(1216)      /* v2.07 */
 
 #define VMAXSB     VX4(258)
 #define VMAXSH     VX4(322)
 #define VMAXSW     VX4(386)
+#define VMAXSD     VX4(450)       /* v2.07 */
 #define VMAXUB     VX4(2)
 #define VMAXUH     VX4(66)
 #define VMAXUW     VX4(130)
+#define VMAXUD     VX4(194)       /* v2.07 */
 #define VMINSB     VX4(770)
 #define VMINSH     VX4(834)
 #define VMINSW     VX4(898)
+#define VMINSD     VX4(962)       /* v2.07 */
 #define VMINUB     VX4(514)
 #define VMINUH     VX4(578)
 #define VMINUW     VX4(642)
+#define VMINUD     VX4(706)       /* v2.07 */
 
 #define VCMPEQUB   VX4(6)
 #define VCMPEQUH   VX4(70)
 #define VCMPEQUW   VX4(134)
+#define VCMPEQUD   VX4(199)       /* v2.07 */
 #define VCMPGTSB   VX4(774)
 #define VCMPGTSH   VX4(838)
 #define VCMPGTSW   VX4(902)
+#define VCMPGTSD   VX4(967)       /* v2.07 */
 #define VCMPGTUB   VX4(518)
 #define VCMPGTUH   VX4(582)
 #define VCMPGTUW   VX4(646)
+#define VCMPGTUD   VX4(711)       /* v2.07 */
 
 #define VSLB       VX4(260)
 #define VSLH       VX4(324)
 #define VSLW       VX4(388)
+#define VSLD       VX4(1476)      /* v2.07 */
 #define VSRB       VX4(516)
 #define VSRH       VX4(580)
 #define VSRW       VX4(644)
+#define VSRD       VX4(1732)      /* v2.07 */
 #define VSRAB      VX4(772)
 #define VSRAH      VX4(836)
 #define VSRAW      VX4(900)
+#define VSRAD      VX4(964)       /* v2.07 */
 #define VRLB       VX4(4)
 #define VRLH       VX4(68)
 #define VRLW       VX4(132)
+#define VRLD       VX4(196)       /* v2.07 */
 
 #define VMULEUB    VX4(520)
 #define VMULEUH    VX4(584)
+#define VMULEUW    VX4(648)       /* v2.07 */
 #define VMULOUB    VX4(8)
 #define VMULOUH    VX4(72)
+#define VMULOUW    VX4(136)       /* v2.07 */
+#define VMULUWM    VX4(137)       /* v2.07 */
 #define VMSUMUHM   VX4(38)
 
 #define VMRGHB     VX4(12)
@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define VNOR       VX4(1284)
 #define VOR        VX4(1156)
 #define VXOR       VX4(1220)
+#define VEQV       VX4(1668)      /* v2.07 */
+#define VNAND      VX4(1412)      /* v2.07 */
+#define VORC       VX4(1348)      /* v2.07 */
 
 #define VSPLTB     VX4(524)
 #define VSPLTH     VX4(588)
@@ -XXX,XX +XXX,XX @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_andc_vec:
     case INDEX_op_not_vec:
         return 1;
+    case INDEX_op_orc_vec:
+        return have_isa_2_07;
     case INDEX_op_add_vec:
     case INDEX_op_sub_vec:
     case INDEX_op_smax_vec:
     case INDEX_op_smin_vec:
     case INDEX_op_umax_vec:
     case INDEX_op_umin_vec:
+    case INDEX_op_shlv_vec:
+    case INDEX_op_shrv_vec:
+    case INDEX_op_sarv_vec:
+        return vece <= MO_32 || have_isa_2_07;
     case INDEX_op_ssadd_vec:
     case INDEX_op_sssub_vec:
     case INDEX_op_usadd_vec:
     case INDEX_op_ussub_vec:
-    case INDEX_op_shlv_vec:
-    case INDEX_op_shrv_vec:
-    case INDEX_op_sarv_vec:
         return vece <= MO_32;
     case INDEX_op_cmp_vec:
-    case INDEX_op_mul_vec:
     case INDEX_op_shli_vec:
     case INDEX_op_shri_vec:
     case INDEX_op_sari_vec:
-        return vece <= MO_32 ? -1 : 0;
+        return vece <= MO_32 || have_isa_2_07 ? -1 : 0;
+    case INDEX_op_mul_vec:
+        switch (vece) {
+        case MO_8:
+        case MO_16:
+            return -1;
+        case MO_32:
+            return have_isa_2_07 ? 1 : -1;
+        }
+        return 0;
     case INDEX_op_bitsel_vec:
         return have_vsx;
     default:
@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
                            const TCGArg *args, const int *const_args)
 {
     static const uint32_t
-        add_op[4] = { VADDUBM, VADDUHM, VADDUWM, 0 },
-        sub_op[4] = { VSUBUBM, VSUBUHM, VSUBUWM, 0 },
-        eq_op[4]  = { VCMPEQUB, VCMPEQUH, VCMPEQUW, 0 },
-        gts_op[4] = { VCMPGTSB, VCMPGTSH, VCMPGTSW, 0 },
-        gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, 0 },
+        add_op[4] = { VADDUBM, VADDUHM, VADDUWM, VADDUDM },
+        sub_op[4] = { VSUBUBM, VSUBUHM, VSUBUWM, VSUBUDM },
+        eq_op[4]  = { VCMPEQUB, VCMPEQUH, VCMPEQUW, VCMPEQUD },
+        gts_op[4] = { VCMPGTSB, VCMPGTSH, VCMPGTSW, VCMPGTSD },
+        gtu_op[4] = { VCMPGTUB, VCMPGTUH, VCMPGTUW, VCMPGTUD },
         ssadd_op[4] = { VADDSBS, VADDSHS, VADDSWS, 0 },
         usadd_op[4] = { VADDUBS, VADDUHS, VADDUWS, 0 },
         sssub_op[4] = { VSUBSBS, VSUBSHS, VSUBSWS, 0 },
         ussub_op[4] = { VSUBUBS, VSUBUHS, VSUBUWS, 0 },
-        umin_op[4] = { VMINUB, VMINUH, VMINUW, 0 },
-        smin_op[4] = { VMINSB, VMINSH, VMINSW, 0 },
-        umax_op[4] = { VMAXUB, VMAXUH, VMAXUW, 0 },
-        smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, 0 },
-        shlv_op[4] = { VSLB, VSLH, VSLW, 0 },
-        shrv_op[4] = { VSRB, VSRH, VSRW, 0 },
-        sarv_op[4] = { VSRAB, VSRAH, VSRAW, 0 },
+        umin_op[4] = { VMINUB, VMINUH, VMINUW, VMINUD },
+        smin_op[4] = { VMINSB, VMINSH, VMINSW, VMINSD },
+        umax_op[4] = { VMAXUB, VMAXUH, VMAXUW, VMAXUD },
+        smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, VMAXSD },
+        shlv_op[4] = { VSLB, VSLH, VSLW, VSLD },
+        shrv_op[4] = { VSRB, VSRH, VSRW, VSRD },
+        sarv_op[4] = { VSRAB, VSRAH, VSRAW, VSRAD },
         mrgh_op[4] = { VMRGHB, VMRGHH, VMRGHW, 0 },
         mrgl_op[4] = { VMRGLB, VMRGLH, VMRGLW, 0 },
-        muleu_op[4] = { VMULEUB, VMULEUH, 0, 0 },
-        mulou_op[4] = { VMULOUB, VMULOUH, 0, 0 },
+        muleu_op[4] = { VMULEUB, VMULEUH, VMULEUW, 0 },
+        mulou_op[4] = { VMULOUB, VMULOUH, VMULOUW, 0 },
         pkum_op[4] = { VPKUHUM, VPKUWUM, 0, 0 },
-        rotl_op[4] = { VRLB, VRLH, VRLW, 0 };
+        rotl_op[4] = { VRLB, VRLH, VRLW, VRLD };
 
     TCGType type = vecl + TCG_TYPE_V64;
     TCGArg a0 = args[0], a1 = args[1], a2 = args[2];
@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_sub_vec:
         insn = sub_op[vece];
         break;
+    case INDEX_op_mul_vec:
+        tcg_debug_assert(vece == MO_32 && have_isa_2_07);
+        insn = VMULUWM;
+        break;
     case INDEX_op_ssadd_vec:
         insn = ssadd_op[vece];
         break;
@@ -XXX,XX +XXX,XX @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         insn = VNOR;
         a2 = a1;
         break;
+    case INDEX_op_orc_vec:
+        insn = VORC;
+        break;
 
     case INDEX_op_cmp_vec:
         switch (args[3]) {
@@ -XXX,XX +XXX,XX @@ static void expand_vec_cmp(TCGType type, unsigned vece, TCGv_vec v0,
 {
     bool need_swap = false, need_inv = false;
 
-    tcg_debug_assert(vece <= MO_32);
+    tcg_debug_assert(vece <= MO_32 || have_isa_2_07);
 
     switch (cond) {
     case TCG_COND_EQ:
@@ -XXX,XX +XXX,XX @@ static void expand_vec_mul(TCGType type, unsigned vece, TCGv_vec v0,
 	break;
 
     case MO_32:
+        tcg_debug_assert(!have_isa_2_07);
         t3 = tcg_temp_new_vec(type);
         t4 = tcg_temp_new_vec(type);
         tcg_gen_dupi_vec(MO_8, t4, -16);
@@ -XXX,XX +XXX,XX @@ static void tcg_target_init(TCGContext *s)
     if (hwcap & PPC_FEATURE_ARCH_2_06) {
         have_isa = tcg_isa_2_06;
     }
+#ifdef PPC_FEATURE2_ARCH_2_07
+    if (hwcap2 & PPC_FEATURE2_ARCH_2_07) {
+        have_isa = tcg_isa_2_07;
+    }
+#endif
 #ifdef PPC_FEATURE2_ARCH_3_00
     if (hwcap2 & PPC_FEATURE2_ARCH_3_00) {
         have_isa = tcg_isa_3_00;
-- 
2.17.1

These new instructions are conditional only on MSR.VSX and
are thus part of the VSX instruction set, and not Altivec.
This includes double-word loads and stores.

Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.inc.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

These new instructions are conditional on MSR.FP when TX=0 and
MSR.VEC when TX=1.  Since we only care about the Altivec registers,
and force TX=1, we can consider these to be Altivec instructions.
Since Altivec is true for any use of vector types, we only need
test have_isa_2_07.

This includes moves to and from the integer registers.

Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.inc.c | 32 ++++++++++++++++++++++++++------
 1 file changed, 26 insertions(+), 6 deletions(-)

These new instructions are conditional only on MSR.VEC and
are thus part of the Altivec instruction set, and not VSX.
This includes negation and compare not equal.

Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.h     |  2 +-
 tcg/ppc/tcg-target.inc.c | 23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+), 1 deletion(-)

These new instructions are a mix of those like LXSD that are
only conditional only on MSR.VEC and those like LXV that are
conditional on MSR.VEC for TX=1.  Thus, in the end, we can
consider all of these as Altivec instructions.

Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.inc.c | 47 ++++++++++++++++++++++++++++++++--------
 1 file changed, 38 insertions(+), 9 deletions(-)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -XXX,XX +XXX,XX @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define LXSDX      (XO31(588) | 1)  /* v2.06, force tx=1 */
 #define LXVDSX     (XO31(332) | 1)  /* v2.06, force tx=1 */
 #define LXSIWZX    (XO31(12) | 1)   /* v2.07, force tx=1 */
+#define LXV        (OPCD(61) | 8 | 1)  /* v3.00, force tx=1 */
+#define LXSD       (OPCD(57) | 2)   /* v3.00 */
+#define LXVWSX     (XO31(364) | 1)  /* v3.00, force tx=1 */
 
 #define STVX       XO31(231)
 #define STVEWX     XO31(199)
 #define STXSDX     (XO31(716) | 1)  /* v2.06, force sx=1 */
 #define STXSIWX    (XO31(140) | 1)  /* v2.07, force sx=1 */
+#define STXV       (OPCD(61) | 8 | 5) /* v3.00, force sx=1 */
+#define STXSD      (OPCD(61) | 2)   /* v3.00 */
 
 #define VADDSBS    VX4(768)
 #define VADDUBS    VX4(512)
@@ -XXX,XX +XXX,XX @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
                              TCGReg base, tcg_target_long offset)
 {
     tcg_target_long orig = offset, l0, l1, extra = 0, align = 0;
-    bool is_store = false;
+    bool is_int_store = false;
     TCGReg rs = TCG_REG_TMP1;
 
     switch (opi) {
@@ -XXX,XX +XXX,XX @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
             break;
         }
         break;
+    case LXSD:
+    case STXSD:
+        align = 3;
+        break;
+    case LXV:
+    case STXV:
+        align = 15;
+        break;
     case STD:
         align = 3;
         /* FALLTHRU */
     case STB: case STH: case STW:
-        is_store = true;
+        is_int_store = true;
         break;
     }
 
@@ -XXX,XX +XXX,XX @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
         if (rs == base) {
             rs = TCG_REG_R0;
         }
-        tcg_debug_assert(!is_store || rs != rt);
+        tcg_debug_assert(!is_int_store || rs != rt);
         tcg_out_movi(s, TCG_TYPE_PTR, rs, orig);
         tcg_out32(s, opx | TAB(rt & 31, base, rs));
         return;
@@ -XXX,XX +XXX,XX @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
     case TCG_TYPE_V64:
         tcg_debug_assert(ret >= TCG_REG_V0);
         if (have_vsx) {
-            tcg_out_mem_long(s, 0, LXSDX, ret, base, offset);
+            tcg_out_mem_long(s, have_isa_3_00 ? LXSD : 0, LXSDX,
+                             ret, base, offset);
             break;
         }
         tcg_debug_assert((offset & 7) == 0);
@@ -XXX,XX +XXX,XX @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
     case TCG_TYPE_V128:
         tcg_debug_assert(ret >= TCG_REG_V0);
         tcg_debug_assert((offset & 15) == 0);
-        tcg_out_mem_long(s, 0, LVX, ret, base, offset);
+        tcg_out_mem_long(s, have_isa_3_00 ? LXV : 0,
+                         LVX, ret, base, offset);
         break;
     default:
         g_assert_not_reached();
@@ -XXX,XX +XXX,XX @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
     case TCG_TYPE_V64:
         tcg_debug_assert(arg >= TCG_REG_V0);
         if (have_vsx) {
-            tcg_out_mem_long(s, 0, STXSDX, arg, base, offset);
+            tcg_out_mem_long(s, have_isa_3_00 ? STXSD : 0,
+                             STXSDX, arg, base, offset);
             break;
         }
         tcg_debug_assert((offset & 7) == 0);
@@ -XXX,XX +XXX,XX @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
         break;
     case TCG_TYPE_V128:
         tcg_debug_assert(arg >= TCG_REG_V0);
-        tcg_out_mem_long(s, 0, STVX, arg, base, offset);
+        tcg_out_mem_long(s, have_isa_3_00 ? STXV : 0,
+                         STVX, arg, base, offset);
         break;
     default:
         g_assert_not_reached();
@@ -XXX,XX +XXX,XX @@ static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
     tcg_debug_assert(out >= TCG_REG_V0);
     switch (vece) {
     case MO_8:
-        tcg_out_mem_long(s, 0, LVEBX, out, base, offset);
+        if (have_isa_3_00) {
+            tcg_out_mem_long(s, LXV, LVX, out, base, offset & -16);
+        } else {
+            tcg_out_mem_long(s, 0, LVEBX, out, base, offset);
+        }
         elt = extract32(offset, 0, 4);
 #ifndef HOST_WORDS_BIGENDIAN
         elt ^= 15;
@@ -XXX,XX +XXX,XX @@ static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
         break;
     case MO_16:
         tcg_debug_assert((offset & 1) == 0);
-        tcg_out_mem_long(s, 0, LVEHX, out, base, offset);
+        if (have_isa_3_00) {
+            tcg_out_mem_long(s, LXV | 8, LVX, out, base, offset & -16);
+        } else {
+            tcg_out_mem_long(s, 0, LVEHX, out, base, offset);
+        }
         elt = extract32(offset, 1, 3);
 #ifndef HOST_WORDS_BIGENDIAN
         elt ^= 7;
@@ -XXX,XX +XXX,XX @@ static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
         tcg_out32(s, VSPLTH | VRT(out) | VRB(out) | (elt << 16));
         break;
     case MO_32:
+        if (have_isa_3_00) {
+            tcg_out_mem_long(s, 0, LXVWSX, out, base, offset);
+            break;
+        }
         tcg_debug_assert((offset & 3) == 0);
         tcg_out_mem_long(s, 0, LVEWX, out, base, offset);
         elt = extract32(offset, 2, 2);
-- 
2.17.1

These new instructions are conditional on MSR.VEC for TX=1,
so we can consider these Altivec instructions.

Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.inc.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

From: Alex Bennée <alex.bennee@linaro.org>

qemu_cpu_kick is used for a number of reasons including to indicate
there is work to be done. However when thread=single the old
qemu_cpu_kick_rr_cpu only advanced the vCPU to the next executing one
which can lead to a hang in the case that:

a) the kick is from outside the vCPUs (e.g. iothread)
  b) the timers are paused (i.e. iothread calling run_on_cpu)

To avoid this lets split qemu_cpu_kick_rr into two functions. One for
the timer which continues to advance to the next timeslice and another
for all other kicks.

Message-Id: <20191001160426.26644-1-alex.bennee@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 cpus.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/cpus.c b/cpus.c
index XXXXXXX..XXXXXXX 100644
--- a/cpus.c
+++ b/cpus.c
@@ -XXX,XX +XXX,XX @@ static inline int64_t qemu_tcg_next_kick(void)
     return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + TCG_KICK_PERIOD;
 }
 
-/* Kick the currently round-robin scheduled vCPU */
-static void qemu_cpu_kick_rr_cpu(void)
+/* Kick the currently round-robin scheduled vCPU to next */
+static void qemu_cpu_kick_rr_next_cpu(void)
 {
     CPUState *cpu;
     do {
@@ -XXX,XX +XXX,XX @@ static void qemu_cpu_kick_rr_cpu(void)
     } while (cpu != atomic_mb_read(&tcg_current_rr_cpu));
 }
 
+/* Kick all RR vCPUs */
+static void qemu_cpu_kick_rr_cpus(void)
+{
+    CPUState *cpu;
+
+    CPU_FOREACH(cpu) {
+        cpu_exit(cpu);
+    };
+}
+
 static void do_nothing(CPUState *cpu, run_on_cpu_data unused)
 {
 }
@@ -XXX,XX +XXX,XX @@ void qemu_timer_notify_cb(void *opaque, QEMUClockType type)
 static void kick_tcg_thread(void *opaque)
 {
     timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
-    qemu_cpu_kick_rr_cpu();
+    qemu_cpu_kick_rr_next_cpu();
 }
 
 static void start_tcg_kick_timer(void)
@@ -XXX,XX +XXX,XX @@ void qemu_cpu_kick(CPUState *cpu)
 {
     qemu_cond_broadcast(cpu->halt_cond);
     if (tcg_enabled()) {
-        cpu_exit(cpu);
-        /* NOP unless doing single-thread RR */
-        qemu_cpu_kick_rr_cpu();
+        if (qemu_tcg_mttcg_enabled()) {
+            cpu_exit(cpu);
+        } else {
+            qemu_cpu_kick_rr_cpus();
+        }
     } else {
         if (hax_enabled()) {
             /*
-- 
2.17.1

The following changes since commit 390e8fc6b0e7b521c9eceb8dfe0958e141009ab9:

Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging (2023-06-26 16:05:45 +0200)

are available in the Git repository at:

https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230626

for you to fetch changes up to a0eaae08c7c6a59c185cf646b02f4167b2ac6ec0:

accel/tcg: Renumber TLB_DISCARD_WRITE (2023-06-26 17:33:00 +0200)

----------------------------------------------------------------
accel/tcg: Replace target_ulong in some APIs
accel/tcg: Remove CONFIG_PROFILER
accel/tcg: Store some tlb flags in CPUTLBEntryFull
tcg: Issue memory barriers as required for the guest memory model
tcg: Fix temporary variable in tcg_gen_gvec_andcs

----------------------------------------------------------------
Alex Bennée (1):
      softfloat: use QEMU_FLATTEN to avoid mistaken isra inlining

Anton Johansson (11):
      accel: Replace target_ulong in tlb_*()
      accel/tcg/translate-all.c: Widen pc and cs_base
      target: Widen pc/cs_base in cpu_get_tb_cpu_state
      accel/tcg/cputlb.c: Widen CPUTLBEntry access functions
      accel/tcg/cputlb.c: Widen addr in MMULookupPageData
      accel/tcg/cpu-exec.c: Widen pc to vaddr
      accel/tcg: Widen pc to vaddr in CPUJumpCache
      accel: Replace target_ulong with vaddr in probe_*()
      accel/tcg: Replace target_ulong with vaddr in *_mmu_lookup()
      accel/tcg: Replace target_ulong with vaddr in translator_*()
      cpu: Replace target_ulong with hwaddr in tb_invalidate_phys_addr()

Fei Wu (1):
      accel/tcg: remove CONFIG_PROFILER

Max Chou (1):
      tcg: Fix temporary variable in tcg_gen_gvec_andcs

Richard Henderson (8):
      tests/plugin: Remove duplicate insn log from libinsn.so
      target/microblaze: Define TCG_GUEST_DEFAULT_MO
      tcg: Do not elide memory barriers for !CF_PARALLEL in system mode
      tcg: Add host memory barriers to cpu_ldst.h interfaces
      accel/tcg: Remove check_tcg_memory_orders_compatible
      accel/tcg: Store some tlb flags in CPUTLBEntryFull
      accel/tcg: Move TLB_WATCHPOINT to TLB_SLOW_FLAGS_MASK
      accel/tcg: Renumber TLB_DISCARD_WRITE